Developer Onboarding

Posted on

If you have been developing modern software for very long it’s likely that you’ve been at a company with some kind of backup and restore policy when it comes to things like databases (or any other type of datastore). An oft uttered pearl of wisdom in regards to backups is something along the lines of “You don’t really have a backup if you don’t test restoring from it”. You can actually extrapolate from this statement and apply it to your overall Disaster Recovery plan. For example, it’s great that you have automation that can construct a new environment in a different AWS Region, but if you never do that and test cutting over to it…how confident are you really that it works?

I tend to do a lot of work spelunking through old code bases, kind of like a digital Indiana Jones, looking for technical debt that is standing in our way of modernizing a system or preventing us from operating at scale. This typically involves me trying to get the system stood up and working locally before I can really begin my quest to find Tutankhamun’s hidden lair of single-threaded background processes. While on such an adventure recently, it occurred to me that Developer Onboarding should be treated exactly like Backup and Restore and Disaster Recovery plans:

  • Documented thoroughly
  • Updated frequently
  • Tested regularly
  • Made as turnkey as possible

First, what’s the scope of “Developer Onboarding”

For the purposes of this post, the scope of “Developer Onboarding” is the entire process. This would include the work needed to setup accounts before a new hire’s first day, the flow to issue equipment, and then all the steps they’d need to “get productive” as quickly as is reasonable.

Some of these things, e.g. corporate accounts like e-mail, issuing equipment, etc are likely not in the direct control of the engineering org. However, we should always aspire to control the chaos we can, so just dial down anything in this post that’s outside of your control. Ideally you’d work with your partners in corporate IT, HR, etc to ensure their parts of the equation balance with the intent of this post: smooth and consistent onboarding.

Documented Thoroughly

If it isn’t documented in the location your organization has chosen as the documentation hub, then it doesn’t exist. Chances are you will have several relevant places that your organization places documentation. A wiki system like Confluence or Notion, README files in code repositories, or even a set of folders in something like Google Drive or Office365. You want to be consistent so that all top-level onboarding documents are in the same place, but it’s fine if that top-level document links out to other locations.

Regardless of where your top-level documents are, the outcome you are after is to be able to hand everyone in the process a single document they can follow through to complete the onboarding flow. It doesn’t have to be the same document for all, but everyone should just get one document. For the hiring manager, the document is probably going to include a lot of information about what requests to make, and to who, for things like account setup, equipment, etc. One of those steps should be for handing off the relevant top-level document to the developer on their first day.

The new engineer’s onboarding document is probably going to be a mix of inline steps like “Login to Slack and post a ‘hello’ in your team channel”, but then another step that links to a README in a git repository that has additional steps for getting their local development environment setup. Again, it doesn’t matter if the paths branch, but they should always be able to circle back to the top-level document to find out what to do next.

Updated Frequently

Once you have a documented and tested onboarding process, the world is going to change on you. Your company is going to move from Slack to Microsoft Teams, your software CI/CD is going to be in CircleCI instead of an inhouse Jenkins server, etc. So, you will be constantly keeping your onboarding documentation up to date.

Hopefully, some of these changes are good changes and result in some of the steps being removed. In any case, the outcome you are after is to treat the onboarding documentation like a living document and make sure to feed and water it as necessary. It’s part of your team and how you do business so you can even talk about it at retros. Hopefully, you’re even talking to newly onboarded engineers to find out what worked and what didn’t.

Tested Regularly

Many blessings be upon you, and may your company always have a healthy growth curve so that you are always testing your onboarding process on new developers! The sad reality though is that won’t always be the case and there will be lags in time between new developers joining the team. So then, how do you test the onboarding flow?

On an existing developer of course! Just “offboard” them as far up the chain as is practical (e.g. you probably aren’t removing them from e-mail, Slack, SSO, etc) and then onboard them again. Maybe the extent you go is wiping their local development setup and having them run back through that part of things. Go as far as you can and do it on a similar cadence to other Disaster Recovery processes in your organization.

This is going to make sure that your onboarding process still works smoothly, keep any existing points of friction top of mind, and maybe uncover something new that you want to use to reduce a few steps.

When your Product Manager (rightfully) asks why the team is going to expend precious capacity on something like this instead of one Feature X, you can explain that it provides the following benefits:

  • What happens if an existing engineer’s laptop explodes or becomes corrupted or they get new equipment? Don’t you want them up and running again quickly?
  • If the software is easy to setup and host locally, it’s likely written such that it’s easy to host and setup on “real” infrastructure too.
  • When the company hits its next growth cycle (when Feature X ships, obviously) you want to be prepared to get new engineers onboarded to build Feature Y.

In short, developer productivity is a business continuity risk and it needs to be mitigated the same way that others are: preparation and redundancy.

Made as Turnkey as Possible

Efficiency is all about getting the biggest result from the least number of steps possible. This means that if your Developer Onboarding document has 37 steps in it, you should try and cut that down to 30. Then 25, and so on, over as many iterations as it takes.

For the hiring manager, this could mean applying the same types of scripting that operations staff uses to manage infrastructure to manage the back office. Recently I used terraform and the GitHub Provider to automate creation of Teams and Team Membership in our github organization. Adding a newly onboarded engineer is now simplified. There may be other places you can do the same, especially if you are using modern SaaS software.

Other opportunities are consolidating systems so you can have a single vendor to onboard users to. The Atlassian suite of tools isn’t everyone’s favorite, but you do have the ability to cover many of your bases with a single onboarding flow.

In terms of the developer’s part, which is predominatly installing tools on their laptop and setting up a local development environment, the same questions apply. How much of that can be scripted. Can an image be created and kept up to date that jumpstarts the process by installing commonly used tools? If not an image, what about using something like Ansible to script out installations?

Technologies like docker, docker-compose, and minikube, among others, make setting up even complicated stacks automated. Ideally you would get your system back into the glory days of the Open Source movement where you would clone a code repository and do the following:

make
make install
./run.sh

When you reduce the number of steps involved you aren’t just shortening the time it takes for a developer to get set up, but also the number of opportunities for human error by missing a step or executing a command incorrectly. If a mistake like that happens (and it will) in a manual setup process, who is the new person going to reach out to? One of your other engineers who will have to put down their current task to help out, or make the choice to let your new engineer sit stuck in limbo until they can get around to helping.

You also gain consistency between developers so that it’s easier for them to pull down another engineer’s branch of code and have it running for some deeper code reviews or pairing sessions.

Lastly, and as stated earlier, if software is written in such a way that it is easy to configure and run locally via automation, the same is likely to carry through all the way out to production. In fact, for a legacy system, it’s often the case that the hosting of the software improves locally before those improvements become realized in other environments in the path to production.

Summary

Having a healthy Developer Onboarding flow should carry the same level of importance as any other type of business continuity solution such as having a working Disaster Recovery plan. It should be thoroughly documented, updated frequently, tested regularly, and made as turnkey as possible. No system will get there over night, but every step you take towards the goal is going to pay off so it’s worth investing in over time. Maybe even ask the question occassionally in your team retrospectives, “What can we do to remove a few steps from the onboarding document?”