The power of checklists

Checklists and runbooks can make your life easier.

Dan Moore
Nov 8th, 2019

Why create a checklist?

I love checklists. They make sure that I follow complicated procedures correctly. They can be shared. I can write a checklist on a wiki page and go back in three days or three months and follow the exact same steps (time travel!).

The Checklist Manifesto is one of my favorite books. In this slim volume, the author covers how effective checklists can be in a wide variety of circumstances. Whether it is surgery or flying a plane, checklists help decrease mistakes and increase overall success rates of most processes. It is really quite inspiring and applicable to many spheres of life. What works in more life and death matters also can help software development and operations. Checklists abound in software development companies (they are often called playbooks or runbooks as well).

However, there’s an even more powerful option for many software systems – you can automate your checklist. If you have a checklist for code formatting and syntax, you can either manually run through the checklist every time someone commits code, or you can write a script to do so. (Or, more likely, find an existing tool and apply a set of rules that your company has standardized on.)

The same is true for deployments. I remember when I shifted from manual deployments following a wiki page to a more automated deployment using the wiki page and shell scripts. The more complicated scenarios still required human involvement, but simpler things like applying database changes could be scripted. This, along with a healthy dose of cutting and pasting, made such deployments less risky.

Is it worth it?

Sometimes it doesn’t make sense to automate a checklist for a process. This is the case when the process:

  • is new and still evolving
  • is continuously changing based on factors outside of your control
  • only happens very rarely and in low risk situations
  • has a lot of human interaction

You can definitely prematurely optimize your processes, as this XKCD comic illustrates:

But the benefits of conceptualizing and sharing checklists are often well worth the effort.

What makes sense?

What are some processes of software development and operations for which you can use checklists? Here are some examples:

  • code reviews
  • new environment setup
  • deploying to production or other environments
  • troubleshooting issues
  • onboarding a new customer
  • setting up a new AWS environment
  • migrating data or infrastructure

Here’s my spectrum of checklists. There’s a tension that is worth acknowledging. The further down the checklist in this list, the more useful and scalable it is (often to other people), but also the more effort is involved to create it and change it.

  • personal notes in a notebook
  • documentation in a personal wiki, Confluence, Google Doc, etc.
  • public how-tos
  • personal aliases
  • cutting and pasting from documentation
  • scripts in your ~bin directory
  • scripts in a public repository
  • infrastructure as code (e.g. Terraform, CloudFormation)
  • configuration management as code (e.g. Puppet, Ansible)
  • CI/CD configuration and scripts

Although all of these checklists do have one problem. Running through each checklist or playbook is more or less a private process. From private notebooks to personal scripts, knowledge is locked up. Sometimes that is because the knowledge is ephemeral or you are unsure of the value. Sometimes because sharing is hard (it’s tough to share scripts with non technical users).

When you run terraform apply no one necessarily knows you have done it. Though they know what has been changed if you version control your Terraform files. (Please version control your Terraform files.) Even the CI/CD pipeline, which is the most public, is often not accessed by all team members. There’s a reason most CI/CD pipelines push results into Slack.

But even with that issue, having a checklist will lead to more reliable, repeatable processes. Which of your processes will you go write a checklist for?