Update Docker container without downtime

Let's say I have a Docker container with a web server (like Apache 2). Now I want to update the OS under it. This SF answer says the best way is to rebuild the base image and my Apache image. But deploying the image means downtime because I have to delete the old container before I can create the new one, so there is only one container that is binding to port 80/443.

But how can I deploy this update with zero downtime? Should I use a load balancer and use inter-container communication? And how do I update the load balancer?

The ideal target scenario

Yes, you should use a load balancer and update one instance at a time. I'm not sure where inter-container communication comes in.

As an example, imagine you have a load balancer which serves your site A. Users only connect to it as and only know it as "A". The load balancer knows that there are two or more backends (B, C, etc.), and whether they're VMs or containers doesn't matter.

Then, you want to upgrade the backends, which in this case are Apache instances.

  1. take B out of the eligible backends for the load balancer so it's no longer accepting any traffic.
  2. wait for the currently-live requests to be served and existing connections closed.
  3. update the container or underlying VM that serves B
  4. restart B, wait for it to load and start working
  5. test B to make sure it's serving new requests properly
  6. add B back to the load balancer backend pool to re-enable traffic

Then, do the same process for C, D, etc.

Note that there's an open request for in-place upgrades of Docker containers, from Nov 2013, but it doesn't appear to have much progress so the above solution is what you should do in the mean time.

What to do for an existing live site

Presumably, you're asking this because you're already running a live site in this model and you would like to upgrade it without downtime. So, we need to get to the ideal target state above, but incrementally.

Let's assume that:

  • you have a DNS name pointing to your container
  • your container runs on some IP address
  • your users don't know the container's IP address and it's not hard-coded anywhere

If these assumptions are false, you should first fix it such that this is correct.

Then, follow these steps:

  1. create a load balancer at a new IP and point it at the existing container as its only backend
  2. change DNS to point to the load balancer rather than the container IP directly
  3. add an identical Apache backend with the same VM + container setup
  4. now you have a load balancer with two backends B and C, so follow the directions in the "ideal target scenario" section for upgrading them one at-a-time

How to update a load balancer

The easy (hosted) way

The easiest option is to not run your own balancer. For example, if you're using a cloud platform which provides load balancing as a service, consider using it and then maintenance and update of the load balancer is not an issue.

The manual way

If you are running your own load balancer, adding another layer of indirection (i.e., DNS) will help. Let's assume the following:

  • that we have a host name resolving to the IP of our load balancer A which we would like to update
  • our load balancer has a backend pool of P1, P2, etc.

We proceed as follows:

  • create a new load balancer B with the new software version
  • add all backend pool instances P1, P2, etc. to our new load balancer B as backends
  • add B's IP address to the DNS resolution along with A

    • now we're effectively using DNS as a load balancer
    • if the entries for A and B are unweighted, they're effectively 50-50
    • now watch to see how B performs, whether there are any errors, etc.
    • if anything is wrong with B, undo as follows:

      1. remove B from the DNS config
      2. wait for the the B entry in the DNS to disappear (i.e., wait for TTL to expire)
      3. turn down B
  • assume you've done the "burn-in" test for B and everything is fine
  • update the priority and weight for B in DNS gradually
  • remove A from DNS entirely
  • wait for DNS TTL to expire; A should not be getting any requests anymore
  • turn down A

and you're done.

Details, diagrams, and tooling

See these write-ups and tools that can help you automate the process, but the general idea is the same:

The Moral

"All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections."David Wheeler