Some commands like up -d service_name or start service_name are returning right away and this is pretty useful if you don't want the containers running to depend on the state of the shell, like they do with regular up service_name. The one use-case is running it from some kind of continious integration/delivery server.
But this way of running/starting services does not provide any feedback about the actual state of the service afterwards.
I had a similar need. However, I have a restart: always in my environment. So it can be a bit tricky to detect if something is crashing and restarting in a loop.
I made an Icinga/Nagios check to also compare the created and start times. Maybe it's useful to someone else down the line:
#!/usr/bin/env python
from __future__ import print_function
import argparse
from datetime import timedelta
from datetime import datetime
import sys
from dateutil.parser import parse as parse_date
import docker
import pytz
parser = argparse.ArgumentParser()
parser.add_argument("compose_project",
help="The name of the docker-compose project")
parser.add_argument("compose_service",
help="The name of the docker-compose service")
args = vars(parser.parse_args())
client = docker.from_env()
service_containers = client.containers.list(filters={
"label": [
"com.docker.compose.oneoff=False",
"com.docker.compose.project={}".format(args["compose_project"]),
"com.docker.compose.service={}".format(args["compose_service"])
]})
if len(service_containers) == 0:
print("CRITICAL: project({})/service({}) doesn't exist!".format(
args["compose_project"], args["compose_service"]))
sys.exit(2)
elif len(service_containers) > 1:
print("CRITICAL: project({})/service({}) has more than 1 "
"container!".format(
args["compose_project"], args["compose_service"]))
sys.exit(2)
service_container = service_containers[0]
created_at = parse_date(service_container.attrs['Created'])
status = service_container.attrs['State']['Status']
started_at = parse_date(service_container.attrs['State']['StartedAt'])
now = datetime.utcnow().replace(tzinfo=pytz.utc)
uptime = now - started_at
if status in ['stopped', 'exited', 'dead']:
print("CRITICAL: project({})/service({}) is status={}".format(
args["compose_project"], args["compose_service"], status))
sys.exit(2)
if (started_at - created_at) > timedelta(minutes=5):
if uptime < timedelta(seconds=5):
print("CRITICAL: project({})/service({}) appears to be "
"crash-looping".format(
args["compose_project"], args["compose_service"]))
sys.exit(2)
if status == "restarting":
print("WARNING: project({})/service({}) is restarting".format(
args["compose_project"], args["compose_service"]))
sys.exit(1)
print ("OK: project({})/service({}) is up for {}".format(
args["compose_project"], args["compose_service"], uptime
))
sys.exit(0)
This is returning only the status of docker container. If you want to check for the actual state of your application you should add HEALTHCHECK to your Dockerfile (https://docs.docker.com/engine/reference/builder/#healthcheck). Afterwards you can inspect it with:
containers either start and run indefinitely or stop immediately with an error code (i.e. for missing configuration)
you do the check only once after docker-compose up -d returns
you can check if there is any stopped container due to an error with:
docker ps -a | grep 'Exited (255)'.
This check works correctly even in case of containers which are expected to stop immediately with no error (i.e. data containers), as their status (from docker ps -a) is marked as Exited (0).
For example, in our docker-compose.yml, we start our containers with:
You can grep for (healthy) or/and (unhealthy) images to act properly.
In this example, i'm probing docker-compose each 5 seconds for running service with (healthy) status.
If script will find such service, it will break execution.
If script will exceed 300 seconds, it will exit with error code.
#!/bin/bash
SECONDS=0
LIMIT=300
x=$(docker-compose -f /mnt/<service>/docker-compose.yaml ps <service> | grep -c '(healthy)')
while [[ $x == "0" ]]; do
echo "Please wait until <service> becomes healthy"
sleep 5
x=$(docker-compose -f /mnt/<service>/docker-compose.yaml ps <service> | grep -c '(healthy)')
EXPIRED=$SECONDS
if [[ $x == "1" ]]; then
echo "<service> is healthy..."
break
elif [[ $LIMIT -lt $EXPIRED ]]; then
echo "<service> startup has exceeded 5m timeout, exiting!"
exit 1
fi
done