After reading the docs I found myself somewhat confused as to how best to manage productive application/service data.
There seem to be 3 options:
- Simply map volume to host directory (i.e.
-v argument for
- Create a docker container image for data (i.e. separate container and
- Creating a docker volume (i.e.
docker volume create)
Now, it seems that the accepted practice is option #2, but then I wonder what is the purpose of #3.
Especially how do you correctly handle these scenarios with
docker volume and is it better to use a data volume container or this for each situation?
- You need application data in a separate volume and/or storage tier in your server
- Backing up
- Restoring data
As of Docker 1.9, creating Named Volumes with the Volumes API (
docker volume create --name mydata) are preferred over a Data Volume Container. As of February 2016, the Docker volumes documentation is woefully out-of-date. Folks at Docker themselves suggest that Data Volume Containers “are no longer considered a recommended pattern,” “named volumes should be able to replace data-only volumes in most (if not all) cases,” and “no reason I can see to use data-only containers.”
I think #2 and #3 are pretty much the same thing, the main difference is that there is no stopped container with #3 (it is literally, just a named volume). For example, you can create a named volume and do similarly what you would do with #2 with
Create a named volume:
$ docker volume create --name test
Mount and write some data to that volume from a container:
$ docker run -v test:/opt/test alpine touch /opt/test/hello
You can then mount that same
test volume in another container and read the data:
$ docker run -v test:/opt/test alpine ls -al /opt/test
drwxr-xr-x 2 root root 4096 Jan 23 22:28 .
drwxr-xr-x 3 root root 4096 Jan 23 22:29 ..
-rw-r--r-- 1 root root 0 Jan 23 22:28 hello
The advantage here is that the volume won't accidentally disappear if you remove the data-only container. You now manage it with the
docker volume sub-command.
$ d volume ls
DRIVER VOLUME NAME
It also opens the possibilities for volume drivers down the road so you might be able to do shared volumes between hosts (ie. named volumes over NFS). Examples of this might be Flocker and Convoy. To your point specifically about moving or backing up data, Convoy has specific sub-commands for backing up data and allows for storage on NFS or EBS external to your host.
For this reason, I think the more new-school way (Docker 1.9+) is to use a named volume rather than a data-only container.
@MichaelHampton Why?, data may not be dockerized but the host OS is still managed by an infrastructure team who monitors and backups
@MichaelHampton I realized I should rephrase my question
@dukeofgaming Not to mention that you can run
btrfs scrub on it to find and correct damaged files. I am not sure how dockerized stuff works, but I guess it does not protect against data rot, so I always need a full restore if something bad happens instead of just restoring individual files. Another thought that it adds another layer of abstraction, so it slows down file reading and writing even more. I somehow don’t see the advantages of #2 and #3, but I am not experienced with docker, so this might change.
#1 is not a serious option for production; it should basically never be done if an alternative exists.