How to decide between a docker volume container and a docker volume?

Skylar · April 27, 2022, 12:24pm

After reading the docs I found myself somewhat confused as to how best to manage productive application/service data.

There seem to be 3 options:

Simply map volume to host directory (i.e. -v argument for docker run)
Create a docker container image for data (i.e. separate container and --volumes-from)
Creating a docker volume (i.e. docker volume create)

Now, it seems that the accepted practice is option #2, but then I wonder what is the purpose of #3.

Especially how do you correctly handle these scenarios with docker volume and is it better to use a data volume container or this for each situation?

You need application data in a separate volume and/or storage tier in your server
Backing up
Restoring data

Eli_D · April 27, 2022, 12:28pm

As of Docker 1.9, creating Named Volumes with the Volumes API (docker volume create --name mydata) are preferred over a Data Volume Container. As of February 2016, the Docker volumes documentation is woefully out-of-date. Folks at Docker themselves suggest that Data Volume Containers “are no longer considered a recommended pattern,” “named volumes should be able to replace data-only volumes in most (if not all) cases,” and “no reason I can see to use data-only containers.”

Alex_Pank · April 27, 2022, 12:33pm

I think #2 and #3 are pretty much the same thing, the main difference is that there is no stopped container with #3 (it is literally, just a named volume). For example, you can create a named volume and do similarly what you would do with #2 with -v instead.

Create a named volume:

$ docker volume create --name test

Mount and write some data to that volume from a container:

$ docker run -v test:/opt/test alpine touch /opt/test/hello

You can then mount that same test volume in another container and read the data:

$ docker run -v test:/opt/test alpine ls -al /opt/test     
total 8
drwxr-xr-x    2 root     root          4096 Jan 23 22:28 .
drwxr-xr-x    3 root     root          4096 Jan 23 22:29 ..
-rw-r--r--    1 root     root             0 Jan 23 22:28 hello

The advantage here is that the volume won't accidentally disappear if you remove the data-only container. You now manage it with the docker volume sub-command.

$ d volume ls
DRIVER              VOLUME NAME
local               test

It also opens the possibilities for volume drivers down the road so you might be able to do shared volumes between hosts (ie. named volumes over NFS). Examples of this might be Flocker and Convoy. To your point specifically about moving or backing up data, Convoy has specific sub-commands for backing up data and allows for storage on NFS or EBS external to your host.

For this reason, I think the more new-school way (Docker 1.9+) is to use a named volume rather than a data-only container.

Blake_P · April 27, 2022, 12:37pm

@MichaelHampton Why?, data may not be dockerized but the host OS is still managed by an infrastructure team who monitors and backups

Gene_H · April 27, 2022, 12:42pm

@MichaelHampton I realized I should rephrase my question

Harley · April 27, 2022, 12:47pm

@dukeofgaming Not to mention that you can run btrfs scrub on it to find and correct damaged files. I am not sure how dockerized stuff works, but I guess it does not protect against data rot, so I always need a full restore if something bad happens instead of just restoring individual files. Another thought that it adds another layer of abstraction, so it slows down file reading and writing even more. I somehow don’t see the advantages of #2 and #3, but I am not experienced with docker, so this might change.

Mason_D · April 27, 2022, 12:51pm

Miller_A · April 27, 2022, 12:56pm

#1 is not a serious option for production; it should basically never be done if an alternative exists.