What is the difference between creating a volume or a mount in docker containers?

Docker provides 2 ways to backup and sync container data on the local machine i.e. volume and mount. Both behave in the same way except for a few things I noticed:

  1. A volume always keeps data in /var/lib/docker/volumes, while mount points can be created wherever we want.
  2. If a container which is assigned a mount point is also assigned a volume then all data from the mount point is copied to the volume automatically, while the opposite is not true.
  3. We cannot describe a mount point in a Dockerfile but can give volumes in a Dockerfile.

Ok, so we can say there are some advantages and disadvantages of methodology but are there still some classification or differences in term of optimization.

Please provide an explained answer.

There are actually three types of volumes:

  • Host Volume: what you refer to as a mount in a container, the more common term is a bind mount.
  • Named Volume: any volume managed by docker which you give a name.
  • Anonymous Volume: any volume without a source, docker will create this as a local volume with a long unique id, and it behaves as a named volume.

Volumes have a source and a target. The source identifies the type of volume, so a path (including the leading slash) to a file/directory results in a host volume. If you do not provide a source, you get the anonymous volumes. If you define a volume inside a Dockerfile, you cannot specify a source there, so by default docker will create anonymous volumes unless you direct it otherwise at runtime.

For each type, here are the pros/cons:

  • Host:
    • Pro: easy to access the underlying files from the host
    • Con: uid/gid permission issues occur when container user's uid does not match the host gid
    • Con: data is not initialized
  • Named:
    • Pro: easy to create an reuse between different containers/images. If you only give it a name with no other settings, the local driver will default to storing your data in /var/lib/docker/volumes which should only be accessible by root from outside of docker.
    • Pro: initializes content to the image contents when it is empty/new and the container is created. This initialization includes file owners and permissions from the image, which can resolve most uid/gid issues.
    • Pro: Can connect to anything that a mount command can, including a bind mount or NFS mount, with a local driver. Other drivers let you reference data in even more locations (e.g. cloud providers).
    • Con: managing content should be done via a container.
  • Anonymous:
    • Pro: requires no planning to use
    • Con: data typically goes here to be lost since there is no mapping from the volume back to the container/image that created it. This is the worst way to store volumes in my opinion, and the reason that no one should ever define a volume inside their Dockerfile.

When possible, I use a named volume. The initialization of data and better handling of uid/gid issues trump the convenience of a host volume. If I really need access outside of docker directly to the data, then I try to use a named volume that points to a bind mount instead of the default local driver settings. A simple example of this is:

$ docker volume create --driver local \
  --opt type=none \
  --opt device=/home/user/test \
  --opt o=bind \
  test_vol

For defining my volumes, since you do not want to do this in a Dockerfile, I use a docker-compose.yml and define my volumes in there. If it's deployed with swarm mode, I'll point to a NFS server with a named volume to allow the data to be reached as the containers migrate to different hosts. Otherwise it's a local named volume that can be easily used with docker-compose.

Volumes in the dockerfile allow a path to be specified in the image that should always be created as a volume. This inherently bypasses the union filesystem docker uses.

Users of such an image will always get a volume at that location when running

docker run <imagename>

i.e. there is no reason to ever add -v /my/mount/point:/mount/here and thus users need not be concerned with it.

binding mounts (like the example above with -v) must always be present if they are required. and are not portable between images.

the effective differences to optimization are these:

  • volumes can be used where a lot of r/w operations are needed and it has business writing on the union file system (think databases)
  • volumes are worthless for mounting things like data volumes. you can do it, but you take an enormous r/w hit because there's no reason for this to be in the union file system.
  • mounts however will store this (the above) quite well as it simply mounts the existing directory to a place within the container and ignores the union file system for that directory all together.

does this make sense?