Ensuring data and code confidentiality in container images
Over the past few years, the cloud industry has seen a major shift from deploying monolithic applications on virtual machines to splitting applications into smaller components (microservices) and packaging them into containers. The popularity of containerization today is largely driven by the workings of Docker. Docker is the company that has become the main driving force behind containers: it has provided an easy-to-use tool for building and running Docker containers and a Docker container registry for the distribution challenge.
The success of containerization technology depends mainly on the security of containers at various stages of their life cycle. One of the security concerns is the presence of vulnerabilities inside individual containers. To identify them, the DevOps pipelines used to create containers are supplemented with scanners that look for packages with possible vulnerabilities in containers and alert their owners or technicians if they are found. Vulnerability Advisor on IBM Cloud is an example of such a utility.
Another aspect of security is to make sure that the container being launched is the one you want and that it has not been modified. This issue is addressed by using digital signatures stored in the Notary, which will protect the containers from any modifications.Docker Notary is an example of a public repository that stores image signatures. Using Notary, a customer can verify the signature of the container image to ensure that the container image has not been altered since it was signed with the owner or service technician key.
Another potential security issue is container isolation. Linux runtime security technologies such as namespaces, cgroups, Linux capabilities, and SELinux, AppArmor, and Seccomp profiles help limit container processes and isolate containers from each other at runtime.
This article addresses the still-hot enterprise security issue regarding data and code privacy in container images. The main security goal when working with container images is to allow the creation and distribution of encrypted container images so that they are available only to a specific set of recipients. In this case, others may have access to these images, but they will not be able to run them or see the sensitive data inside them. Container encryption is based on existing cryptography such as Rivest-Shamir-Adleman (RSA) encryption technologies, elliptic curve and Advanced Encryption Standard (AES), also known as Rijndael, a symmetric block encryption algorithm.
Introductory
To get the most out of this article, you should be familiar with Linux containers and container images, and have an understanding of the basics of security.
Related work on encryption and containers
As far as we know, there is no work in the field of encrypting container images. However, there are many implementations and products that support data confidentiality and anti-theft encryption through file system, block device, or hardware encryption. The latter is implemented using self-encrypting disks. There are also encrypted virtual machine images.
Encrypted file systems exist on many operating systems in enterprises and can support mounting encrypted partitions and directories. Encrypted file systems can even support booting from an encrypted boot drive. Linux supports block device encryption using the dm-encrypt driver; ecryptfs is one example of an encrypted file system. Other file encryption solutions available for Linuxopen source. In Windows, encryption is supported by the NTFS v3.0 file system. In addition, many manufacturers create self-encrypting discs. For virtual machine images, there is a solution similar to encrypted disks. The open source QEMU Machine (PC) Emulator and VMware virtualization products support encrypted virtual machine images.
Data encryption is usually aimed at protecting against data theft while the system is offline. A related technology is signing the container image with a key provided by the client and the Docker Notary server. Docker Notary server works in close proximity to the container image registry. Docker client tool users have the option to sign the container image and upload the signature to their accounts via Docker Notary. During this process, the signature is bound to the container image via the pathname to the image and its versions. The signature is created using a hash function that is calculated based on the description of the entire contents of the image. This description is called the container image manifest.The container image signing technology solves the problem of protecting container images from unauthorized access and helps to determine the origin of the container image.
Structure
The Docker ecosystem evolved to standardize container image formats using the Open Container Initiative (OCI) group of standards, which now controls the container runtime format (runtime-spec) and container image format (image-spec). Since the team's work required an extension of the existing container image format, we identified an extension to the standard to support encrypted images. The following sections describe the existing container image and extension format.
At the top level, a container can consist of a JavaScript Object Notation (JSON) document, which is a list of image manifests. For example, you can use this list of manifests when multiple architectures or platforms are used for the container image. The manifest list contains links to container manifests, one for each combination of architecture and operating system. For example, supported architectures include amd64, arm, and ppc64le, and supported operating systems include Linux or Windows. An example of a list of manifests is shown in the screenshot below:
The mediaType field describes the exact format of the specified document. This list of manifests allows future expansion and selection of the appropriate parser for the document involved.
The level below the list of manifests is the manifest. The manifest is also a JSON document and contains an ordered list of references to image layers. These links contain mediaType describing the format of the layer. The format can describe whether the layer is compressed, and if so, how. For example, each level could be saved as a .tar file containing files that were added at a specific stage in the build when doing docker build in a Dockerfile. Layers are often packed using compressed .gzip files to improve storage efficiency. An example of a manifest document is shown in the following screenshot:
As shown, manifests and layers are referenced through a "digest", which is usually a sha256 hash function in JSON documents. Manifests and layers are usually stored as files on the file system. Often, filenames are hash functions over the content, making them easier to find and load. The consequence of this hash method is that a small change in the referenced document causes changes in all the documents that reference it, right down to the list of manifests.
As part of our team's project, we created image encryption based on a hybrid encryption scheme using public and symmetric keys. Symmetric keys are used for bulk data encryption (used for multilevel encryption), and public keys are used for packing symmetric keys. We used three different public key encryption technologies: OpenPGP, JSON Web Encryption (JWE), and PKCS # 7.
OpenPGP
OpenPGP is an encryption and signature technology commonly used to encrypt and sign email messages. Open source communities also often use it to sign commits (tags) of source code in git repositories. It is an Internet standard defined by the IETF in RFC480 and can be viewed as an open version of the previous proprietary PGP technology.
OpenPGP has its own format for RSA keys. Keys are usually stored in a keyring file and can be imported from plain OpenPGP key files. The most convenient aspect of OpenPGP keyring is that public keys can be linked to the email addresses of their owners. You can work with multiple recipients of a message by simply selecting a list of recipients by their email addresses, which then appear in the public keys for those recipients. In addition, a web of trust has been created around this technology: you can find the public keys of many users, sorted by their email addresses. For example, these keys are often used for signing git tags.
You can use the OpenPGP Encrypted Message Format to encrypt a bulk message to multiple recipients. The OpenPGP message header contains one block for each recipient. Each block contains a 64-bit key identifier that tells the decryption algorithm where to try to decrypt the corresponding private key. After the encrypted blob inside the block is decrypted, it shows the symmetric key that can be used to decrypt the bulk message. Each recipient's encrypted public key blob exhibits the same symmetric key.
We used OpenPGP in a similar way, but in this case, the encrypted message it sends is not a layer. Instead, it contains a JSON document, which in turn contains a symmetric key used to encrypt and decrypt both the layer and the initialization vector. We call this key the layer encryption key (LEK) and it is a form of data encryption key. The advantage of this method is that we only need one LEK. With LEK we encrypt a layer for one or more recipients. Each recipient (container image) can have a different key type, and it does not have to be an OpenPGP key. For example, it could be a simple RSA key. As long as we have the ability to use this RSA key to encrypt the LEK, we can work with multiple recipients with different key types.
JSON Web Encryption (JWE)
JSON Web Encryption, also known as JWE, is another IETF Internet standard and is defined in RFC7516 . It is a newer encryption standard than OpenPGP, and therefore uses more recent low-level ciphers designed to meet more stringent encryption requirements.
On a larger scale, JWE works in a similar way to OpenPGP in that it also maintains a recipient list and bulk mailing of a message encrypted with a symmetric key that every recipient of the message has access to. Recipients of a JWE message can have different types of keys, such as RSA keys, specific elliptic curve key types for encryption, and symmetric keys. As this is a newer standard, it is still possible to extend the JWE to support keys in hardware devices such as TPMs or hardware security modules (HSMs) using PKCS # 11 or Key Management and Interoperability Protocol (KMIP) interfaces. The JWE is used in a similar manner to OpenPGP if the recipients have RSA keys or elliptic curves.In the future, we could extend it to support symmetric keys like KMIP inside HSM.
PKCS # 7
PKCS # 7, also known as Cryptographic Message Syntax (CMS), is defined in IEFT RFC5652 . According to Wikipedia about the CMS, "It can be used to digitally sign, digest, authenticate, or encrypt any form of digital data."
It is similar to the two previously described technologies in that it allows multiple recipients and encryption of bulk messages. Therefore, we used it just like other technologies, but only for recipients who provide certificates for encryption keys.
To support the previously described encryption technologies, we have extended the manifest document to include the following information:
- OpenPGP, JWE and PKCS # 7 messages are stored in an annotation map, which is part of the manifest.
- Each specified layer contains one map. An annotation map is basically a dictionary with strings as keys and strings as values (key-value pairs).
To support image encryption, we have defined the following keys:
- org.opencontainers.image.enc.keys.openpgp
- org.opencontainers.image.enc.keys.jwe
- org.opencontainers.image.enc.keys.pkcs7
The value referenced by each key contains one or more encrypted messages for the corresponding encryption technology. Since these messages can be in binary format, they are base64 encoded. An encrypted layer must have at least one such annotation, but it can have all, if its recipient has a sufficient number of different types of keys.
To determine that the layer was encrypted with LEK, we extended the existing media types with the suffix '+ encrypted', as shown in the following examples:
- application / vnd.docker.image.rootfs.diff.tar + encrypted
- application / vnd.docker.image.rootfs.diff.tar.gzip + encrypted
These examples show that the layer is zipped in a .tar file and encrypted - or both zipped in a .tar file and compressed into a .gzip file and encrypted. The following screenshot shows an example of a manifest that links to encrypted layers. It also shows an annotation map containing the encrypted JWE message.
Layered encryption using symmetric keys
For symmetric encryption with LEK, our team chose a cipher that supports authenticated encryption and is based on the AES encryption standard with 128- and 256-bit keys.
Example implementation: containerd
We implemented our variation in a new container runtime project called containerd . Its source code on golang can be viewed by following the link . The Docker daemon uses containerd to run some of its services, and Kubernetes has a plugin to use containerd directly. Therefore, we hope that our extensions to support encrypted container images will be useful to both.
The implementation of multi-level encryption using LEK is at the lowest architectural level of extensions. One of the implementation requirements was to accommodate volumetric layers of several gigabytes, while keeping the amount of memory occupied by the process performing the cryptographic operation on the layer as small as a few megabytes.
Support for authenticated encryption algorithms in Golang takes a byte array as input and performs the entire stage of encrypting (sealing) or decrypting (opening) it, preventing the transfer and addition of additional arrays to the stream. Since this crypto API required loading the entire layer into memory or inventing some scheme to change the initialization vector (IV) for each block, we decided not to use golang's authenticated encryption with linked data support (AEAD). Instead, we used the miscreant golang crypto library which supports AEAD on streams(blocks) and implements its own scheme for changing IV in each block. In our implementation, we split the layer into 1 MB blocks, which we transfer one by one for encryption. This approach reduces the amount of memory when using an authenticated cipher. On the decryption side, we do the opposite and pay attention to the errors returned by the Open () function to ensure that the encryption blocks were not tampered with.
Above symmetric encryption, asymmetric cryptographic schemes encrypt the LEK and the initialization vector (IV). To add or remove cryptographic schemes, we register each asymmetric cryptographic implementation. When the Asymmetric Cryptographic Code API is called to encrypt the layer, we call the registered cryptographic handlers one by one, passing the recipients' public keys. After all the recipient keys have been used for encryption, we return to the annotation map with the asymmetric cryptoalgorithm identifiers as the mapping keys and the values containing the messages encoded in OpenPGP, JWE, and PKCS # 7. Each message contains packed LEK and IV. Annotation maps are stored in the manifest document as shown in the previous screenshot.
We can also add recipients to an already encrypted image. Image authors add recipients if they are in the list. The private key is used for the recipient list, which is required to unpack LEK and IV levels. We then pack the LEK and IV into a new message using the new recipient key and add this message to the annotation map.
We have used three types of asymmetric encryption schemes for different types of keys. We use OpenPGP keys to encrypt OpenPGP messages. The PKCS # 7 we are using requires x.509 certificates for encryption keys. JWE handles all other key types such as simple RSA keys, elliptic curves, and symmetric keys. We have prototyped an extension for JWE that allows cryptographic operations using keys managed by the KMIP server.
The containerd runtime includes a ctr client tool to interact with it. We extended ctr to enable testing of our changes and provide access to container users. ctr already implements a subcommand that supports image operations, such as interacting with the image registry by fetching and sending them.
We have extended this subcommand by adding functionality to encrypt images and enable encryption of specific layers of specific architectures using a specific set of keys. This approach allows users to encrypt only those layers that contain sensitive data and leave other layers unencrypted. The latter can be deduplicated, but this is hardly possible for encrypted layers.
Likewise, we can decipher the individual layers of individual architectures. We have added a layerinfo subcommand that shows the encryption status of each layer and displays the encryption technologies used for it. For OpenPGP, we can also display the key ids needed for decryption, or convert them to their recipients' email addresses using a keyring.
Additionally, you can export and import container images. We have implemented support for layer encryption on export and decryption on import. Even though we decrypt the layers to create the container's rootfs file system, the encrypted layers and the original metadata files such as its manifests are preserved. This approach allows you to export an encrypted image and perform authorization checks when users want to start a container with an encrypted image.
When a plain (unencrypted) image is retrieved from the registry, it is automatically unpacked and unzipped so that containers can be created from it immediately. To make this easier for encrypted images, we suggest that you transfer the private key to the unpacking team so that they can decrypt the layers before unpacking. If the image is encrypted with multiple keys, multiple keys can be passed to the pull command. This transfer is also supported. After successfully extracting the encrypted image from the registry, anyone with access to containerd can create a container from the image. To confirm that the user has rights to use the container image, we suggest that he provide the private keys used to decrypt the container.We use keys to check the user's authorization, whether they can be used to decrypt the LEK of each encrypted level, and if this is confirmed, we allow the container to start.
A step-by-step guide to encryption using containerd
In this section, we will demonstrate the encryption steps that are applied with containderd using ctr on the command line. We will show you how to encrypt and decrypt a container image.
First of all, you need to clone the git containerd / imgcrypt repository , which is a subproject and can encrypt / decrypt the container image. Then you need to build containerd and run it. To complete these steps, you need to know how the golang development environment is set up:
imgcrypt requires containerd version 1.3 or higher.
Build and install imgcrypt:
# make
# sudo make install
Run containerd with the configuration file seen in the example below. To avoid conflicts in containerd, use the / tmp directory for directories. Also build containerd version 1.3 from source, but don't install it.
# cat config.toml
disable_plugins = ["cri"]
root = "/tmp/var/lib/containerd"
state = "/tmp/run/containerd"
[grpc]
address = "/tmp/run/containerd/containerd.sock"
uid = 0
gid = 0
[stream_processors]
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar.gzip"]
accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
returns = "application/vnd.oci.image.layer.v1.tar+gzip"
path = "/usr/local/bin/ctd-decoder"
[stream_processors."io.containerd.ocicrypt.decoder.v1.tar"]
accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
returns = "application/vnd.oci.image.layer.v1.tar"
path = "/usr/local/bin/ctd-decoder"
# sudo ~/src/github.com/containerd/containerd/bin/containerd -c config.toml
Create an RSA key pair using the openssl command line tool and encrypt the image:
# openssl genrsa --out mykey.pem
Generating RSA private key, 2048 bit long modulus (2 primes)
...............................................+++++
............................+++++
e is 65537 (0x010001)
# openssl rsa -in mykey.pem -pubout -out mypubkey.pem
writing RSA key
# sudo chmod 0666 /tmp/run/containerd/containerd.sock
# CTR="/usr/local/bin/ctr-enc -a /tmp/run/containerd/containerd.sock"
# $CTR images pull --all-platforms docker.io/library/bash:latest
[...]
# $CTR images layerinfo --platform linux/amd64 docker.io/library/bash:latest
# DIGEST PLATFORM SIZE ENCRYPTION RECIPIENTS
0 sha256:9d48c3bd43c520dc2784e868a780e976b207cbf493eaff8c6596eb871cbd9609 linux/amd64 2789669
1 sha256:7dd01fd971d4ec7058c5636a505327b24e5fc8bd7f62816a9d518472bd9b15c0 linux/amd64 3174665
2 sha256:691cfbca522787898c8b37f063dd20e5524e7d103e1a3b298bd2e2b8da54faf5 linux/amd64 340
# $CTR images encrypt --recipient jwe:mypubkey.pem --platform linux/amd64 docker.io/library/bash:latest bash.enc:latest
Encrypting docker.io/library/bash:latest to bash.enc:latest
$ $CTR images layerinfo --platform linux/amd64 bash.enc:latest
# DIGEST PLATFORM SIZE ENCRYPTION RECIPIENTS
0 sha256:360be141b01f69b25427a9085b36ba8ad7d7a335449013fa6b32c1ecb894ab5b linux/amd64 2789669 jwe [jwe]
1 sha256:ac601e66cdd275ee0e10afead03a2722e153a60982122d2d369880ea54fe82f8 linux/amd64 3174665 jwe [jwe]
2 sha256:41e47064fd00424e328915ad2f7f716bd86ea2d0d8315edaf33ecaa6a2464530 linux/amd64 340 jwe [jwe]
Start your local image registry so you can upload the encrypted image to it. To receive encrypted container images, you need the latest registry versions.
# docker pull registry:latest
# docker run -d -p 5000:5000 --restart=always --name registry registry
Upload the encrypted image to your local registry, extract it using ctr-enc, and then run the image:
# $CTR images tag bash.enc:latest localhost:5000/bash.enc:latest
# $CTR images push localhost:5000/bash.enc:latest
# $CTR images rm localhost:5000/bash.enc:latest bash.enc:latest
# $CTR images pull localhost:5000/bash.enc:latest
# sudo $CTR run --rm localhost:5000/bash.enc:latest test echo 'Hello World!'
ctr: you are not authorized to use this image: missing private key needed for decryption
# sudo $CTR run --rm --key mykey.pem localhost:5000/bash.enc:latest test echo 'Hello World!'
Hello World!
Conclusion
Encrypting container images is a good addition to their security, it ensures the confidentiality of data and the integrity of container images at the storage location. The proposed technology is based on the publicly available RSA, elliptic curve and AES encryption technologies. It applies keys to higher-level encryption schemes such as OpenPGP, JWE, and PKCS # 7. If you know how to work with OpenPGP, you can encrypt container images for OpenPGP recipients using their email addresses, while simple RSA keys and elliptic curves are used for encryption like JWE.