Today we are sharing with you a translation of an article by an IBM DevOps engineer on automating the build of quickly assembled and easily debugged Docker images for Python projects using a Makefile. This project not only makes debugging in Docker easier, but also takes care of the quality of your project code. Details, as always, under the cut.
Every project - whether you're working on a web application, Data Science or Artificial Intelligence can benefit from well-tuned CI / CDs, Docker images that are simultaneously debugged during development and optimized for a production environment, or quality assurance tools code such as CodeClimate or SonarCloud . All of these things are covered in this article and shown how they are added to a Python project.
Debuggable containers for development
Some people don't like Docker because containers can be difficult to debug, or because images take a long time to build. So let's start by building images that are ideal for development - fast to build and easy to debug. To make the image easy to debug, you need a base image that includes all the tools you might ever need to debug. These are bash, vim, netcat, wget, cat, find, grep and others. Python
image : 3.8.1-busterseems like a perfect candidate for this task. It includes many tools out of the box, it is easy to install missing tools. The image is large, but it doesn't matter here: it will only be used in development. As you've probably noticed, the imagery is very specific. Locking Python and Debian versions is intentional: you want to minimize the risk of breakage caused by new, possibly incompatible versions of Python or Debian . An Alpine- based image is possible as an alternative , but it can cause some problems: inside it uses musl lib instead of glibcwhich Python relies on. Keep this in mind if you decide to choose an Alpine. In terms of speed, we'll use multi-stage builds to cache as many layers as possible. So dependencies and tools like gcc , as well as all the dependencies needed by the application, are not loaded from requirements.txt every time. To further speed things up, a custom base image is created from the previously mentioned python: 3.8.1-buster , which has everything we need, since we can't cache the steps required to download and install these tools into the final image
runner
. But stop talking, let's take a look at the Dockerfile:
# dev.Dockerfile
FROM python:3.8.1-buster AS builder
RUN apt-get update && apt-get install -y --no-install-recommends --yes python3-venv gcc libpython3-dev && \
python3 -m venv /venv && \
/venv/bin/pip install --upgrade pip
FROM builder AS builder-venv
COPY requirements.txt /requirements.txt
RUN /venv/bin/pip install -r /requirements.txt
FROM builder-venv AS tester
COPY . /app
WORKDIR /app
RUN /venv/bin/pytest
FROM martinheinz/python-3.8.1-buster-tools:latest AS runner
COPY --from=tester /venv /venv
COPY --from=tester /app /app
WORKDIR /app
ENTRYPOINT ["/venv/bin/python3", "-m", "blueprint"]
USER 1001
LABEL name={NAME}
LABEL version={VERSION}
Above you can see that the code
runner
will go through 3 intermediate images before creating the final image. The first is builder . It downloads all the libraries needed to build the application, including gcc and the Python virtual environment. After installation, a real virtual environment is created and used by the following images. Next comes builder-vv , which copies the list of dependencies (requirements.txt) into the image and then installs them. This intermediate image is necessary for caching: you only want to install libraries if requirements.txt changes, otherwise we just use the cache. Let's test the application before creating the final image.
Before we create our final image, let's first run the tests of our application. Copy the source code and run the tests. When the tests pass, go to the runner image . This uses a custom image with some additional tools not found in the regular Debian image: vim and netcat. This image is on Docker Hub , and you can also look at a very simple Dockerfile in base.Dockerfile . So what we do in this final image: first we copy the virtual environment where all the dependencies we installed from the tester image are stored, then copy the tested application. Now that all the sources are in the image, move to the directory where the application is located and install ENTRYPOINT so that when the image is launched, the application is launched. For security reasons, USER is set to 1001 : best practice recommends never running containers as root. The final 2 lines set the image labels. They will be replaced when building through the target
make
, which we will see a little later.
Optimized containers for the production environment
When it comes to production-grade looks, you want to make sure they're small, safe and fast. My personal favorite in this sense is the Python image from the Distroless project . But what is "Distroless"? Let's put it this way: in an ideal world, everyone would build their own image using FROM scratch as the base (that is, an empty image). But that's not what most of us want, as it requires statically linking binaries, etc. That's where Distroless comes into play : it's a FROM scratch for everyone. And now I will really tell you what "Distroless" is. This is a set created by Googleimages containing the absolute minimum required by the application. This means that there are no wrappers, package managers, or other tools that bloat the image and generate signal noise for security scanners (such as CVE ), making it difficult to establish compliance. Now that we know what we are dealing with, let's take a look at the production Dockerfile. In fact, you don't need to change the code much, you only need to change 2 lines:
# prod.Dockerfile
# 1. Line - Change builder image
FROM debian:buster-slim AS builder
# ...
# 17. Line - Switch to Distroless image
FROM gcr.io/distroless/python3-debian10 AS runner
# ... Rest of the Dockefile
All we needed to change were our base images to build and run the app! But the difference is pretty big - the development image weighed 1.03 GB, and this one was only 103 MB, which is a big difference! And I can already hear you: "Alpina can weigh even less!" ... Yes it is, but size doesn't matter that much. You will only notice the size of the image when loading / unloading, it does not happen very often. When the image works, size doesn't matter. What's more important than size is security, and in this respect Distroless is definitely superior to Alpine: Alpine has many additional packages to increase the attack surface. The last thing worth mentioning when talking about Distroless is image debugging. Considering thatDistroless does not contain any shell (not even "sh"), debugging and researching becomes quite difficult. For this, there are "debug" versions of all Distroless images . That way, when trouble happens, it is possible to build your working image using a tag
debug
and deploy it along with your usual image, perform the necessary in the debug image and do, for example, a stream dump. It is possible to use the debug version of the python3 image like this:
docker run --entrypoint=sh -ti gcr.io/distroless/python3-debian10:debug
One team for everything
With all the Dockerfiles ready, you can automate this whole nightmare with a Makefile! The first thing we want to do is build the application using Docker. Therefore, to build a development image, we will write
make build-dev
that executes the following code:
# The binary to build (just the basename).
MODULE := blueprint
# Where to push the docker image.
REGISTRY ?= docker.pkg.github.com/martinheinz/python-project-blueprint
IMAGE := $(REGISTRY)/$(MODULE)
# This version-strategy uses git tags to set the version string
TAG := $(shell git describe --tags --always --dirty)
build-dev:
@echo "\n${BLUE}Building Development image with labels:\n"
@echo "name: $(MODULE)"
@echo "version: $(TAG)${NC}\n"
@sed \
-e 's|{NAME}|$(MODULE)|g' \
-e 's|{VERSION}|$(TAG)|g' \
dev.Dockerfile | docker build -t $(IMAGE):$(TAG) -f- .
This target builds an image by first replacing the labels at the bottom with the
dev.Dockerfile
name of the image and the tag that is created by launch git describe
, then it is launched docker build
. Next, build for the production environment using make build-prod VERSION=1.0.0
:
build-prod:
@echo "\n${BLUE}Building Production image with labels:\n"
@echo "name: $(MODULE)"
@echo "version: $(VERSION)${NC}\n"
@sed \
-e 's|{NAME}|$(MODULE)|g' \
-e 's|{VERSION}|$(VERSION)|g' \
prod.Dockerfile | docker build -t $(IMAGE):$(VERSION) -f- .
This target is very similar to the previous one, but instead of using the git tag as the version, the version passed as an argument is used, in the example above it is 1.0.0. When everything is running in Docker , at some point you also need to debug everything in Docker . There is a goal for this:
# Example: make shell CMD="-c 'date > datefile'"
shell: build-dev
@echo "\n${BLUE}Launching a shell in the containerized build environment...${NC}\n"
@docker run \
-ti \
--rm \
--entrypoint /bin/bash \
-u $$(id -u):$$(id -g) \
$(IMAGE):$(TAG) \
$(CMD)
In the code above, you can see that the entry point is overridden by bash, and the container command is overridden by an argument in the CMD. Thus, we can either just go into the container and rummage around, or execute some kind of command, like in the above example. Once we have finished programming and pushing the image to the Docker registry, we can use
make push VERSION=0.0.2
. Let's see what this goal does:
REGISTRY ?= docker.pkg.github.com/martinheinz/python-project-blueprint
push: build-prod
@echo "\n${BLUE}Pushing image to GitHub Docker Registry...${NC}\n"
@docker push $(IMAGE):$(VERSION)
It first launches the target discussed earlier
build-prod
, and then simply docker push
. This assumes you are logged into the Docker registry, so this target needs to be executed before running docker login
. The final goal is to clean up Docker artifacts. This uses the name tag, which has been replaced inside the Docker image build files, to filter and find artifacts that need to be removed:
docker-clean:
@docker system prune -f --filter "label=name=$(MODULE)"
All Makefile code is in the repository .
CI / CD with GitHub Actions
The project uses make, Github Actions and the Github package registry to build pipelines (tasks) and store our images to configure CI / CD. But what is it?
- GitHub Actions are tasks / pipelines that help automate development workflows. It is possible to use them to create separate tasks and then combine them into custom workflows that are executed, for example, every time you submit data to the repository or when creating a release.
- The Github Package Registry is a package hosting service fully integrated with GitHub. It allows you to store different types of packages, such as Ruby gems or npm packages . The project uses it to store Docker images. Learn more about Github package registry can be here .
To use GitHub Actions , workflows are created in the project based on the selected triggers (example of a trigger is submitting to the repository). These workflows are YAML files in the directory
.github/workflows
:
.github
└── workflows
├── build-test.yml
└── push.yml
The build-test.yml file contains 2 tasks that are run each time the code is submitted to the repository, they are shown below:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Run Makefile build for Development
run: make build-dev
The first task, called build, verifies that the application can be built by running the target
make build-dev
. However, before starting, it checks the repository by executing checkout
it published to GitHub.
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: actions/setup-python@v1
with:
python-version: '3.8'
- name: Install Dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Run Makefile test
run: make test
- name: Install Linters
run: |
pip install pylint
pip install flake8
pip install bandit
- name: Run Linters
run: make lint
The second task is a little more difficult. It runs tests next to the application, as well as 3 code quality control linters (code quality controllers). As in the previous task, an action is used to get the source code
checkout@v1
. After that, another published action called setup-python@v1
, which sets up the python environment, is launched (more on that here ). Now that we have a Python environment, we need application dependencies from requirements.txt
which are installed using pip. At this point make test
, let's start running the target , it runs the Pytest test suite . If the kit tests pass, then proceed to installing the previously mentioned linters - pylint , flake8 and bandit . Finally, we launch the targetmake lint
which in turn launches each of these linters. It's all about the build / test job, but what about submitting the code? Let's talk about her:
on:
push:
tags:
- '*'
jobs:
push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Set env
run: echo ::set-env name=RELEASE_VERSION::$(echo ${GITHUB_REF:10})
- name: Log into Registry
run: echo "${{ secrets.REGISTRY_TOKEN }}" | docker login docker.pkg.github.com -u ${{ github.actor }} --password-stdin
- name: Push to GitHub Package Registry
run: make push VERSION=${{ env.RELEASE_VERSION }}
The first 4 lines define when the job starts. We indicate that this job should only be triggered when tags are moved to the repository (* indicates a name pattern, here they are all tags ). This is done so that we do not push the Docker image into the GitHub package registry every time we push data to the repository, but only when a tag indicating the new version of our application is uploaded. Now for the body of this task - it starts by checking the source code and setting the value of the RELEASE_VERSION environment variable equal to the git uploaded tag. This is done using the built-in GitHub Actions function :: setenv (more details here). Then the task enters the Docker registry with the secret REGISTRY_TOKEN stored in the repository and the login of the user who initiated the workflow (github.actor). Finally, the last line runs the push target, which builds the production image and pushes it into the registry with the previously posted git tag as the image tag. Check out all the code in my repository files .
Code quality check with CodeClimate
Last but not least, let's add code quality checking using CodeClimate and SonarCloud . They will work together with the test task shown above. Add a few lines of code:
# test, lint...
- name: Send report to CodeClimate
run: |
export GIT_BRANCH="${GITHUB_REF/refs\/heads\//}"
curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
chmod +x ./cc-test-reporter
./cc-test-reporter format-coverage -t coverage.py coverage.xml
./cc-test-reporter upload-coverage -r "${{ secrets.CC_TEST_REPORTER_ID }}"
- name: SonarCloud scanner
uses: sonarsource/sonarcloud-github-action@master
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
Starting with CodeClimate : exporting a variable
GIT_BRANCH
retrieved using an environment variable GITHUB_REF
. Then we download the CodeClimate test report tool and make it executable. Then we'll use it to format the test suite coverage report. In the last line, we send it to CodeClimate with the ID of the tool for the test report, which is stored in the secrets of the repository. For SonarCloud , you need to create a sonar-project.properties
. The values for this file can be found on the SonarCloud dashboard in the lower right corner, and this file looks like this:
sonar.organization=martinheinz-github
sonar.projectKey=MartinHeinz_python-project-blueprint
sonar.sources=blueprint
Moreover, it is possible to simply use the one doing the work for us
sonarcloud-github-action
. All we have to do is provide two tokens: for GitHub, the one in the default repository, and for SonarCloud , the one we got from the SonarCloud website . Note: The steps for getting and installing all mentioned tokens and secrets are described in the README of the repository .
Conclusion
That's all! With tools, configurations, and code, you're ready to customize and automate every aspect of your next Python project! If you need more information about the topics shown or discussed in this article, check out the documentation and code in my repository , and if you have any suggestions or issues, please submit a request to the repository, or just star this little project if you need it like.
And with the HABR promo code , you can get an additional 10% to the discount indicated on the banner.
- Teaching the Data Science profession from scratch
- Online bootcamp for Data Science
- Training the Data Analyst profession from scratch
- Data Analytics Online Bootcamp
- Python for Web Development Course
More courses