Skip to main content
Edit this page on GitHub

Docker builds, images and tags

The Apache Superset community extensively uses Docker for development, release, and productionizing Superset. This page details our Docker builds and tag naming schemes to help users navigate our offerings.

Images are built and pushed to the Superset Docker Hub repository using GitHub Actions. Different sets of images are built and/or published at different times:

  • Published releases (release): published using tags like 3.0.0 and the latest tag.
  • Pull request iterations (pull_request): for each pull request, while we actively build the docker to validate the build, we do not publish those images for security reasons, we simply docker build --load
  • Merges to the main branch (push): resulting in new SHAs, with tags prefixed with master for the latest master version.

Build presets

We have a set of build "presets" that each represent a combination of parameters for the build, mostly pointing to either different target layer for the build, and/or base image.

Here are the build presets that are exposed through the build_docker.py script:

  • lean: The default Docker image, including both frontend and backend. Tags without a build_preset are lean builds (ie: latest, 4.0.0, 3.0.0, ...). lean builds do not contain database drivers, meaning you need to install your own. That applies to analytics databases AND the metadata database. You'll likely want to layer either mysqlclient or psycopg2-binary depending on the metadata database you choose for your installation, plus the required drivers to connect to your analytics database(s).
  • dev: For development, with a headless browser, dev-related utilities and root access. This includes some commonly used database drivers like mysqlclient, psycopg2-binary and some other used for development/CI
  • py311, e.g., Py311: Similar to lean but with a different Python version (in this example, 3.11).
  • ci: For certain CI workloads.
  • websocket: For Superset clusters supporting advanced features.
  • dockerize: Used by Helm.

Key tags examples

  • latest: The latest official release build
  • latest-dev: the -dev image of the latest official release build, with a headless browser and root access.
  • master: The latest build from the master branch, implicitly the lean build preset
  • master-dev: Similar to master but includes a headless browser and root access.
  • pr-5252: The latest commit in PR 5252.
  • 30948dc401b40982cb7c0dbf6ebbe443b2748c1b-dev: A build for this specific SHA, which could be from a master merge, or release.
  • websocket-latest: The WebSocket image for use in a Superset cluster.

For insights or modifications to the build matrix and tagging conventions, check the build_docker.py script and the docker.yml GitHub action.

Key ARGs in Dockerfile

  • BUILD_TRANSLATIONS: whether to build the translations into the image. For the frontend build this tells webpack to strip out all locales other than en from the moment-timezone library. For the backendthis skips compiling the *.po translation files
  • DEV_MODE: whether to skip the frontend build, this is used by our docker-compose dev setup where we mount the local volume and build using webpack in --watch mode, meaning as you alter the code in the local file system, webpack, from within a docker image used for this purpose, will constantly rebuild the frontend as you go. This ARG enables the initial docker-compose build to take much less time and resources
  • INCLUDE_CHROMIUM: whether to include chromium in the backend build so that it can be used as a headless browser for workloads related to "Alerts & Reports" and thumbnail generation
  • INCLUDE_FIREFOX: same as above, but for firefox
  • PY_VER: specifying the base image for the python backend, we don't recommend altering this setting if you're not working on forwards or backwards compatibility

Caching

To accelerate builds, we follow Docker best practices and use apache/superset-cache.

About database drivers

Our docker images come with little to zero database driver support since each environment requires different drivers, and maintaining a build with wide database support would be both challenging (dozens of databases, python drivers, and os dependencies) and inefficient (longer build times, larger images, lower layer cache hit rate, ...).

For production use cases, we recommend that you derive our lean image(s) and add database support for the database you need.

On supporting different platforms (namely arm64 AND amd64)

Currently all automated builds are multi-platform, supporting both linux/arm64 and linux/amd64. This enables higher level constructs like helm and docker compose to point to these images and effectively be multi-platform as well.

Pull requests and master builds are one-image-per-platform so that they can be parallelized and the build matrix for those is more sparse as we don't need to build every build preset on every platform, and generally can be more selective here. For those builds, we suffix tags with -arm where it applies.

Working with Apple silicon

Apple's current generation of computers uses ARM-based CPUs, and Docker running on MACs seem to require linux/arm64/v8 (at least one user's M2 was configured in that way). Setting the environment variable DOCKER_DEFAULT_PLATFORM to linux/amd64 seems to function in term of leveraging, and building upon the Superset builds provided here.

export DOCKER_DEFAULT_PLATFORM=linux/amd64

Presumably, linux/arm64/v8 would be more optimized for this generation of chips, but less compatible across the ARM ecosystem.