Join us

Java containerization for modular PF4j applications

No, I couldn’t use jib.

Here’s a brief story about container optimization that came about due to frustration over long docker build times.

The existing software architecture

We’re dealing with a modular PF4J application whose containers contain (apologies for the alliteration):

an entrypoint JAR file;
exactly one plugin JAR.
The former interacts with Kafka and triggers the latter upon receiving records belonging to a specific topic:

services: one-of-the-plugins: image: ${ENTRYPOINT_JAR_IMAGE} command: [ "--kafka_server", "${KAFKA_INTERNAL_ADDR}", "--plugin_list", "OPAL", # always one :/ "--topic", "OPAL=input.topic.name", ... ]

It looks like the purpose of the former entrypoint JAR was to orchestrate multiple plugins in a non-dockerized environment: this was probably the fastest way of containerizing plugins without breaking things. Therefore, this excludes jib as a possibility.

Let’s speed up the dev process!

I’m a big fan of building applications in Dockerfile stages, as it allows me to reach a higher level of automation during development (e.g., the docker-compose build service configuration, and skaffold for Kubernetes).

THE CONTAINERIZATION BEGINS

Unfortunately, jib is out of play due to the presence of PF4J, so we’ll have to dockerize JARs in order not to break things.
One could use the single-line mvn install command used by the CI:

#

Stage 1a: building all the necessary JARs

#

FROM maven:3.8.1-jdk-11-slim AS build-jars
COPY . /home/app
RUN mvn –file /home/app/pom.xml install –projects server,:my-plugin –also-make
But that would be a disaster due to Docker cache invalidation.
A brute-force solution would be to build all Maven projects individually: that’s what we’re gonna do.

SLOW AND STEADY WINS THE RACE

What does my plugin need, exactly?

$ mvn validate –projects :my-plugin –also-make
[INFO] Scanning for projects…
[INFO] ————————————————————————
[INFO] Reactor Build Order:
[INFO]
[INFO] parent [pom]
[INFO] core [jar]
[INFO] plugin [pom]
[INFO] my-plugin [jar]
Two JARs, two projects, and server, as the CI says… roger!
Let’s build these entities following this order to make the most of the Docker cache:

#

Stage 1a: building all the necessary JARs

#

FROM maven:3.8.1-jdk-11-slim AS build-jars

Building parent

COPY pom.xml /home/app/
RUN mvn –file /home/app/pom.xml install –projects :parent

Building core

COPY core/ /home/app/core
RUN mvn –file /home/app/pom.xml install –projects core

Building server

COPY server/ /home/app/server
RUN mvn –file /home/app/pom.xml install –projects :server

Building the generic plugin

RUN mvn –file /home/app/pom.xml install –projects plugin

Building my plugin

COPY plugins/my-plugin/ /home/app/plugins/my-plugin
RUN mvn –file /home/app/pom.xml install –projects :my-plugin
If only it were that simple!
We now enter some Maven-specific madness: parent needs the pom.xml files of core, server, and all plugins!
To cover the first two:

COPY core/pom.xml /home/app/core/
COPY server/pom.xml /home/app/server/
But what about all the other plugins?
Copying the entire folder containing all plugins would defeat the purpose of this optimization since a change in our plugin’s source code would invalidate this (and all the following) layers!

What to do?
To follow, more madness.

AS PROMISED, MORE MADNESS

We want to copy all plugins’ pom.xml files without having to copy their source code too.
Unluckily, Docker’s COPY directive doesn’t support glob patterns, but we can use another stage to get around this:

#

Stage 0: layer with plugins’ pom.xml files only. Used for caching purposes.

#

FROM alpine:3.14.0 AS list-plugins-pom-files

Copying the entirety of all plugins

COPY plugins /home

Finding and removing non-pom.xml files

RUN find /home ! -name “pom.xml” -mindepth 2 -maxdepth 2 -print | xargs rm -rf
We then add the following to stage 1a:

COPY –from=list-plugins-pom-files /home/ /home/app/plugins/


Evaluation

I’ll build my-plugin twice: once at the beginning, and then after making a change exclusively in its source code, without affecting the other Maven projects it needs (as it normally happens during development).

I’ll revert to a pre-optimization commit, and I’ll repeat this after the optimization was made.

Note that, for the first build of each experiment, I’ll do a mvn clean, a docker system prune, and a docker build with the –no-cache flag. I’ll also skip tests as they represent an equal overhead in both cases.

Results

Pre-optimization build times (skipping tests):

First build (–no-cache): 2min 3s
After a change in the plugin’s source code: ???? 1min 44s ????


Post-optimization build times (skipping tests):

First build (–no-cache): 2min 44s
After a change in the plugin’s source code: ???? 21.5s ????

Conclusions

There seems to be a slight increase in build time when it comes to the first build, 122.9 seconds vs. 163.9 seconds: this might be due to the fact that the post-optimization Dockerfile uses way more layers with respect to the pre-optimization one.

This is justified, however, by the great achieved time saving:

Pre-optimizationPost-optimization
First build122.9s163.9s
Second build103.6s21.5s
Time saved19.3s ????142.4s ????

We only saved (122.9 – 103.6)s = 19.3s without optimization, and then saved (163.9 – 21.5)s = 142.4s after.
That’s 2 minutes and 22 seconds saved every time one wants to containerize the plugin after making a change exclusively in its source code.

Not bad!

A VERY LAST, VERY STUPID MISTAKE

The CI takes almost 10 minutes to package my application, do a docker build, push it to the registry, etc., in different steps: ???? what the heck?
Ok, it runs tests, but still…

TL;DR

It wasn’t using BuildKit.

Explanation

To maintain backward compatibility, the CI docker build command should expect already-existing JARs, packaged in a previous step.
That’s easily achievable using a different last stage… so what’s the matter?

It turns out that the classic docker build –target package-ci was executing all stages, including the unused ones, no matter what.

Don’t forget to use BuildKit DOCKER_BUILDKIT=1!
It now only takes 5 minutes (tests included)… phew.

Sebastian

Sebastian

Follow me

You may also like