Optimize the Docker file - fixes #1 #153

darknetehf · 2024-11-25T02:21:29Z

Hello,

This is an attempt to optimize the Docker file.
I noticed that the base image webrecorder/browsertrix-crawler weighs 2.44 Gb, which is already a lot, whereas the resulting auto-archiver image nearly doubles the size (4.29GB on my end). There is clearly some bloat.

I think we could improve this with multi-stage builds and remove some unwanted layers.

The proposed version should be functionally equivalent. The image size is 3.33GB on my end.
There is one stage that builds the virtual environment and copies it to the next layer. So, we can ditch pipenv and run Python directly. See the adapted entry point.

Another benefit of the multi-stage build is that if changes are made to the Python code, there is no need to rebuild the upper layers unless the requirements list has changed. Thanks to caching, generating a new Docker image will be quicker.

However, I need to stress that it’s not been tested thoroughly at all, and should be treated as a PoC to be validated before going to production.

Optimize the Docker file - fixes #1

16d47c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the Docker file - fixes #1 #153

Optimize the Docker file - fixes #1 #153

darknetehf commented Nov 25, 2024

Optimize the Docker file - fixes #1 #153

Are you sure you want to change the base?

Optimize the Docker file - fixes #1 #153

Conversation

darknetehf commented Nov 25, 2024