Optimize the Docker file - fixes #1 #153
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
This is an attempt to optimize the Docker file.
I noticed that the base image webrecorder/browsertrix-crawler weighs 2.44 Gb, which is already a lot, whereas the resulting auto-archiver image nearly doubles the size (4.29GB on my end). There is clearly some bloat.
I think we could improve this with multi-stage builds and remove some unwanted layers.
The proposed version should be functionally equivalent. The image size is 3.33GB on my end.
There is one stage that builds the virtual environment and copies it to the next layer. So, we can ditch pipenv and run Python directly. See the adapted entry point.
Another benefit of the multi-stage build is that if changes are made to the Python code, there is no need to rebuild the upper layers unless the requirements list has changed. Thanks to caching, generating a new Docker image will be quicker.
However, I need to stress that it’s not been tested thoroughly at all, and should be treated as a PoC to be validated before going to production.