Skip to content

Commit

Permalink
Removed stage from CDC data dockerfile to reduce image size (datacomm…
Browse files Browse the repository at this point in the history
…onsorg#4818)

* Previously, pip dependencies were installed in Stage 2 and copied to
Stage 3, causing them to be stored in two layers and doubling their size
in the final image.
* Combined Stages 2 and 3 into a single layer to eliminate the
duplication of pip dependencies and to reduce the final image size.
  • Loading branch information
dwnoble authored Dec 26, 2024
1 parent 7eea9bf commit 76845fe
Showing 1 changed file with 13 additions and 16 deletions.
29 changes: 13 additions & 16 deletions build/cdc_data/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,17 @@ RUN mkdir -p /tmp/datcom-nl-models \
&& gsutil -m cp -R gs://datcom-nl-models/ft_final_v20230717230459.all-MiniLM-L6-v2/ /tmp/datcom-nl-models/


# #### Stage 2: Install python dependencies. ####
FROM python:3.11.4-slim AS dependencies-installer
# #### Stage 2: Python runtime. ####
FROM python:3.11.4-slim AS runner

ARG ENV
ENV ENV=${ENV}

WORKDIR /workspace

# Copy models
COPY --from=model-downloader /tmp/datcom-nl-models /tmp/datcom-nl-models

# Copy simple importer requirements.
COPY import/simple/requirements.txt ./import/simple/requirements.txt

Expand All @@ -40,26 +46,17 @@ ARG PIP_NO_CACHE_DIR=1
# Create a virtual env, add it to path, and install all requirements.
RUN python -m venv /workspace/venv
ENV PATH="/workspace/venv/bin:$PATH"

# TODO: Install requirements for embeddings importer and data importer in separate virtual envs.
# Install embeddings importer requirements.
RUN pip3 install -r ./import/simple/requirements.txt

# Install data requirements.
# Remove lancedb - it is not used by custom dc.
RUN sed -i'' '/lancedb/d' /workspace/nl_requirements.txt \
&& pip3 install torch==2.2.2 --extra-index-url https://download.pytorch.org/whl/cpu \
&& pip3 install -r ./tools/nl/embeddings/requirements.txt


# #### Stage 3: Runtime env. ####
FROM python:3.11.4-slim AS runner

ARG ENV
ENV ENV=${ENV}

WORKDIR /workspace

# Copy models and dependencies.
COPY --from=dependencies-installer /workspace/ .
COPY --from=dependencies-installer /workspace/venv /workspace/venv
COPY --from=model-downloader /tmp/datcom-nl-models /tmp/datcom-nl-models

# Copy the embeddings builder module.
COPY tools/nl/embeddings/. ./tools/nl/embeddings/
# Copy the shared module.
Expand Down

0 comments on commit 76845fe

Please sign in to comment.