Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install text-generation-server from poetry.lock export #2786

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ COPY server/Makefile-flashinfer Makefile
RUN make install-flashinfer

# Text Generation Inference base image
FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS base
FROM nvidia/cuda:12.1.0-base-ubuntu22.04 AS conda-install

# Conda env
ENV PATH=/opt/conda/bin:$PATH \
Expand All @@ -198,6 +198,23 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-ins
# Copy conda with PyTorch installed
COPY --from=pytorch-install /opt/conda /opt/conda

# Export text-generation-server Python requirements from poetry lock file
FROM conda-install AS poetry-requirements

COPY server/poetry.lock poetry.lock
COPY server/pyproject.toml pyproject.toml

RUN pip install poetry && \
poetry self add poetry-plugin-export && \
poetry export -f requirements.txt \
--extras "attention bnb accelerate compressed-tensors marlin moe quantize peft outlines" \
--output requirements_poetry.txt

FROM conda-install AS base

# Copy the requirements file generated from the poetry lock
COPY --from=poetry-requirements /usr/src/requirements_poetry.txt server/requirements_poetry.txt

# Copy build artifacts from flash attention builder
COPY --from=flash-att-builder /usr/src/flash-attention/build/lib.linux-x86_64-cpython-311 /opt/conda/lib/python3.11/site-packages
COPY --from=flash-att-builder /usr/src/flash-attention/csrc/layer_norm/build/lib.linux-x86_64-cpython-311 /opt/conda/lib/python3.11/site-packages
Expand Down Expand Up @@ -233,7 +250,8 @@ COPY server/Makefile server/Makefile
RUN cd server && \
make gen-server && \
pip install -r requirements_cuda.txt && \
pip install ".[attention, bnb, accelerate, compressed-tensors, marlin, moe, quantize, peft, outlines]" --no-cache-dir && \
pip install -r requirements_poetry.txt --no-cache-dir && \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the issue again ? Why are we not using poetry directly ?

Also maybe we should take the opportunity to switch to uv ? https://docs.astral.sh/uv/pip/compile/#locking-requirements I have no idea, but it seems like the poetry lock isn't locking tightly enough, maybe ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this PR, and happy new year BTW! So the issue is that since we're not using poetry as the pip installer the locked dependencies are not used leading to potential issues when partners as e.g. Google build the containers ahead in time, as those dependencies are installed from the pyproject.toml rather than the lock.

I know this is not an issue impacting directly TGI per se, but for robustness and reproducibility we should try to lock those and install those from a lock. The installation from the lock can either be via poetry (the current approach), or switching to uv instead and installing from the uv.lock.

Maybe we should start an internal conversation and decide what's best, happy to help whatever the solution / proposal is!

pip install . --no-cache-dir && \
pip install nvidia-nccl-cu12==2.22.3

ENV LD_PRELOAD=/opt/conda/lib/python3.11/site-packages/nvidia/nccl/lib/libnccl.so.2
Expand All @@ -258,7 +276,6 @@ COPY --from=builder /usr/src/target/release-opt/text-generation-router /usr/loca
# Install launcher
COPY --from=builder /usr/src/target/release-opt/text-generation-launcher /usr/local/bin/text-generation-launcher


# AWS Sagemaker compatible image
FROM base AS sagemaker

Expand Down
Loading