Skip to content

Commit

Permalink
How to Fine-Tune LLMs in 2024 with Hugging Face”, but with Dagster, M…
Browse files Browse the repository at this point in the history
…odal and Llama3 code
  • Loading branch information
truskovskiyk committed Apr 22, 2024
1 parent 9d61b4e commit 15158fa
Show file tree
Hide file tree
Showing 14 changed files with 436 additions and 1 deletion.
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
duckdb-text2sql-codellama
45 changes: 45 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Publish Docker image

on:
push:
branches:
- main
- migrate-to-github-registry-for-docker-images

jobs:
container:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:

- name: Checkout repository
uses: actions/checkout@v4

- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ghcr.io/kyryl-opens-ml/fine-tune-llm-in-2024

# See explanation: https://github.com/orgs/community/discussions/25678
- name: Clean disk
run: |
rm -rf /opt/hostedtoolcache
- name: Build and push Docker image
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}

16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM huggingface/transformers-pytorch-gpu:4.35.2

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
RUN MAX_JOBS=4 pip3 install flash-attn==2.5.7 --no-build-isolation

ENV DAGSTER_HOME /app/dagster_data
RUN mkdir -p $DAGSTER_HOME

ENV PYTHONPATH /app
RUN ln -s /usr/bin/python3 /usr/bin/python

COPY text2sql_training text2sql_training
CMD dagster dev -f text2sql_training/llm_stf.py -p 3000 -h 0.0.0.0
11 changes: 11 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
style_check:
ruff check text2sql_training/

style_fix:
ruff format text2sql_training/

docker_build:
docker build -t fine-tune-llm-in-2024:latest -f Dockerfile .

docker_run:
docker run -it --gpus all --ipc=host --net=host -v $PWD:/app fine-tune-llm-in-2024:latest /bin/bash
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,30 @@
# fine-tune-llms-in-2024-with-trl
# fine-tune-llms-in-2024-with-trl

Full code for [“How to Fine-Tune LLMs in 2024 with Hugging Face”, but with Dagster, Modal and Llama3!](https://kyrylai.com/2024/04/21/how-to-fine-tune-llms-in-2024-with-hugging-face-but-with-dagster-and-modal/) blog post.


## TLRD: Traing LLama3 with Dagster and ModalLab


![alt text](./docs/final.png)



## Docker setup

```
make docker_build
make docker_run
```

## Access to have:

Make sure you have .env file with next variables:

```
HF_TOKEN=hf_
HF_TOKEN_WRITE=hf_
MODAL_TOKEN_ID=ak-
MODAL_TOKEN_SECRET=as-
```

Binary file added docs/data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/final.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/training-data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[tool.ruff]
line-length = 120

[tool.ruff.lint]
# Add the `line-too-long` rule to the enforced rule set. By default, Ruff omits rules that
# overlap with the use of a formatter, like Black, but we can override this behavior by
# explicitly adding the rule.
extend-select = ["E501"]
14 changes: 14 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
torch==2.1.0
transformers==4.38.2
datasets==2.16.1
accelerate==0.26.1
evaluate==0.4.1
bitsandbytes==0.42.0
trl==0.7.11
peft==0.8.2
dagster==1.7.1
dagster-webserver==1.7.1
ipython==8.12.3
modal==0.62.97
packaging==23.2
ninja==1.11.1.1
Empty file added text2sql_training/__init__.py
Empty file.
Loading

0 comments on commit 15158fa

Please sign in to comment.