llm-stack

This tutorial series will show you how to build an end-to-end data flywheel for Large Language Models (LLMs).

We will be summarising arXiv abstracts.

What you will learn

How to:

wandb for experiment tracking. This is where we will record all our artifacts (datasets, models, code) and metrics.
modal for running jobs on the cloud.
huggingface for all-things-LLM.
argilla for labelling our data.

In this tutorial, we will use GPT-3.5 to generate a training set for summarisation task.

modal run src/llm_stack/scripts/build_dataset_summaries.py

Found any mistakes or want to contribute? Feel free to open a PR or an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github		.github
.vscode		.vscode
src/llm_stack		src/llm_stack
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.yamllint		.yamllint
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt