Skip to content

Commit

Permalink
cleaned up notebook (#28)
Browse files Browse the repository at this point in the history
  • Loading branch information
GokuMohandas authored Sep 1, 2023
1 parent 500a1f6 commit 428e239
Show file tree
Hide file tree
Showing 127 changed files with 110,292 additions and 40,214 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ repos:
rev: v1.4.0
hooks:
- id: detect-secrets
exclude: "notebooks"
exclude: "notebooks|experiments"
- repo: local
hooks:
- id: clean
Expand Down
147 changes: 37 additions & 110 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,27 @@
# LLM Applications

An end-to-end guide for scaling and serving LLM application in production.

This repo currently contains one such application: a retrieval-augmented generation (RAG)
app for answering questions about supplied information. By default, the app uses
the [Ray documentation](https://docs.ray.io/en/master/) as the source of information.
This app first [indexes](./app/index.py) the documentation in a vector database
and then uses an LLM to generate responses for questions that got augmented with
relevant info retrieved from the index.
An end-to-end guide for scaling and serving LLM application in production. This repo currently contains one such application: a retrieval-augmented generation (RAG) app for answering questions about supplied information.

## Setup

### API keys
We'll be using [OpenAI](https://platform.openai.com/docs/models/) to access ChatGPT models like `gpt-3.5-turbo`, `gpt-4`, etc. and [Anyscale Endpoints](https://endpoints.anyscale.com/) to access OSS LLMs like `Llama-2-70b`. Be sure to create your accounts for both and have your credentials ready.

### Compute
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces)
using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node on an AWS cloud.
- Start a new [Anyscale workspace on staging](https://console.anyscale-staging.com/o/anyscale-internal/workspaces) using an [`g3.8xlarge`](https://instances.vantage.sh/aws/ec2/g3.8xlarge) head node (you can also add GPU worker nodes to run the workloads faster).
- Use the [`default_cluster_env_2.6.2_py39`](https://docs.anyscale.com/reference/base-images/ray-262/py39#ray-2-6-2-py39) cluster environment.
- Use the `us-east-1` if you'd like to use the artifacts in our shared storage (source docs, vector DB dumps, etc.).

### Repository
```bash
git clone https://github.com/ray-project/llm-applications.git . # git checkout -b goku origin/goku
git config --global user.name <GITHUB-USERNAME>
git config --global user.email <EMAIL-ADDRESS>
```

First, clone this repository.

### Data
Our data is already ready at `/efs/shared_storage/goku/docs.ray.io/en/master/` (on Staging, `us-east-1`) but if you wanted to load it yourself, run this bash command (change `/desired/output/directory`, but make sure it's on the shared storage,
so that it's accessible to the workers)
```bash
git clone https://github.com/ray-project/llm-applications.git .
```
Expand All @@ -30,116 +32,41 @@ Then set up the environment correctly by specifying the values in your `.env` fi
and installing the dependencies:

```bash
cp ./envs/.env_template .envs
source .envs
pip install --user -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$PWD
pre-commit install
pre-commit autoupdate
```

### Data

Our data is already ready at `/efs/shared_storage/pcmoritz/docs.ray.io/en/master/`
(on Staging) but if you wanted to load it yourself, run this bash command:

### Variables
```bash
bash scrape-docs.sh
touch .env
# Add environment variables to .env
OPENAI_API_BASE="https://api.openai.com/v1"
OPENAI_API_KEY="" # https://platform.openai.com/account/api-keys
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_KEY="" # https://app.endpoints.anyscale.com/credentials
DB_CONNECTION_STRING="dbname=postgres user=postgres host=localhost password=postgres"
source .env
```

### Vector DB

<details>
<summary>Local installation with brew on MacOS</summary>
## Steps

1. Open [rag.ipynb](notebooks/rag.ipynb) to interactively go through all the concepts and run experiments.
2. Use the best configuration (in `serve.py`) from the notebook experiments to serve the LLM.
```bash
brew install postgresql
brew install pgvector
psql -c "CREATE USER postgres WITH SUPERUSER;"
# pragma: allowlist nextline secret
psql -c "ALTER USER postgres with password 'postgres';"
psql -c "CREATE EXTENSION vector;"
psql -f migrations/initial.sql
python app/index.py create-index
python app/main.py
```
</details>

```bash
bash setup-pgvector.sh
sudo -u postgres psql -f migrations/initial.sql
python app/index.py create-index
```

### Query
Just a sample and uses the current index that's been created.
3. Query your service.
```python
import json
from app.query import QueryAgent
query = "What is the default batch size for map_batches?"
system_content = "Your job is to answer a question using the additional context provided."
agent = QueryAgent(
embedding_model="thenlper/gte-base",
llm="meta-llama/Llama-2-7b-chat-hf",
max_context_length=4096,
system_content=system_content,
)
result = agent.get_response(query=query)
print(json.dumps(result, indent=2))
import requests
data = {"query": "What is the default batch size for map_batches?"}
response = requests.post("http://127.0.0.1:8000/query", json=data)
print(response.text)
```

### Experiments

#### Generate responses

```bash
python app/main.py generate-responses \
--system-content "Answer the {query} using the additional {context} provided."
```

#### Evaluate responses

```bash
python app/main.py evaluate-responses \
--system-content """
Your job is to rate the quality of our generated answer {generated_answer}
given a query {query} and a reference answer {reference_answer}.
Your score has to be between 1 and 5.
You must return your response in a line with only the score.
Do not return answers in any other format.
On a separate line provide your reasoning for the score as well.
"""
```

### Dashboard
```bash
export APP_PORT=8501
echo https://$APP_PORT-port-$ANYSCALE_SESSION_DOMAIN
streamlit run dashboard/Home.py
3. Shutdown the service
```python
from ray import serve
serve.shutdown()
```

### TODO
- [x] notebook cleanup
- [x] evaluator (ex. GPT4) response script
- [x] DB dump & load
- [ ] experiments (in order and fixing choices along the way)
- Evaluator
- [ ] GPT-4 best experiment
- [ ] Llama-70b consistency with GPT4
- [ ] OSS vs. Closed (gpt-3.5 vs. llama)
- [ ] w/ and w/out context (value of RAG)
- [ ] # of chunks to use in context
- Does using more resources help/harm?
- 1, 5, 10 will all fit in the smallest context length of 4K)
- [ ] Chunking size/overlap
- related to # of chunks + context length, but we'll treat as independent variable
- [ ] Embedding (top 3 in leaderboard)
- global leaderboard may not be your leaderboard (empirically validate)
- Later
- [ ] Commercial Assistant evaluation
- [ ] Human Assistant evaluation
- [ ] Data sources
- Much later
- [ ] Prompt
- [ ] Prompt-tuning on query
- [ ] Embedding vs. LLM for retrieval
- [ ] Ray Tune to tweak a subset of components
- [ ] CI/CD workflows
57 changes: 17 additions & 40 deletions app/config.py
Original file line number Diff line number Diff line change
@@ -1,44 +1,21 @@
import os
from pathlib import Path

# Directories
EFS_DIR = Path("/efs/shared_storage/goku")
ROOT_DIR = Path(__file__).parent.parent.absolute()


DB_CONNECTION_STRING = os.environ.get("DB_CONNECTION_STRING")
DOCS_PATH = os.environ.get("DOCS_PATH")

# Credentials
OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE", "https://api.endpoints.anyscale.com/v1")
OPENAI_API_KEY = os.environ.get(
"OPENAI_API_KEY", ""
) # https://app.endpoints.anyscale.com/credentials

# Indexing and model properties
DEVICE = os.environ.get("DEVICE", "cuda")
EMBEDDING_BATCH_SIZE = os.environ.get("EMBEDDING_BATCH_SIZE", 100)
EMBEDDING_ACTORS = os.environ.get("EMBEDDING_ACTORS", 2)
NUM_GPUS = os.environ.get("NUM_GPUS", 1)
INDEXING_ACTORS = os.environ.get("INDEXING_ACTORS", 20)
INDEXING_BATCH_SIZE = os.environ.get("INDEXING_BATCH_SIZE", 128)

# Response generation properties
EXPERIMENT_NAME = os.environ.get("EXPERIMENT_NAME", "llama-2-7b-gtebase")
DATA_PATH = os.environ.get("DATA_PATH", "datasets/eval-dataset-v1.jsonl")
CHUNK_SIZE = os.environ.get("CHUNK_SIZE", 300)
CHUNK_OVERLAP = os.environ.get("CHUNK_OVERLAP", 50)
EMBEDDING_MODEL = os.environ.get("EMBEDDING_MODEL", "thenlper/gte-base")
LLM = os.environ.get("LLM", "meta-llama/Llama-2-7b-chat-hf")
TEMPERATURE = os.environ.get("TEMPERATURE", 0)
MAX_CONTEXT_LENGTH = os.environ.get("MAX_CONTEXT_LENGTH", 4096)

# Evaluation properties
REFERENCE_LOC = os.environ.get("REFERENCE_LOC", "experiments/responses/gpt-4-with-source.json")
RESPONSE_LOC = os.environ.get("RESPONSE_LOC", "experiments/responses/$EXPERIMENT_NAME.json")
EVALUATOR = os.environ.get("EVALUATOR", "meta-llama/Llama-2-70b-chat-hf")
EVALUATOR_TEMPERATURE = os.environ.get("EVALUATOR_TEMPERATURE", 0)
EVALUATOR_MAX_CONTEXT_LENGTH = os.environ.get("EVALUATOR_MAX_CONTEXT_LENGTH", 4096)

# Slack bot integration
SLACK_APP_TOKEN = os.environ.get("SLACK_APP_TOKEN", "")
SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN", "")
EXPERIMENTS_DIR = Path(ROOT_DIR, "experiments")

# Mappings
EMBEDDING_DIMENSIONS = {
"thenlper/gte-base": 768,
"BAAI/bge-large-en": 1024,
"text-embedding-ada-002": 1536,
}
MAX_CONTEXT_LENGTHS = {
"gpt-4": 8192,
"gpt-3.5-turbo": 4096,
"gpt-3.5-turbo-16k": 16384,
"meta-llama/Llama-2-7b-chat-hf": 4096,
"meta-llama/Llama-2-13b-chat-hf": 4096,
"meta-llama/Llama-2-70b-chat-hf": 4096,
}
Loading

0 comments on commit 428e239

Please sign in to comment.