Skip to content

Commit

Permalink
hf readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
b4rtaz committed Jun 7, 2024
1 parent 0929f4e commit ee40fc4
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Python 3 and C++ compiler required. The command will download the model and the
Supported architectures: Llama, Mixtral, Grok

* [How to Convert Llama 2, Llama 3](./docs/LLAMA.md)
* [How to Convert Hugging Face Model](./docs/HUGGINGFACE.md)

### 🚧 Known Limitations

Expand Down
22 changes: 22 additions & 0 deletions docs/HUGGINGFACE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# How to Run Hugging Face 🤗 Model

Currently, Distributed Llama supports three types of Hugging Face models: `llama`, `mistral`, and `mixtral`. You can try to convert any compatible Hugging Face model and run it with Distributed Llama.

> [!IMPORTANT]
> All converters are in the early stages of development. After conversion, the model may not work correctly.
1. Download a model, for example: [Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3/tree/main).
2. The downloaded model should contain `config.json`, `tokenizer.json`, `tokenizer_config.json` and `tokenizer.model` and safetensor files.
3. Run the converter of the model:
```sh
cd converter
python convert-hf.py path/to/hf/model q40 mistral-7b-0.3
```
4. Run the converter of the tokenizer:
```sh
python convert-tokenizer-hf.py path/to/hf/model mistral-7b-0.3
```
5. That's it! Now you can run the Distributed Llama.
```
./dllama inference --model dllama_model_mistral-7b-0.3_q40.m --tokenizer dllama_tokenizer_mistral-7b-0.3.t --buffer-float-type q80 --prompt "Hello world"
```

0 comments on commit ee40fc4

Please sign in to comment.