hf readme.

b4rtaz · Jun 7, 2024 · ee40fc4 · ee40fc4
1 parent 0929f4e
commit ee40fc4
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@ Python 3 and C++ compiler required. The command will download the model and the
 Supported architectures: Llama, Mixtral, Grok
 
 * [How to Convert Llama 2, Llama 3](./docs/LLAMA.md)
+* [How to Convert Hugging Face Model](./docs/HUGGINGFACE.md)
 
 ### 🚧 Known Limitations
 

diff --git a/docs/HUGGINGFACE.md b/docs/HUGGINGFACE.md
@@ -0,0 +1,22 @@
+# How to Run Hugging Face 🤗 Model
+
+Currently, Distributed Llama supports three types of Hugging Face models: `llama`, `mistral`, and `mixtral`. You can try to convert any compatible Hugging Face model and run it with Distributed Llama.
+
+> [!IMPORTANT]
+> All converters are in the early stages of development. After conversion, the model may not work correctly.
+
+1. Download a model, for example: [Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3/tree/main).
+2. The downloaded model should contain `config.json`, `tokenizer.json`, `tokenizer_config.json` and `tokenizer.model` and safetensor files.
+3. Run the converter of the model:
+```sh
+cd converter
+python convert-hf.py path/to/hf/model q40 mistral-7b-0.3
+```
+4. Run the converter of the tokenizer:
+```sh
+python convert-tokenizer-hf.py path/to/hf/model mistral-7b-0.3
+```
+5. That's it! Now you can run the Distributed Llama.
+```
+./dllama inference --model dllama_model_mistral-7b-0.3_q40.m --tokenizer dllama_tokenizer_mistral-7b-0.3.t --buffer-float-type q80 --prompt "Hello world"
+```