diff --git a/_posts/2024-03-21-LLM.md b/_posts/2024-03-21-LLM.md index b4131a55..85fb2c58 100644 --- a/_posts/2024-03-21-LLM.md +++ b/_posts/2024-03-21-LLM.md @@ -254,6 +254,11 @@ Phind-70B is significantly faster than GPT-4 Turbo, running at 80+ tokens per se **model:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
![](https://scontent.ftpe3-2.fna.fbcdn.net/v/t39.2365-6/438922663_1135166371264105_805978695964769385_n.png?_nc_cat=107&ccb=1-7&_nc_sid=e280be&_nc_ohc=VlupGTPFG1UAb6vzA2n&_nc_ht=scontent.ftpe3-2.fna&oh=00_AfCRhEYLtA4OYbvEyNCRXBdU9riWMtwtBWnk79O-SIigbg&oe=663C005E) +--- +### Phi-3 +**mode:** [microsoft/Phi-3-mini-4k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
+**Blog:** [Introducing Phi-3: Redefining what’s possible with SLMs](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)
+ --- ## LLM running locally @@ -294,6 +299,12 @@ Get up and running with large language models locally.
**Paper:** [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
**Code:** [https://github.com/artidoro/qlora](https://github.com/artidoro/qlora)
+--- +### AWQ +**Paper:** [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978)
+**Code:** [https://github.com/mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+![](https://github.com/mit-han-lab/llm-awq/raw/main/figures/overview.png) + --- ### [Platypus](https://platypus-llm.github.io/) **Paper:** [Platypus: Quick, Cheap, and Powerful Refinement of LLMs](https://arxiv.org/abs/2308.07317)