diff --git a/_posts/2024-03-21-LLM.md b/_posts/2024-03-21-LLM.md
index b4131a55..85fb2c58 100644
--- a/_posts/2024-03-21-LLM.md
+++ b/_posts/2024-03-21-LLM.md
@@ -254,6 +254,11 @@ Phind-70B is significantly faster than GPT-4 Turbo, running at 80+ tokens per se
**model:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)
![](https://scontent.ftpe3-2.fna.fbcdn.net/v/t39.2365-6/438922663_1135166371264105_805978695964769385_n.png?_nc_cat=107&ccb=1-7&_nc_sid=e280be&_nc_ohc=VlupGTPFG1UAb6vzA2n&_nc_ht=scontent.ftpe3-2.fna&oh=00_AfCRhEYLtA4OYbvEyNCRXBdU9riWMtwtBWnk79O-SIigbg&oe=663C005E)
+---
+### Phi-3
+**mode:** [microsoft/Phi-3-mini-4k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
+**Blog:** [Introducing Phi-3: Redefining what’s possible with SLMs](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)
+
---
## LLM running locally
@@ -294,6 +299,12 @@ Get up and running with large language models locally.
**Paper:** [QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)
**Code:** [https://github.com/artidoro/qlora](https://github.com/artidoro/qlora)
+---
+### AWQ
+**Paper:** [AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration](https://arxiv.org/abs/2306.00978)
+**Code:** [https://github.com/mit-han-lab/llm-awq](https://github.com/mit-han-lab/llm-awq)
+![](https://github.com/mit-han-lab/llm-awq/raw/main/figures/overview.png)
+
---
### [Platypus](https://platypus-llm.github.io/)
**Paper:** [Platypus: Quick, Cheap, and Powerful Refinement of LLMs](https://arxiv.org/abs/2308.07317)