From d58ff7e6c8599166880ffc5b7523823c205b0ed8 Mon Sep 17 00:00:00 2001
From: Richard Kuo <rkuo2000@gmail.com>
Date: Wed, 18 Sep 2024 10:37:04 +0800
Subject: [PATCH] Delete _posts/2024-08-15-LLM.md

---
 _posts/2024-08-15-LLM.md | 521 ---------------------------------------
 1 file changed, 521 deletions(-)
 delete mode 100644 _posts/2024-08-15-LLM.md
diff --git a/_posts/2024-08-15-LLM.md b/_posts/2024-08-15-LLM.md
deleted file mode 100644
index a9bd3219..00000000
--- a/_posts/2024-08-15-LLM.md
+++ /dev/null
@@ -1,521 +0,0 @@
----
-layout: post
-title: Large Language Models
-author: [Richard Kuo]
-category: [Lecture]
-tags: [jekyll, ai]
----
-
-Introduction to LLMs, Deep LLM, Time-series LLMs, Applications, etc.
-
----
-## History of LLMs
-[A Survey of Large Language Models](https://arxiv.org/abs/2303.18223)<br>
-
-### 大型語言模型(>10B)的時間軸
-![](https://api.wandb.ai/files/vincenttu/images/projects/37228380/5a69d608.png)
-
----
-### 計算記憶體的成長與Transformer大小的關係
-[AI and Memory Wall](https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8)<br>
-![](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*U-7GJqBZ2tY1W5Iu)
-
----
-### Scaling Law
-我們可以用模型大小、Dataset大小、總計算量，來預測模型最終能力。（通常以相對簡單的函數型態, ex: Linear relationship）<br>
-
-**Blog:** [【LLM 10大觀念-1】Scaling Law](https://axk51013.medium.com/llm%E5%B0%88%E6%AC%84-%E8%BF%8E%E6%8E%A52024%E5%B9%B4-10%E5%80%8B%E5%BF%85%E9%A0%88%E8%A6%81%E6%90%9E%E6%87%82%E7%9A%84llm%E6%A6%82%E5%BF%B5-1-scaling-law-5f6a409d35c5)<br>
-
-![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*_NvxS3TwabOY6OPvaj2Peg.png)
-*從此圖可以看出他們在train GPT4之前，就已經完美的預測到了GPT4的能力水平。* <br>
-Ref: [GPT-4 Technical Report. OpenAI. 2023](https://arxiv.org/pdf/2303.08774.pdf)<br>
-
-**Papers:**
-- [Hestness et al.](https://arxiv.org/abs/1712.00409) 於2017發現在Machine Translation, Language Modeling, Speech Recognition和Image Classification都有出現Scaling law.
-- OpenAI [Kaplan et al.2020](https://arxiv.org/abs/2001.08361) 於2020年從計算量、Dataset大小、跟參數量分別討論了Scaling Law。
-- [Rosenfeld et al.](https://arxiv.org/abs/2108.07686) 於2021年發表了關於Scaling Law的survey paper。在各種architecture更進一步驗證Scaling Law的普適性。
-
----
-### Chinchilla Scaling Law
-**Paper:** [Training Compute-Optimal Large Language Models](https://arxiv.org/abs/2203.15556)<br>
-
-如果我們接受原本Scaling Law的定義（模型性能可藉由參數量、Dataset大小、計算量預測），馬上就會衍伸出兩個很重要的問題:<br>
-
-**Return（收益）**： 在固定的訓練計算量之下，我們所能得到的最好性能是多好？<br>
-**Allocation（分配）**：我們要怎麼分配我們的模型參數量跟Dataset大小。<br>
-（假設計算量 = 參數量 * Dataset size，我們要大模型 * 少量data、中模型 * 中量data、還是小模型 * 大量data）<br>
-
-2022年DeepMind提出Chinchilla Scaling Law，同時解決了這兩個問題，並且依此改善了當時其他大模型的訓練方式。
-他們基於三種方式來找到訓練LLM的Scaling Law：<br>
-1. 固定模型大小，變化訓練Data數量。
-2. 固定計算量（浮點運算），變化模型大小。
-3. 對所有實驗結果，直接擬合參數化損失函數。
-
-![](https://miro.medium.com/v2/resize:fit:720/format:webp/1*PWkg8x3Dtr64q-7BI2rrJA.png)
-`Method 3 result from Chinchilla Scaling Law，N是模型參數量、D是數據量、其他都是係數`<br>
-
-LLM最終的Loss（Perplexity），會隨著模型放大、數據量變多而下降，並且是跟他們呈現指數映射後線性關係。<br>
-
-Chinchilla最大的貢獻更是在解決Allocation的問題，他們發現<br>
-* **數據量（Tokens數）應該要約等於模型參數量的20倍**
-* **並且數據量跟模型參數量要同比放大（Ex: 模型放大一倍，數據也要跟著增加一倍）**
-
----
-## Large Language Models
-
-### [Open LLM Leaderboard](https://chat.lmsys.org/?leaderboard)
-
-### Transformer
-**Paper:** [Attention Is All You Need](https://arxiv.org/abs/1706.03762)<br>
-![](https://miro.medium.com/max/407/1*3pxDWM3c1R_WSW7hVKoaRA.png)
-
-<iframe width="1158" height="652" src="https://www.youtube.com/embed/uhNsUCb2fJI?list=PLJV_el3uVTsPz6CTopeRp2L2t4aL_KgiI" title="【生成式AI導論 2024】第10講：今日的語言模型是如何做文字接龍的 — 淺談Transformer (已經熟悉 Transformer 的同學可略過本講)" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
-
-### ChatGPT
-[ChatGPT: Optimizing Language Models for Dialogue](https://openai.com/blog/chatgpt/)<br>
-ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022.<br>
-
-![](https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg)
-
-<iframe width="640" height="455" src="https://www.youtube.com/embed/e0aKI2GGZNg" title="Chat GPT (可能)是怎麼煉成的 - GPT 社會化的過程" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
-
-<iframe width="1152" height="649" src="https://www.youtube.com/embed/cCpErV7To2o?list=PLJV_el3uVTsPz6CTopeRp2L2t4aL_KgiI" title="【生成式AI導論 2024】第6講：大型語言模型修練史 — 第一階段: 自我學習，累積實力 (熟悉機器學習的同學從 15:00 開始看起即可)" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
-<iframe width="1152" height="649" src="https://www.youtube.com/embed/Q9cNkUPXUB8?list=PLJV_el3uVTsPz6CTopeRp2L2t4aL_KgiI" title="【生成式AI導論 2024】第7講：大型語言模型修練史 — 第二階段: 名師指點，發揮潛力 (兼談對 ChatGPT 做逆向工程與 LLaMA 時代的開始)" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
-<iframe width="1152" height="649" src="https://www.youtube.com/embed/v12IKvF6Cj8?list=PLJV_el3uVTsPz6CTopeRp2L2t4aL_KgiI" title="【生成式AI導論 2024】第8講：大型語言模型修練史 — 第三階段: 參與實戰，打磨技巧 (Reinforcement Learning from Human Feedback, RLHF)" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
-
----
-### GPT4
-**Paper:** [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)<br>
-![](https://image-cdn.learnin.tw/bnextmedia/image/album/2023-03/img-1679884936-23656.png?w=1200&output=webp)
-**Paper:** [From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting](https://arxiv.org/abs/2309.04269)<br>
-**Blog:** [GPT-4 Code Interpreter: The Next Big Thing in AI](https://medium.com/@aaabulkhair/gpt-4-code-interpreter-the-next-big-thing-in-ai-56bbf72d746)<br>
-
----
-### [LLaMA](https://huggingface.co/docs/transformers/main/model_doc/llama)
-**Paper:** [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)<br>
-![](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*nt-ydHhSVsaLXq_HZRaLQA.png)
-**Blog:** [Building a Million-Parameter LLM from Scratch Using Python](https://levelup.gitconnected.com/building-a-million-parameter-llm-from-scratch-using-python-f612398f06c2)<br>
-**Kaggle:** [LLM LLaMA from scratch](https://www.kaggle.com/rkuo2000/llm-llama-from-scratch/)<br>
-
----
-### BloombergGPT
-**Paper:** [BloombergGPT: A Large Language Model for Finance](https://arxiv.org/abs/2303.17564)<br>
-**Blog:** [Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance](https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/)<br>
-
----
-### Pythia
-**Paper:** [Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling](https://arxiv.org/abs/2304.01373)<br>
-**Dataset:** <br>
-[The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)<br>
-[Datasheet for the Pile](https://arxiv.org/abs/2201.07311)<br>
-**Code:** [Pythia: Interpreting Transformers Across Time and Scale](https://github.com/EleutherAI/pythia)<br>
-
----
-### MPT-7B
-**model:** [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)<br>
-**Code:** [https://github.com/mosaicml/llm-foundry](https://github.com/mosaicml/llm-foundry)<br>
-**Blog:** [Announcing MPT-7B-8K: 8K Context Length for Document Understanding](https://www.mosaicml.com/blog/long-context-mpt-7b-8k)<br>
-**Blog:** [Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs](https://www.mosaicml.com/blog/mpt-7b)<br>
-
----
-### Falcon-40B
-**model:** [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)<br>
-**Paper:** [The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only](https://arxiv.org/abs/2306.01116)<br>
-
----
-### Orca 
-**Paper:** [Orca: Progressive Learning from Complex Explanation Traces of GPT-4](https://arxiv.org/abs/2306.02707)<br>
-
----
-### OpenLLaMA
-**model:** [openlm-research/open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2)<br>
-**model:** [openlm-research/open_llama_7b_v2](https://huggingface.co/openlm-research/open_llama_7b_v2)<br>
-**Code:** [https://github.com/openlm-research/open_llama](https://github.com/openlm-research/open_llama)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-openllama](https://www.kaggle.com/code/rkuo2000/llm-openllama)<br>
-
----
-### Vicuna
-**model:** [lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)<br>
-**Paper:** [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://arxiv.org/abs/2306.05685)<br>
-**Code:** [https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)<br>
-
----
-### [LLaMA-2](https://huggingface.co/meta-llama)
-**model:** [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)<br>
-**Paper:** [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)<br>
-**Code:** [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama)<br>
-
----
-### Sheared LLaMA
-**model_name** = ["princeton-nlp/Sheared-LLaMA-1.3B"](https://huggingface.co/princeton-nlp/Sheared-LLaMA-1.3B), [princeton-nlp/Sheared-LLaMA-2.7B](https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B) | [princeton-nlp/Sheared-Pythia-160m](https://huggingface.co/princeton-nlp/Sheared-Pythia-160m/tree/main)<br>
-**Paper:** [Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning](https://arxiv.org/abs/2310.06694)<br>
-**Code:** [https://github.com/princeton-nlp/LLM-Shearing](https://github.com/princeton-nlp/LLM-Shearing)<br>
-
----
-### Neural-Chat-7B (Intel)
-**model_name** = ["Intel/neural-chat-7b-v3-1"](https://huggingface.co/Intel/neural-chat-7b-v3-1)<br>
-**Blog:** [Intel neural-chat-7b Model Achieves Top Ranking on LLM Leaderboard!](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Intel-neural-chat-7b-Model-Achieves-Top-Ranking-on-LLM/post/1549386)<br>
-
----
-### Mistral
-**model_name** = ["mistralai/Mistral-7B-Instruct-v0.2"](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)<br>
-**Paper:** [Mistral 7B](https://arxiv.org/abs/2310.06825)<br>
-**Code:** [https://github.com/mistralai/mistral-src](https://github.com/mistralai/mistral-src)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-mistral-7b-instruct](https://www.kaggle.com/code/rkuo2000/llm-mistral-7b-instruct)<br>
-![](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmistral-7B-2.8625353c.png&w=1920&q=75)
-
----
-### Mistral 8X7B
-**model:** [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1)<br>
-**Paper:** [Mixtral of Experts](https://arxiv.org/abs/2401.04088)<br>
-![](https://miro.medium.com/v2/resize:fit:720/format:webp/0*91yEJMc_q-QlU-bk.png)
-
----
-### Starling-LM
-**model:** [Nexusflow/Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta)<br>
-**Paper:** [RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback](https://arxiv.org/abs/2309.00267)<br>
-**Blog:** [Starling-7B: Increasing LLM Helpfulness & Harmlessness with RLAIF](https://starling.cs.berkeley.edu/)<br>
-![](https://starling.cs.berkeley.edu/rlaif_dataset.png)
-
----
-### Zephyr
-**model:** [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)<br>
-**Paper:** [Zephyr: Direct Distillation of LM Alignment](https://arxiv.org/abs/2310.16944)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-zephyr-7b](https://www.kaggle.com/code/rkuo2000/llm-zephyr-7b)<br>
-![](https://i3.res.bangqu.com/farm/liang/news/2023/10/28/9e3a1a498f94b147fd57608b4beaefe0.jpg)
-**Blog:** [Zephyr-7B : HuggingFace’s Hyper-Optimized LLM Built on Top of Mistral 7B](https://www.unite.ai/zephyr-7b-huggingfaces-hyper-optimized-llm-built-on-top-of-mistral-7b/)<br>
-![](https://www.unite.ai/wp-content/uploads/2023/11/Model-Performace-768x418.png)
-
----
-### Orca 2
-**model:** [microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)<br>
-**Paper:** [https://arxiv.org/abs/2311.11045](https://arxiv.org/abs/2311.11045)<br>
-**Blog:** [Microsoft's Orca 2 LLM Outperforms Models That Are 10x Larger](https://www.infoq.com/news/2023/12/microsoft-orca-2-llm/)<br>
-<p><img src="https://s4.itho.me/sites/default/files/images/1123-Orca-2-microsoft-600.png" width="50%" height="50%"></p>
-
----
-### [BlueLM (VIVO)](https://developers.vivo.com/product/ai/bluelm)
-**model:** [vivo-ai/BlueLM-7B-Chat-4bits](https://huggingface.co/vivo-ai/BlueLM-7B-Chat-4bits)<br>
-**Code:** [https://github.com/vivo-ai-lab/BlueLM/](https://github.com/vivo-ai-lab/BlueLM/)<br>
-
----
-### Taiwan-LLM (優必達+台大)
-**model:** [yentinglin/Taiwan-LLM-7B-v2.1-chat](https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat)<br>
-**Paper:** [TAIWAN-LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model](https://arxiv.org/abs/2311.17487)<br>
-**Blog:** [專屬台灣！優必達攜手台大打造「Taiwan LLM」，為何我們需要本土化的AI？](https://www.bnext.com.tw/article/77335/ubitus-ai-taiwan-llm)<br>
-**Code:** [https://github.com/MiuLab/Taiwan-LLM](https://github.com/MiuLab/Taiwan-LLM)<br>
-
----
-### Phi-2 (Transformer with 2.7B parameters)
-**model:** [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)<br>
-**Blog:** [Phi-2: The surprising power of small language models](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-phi-2](https://www.kaggle.com/code/rkuo2000/llm-phi-2)<br>
-
----
-### [Mamba](https://huggingface.co/collections/Q-bert/mamba-65869481595e25821853d20d)
-**model:** [Q-bert/Mamba-130M](https://huggingface.co/Q-bert/Mamba-130M)<br>
-**Paper:** [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-mamba-130m](https://www.kaggle.com/code/rkuo2000/llm-mamba-130m)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-mamba-3b](https://www.kaggle.com/code/rkuo2000/llm-mamba-3b)<br>
-![](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*OqdcbRFPuqNd_4hyBcMGHQ.png)
-
----
-### SOLAR-10.7B ~ Depth Upscaling
-**Paper:** [SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling](https://arxiv.org/abs/2312.15166)<br>
-**Code:** [https://huggingface.co/upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)<br>
-Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model.<br>
-Leveraging state-of-the-art instruction fine-tuning methods, including supervised fine-tuning (SFT) and direct preference optimization (DPO), 
-researchers utilized a diverse set of datasets for training. This fine-tuned model, SOLAR-10.7B-Instruct-v1.0, achieves a remarkable Model H6 score of 74.20, 
-boasting its effectiveness in single-turn dialogue scenarios.<br>
-
----
-### Qwen (通义千问)
-**model:** [Qwen/Qwen1.5-7B-Chat](https://huggingface.co/Qwen/Qwen1.5-7B-Chat)<br>
-**Blog:** [Introducing Qwen1.5](https://qwenlm.github.io/blog/qwen1.5/)<br>
-**Code:** [https://github.com/QwenLM/Qwen1.5](https://github.com/QwenLM/Qwen1.5)<br>
-**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-qwen1-5](https://www.kaggle.com/code/rkuo2000/llm-qwen1-5)<br>
-<p><img width="50%" height="50%" src="https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/assets/blog/qwen1.5/sft.jpg"></p>
-
----
-### Yi (零一万物)
-**model:** [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)<br>
-**Paper:** [CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark](https://arxiv.org/abs/2401.11944)<br>
-**Paper:** [Yi: Open Foundation Models by 01.AI](https://arxiv.org/abs/2403.04652)<br>
-
----
-### Orca-Math
-**Paper:** [Orca-Math: Unlocking the potential of SLMs in Grade School Math](https://arxiv.org/abs/2402.14830)<br>
-**Dataset:** [https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k](https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k)<br>
-
----
-### Breeze (達哥)
-**model:** [MediaTek-Research/Breeze-7B-Instruct-v0_1](https://huggingface.co/MediaTek-Research/Breeze-7B-Instruct-v0_1)<br>
-**Paper:** [Breeze-7B Technical Report](https://arxiv.org/abs/2403.02712)<br>
-**Blog:** [Breeze-7B: 透過 Mistral-7B Fine-Tune 出來的繁中開源模型](https://blog.infuseai.io/quick-demo-3-breeze-7b-mediatek-intro-3e2f8e2f6da9)<br>
-
----
-### Bialong (白龍)
-Bilingual transfer learning based on QLoRA and zip-tie embedding<br>
-**model:** [INX-TEXT/Bailong-instruct-7B](https://huggingface.co/INX-TEXT/Bailong-instruct-7B)<br>
-
----
-### [TAIDE](https://taide.tw/index)
-**model:** [taide/TAIDE-LX-7B-Chat](https://huggingface.co/taide/TAIDE-LX-7B-Chat)<br>
-* TAIDE-LX-7B: 以 LLaMA2-7b 為基礎，僅使用繁體中文資料預訓練 (continuous pretraining)的模型，適合使用者會對模型進一步微調(fine tune)的使用情境。因預訓練模型沒有經過微調和偏好對齊，可能會產生惡意或不安全的輸出，使用時請小心。
-* TAIDE-LX-7B-Chat: 以 TAIDE-LX-7B 為基礎，透過指令微調(instruction tuning)強化辦公室常用任務和多輪問答對話能力，適合聊天對話或任務協助的使用情境。TAIDE-LX-7B-Chat另外有提供4 bit 量化模型，量化模型主要是提供使用者的便利性，可能會影響效能與更多不可預期的問題，還請使用者理解與注意。
-
----
-### Gemma
-**model:** [google/gemma-1.1-7b-it](https://huggingface.co/google/gemma-1.1-7b-it)<br>
-**Blog:** [Gemma: Introducing new state-of-the-art open models](https://blog.google/technology/developers/gemma-open-models/)<br>
-**Kaggle:** [https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora](https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora)<br>
-
----
-### Claude 3
-[Introducing the next generation of Claude](https://www.anthropic.com/news/claude-3-family)<br>
-![](https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2F9ad98d612086fe52b3042f9183414669b4d2a3da-2200x1954.png&w=3840&q=75)
-
----
-### InflectionAI
-**Blog:** [Inflection AI 發表新基礎模型「Inflection-2.5 」，能力逼近 GPT-4！](https://www.inside.com.tw/article/34418-inflection-ai-launches-inflection-2-5)<br>
-
----
-### Phind-70B
-**Blog:** [Introducing Phind-70B – closing the code quality gap with GPT-4 Turbo while running 4x faster](https://www.phind.com/blog/introducing-phind-70b)<br>
-**Blog:** [Phind - 給工程師的智慧搜尋引擎](https://ywctech.net/ml-ai/phind-first-impression)<br>
-Phind-70B is significantly faster than GPT-4 Turbo, running at 80+ tokens per second to GPT-4 Turbo's ~20 tokens per second. We're able to achieve this by running NVIDIA's TensorRT-LLM library on H100 GPUs, and we're working on optimizations to further increase Phind-70B's inference speed.
-
----
-### [Llama-3](https://ai.meta.com/blog/meta-llama-3/)
-**model:** [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)<br>
-**Code:** [https://github.com/meta-llama/llama3/](https://github.com/meta-llama/llama3/)<br>
-![](https://scontent.ftpe3-2.fna.fbcdn.net/v/t39.2365-6/438922663_1135166371264105_805978695964769385_n.png?_nc_cat=107&ccb=1-7&_nc_sid=e280be&_nc_ohc=VlupGTPFG1UAb6vzA2n&_nc_ht=scontent.ftpe3-2.fna&oh=00_AfCRhEYLtA4OYbvEyNCRXBdU9riWMtwtBWnk79O-SIigbg&oe=663C005E)
-
----
-### Phi-3
-**model:** [microsoft/Phi-3-mini-4k-instruct"](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)<br>
-**Blog:** [Introducing Phi-3: Redefining what’s possible with SLMs](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)<br>
-
----
-### Octopus v4
-**model:** [NexaAIDev/Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4)<br>
-**Paper:** [Octopus v4: Graph of language models](https://arxiv.org/abs/2404.19296)<br>
-**Code:** [https://github.com/NexaAI/octopus-v4](https://github.com/NexaAI/octopus-v4)<br>
-[design demo](https://graph.nexa4ai.com/)<br>
-
----
-### [Llama 3.1](https://ai.meta.com/blog/meta-llama-3-1/)
-**mode:** [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)<br>
-![](https://scontent.ftpe3-1.fna.fbcdn.net/v/t39.2365-6/452673884_1646111879501055_1352920258421649752_n.png?_nc_cat=100&ccb=1-7&_nc_sid=e280be&_nc_ohc=tHtG_dev9lcQ7kNvgFirAn9&_nc_ht=scontent.ftpe3-1.fna&oh=00_AYDvpGKrjgZlN5eVxo7ppCqw2Umq03ytaednlW3mTpCDvQ&oe=66C45828)
-
----
-### Grok-2
-Grok-2 & Grok-2 mini, achieve performance levels competitive to other frontier models in areas such as graduate-level science knowledge (GPQA), general knowledge (MMLU, MMLU-Pro), and math competition problems (MATH). Additionally, Grok-2 excels in vision-based tasks, delivering state-of-the-art performance in visual math reasoning (MathVista) and in document-based question answering (DocVQA).
-![](https://github.com/rkuo2000/AI-course/blob/main/images/Grok-2-benchmark.png?raw=true)
-
----
-### Phi-3.5
-**model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)<br>
-**model:** [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)<br>
-**model:** [microsoft/Phi-3.5-MoE-instruct](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)<br>
-**News:** [Microsoft Unveils Phi-3.5: Powerful AI Models Punch Above Their Weight](https://www.maginative.com/article/microsoft-unveils-phi-3-5-powerful-ai-models-punch-above-their-weight/)<br>
-![](https://www.maginative.com/content/images/size/w1600/2024/08/Screenshot-2024-08-21-at-2.11.41-AM.png)
-![](https://www.maginative.com/content/images/size/w1600/2024/08/Screenshot-2024-08-21-at-2.09.00-AM.png)
-
----
-## LLM running locally
-
-### [ollama](https://ollama.com/download)
-`ollama -v`<br>
-`ollama`<br>
-`ollama pull llava`<br>
-`ollama run llava`<br>
-
-**Code:** [Github](https://github.com/ollama/ollama)<br>
-**[Examples](https://github.com/ollama/ollama/tree/main/examples)**:<br>
-* [langchain-python-rag-privategpt](https://github.com/ollama/ollama/tree/main/examples/langchain-python-rag-privategpt)<br>
-
----
-### [LM Studio](https://lmstudio.ai/)
-![](https://github.com/rkuo2000/AI-course/blob/main/images/LM_studio_0.2.19.png?raw=true)
-
----
-### PrivateGPT
-**Code:** [https://github.com/zylon-ai/private-gpt](https://github.com/zylon-ai/private-gpt)<br>
-![](https://github.com/zylon-ai/private-gpt/raw/main/fern/docs/assets/ui.png?raw=true)
-
----
-### [GPT4All](https://gpt4all.io/index.html)
-```
-chmod +x gpt4all-installer-linux.run
-./gpt4all-installer-linux.run
-cd ~/gpt4all
-./bin/chat
-```
-![](https://gpt4all.io/landing.gif)
-
----
-### [GPT4FREE](https://github.com/xtekky/gpt4free)
-`pip install g4f`<br>
-
----
-## Deep LLM
-
-### Deep Language Networks
-**Paper:** [Joint Prompt Optimization of Stacked LLMs using Variational Inference](https://arxiv.org/abs/2306.12509)<br>
-**Code:** [https://github.com/microsoft/deep-language-networks](https://github.com/microsoft/deep-language-networks)<br>
-![](https://github.com/rkuo2000/AI-course/blob/main/images/DLN-2_benchmark.png?raw=true)
-
----
-### Constitutional AI
-**Paper:** [Constitutional AI: Harmlessness from AI Feedback](https://arxiv.org/abs/2212.08073)
-Two key phases:<br>
-1. Supervised Learning Phase (SL Phase)
-- Step1: The learning starts using the samples from the initial model
-- Step2: From these samples, the model generates self-critiques and revisions
-- Step3: Fine-tine the original model with these revisions
-2. Reinforcement Learning Phase (RL Phease)
-- Step1. The model uses samples from the fine-tuned model.
-- Step2. Use a model to compare the outputs from samples from the initial model and the fine-tuned model
-- Step3. Decide which sample is better. (RLHF)
-- Step4. Train a new "preference model" from the new dataset of AI preferences.
-This new "prefernece model" will then be used to re-train the RL (as a reward signal).
-It is now the RLHAF (Reinforcement Learning from AI feedback)
-
----
-### Chain-of-Thought Prompting
-**Paper:** [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)<br>
-![](https://ar5iv.labs.arxiv.org/html/2201.11903/assets/x1.png)
-
----
-### [ReAct Prompting](https://react-lm.github.io/)
-**Paper:** [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629)<br>
-**Code:** [https://github.com/ysymyth/ReAct](https://github.com/ysymyth/ReAct)<br>
-![](https://react-lm.github.io/files/diagram.png)
-
----
-### Tree-of-Thoughts
-**Paper:** [Tree of Thoughts: Deliberate Problem Solving with Large Language Models](https://arxiv.org/abs/2305.10601)<br>
-**Code:** [https://github.com/princeton-nlp/tree-of-thought-llm](https://github.com/princeton-nlp/tree-of-thought-llm)<br>
-**Code:** [https://github.com/kyegomez/tree-of-thoughts](https://github.com/kyegomez/tree-of-thoughts)<br>
-![](https://github.com/princeton-nlp/tree-of-thought-llm/blob/master/pics/teaser.png?raw=true)
-
----
-### Tabular CoT
-**Paper:** [Tab-CoT: Zero-shot Tabular Chain of Thought](https://arxiv.org/abs/2305.17812)<br>
-**Code:** [https://github.com/Xalp/Tab-CoT](https://github.com/Xalp/Tab-CoT)<br>
-![](https://github.com/Xalp/Tab-CoT/raw/main/intro.jpg)
-
----
-### Survey of Chain-of-Thought
-**Paper:** [A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future](https://arxiv.org/abs/2309.15402)<br>
-
----
-### Chain-of-Thought Hub
-**Paper:** [Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance](https://arxiv.org/abs/2305.17306)<br>
-**Code:** [https://github.com/FranxYao/chain-of-thought-hub](https://github.com/FranxYao/chain-of-thought-hub)<br>
-
----
-### Everything-of-Thoughts
-**Paper:** [Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation](https://arxiv.org/abs/2311.04254)<br>
-**Code:** [https://github.com/microsoft/Everything-of-Thoughts-XoT](https://github.com/microsoft/Everything-of-Thoughts-XoT)<br>
-![](https://github.com/microsoft/Everything-of-Thoughts-XoT/raw/main/assets/compare.jpg)
-
----
-### R3
-**Paper:** [Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning](https://arxiv.org/abs/2402.05808)<br>
-**Code:** [https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL](https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL)<br>
-![](https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL/raw/master/src/figures/main.png)
-
----
-## Time-series LLM
-**Paper:** [Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook](https://arxiv.org/abs/2310.10196)<br>
-**Papers: [https://github.com/DaoSword/Time-Series-Forecasting-and-Deep-Learning](https://github.com/DaoSword/Time-Series-Forecasting-and-Deep-Learning)<br>
-
-### Time-LLM
-**Paper:** [Time-LLM: Time Series Forecasting by Reprogramming Large Language Models](https://arxiv.org/abs/2310.01728)<br>
-**Code:** [https://github.com/KimMeen/Time-LLM](https://github.com/KimMeen/Time-LLM)<br>
-![](https://github.com/KimMeen/Time-LLM/raw/main/figures/framework.png)
-![](https://github.com/KimMeen/Time-LLM/raw/main/figures/method-detailed-illustration.png)
-
----
-### Time-LLM
-**Paper:** [Time-LLM: Time Series Forecasting by Reprogramming Large Language Models](https://arxiv.org/abs/2310.01728)<br>
-**Code:** [https://github.com/KimMeen/Time-LLM](https://github.com/KimMeen/Time-LLM)<br>
-![](https://github.com/KimMeen/Time-LLM/raw/main/figures/framework.png)
-![](https://github.com/KimMeen/Time-LLM/raw/main/figures/method-detailed-illustration.png)
-
----
-### TimeGPT-1
-**Paper:** [TimeGPT-1](https://arxiv.org/abs/2310.03589)<br>
-<iframe width="648" height="365" src="https://www.youtube.com/embed/L-hRexVa32k" title="TimeLLM - Time Series Forecasting Model" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-
----
-### TEMPO
-**Paper:** [TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting](https://arxiv.org/abs/2310.04948)<br>
-**Code:** [https://github.com/liaoyuhua/tempo-pytorch](https://github.com/liaoyuhua/tempo-pytorch)<br>
-![](https://github.com/liaoyuhua/tempo-pytorch/raw/main/assets/struct.png)
-
----
-### Lag-LLaMA
-**Paper:** [Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting](https://arxiv.org/abs/2310.08278)<br>
-**Blog:** [From RNN/LSTM to Temporal Fusion Transformers and Lag-Llama](https://dataman-ai.medium.com/from-rnn-lstm-to-temporal-fusion-transformers-and-lag-llama-6e6a62c811bd)<br>
-**Code:** [https://github.com/time-series-foundation-models/lag-llama](https://github.com/time-series-foundation-models/lag-llama)<br>
-**Colab:**<br>
-* [Lag-Llama Fine-Tuning Demo](https://colab.research.google.com/drive/1uvTmh-pe1zO5TeaaRVDdoEWJ5dFDI-pA?usp=sharing)<br>
-* [Lag-Llama Zero-shot Forecasting Demo](https://colab.research.google.com/drive/1XxrLW9VGPlZDw3efTvUi0hQimgJOwQG6?usp=sharing)<br>
-
----
-### Timer
-**Paper:** [Timer: Transformers for Time Series Analysis at Scale](https://arxiv.org/abs/2402.02368)<br>
-
----
-## Applications
-
-### [FunSearch](https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/)
-[DeepMind發展用LLM解困難數學問題的方法](https://www.ithome.com.tw/news/160354)<br>
-![](https://s4.itho.me/sites/default/files/styles/picture_size_large/public/field/image/2108_-_funsearch_making_new_discoveries_in_mathematical_sciences_using_lar_-_deepmind.google.jpg?itok=mAy4ydAE)
-
----
-### Automatic Evaluation
-**Paper:** [Can Large Language Models Be an Alternative to Human Evaluation?](https://arxiv.org/abs/2305.01937)<br>
-![](https://aisholar.s3.ap-northeast-1.amazonaws.com/posts/July2023/Can_Large_Language_Models_Be_an_Alternative_to_Human_Evaluation_fig1.png)
-
-**Paper:** [A Closer Look into Automatic Evaluation Using Large Language Models](https://arxiv.org/abs/2310.05657)<br>
-**Code:** [https://github.com/d223302/A-Closer-Look-To-LLM-Evaluation](https://github.com/d223302/A-Closer-Look-To-LLM-Evaluation)<br>
-
----
-### BrainGPT
-**Paper:** [DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation](https://arxiv.org/abs/2309.14030)<br>
-**Blog:** [New Mind-Reading "BrainGPT" Turns Thoughts Into Text On Screen](https://www.iflscience.com/new-mind-reading-braingpt-turns-thoughts-into-text-on-screen-72054)<br>
-![](https://i3.res.bangqu.com/farm/liang/news/2023/12/18/339b9a2158e1fd28e1e39ee4b1557df2.jpg)
-![](https://i3.res.bangqu.com/farm/liang/news/2023/12/18/79ca704627e4cadc1e23afc1b2f029cb.jpg)
-
----
-### Designing Silicon Brains using LLM
-**Paper:** [Designing Silicon Brains using LLM: Leveraging ChatGPT for Automated Description of a Spiking Neuron Array
-](https://arxiv.org/abs/2402.10920)<br>
-![](https://github.com/rkuo2000/AI-course/blob/main/images/ChatGPT_design_Spiking_Neuron_Array.png?raw=true)
-
----
-### MyGirlGPT
-**Code:** [https://github.com/Synthintel0/MyGirlGPT](https://github.com/Synthintel0/MyGirlGPT)<br>
-<iframe width="884" height="663" src="https://www.youtube.com/embed/GlDwTl__UDs" title="MyGirl GPT - Your Personal AI Girlfriend Running on Your Personal Server" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-
----
-### Robotic Manipulation
-**Paper:** [Language-conditioned Learning for Robotic Manipulation: A Survey](https://arxiv.org/abs/2312.10807)<br>
-**Paper:** [Human Demonstrations are Generalizable Knowledge for Robots](https://arxiv.org/abs/2312.02419)<br>
-
----
-### [ALTER-LLM](https://tnoinkwms.github.io/ALTER-LLM/)
-**Paper:** [From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"](https://arxiv.org/abs/2312.06571)<br>
-<iframe width="593" height="346" src="https://www.youtube.com/embed/SAc-O5FDJ4k" title="play the metal" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
-![](https://tnoinkwms.github.io/ALTER-LLM/architecture_2.png)
-![](https://tnoinkwms.github.io/ALTER-LLM/feedback.png)
-
-<br>
-<br>
-
-*This site was last updated {{ site.time | date: "%B %d, %Y" }}.*
-