forked from zivong/jekyll-theme-hydure
-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
498 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,253 @@ | ||
--- | ||
layout: post | ||
title: Large Language Models | ||
author: [Richard Kuo] | ||
category: [Lecture] | ||
tags: [jekyll, ai] | ||
--- | ||
|
||
Introduction to Large Language Models (LLMs), LLM in Vision, etc. | ||
|
||
--- | ||
## History of LLM | ||
[A Survey of Large Language Models](https://arxiv.org/abs/2303.18223)<br> | ||
*Since the introduction of Transformer model in 2017, large language models (LLMs) have evolved significantly.*<br> | ||
*ChatGPT saw 1.6B visits in May 2023.*<br> | ||
*Meta also released three versions of LLaMA-2 (7B, 13B, 70B) free for commercial use in July.*<br> | ||
|
||
--- | ||
### 從解題能力來看四代語言模型的演進 | ||
An evolution process of the four generations of language models (LM) from the perspective of task solving capacity.<br> | ||
![](https://d3i71xaburhd42.cloudfront.net/c61d54644e9aedcfc756e5d6fe4cc8b78c87755d/2-Figure2-1.png) | ||
|
||
--- | ||
### 大型語言模型統計表 | ||
![](https://d3i71xaburhd42.cloudfront.net/c61d54644e9aedcfc756e5d6fe4cc8b78c87755d/8-Table1-1.png) | ||
|
||
--- | ||
### 近年大型語言模型(>10B)的時間軸 | ||
![](https://d3i71xaburhd42.cloudfront.net/c61d54644e9aedcfc756e5d6fe4cc8b78c87755d/9-Figure3-1.png) | ||
|
||
--- | ||
### 大型語言模型之產業分類 | ||
![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2e8bfd65-5272-4cf1-8b86-954bab975bab_2400x1350.png) | ||
|
||
--- | ||
### 大型語言模型之技術分類 | ||
![](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*vZK250i8PIWid6BiaZ1QCA.png) | ||
|
||
--- | ||
### 計算記憶體的成長與Transformer大小的關係 | ||
[AI and Memory Wall](https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8)<br> | ||
![](https://miro.medium.com/v2/resize:fit:4800/format:webp/0*U-7GJqBZ2tY1W5Iu) | ||
|
||
--- | ||
### LLMops 針對生成式 AI 用例調整了 MLops 技術堆疊 | ||
![](https://www.insightpartners.com/wp-content/uploads/2023/10/llmops-market-map-1.png) | ||
|
||
--- | ||
## Large Language Models | ||
|
||
### ChatGPT | ||
[ChatGPT: Optimizing Language Models for Dialogue](https://openai.com/blog/chatgpt/)<br> | ||
ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022.<br> | ||
|
||
![](https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagram.svg) | ||
|
||
<iframe width="640" height="455" src="https://www.youtube.com/embed/e0aKI2GGZNg" title="Chat GPT (可能)是怎麼煉成的 - GPT 社會化的過程" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> | ||
|
||
--- | ||
### GPT4 | ||
**Paper:** [GPT-4 Technical Report](https://arxiv.org/abs/2303.08774)<br> | ||
![](https://image-cdn.learnin.tw/bnextmedia/image/album/2023-03/img-1679884936-23656.png?w=1200&output=webp) | ||
**Paper:** [From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting](https://arxiv.org/abs/2309.04269)<br> | ||
**Blog:** [GPT-4 Code Interpreter: The Next Big Thing in AI](https://medium.com/@aaabulkhair/gpt-4-code-interpreter-the-next-big-thing-in-ai-56bbf72d746)<br> | ||
|
||
--- | ||
### Open LLMs | ||
**[Open LLM leardboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)**<br> | ||
|
||
--- | ||
### [LLaMA](https://huggingface.co/docs/transformers/main/model_doc/llama) | ||
*It is a collection of foundation language models ranging from 7B to 65B parameters.*<br> | ||
**Paper:** [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)<br> | ||
![](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*nt-ydHhSVsaLXq_HZRaLQA.png) | ||
|
||
--- | ||
### [OpenLLaMA](https://github.com/openlm-research/open_llama) | ||
**model:** [https://huggingface.co/openlm-research/open_llama_3b_v2](https://huggingface.co/openlm-research/open_llama_3b_v2)<br> | ||
**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-openllama](https://www.kaggle.com/code/rkuo2000/llm-openllama)<br> | ||
|
||
--- | ||
**Blog:** [Building a Million-Parameter LLM from Scratch Using Python](https://levelup.gitconnected.com/building-a-million-parameter-llm-from-scratch-using-python-f612398f06c2)<br> | ||
**Kaggle:** [LLM LLaMA from scratch](https://www.kaggle.com/rkuo2000/llm-llama-from-scratch/)<br> | ||
|
||
--- | ||
### Pythia | ||
**Paper:** [Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling](https://arxiv.org/abs/2304.01373)<br> | ||
**Dataset:** <br> | ||
[The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)<br> | ||
[Datasheet for the Pile](https://arxiv.org/abs/2201.07311)<br> | ||
**Code:** [Pythia: Interpreting Transformers Across Time and Scale](https://github.com/EleutherAI/pythia)<br> | ||
|
||
--- | ||
### Falcon-40B | ||
**Paper:** [The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only](https://arxiv.org/abs/2306.01116)<br> | ||
**Dataset:** [https://huggingface.co/datasets/tiiuae/falcon-refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)<br> | ||
**Code:** [https://huggingface.co/tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)<br> | ||
|
||
--- | ||
### LLaMA-2 | ||
**Paper:** [Llama 2: Open Foundation and Fine-Tuned Chat Models](https://arxiv.org/abs/2307.09288)<br> | ||
**Code:** [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama)<br> | ||
**models:** [https://huggingface.co/meta-llama](https://huggingface.co/meta-llama)<br> | ||
|
||
--- | ||
### Orca | ||
**Paper:** [Orca: Progressive Learning from Complex Explanation Traces of GPT-4](https://arxiv.org/abs/2306.02707)<br> | ||
|
||
--- | ||
### Vicuna | ||
**Paper:** [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://arxiv.org/abs/2306.05685)<br> | ||
**model:** [https://huggingface.co/lmsys/vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5)<br> | ||
**Code:** [https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat)<br> | ||
|
||
--- | ||
### [Platypus](https://platypus-llm.github.io/) | ||
**Paper:** [Platypus: Quick, Cheap, and Powerful Refinement of LLMs](https://arxiv.org/abs/2308.07317)<br> | ||
**Code:** [https://github.com/arielnlee/Platypus/](https://github.com/arielnlee/Platypus/)<br> | ||
|
||
--- | ||
### Mistral | ||
**Paper:** [Mistral 7B](https://arxiv.org/abs/2310.06825)<br> | ||
**Code:** [https://github.com/mistralai/mistral-src](https://github.com/mistralai/mistral-src)<br> | ||
**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-mistral-7b-instruct](https://www.kaggle.com/code/rkuo2000/llm-mistral-7b-instruct)<br> | ||
![](https://www.promptingguide.ai/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmistral-7B-2.8625353c.png&w=1920&q=75) | ||
|
||
--- | ||
### Mistral 8X7B | ||
![](https://miro.medium.com/v2/resize:fit:720/format:webp/0*91yEJMc_q-QlU-bk.png) | ||
|
||
--- | ||
### Zephyr | ||
**Paper:** [Zephyr: Direct Distillation of LM Alignment](https://arxiv.org/abs/2310.16944)<br> | ||
**Code:** [https://huggingface.co/HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)<br> | ||
**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-zephyr-7b](https://www.kaggle.com/code/rkuo2000/llm-zephyr-7b)<br> | ||
![](https://i3.res.bangqu.com/farm/liang/news/2023/10/28/9e3a1a498f94b147fd57608b4beaefe0.jpg) | ||
|
||
--- | ||
**Blog:** [Zephyr-7B : HuggingFace’s Hyper-Optimized LLM Built on Top of Mistral 7B](https://www.unite.ai/zephyr-7b-huggingfaces-hyper-optimized-llm-built-on-top-of-mistral-7b/)<br> | ||
![](https://www.unite.ai/wp-content/uploads/2023/11/Knowledge-distillation-r.png) | ||
![](https://www.unite.ai/wp-content/uploads/2023/11/knowledge-distillation.png) | ||
![](https://www.unite.ai/wp-content/uploads/2023/11/Model-Performace-768x418.png) | ||
|
||
--- | ||
### SOLAR-10.7B ~ Depth Upscaling | ||
**Code:** [https://huggingface.co/upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0)<br> | ||
Depth-Upscaled SOLAR-10.7B has remarkable performance. It outperforms models with up to 30B parameters, even surpassing the recent Mixtral 8X7B model.<br> | ||
Leveraging state-of-the-art instruction fine-tuning methods, including supervised fine-tuning (SFT) and direct preference optimization (DPO), | ||
researchers utilized a diverse set of datasets for training. This fine-tuned model, SOLAR-10.7B-Instruct-v1.0, achieves a remarkable Model H6 score of 74.20, | ||
boasting its effectiveness in single-turn dialogue scenarios.<br> | ||
|
||
--- | ||
### Phi-2 (Transformer with 2.7B parameters) | ||
**Blog:** [Phi-2: The surprising power of small language models](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/)<br> | ||
**Code:** [https://huggingface.co/microsoft/phi-2](https://huggingface.co/microsoft/phi-2)<br> | ||
**Kaggle:** [https://www.kaggle.com/code/rkuo2000/llm-phi-2](https://www.kaggle.com/code/rkuo2000/llm-phi-2)<br> | ||
|
||
--- | ||
### Orca 2 | ||
**Paper:** [https://arxiv.org/abs/2311.11045](https://arxiv.org/abs/2311.11045)<br> | ||
**model:** [Orca 2: Teaching Small Language Models How to Reason](https://huggingface.co/microsoft/Orca-2-13b)<br> | ||
**Blog:** [Microsoft's Orca 2 LLM Outperforms Models That Are 10x Larger](https://www.infoq.com/news/2023/12/microsoft-orca-2-llm/)<br> | ||
<p><img src="https://s4.itho.me/sites/default/files/images/1123-Orca-2-microsoft-600.png" width="50%" height="50%"></p> | ||
|
||
--- | ||
**Paper:** [Variety and Quality over Quantity: Towards Versatile Instruction Curation](https://arxiv.org/abs/2312.11508)<br> | ||
|
||
--- | ||
### [Next-GPT](https://next-gpt.github.io/) | ||
**Paper:** [Any-to-Any Multimodal Large Language Model](https://arxiv.org/abs/2309.05519)<br> | ||
![](https://next-gpt.github.io/static/images/framework.png) | ||
|
||
--- | ||
### MLLM FineTuning | ||
**Paper:** [Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning](https://arxiv.org/abs/2312.11420)<br> | ||
|
||
--- | ||
## LLM in Vision | ||
**Papers:** [https://github.com/DirtyHarryLYL/LLM-in-Vision](https://github.com/DirtyHarryLYL/LLM-in-Vision)<br> | ||
|
||
--- | ||
### VisionLLM | ||
**Paper:** [VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks](https://arxiv.org/abs/2305.11175)<br> | ||
![](https://api.wandb.ai/files/byyoung3/images/projects/37269171/01ab3dba.png) | ||
|
||
--- | ||
### MiniGPT-v2 | ||
**Paper:** [MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning](https://arxiv.org/abs/2310.09478)<br> | ||
**Code:** [https://github.com/Vision-CAIR/MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4)<br> | ||
![](https://github.com/Vision-CAIR/MiniGPT-4/raw/main/figs/minigpt2_demo.png) | ||
|
||
--- | ||
### GPT4-V | ||
**Paper:** [Assessing GPT4-V on Structured Reasoning Tasks](https://arxiv.org/abs/2312.11524)<br> | ||
|
||
--- | ||
### Gemini | ||
**Paper:** [Gemini: A Family of Highly Capable Multimodal Models](https://arxiv.org/abs/2312.11805)<br> | ||
![](https://github.com/rkuo2000/AI-course/blob/main/images/Gemini.png?raw=true) | ||
|
||
--- | ||
### [LLaVA](https://llava-vl.github.io/) | ||
**Paper:** [Visual Instruction Tuning](https://arxiv.org/abs/2304.08485)<br> | ||
**Paper:** [Improved Baselines with Visual Instruction Tuning](https://arxiv.org/abs/2310.03744)<br> | ||
**Code:** [https://github.com/haotian-liu/LLaVA](https://github.com/haotian-liu/LLaVA)<br> | ||
**Demo:** [https://llava.hliu.cc/](https://llava.hliu.cc/) | ||
|
||
--- | ||
### [VLFeedback and Silkie](https://vlf-silkie.github.io/) | ||
A GPT-4V annotated preference dataset for large vision language models. | ||
**Paper:** [Silkie: Preference Distillation for Large Visual Language Models](https://arxiv.org/abs/2312.10665)<br> | ||
**Code:** [https://github.com/vlf-silkie/VLFeedback](https://github.com/vlf-silkie/VLFeedback)<br> | ||
![](https://github.com/vlf-silkie/VLFeedback/raw/main/imgs/annotate_framework.png) | ||
|
||
--- | ||
## LLM in Robotics | ||
**Paper:** [Language-conditioned Learning for Robotic Manipulation: A Survey](https://arxiv.org/abs/2312.10807)<br> | ||
|
||
--- | ||
**Paper:** [Human Demonstrations are Generalizable Knowledge for Robots](https://arxiv.org/abs/2312.02419)<br> | ||
|
||
--- | ||
## Advanced Topics | ||
|
||
### XoT | ||
**Paper:** [Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation](https://arxiv.org/abs/2311.04254)<br> | ||
![](https://miro.medium.com/v2/resize:fit:720/format:webp/0*r_a44DuxG3D8DGZO.png) | ||
|
||
--- | ||
### FunSearch | ||
[DeepMind發展用LLM解困難數學問題的方法](https://www.ithome.com.tw/news/160354)<br> | ||
![](https://s4.itho.me/sites/default/files/styles/picture_size_large/public/field/image/2108_-_funsearch_making_new_discoveries_in_mathematical_sciences_using_lar_-_deepmind.google.jpg?itok=mAy4ydAE) | ||
|
||
--- | ||
### [ALTER-LLM](https://tnoinkwms.github.io/ALTER-LLM/) | ||
**Paper:** [From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"](https://arxiv.org/abs/2312.06571)<br> | ||
<iframe width="593" height="346" src="https://www.youtube.com/embed/SAc-O5FDJ4k" title="play the metal" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> | ||
![](https://tnoinkwms.github.io/ALTER-LLM/architecture_2.png) | ||
![](https://tnoinkwms.github.io/ALTER-LLM/feedback.png) | ||
|
||
--- | ||
### BrainGPT | ||
**Paper:** [DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation](https://arxiv.org/abs/2309.14030)<br> | ||
**Blog:** [New Mind-Reading "BrainGPT" Turns Thoughts Into Text On Screen](https://www.iflscience.com/new-mind-reading-braingpt-turns-thoughts-into-text-on-screen-72054)<br> | ||
![](https://i3.res.bangqu.com/farm/liang/news/2023/12/18/339b9a2158e1fd28e1e39ee4b1557df2.jpg) | ||
![](https://i3.res.bangqu.com/farm/liang/news/2023/12/18/79ca704627e4cadc1e23afc1b2f029cb.jpg) | ||
<iframe width="993" height="559" src="https://www.youtube.com/embed/crJst7Yfzj4" title="UTS HAI Research - BrainGPT" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe> | ||
|
||
<br> | ||
<br> | ||
|
||
*This site was last updated {{ site.time | date: "%B %d, %Y" }}.* | ||
|
Oops, something went wrong.