From 072f5041a38a58b97620927854457fcacc4abd73 Mon Sep 17 00:00:00 2001 From: Richard Kuo Date: Sun, 9 Jun 2024 03:05:57 +0800 Subject: [PATCH] Update 2024-04-04-VLM.md --- _posts/2024-04-04-VLM.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/_posts/2024-04-04-VLM.md b/_posts/2024-04-04-VLM.md index 8e074565..6ca3bfa5 100644 --- a/_posts/2024-04-04-VLM.md +++ b/_posts/2024-04-04-VLM.md @@ -173,8 +173,20 @@ Multimodal Language Models](https://publications.reka.ai/reka-core-tech-report.p **InternLM-XComposer2-4KHD** could further understand 4K Resolution images.
![](https://github.com/InternLM/InternLM-XComposer/raw/main/assets/4khd_radar.png) +--- +### [Phi-3](https://azure.microsoft.com/en-us/blog/new-models-added-to-the-phi-3-family-available-on-microsoft-azure/) +**model:** [microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)
+* Phi-3-vision is a **4.2B** parameter multimodal model with language and vision capabilities. +* Phi-3-mini is a 3.8B parameter language model, available in two context lengths (128K and 4K). +* Phi-3-small is a 7B parameter language model, available in two context lengths (128K and 8K). +* Phi-3-medium is a 14B parameter language model, available in two context lengths (128K and 4K). + +--- +### [MiniCPM-V](https://github.com/OpenBMB/MiniCPM-V) +**model:** [openbmb/MiniCPM-Llama3-V-2_5-int4](https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-int4)
+![](https://github.com/OpenBMB/MiniCPM-V/raw/main/assets/MiniCPM-Llama3-V-2.5-peformance.png) +

*This site was last updated {{ site.time | date: "%B %d, %Y" }}.* -