Release Release v1.0.0 · PKU-YuanGroup/MoE-LLaVA

Supported higher resolution input using google/siglip-so400m-patch14-384 as the vision encoder for a more detailed visual understanding.
Changed capacity_factor to 1.5 to support stronger MoE-LLaVA.
Added the results of MME benchmark and evaluation pipeline.
Improved docs.
Fixed typos.

We hope that community researchers can pay attention to the fact that large vision-language models can also be sparsified and even perform better.

Provide feedback