Skip to content

Release v1.0.0

Latest
Compare
Choose a tag to compare
@LinB203 LinB203 released this 04 Feb 08:53
188d462
  • Supported higher resolution input using google/siglip-so400m-patch14-384 as the vision encoder for a more detailed visual understanding.
  • Changed capacity_factor to 1.5 to support stronger MoE-LLaVA.
  • Added the results of MME benchmark and evaluation pipeline.
  • Improved docs.
  • Fixed typos.

We hope that community researchers can pay attention to the fact that large vision-language models can also be sparsified and even perform better.