Use q3f16 models #630

bil-ash · 2024-11-16T02:33:16Z

@CharlieFRuan From the docs, it can be seen that mlc supports q3f16 quantization . However, all the models used in web-llm are either q4 or q0. Is q3f16 quantization not supported in web-llm(only supported in mlc-llm)? I an asking this because if q3 is indeed supported in web-llm it should be used because generally mobile browsers are memory constrained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use q3f16 models #630

Use q3f16 models #630

bil-ash commented Nov 16, 2024

Use q3f16 models #630

Use q3f16 models #630

Comments

bil-ash commented Nov 16, 2024