Skip to content

v1.2.0

Compare
Choose a tag to compare
@yejunjin yejunjin released this 24 Jun 05:32
· 20 commits to main since this release
3a0417b

expand context length to 32K & support flash attention on intel-avx512 platform

  • remove currently unsupported cache mode
  • examples: update qwen prompt template, add print func to examples
  • support glm-4-9b-chat by
  • change to size_t to avoid overflow when seq is long
  • update README since we support 32k context length
  • Add flash attention on intel-avx512 platform