v1.2.0
expand context length to 32K & support flash attention on intel-avx512 platform
- remove currently unsupported cache mode
- examples: update qwen prompt template, add print func to examples
- support glm-4-9b-chat by
- change to size_t to avoid overflow when seq is long
- update README since we support 32k context length
- Add flash attention on intel-avx512 platform