Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Refact convert scripts #35

Closed
wants to merge 7 commits into from
Closed

Refact convert scripts #35

wants to merge 7 commits into from

Conversation

zhenwei-intel
Copy link
Contributor

@zhenwei-intel zhenwei-intel commented Jan 8, 2024

Type of Change

feature

Description

  • convert llama online without saving to local path
  • support online quantizing: q4_0/jblas, without saving fp32.bin
  • enable gptq and awq for all other models
  • bf16/fp16
  • integrate gguf function
zhenweil@icx-1 ~/c/neural-speed (lzw/online_llama) [1]> python scripts/python_api_example.py ~/models/llama/Llama-2-7b-chat-hf/                                                   (llm) 
QuantConfig(weight_dtype='int4', alg='sym', group_size=32, scale_dtype='fp32', compute_dtype='int8', use_ggml=False, not_quant=False, use_gptq=False, use_awq=False)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.70it/s]
Loading vocab file /home/zhenweil/models/llama/Llama-2-7b-chat-hf/tokenizer.model
Processing layers: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [00:57<00:00,  1.79s/it]
Success! saved as runtime_outs/ne_llama_q.bin
AVX:1 AVX2:1 AVX512F:1 AVX_VNNI:0 AVX512_VNNI:1 AMX_INT8:0 AMX_BF16:0 AVX512_BF16:0 AVX512_FP16:0
beam_size: 1, do_sample: 1, top_k: 40, top_p: 0.950000
model.cpp: loading model from runtime_outs/ne_llama_q.bin
init: n_vocab    = 32000
init: n_embd     = 4096
init: n_mult     = 256
init: n_head     = 32
init: n_head_kv  = 32
init: n_layer    = 32
init: n_rot      = 128
init: n_ff       = 11008
init: n_parts    = 1
load: ne ctx size = 5271.06 MB
load: mem required  = 7321.06 MB (+ memory per state)
....................................................................................
model_init_from_file: support_bestla_kv = 0
model_init_from_file: kv self size =  128.00 MB
<s> Once upon a time, a little girl named Lily lived in a small village nestled between two great mountains. everyone in the village loved Lily, and she was known for her kindness and

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

@zhenwei-intel zhenwei-intel marked this pull request as draft January 9, 2024 02:49
@zhenwei-intel zhenwei-intel changed the title convert llama online without saving to local path Refact convert scripts Jan 11, 2024
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Signed-off-by: zhenwei-intel <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants