This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
resubmit "Implement the YaRN rop scaling feature" #147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resubmit "Implement the YaRN rop scaling feature (#109)"
Fix Windows build error.
Type of Change
Implement the YaRN feature.
Add two new API for YaRN rop-scale:
NE_API struct ne_tensor* ne_rope_custom_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, float freq_base, float freq_scale, int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);
NE_API struct ne_tensor ne_rope_custom_shift_inplace(struct ne_context* ctx, struct ne_tensor* a, int n_shift, int n_dims,
int mode, int prompt_size, int n_keep, struct ne_tensor* cossin,
float freq_base, float freq_scale, int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow);
Change ne_layer internal API:
original:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale)
new API:
struct ne_tensor* ne_rope_impl(struct ne_context* ctx, struct ne_tensor* a, int n_past, int n_dims, int mode,
int prompt_size, bool inplace, int n_keep, struct ne_tensor* cossin, int* n_padding,
bool padding_left, float freq_base, float freq_scale,
int yarn_orig_ctx, float ext_factor, float attn_factor,
float beta_fast, float beta_slow)
Description
Expected Behavior & Potential Risk
No behaviors changes to all the models as yarn is not called.
How has this PR been tested?
Teste llama2 7B ,fp32, q40, q4j format. No changes found yet.
Dependency Change?
No.