Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Commit

Permalink
revert embedding size on CPU
Browse files Browse the repository at this point in the history
  • Loading branch information
luoyu-intel committed Jun 21, 2024
1 parent 42fa774 commit 5d02676
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion neural_speed/models/llama/llama_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ void Llama::load(model_context* ctx, model_progress_callback progress_callback,
int n_cpu_layer = n_layer - n_gpu_layer;
n_cpu_layer = n_cpu_layer < 0 ? 0 : n_cpu_layer;
fprintf(stderr, "%s: ctx size = %7.2f MB\n", __func__, ctx_size / 1024.0 / 1024.0);
auto host_size = (ctx_size + (50 << 20)) * n_cpu_layer / n_layer + (50 << 20);
auto host_size = (ctx_size + (50 << 20)) * n_cpu_layer / n_layer + n_embd * n_vocab * sizeof(float); // embedding on CPU
auto device_size = (ctx_size + (50 << 20)) * n_gpu_layer / n_layer + (50 << 20);
fprintf(stderr, "%s: host ctx size = %7.2f MB\n", __func__, host_size / 1024.0 / 1024.0);
#ifdef NS_SYCL
Expand Down

0 comments on commit 5d02676

Please sign in to comment.