Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix llama meta tensor loading in AutoTP and kernel injected inference (…
…#3608) * Adapt to Llama when using meta tensor to load * Fix gated mlp parameter mp * Re-enable meta tensor for kernel injection Fix layer params loading in meta tensor * Revert mlp_inter_mp for gated mlp as it is fixed * Monkey patch for fixing llama output * Fix formatting * Add comment --------- Co-authored-by: Lev Kurilenko <[email protected]> Co-authored-by: Lev Kurilenko <[email protected]>
- Loading branch information