Add support for Grouped Query Attention on Llama Model #393

felladrin · 2023-11-15T13:42:30Z

This change allows using models like ahxt/llama2_xs_460M_experimental.

Resolves [Feature request] Support for ahxt/llama2_xs_460M_experimental model #388
See @xenova's comment which defines the problem and points to the solution.

How to test

import { AutoModelForCausalLM, AutoTokenizer } from "http://localhost:8080/dist/transformers.js";
const model_path = "Felladrin/onnx-int8-llama2_xs_460M_experimental";
const model = await AutoModelForCausalLM.from_pretrained(model_path);
const tokenizer = await AutoTokenizer.from_pretrained(model_path);
const prompt = "Q: What is the largest bird?\nA:";
const input_ids = tokenizer(prompt).input_ids;
const tokens = await model.generate(input_ids, { max_length: 20 });
console.log(tokenizer.decode(tokens[0], { skip_special_tokens: true }));

Confirm it also works with pipeline:

import { pipeline } from "http://localhost:8080/dist/transformers.js";
const generator = await pipeline( "text-generation", "Felladrin/onnx-int8-llama2_xs_460M_experimental" );
const [output] = await generator("Once upon a time,", { max_length: 20 });
console.log(output.generated_text);

Resolves #388

xenova · 2023-11-15T15:51:38Z

Thanks so much @felladrin! 🤗

Add support for Grouped Query Attention on Llama Model

9992e07

Resolves #388

xenova merged commit 4e4148c into huggingface:main Nov 15, 2023
3 of 4 checks passed

felladrin deleted the patch-1 branch November 15, 2023 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Grouped Query Attention on Llama Model #393

Add support for Grouped Query Attention on Llama Model #393

felladrin commented Nov 15, 2023 •

edited

Loading

xenova commented Nov 15, 2023

Add support for Grouped Query Attention on Llama Model #393

Add support for Grouped Query Attention on Llama Model #393

Conversation

felladrin commented Nov 15, 2023 • edited Loading

How to test

xenova commented Nov 15, 2023

felladrin commented Nov 15, 2023 •

edited

Loading