Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Grouped Query Attention on Llama Model #393

Merged
merged 1 commit into from
Nov 15, 2023
Merged

Add support for Grouped Query Attention on Llama Model #393

merged 1 commit into from
Nov 15, 2023

Conversation

felladrin
Copy link
Contributor

@felladrin felladrin commented Nov 15, 2023

This change allows using models like ahxt/llama2_xs_460M_experimental.

How to test

import { AutoModelForCausalLM, AutoTokenizer } from "http://localhost:8080/dist/transformers.js";
const model_path = "Felladrin/onnx-int8-llama2_xs_460M_experimental";
const model = await AutoModelForCausalLM.from_pretrained(model_path);
const tokenizer = await AutoTokenizer.from_pretrained(model_path);
const prompt = "Q: What is the largest bird?\nA:";
const input_ids = tokenizer(prompt).input_ids;
const tokens = await model.generate(input_ids, { max_length: 20 });
console.log(tokenizer.decode(tokens[0], { skip_special_tokens: true }));
image

Confirm it also works with pipeline:

import { pipeline } from "http://localhost:8080/dist/transformers.js";
const generator = await pipeline( "text-generation", "Felladrin/onnx-int8-llama2_xs_460M_experimental" );
const [output] = await generator("Once upon a time,", { max_length: 20 });
console.log(output.generated_text);
image

@xenova xenova merged commit 4e4148c into huggingface:main Nov 15, 2023
3 of 4 checks passed
@xenova
Copy link
Collaborator

xenova commented Nov 15, 2023

Thanks so much @felladrin! 🤗

@felladrin felladrin deleted the patch-1 branch November 15, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Support for ahxt/llama2_xs_460M_experimental model
2 participants