Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom answer generation function in WWB #507

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Jun 14, 2024

This functionally will help for enable different model API that has different interface for generation answers (e.g. OpenVINO GenAI)

example with GenAI:

from transformers import AutoModelForCausalLM, AutoTokenizer
import huggingface_hub as hf_hub
import whowhatbench
import openvino_genai

model_id = "databricks/dolly-v2-3b"
base_model = AutoModelForCausalLM.from_pretrained(model_id)
ov_model_dir = "./dolly-v2-3b-int4-ov"

hf_hub.snapshot_download("OpenVINO/dolly-v2-3b-int4-ov", local_dir=ov_model_dir)
optimized_model = openvino_genai.LLMPipeline(ov_model_dir, "CPU")
tokenizer = AutoTokenizer.from_pretrained(model_id)

def genai_gen_answer(model, tokenizer, question, max_new_tokens, skip_question):
    out = model.generate(question, max_new_tokens=max_new_tokens)
    return out.texts[0]

evaluator = whowhatbench.Evaluator(base_model=base_model, tokenizer=tokenizer)
metrics_per_prompt, metrics = evaluator.score(optimized_mode, gen_answer_fn=genai_gen_answer)

@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Jun 14, 2024
@eaidova eaidova force-pushed the ea/custom_answer_gen branch from 73deda5 to ab0970b Compare June 14, 2024 08:45
@eaidova eaidova force-pushed the ea/custom_answer_gen branch from ab0970b to 659b327 Compare June 14, 2024 08:50
@andreyanufr
Copy link
Contributor

This functionally will help for enable different model API that has different interface for generation answers (e.g. OpenVINO GenAI)

example with GenAI:

from transformers import AutoModelForCausalLM, AutoTokenizer
import huggingface_hub as hf_hub
import whowhatbench
import openvino_genai

model_id = "databricks/dolly-v2-3b"
base_model = AutoModelForCausalLM.from_pretrained(model_id)
ov_model_dir = "./dolly-v2-3b-int4-ov"

hf_hub.snapshot_download("OpenVINO/dolly-v2-3b-int4-ov", local_dir=ov_model_dir)
optimized_model = openvino_genai.LLMPipeline(ov_model_dir, "CPU")
tokenizer = AutoTokenizer.from_pretrained(model_id)

def genai_gen_answer(model, tokenizer, question, max_new_tokens, skip_question):
    out = model.generate(question, max_new_tokens=max_new_tokens)
    return out.texts[0]

evaluator = whowhatbench.Evaluator(base_model=base_model, tokenizer=tokenizer)
metrics_per_prompt, metrics = evaluator.score(optimized_mode, gen_answer_fn=genai_gen_answer)

@eaidova
May be it is better to move gen_answer_fn to whowhatbench.Evaluator() init? In current implementation base_model and optimized model will have different inference.

@eaidova
Copy link
Collaborator Author

eaidova commented Jun 14, 2024

This functionally will help for enable different model API that has different interface for generation answers (e.g. OpenVINO GenAI)
example with GenAI:

from transformers import AutoModelForCausalLM, AutoTokenizer
import huggingface_hub as hf_hub
import whowhatbench
import openvino_genai

model_id = "databricks/dolly-v2-3b"
base_model = AutoModelForCausalLM.from_pretrained(model_id)
ov_model_dir = "./dolly-v2-3b-int4-ov"

hf_hub.snapshot_download("OpenVINO/dolly-v2-3b-int4-ov", local_dir=ov_model_dir)
optimized_model = openvino_genai.LLMPipeline(ov_model_dir, "CPU")
tokenizer = AutoTokenizer.from_pretrained(model_id)

def genai_gen_answer(model, tokenizer, question, max_new_tokens, skip_question):
    out = model.generate(question, max_new_tokens=max_new_tokens)
    return out.texts[0]

evaluator = whowhatbench.Evaluator(base_model=base_model, tokenizer=tokenizer)
metrics_per_prompt, metrics = evaluator.score(optimized_mode, gen_answer_fn=genai_gen_answer)

@eaidova May be it is better to move gen_answer_fn to whowhatbench.Evaluator() init? In current implementation base_model and optimized model will have different inference.

@andreyanufr it is expected that they may have different inference. I want to compare model answers between genai and original model or optimum-intel. In this case, both base and optimized may be used via different API.

Do you think I should implement this for base model too?

@eaidova eaidova merged commit c7c592d into openvinotoolkit:master Jun 20, 2024
28 checks passed
@eaidova eaidova deleted the ea/custom_answer_gen branch June 20, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants