Skip to content

Commit

Permalink
Merge branch 'yaml_multilingual_tasks'
Browse files Browse the repository at this point in the history
  • Loading branch information
KlaudiaTH committed Jul 16, 2024
2 parents a0a2fec + 6092ac5 commit 467e766
Show file tree
Hide file tree
Showing 2,042 changed files with 20,012 additions and 25 deletions.
8 changes: 8 additions & 0 deletions lm_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,13 @@ def parse_eval_args() -> argparse.Namespace:
help="Limit the number of examples per task. "
"If <1, limit is a percentage of the total number of examples.",
)
parser.add_argument(
"--bootstrap_iters",
type=int,
default=100000,
metavar="N",
help="Number of bootstrapping iterations for metric standard error estimation.",
)
parser.add_argument(
"--use_cache",
"-c",
Expand Down Expand Up @@ -238,6 +245,7 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
device=args.device,
use_cache=args.use_cache,
limit=args.limit,
bootstrap_iters=args.bootstrap_iters,
decontamination_ngrams_path=args.decontamination_ngrams_path,
check_integrity=args.check_integrity,
write_out=args.write_out,
Expand Down
14 changes: 13 additions & 1 deletion lm_eval/api/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,9 +159,21 @@ def exact_match_fn(**kwargs):
output_type="loglikelihood",
aggregation="perplexity",
)
def perplexity_fn(items): # This is a passthrough function
def perplexity_fn(items):
return items

@register_aggregation("nll")
def nll(items):
return -mean(items)

@register_metric(
metric="nll",
higher_is_better=False,
output_type="loglikelihood",
aggregation="nll",
)
def nll_fn(items):
return items

@register_metric(
metric="word_perplexity",
Expand Down
38 changes: 15 additions & 23 deletions lm_eval/models/huggingface.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,34 +742,26 @@ def _select_cont_toks(self, logits, contlen=None, inplen=None):

return logits

def _encode_pair(
self, context: str, continuation: str
) -> Tuple[List[int], List[int]]:
n_spaces = len(context) - len(context.rstrip())
if n_spaces > 0:
continuation = context[-n_spaces:] + continuation
context = context[:-n_spaces]

whole_enc = self.tok_encode(context + continuation, add_special_tokens=False)
context_enc = self.tok_encode(context, add_special_tokens=False)

# whole_enc = self.tok_encode(context + continuation)
# context_enc = self.tok_encode(context, add_special_tokens=False)
context_enc_len = len(context_enc)
continuation_enc = whole_enc[context_enc_len:]
return context_enc, continuation_enc

def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
new_reqs = []
for context, continuation in [req.args for req in requests]:
continuation_enc = self.tok_encode(continuation)

if context == "":
# end of text as context
context_enc, continuation_enc = (
[self.eot_token_id],
self.tok_encode(continuation),
)
context_enc = [self.eot_token_id]
else:
context_enc, continuation_enc = self._encode_pair(context, continuation)
context_enc = self.tok_encode(context, add_special_tokens=False)
ctx_cont_enc = self.tok_encode(context + continuation, add_special_tokens=False)

if context_enc + continuation_enc != ctx_cont_enc:
if ctx_cont_enc[: len(context_enc)] == context_enc:
continuation_enc = ctx_cont_enc[len(context_enc) :]
elif ctx_cont_enc[-len(continuation_enc) :] == continuation_enc:
context_enc = ctx_cont_enc[: -len(continuation_enc)]
else:
print(
f"WARNING: Unnatural tokenization of concatenated context ...{repr(context[-20:])} and continuation {repr(continuation)}"
)

new_reqs.append(((context, continuation), context_enc, continuation_enc))

Expand Down
47 changes: 47 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Multilingual ARC

### Paper

Title: `Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback`

Abstract: https://arxiv.org/abs/2307.16039

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at this https URL.

Homepage: `https://github.com/nlp-uoregon/Okapi`


### Citation

```
@article{dac2023okapi,
title={Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback},
author={Dac Lai, Viet and Van Nguyen, Chien and Ngo, Nghia Trung and Nguyen, Thuat and Dernoncourt, Franck and Rossi, Ryan A and Nguyen, Thien Huu},
journal={arXiv e-prints},
pages={arXiv--2307},
year={2023}
}
```

### Groups and Tasks

#### Groups

- arc_multilingual

#### Tasks

- `arc_{ar,bn,ca,da,de,es,eu,fr,gu,hi,hr,hu,hy,id,it,kn,ml,mr,ne,nl,pt,ro,ru,sk,sr,sv,ta,te,uk,vi,zh}`

### Checklist

For adding novel benchmarks/datasets to the library:
* [x] Is the task an existing benchmark in the literature?
* [x] Have you referenced the original paper that introduced the task?
* [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?


If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
23 changes: 23 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/_arc_yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
group:
- arc_multilingual
dataset_path: null
dataset_name: null
output_type: multiple_choice
training_split: train
validation_split: validation
test_split: test
process_docs: !function utils.process_docs
doc_to_text: "query"
doc_to_target: "gold"
doc_to_choice: "choices"
should_decontaminate: true
doc_to_decontamination_query: "query"
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
metadata:
version: 1.0
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ar.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ar
dataset_path: alexandrainst/m_arc
dataset_name: ar
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_bn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_bn
dataset_path: alexandrainst/m_arc
dataset_name: bn
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ca.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ca
dataset_path: alexandrainst/m_arc
dataset_name: ca
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_da.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_da
dataset_path: alexandrainst/m_arc
dataset_name: da
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_de.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_de
dataset_path: alexandrainst/m_arc
dataset_name: de
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_es.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_es
dataset_path: alexandrainst/m_arc
dataset_name: es
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_eu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_eu
dataset_path: alexandrainst/m_arc
dataset_name: eu
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_fr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_fr
dataset_path: alexandrainst/m_arc
dataset_name: fr
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_gu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_gu
dataset_path: alexandrainst/m_arc
dataset_name: gu
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_hi.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_hi
dataset_path: alexandrainst/m_arc
dataset_name: hi
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_hr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_hr
dataset_path: alexandrainst/m_arc
dataset_name: hr
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_hu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_hu
dataset_path: alexandrainst/m_arc
dataset_name: hu
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_hy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_hy
dataset_path: alexandrainst/m_arc
dataset_name: hy
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_id.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_id
dataset_path: alexandrainst/m_arc
dataset_name: id
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_it.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_it
dataset_path: alexandrainst/m_arc
dataset_name: it
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_kn.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_kn
dataset_path: alexandrainst/m_arc
dataset_name: kn
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ml.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ml
dataset_path: alexandrainst/m_arc
dataset_name: ml
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_mr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_mr
dataset_path: alexandrainst/m_arc
dataset_name: mr
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ne.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ne
dataset_path: alexandrainst/m_arc
dataset_name: ne
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_nl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_nl
dataset_path: alexandrainst/m_arc
dataset_name: nl
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_pt.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_pt
dataset_path: alexandrainst/m_arc
dataset_name: pt
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ro.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ro
dataset_path: alexandrainst/m_arc
dataset_name: ro
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ru.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ru
dataset_path: alexandrainst/m_arc
dataset_name: ru
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_sk.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_sk
dataset_path: alexandrainst/m_arc
dataset_name: sk
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_sr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_sr
dataset_path: alexandrainst/m_arc
dataset_name: sr
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_sv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_sv
dataset_path: alexandrainst/m_arc
dataset_name: sv
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_ta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_ta
dataset_path: alexandrainst/m_arc
dataset_name: ta
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_te.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_te
dataset_path: alexandrainst/m_arc
dataset_name: te
training_split: train
validation_split: validation
test_split: test
7 changes: 7 additions & 0 deletions lm_eval/tasks/okapi/arc_multilingual/arc_uk.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
include: _arc_yaml
task: arc_uk
dataset_path: alexandrainst/m_arc
dataset_name: uk
training_split: train
validation_split: validation
test_split: test
Loading

0 comments on commit 467e766

Please sign in to comment.