Skip to content

Commit

Permalink
Adding Simplified Coding Tasks (#645)
Browse files Browse the repository at this point in the history
* adding simple tasks

* add simple human_eval

* fix yaml

* fix yaml

* remove breakpoint

* remove breakpoint

* change bsz

* merge main

* add udpated readme

* fix precommit

* restor line

* restor line

* add link to codegeex

* restor hf eval

---------

Co-authored-by: Jeremy Dohmann <[email protected]>
Co-authored-by: Jeremy D <[email protected]>
Co-authored-by: Daniel King <[email protected]>
  • Loading branch information
4 people authored Oct 11, 2023
1 parent 0045ae6 commit cdb1c28
Show file tree
Hide file tree
Showing 10 changed files with 792 additions and 15 deletions.
39 changes: 37 additions & 2 deletions scripts/eval/local_data/MODEL_GAUNTLET.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,8 +257,43 @@ Language understanding tasks evaluate the model’s ability to understand the st
### Programming
Programming tasks evaluate the model's ability to understand code, write functionally correct code given a specification, simulate code, and document code. Right now we just have HumanEval but later versions will include more.

35. HumanEval code generation
- Description: HumanEval consists of 164 python programming challenges, in which the model is presented with the method signature and docstring comment for a python program and is expected to complete the program. We then test the resultant code’s functional correctness on a number of test input/output pairs.
35. HumanEval Python code generation
- Description: HumanEval Python consists of 164 python programming challenges, in which the model is presented with the method signature and docstring comment for a python program and is expected to complete the program. We then test the resultant code’s functional correctness on a number of test input/output pairs.
- Year released: 2022
- Number of few shot examples: 0
- Random baseline accuracy: 0%
36. HumanEval C++ code generation
- Description: HumanEval C++ consists of 161 C++ programming challenges, in which the model is presented with the method signature and docstring comment for a C++ program and is expected to complete the program. We then test the resultant code’s functional correctness on a number of test input/output pairs. The C++ translation of HumanEval comes from the [CodeGeex](https://huggingface.co/datasets/THUDM/humaneval-x/viewer/cpp) project.
- Year released: 2022
- Number of few shot examples: 0
- Random baseline accuracy: 0%
37. HumanEval JS code generation
- Description: HumanEval JS consists of 164 Javscript programming challenges, in which the model is presented with the method signature and docstring comment for a Javacript program and is expected to complete the program. We then test the resultant code’s functional correctness on a number of test input/output pairs. The JS translation of HumanEval comes from the [CodeGeex](https://huggingface.co/datasets/THUDM/humaneval-x/viewer/cpp) project.
- Year released: 2022
- Number of few shot examples: 0
- Random baseline accuracy: 0%
38. HumanEval Python 25% code generation
- Description: HumanEval Python 25% is an easier variant of HumanEval Python in which in addition to the original method signature, the model is also provided 25% of the lines in the canonical solution and expected to complete the reaminder of the program. It consists of 164 samples.
- Year released: 2023
- Number of few shot examples: 0
- Random baseline accuracy: 0%
39. HumanEval Python 50% code generation
- Description: HumanEval Python 50% is an easier variant of HumanEval Python in which in addition to the original method signature, the model is also provided 50% of the lines in the canonical solution and expected to complete the reaminder of the program. It consists of 164 samples.
- Year released: 2023
- Number of few shot examples: 0
- Random baseline accuracy: 0%
40. HumanEval Python 75% code generation
- Description: HumanEval Python 75% is an easier variant of HumanEval Python in which in addition to the original method signature, the model is also provided 75% of the lines in the canonical solution and expected to complete the reaminder of the program. It consists of 164 samples.
- Year released: 2023
- Number of few shot examples: 0
- Random baseline accuracy: 0%
41. HumanEval Python simple return statement code generation
- Description: HumanEval Python simple return statament is an easier variant of HumanEval Python in which the model is provided all of the canonical solution with the exception of the return statement and is expected to complete the return statement. Additionally, this set contains only the problems for which the canonical solution has a "simple" return statement consisting only of a line of the form `return VARIABLE\_NAME`. There are 37 samples.
- Year released: 2023
- Number of few shot examples: 0
- Random baseline accuracy: 0%
42. HumanEval Python complex return statement code generation
- Description: HumanEval Pythom complex return statament is an easier variant of HumanEval Python in which the model is provided all of the canonical solution with the exception of the return statement and is expected to complete the return statement. Additionally, this set contains only the problems for which the canonical solution does not have a "simple" return statement as defined above. There are 127 samples.
- Year released: 2023
- Number of few shot examples: 0
- Random baseline accuracy: 0%
164 changes: 164 additions & 0 deletions scripts/eval/local_data/programming/human_eval-0.25.jsonl

Large diffs are not rendered by default.

164 changes: 164 additions & 0 deletions scripts/eval/local_data/programming/human_eval-0.5.jsonl

Large diffs are not rendered by default.

164 changes: 164 additions & 0 deletions scripts/eval/local_data/programming/human_eval-0.75.jsonl

Large diffs are not rendered by default.

127 changes: 127 additions & 0 deletions scripts/eval/local_data/programming/human_eval_return_complex.jsonl

Large diffs are not rendered by default.

37 changes: 37 additions & 0 deletions scripts/eval/local_data/programming/human_eval_return_simple.jsonl

Large diffs are not rendered by default.

This file was deleted.

44 changes: 42 additions & 2 deletions scripts/eval/yamls/coding_tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,62 @@ icl_tasks:
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
icl_task_type: code_evaluation
batch_size: 1
icl_task_type: code_evaluation

-
label: human_eval_cpp
dataset_uri: eval/local_data/programming/processed_human_eval_cpp.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
icl_task_type: code_evaluation
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_js
dataset_uri: eval/local_data/programming/processed_human_eval_js.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_return_simple
dataset_uri: eval/local_data/programming/human_eval_return_simple.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_return_complex
dataset_uri: eval/local_data/programming/human_eval_return_complex.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_25
dataset_uri: eval/local_data/programming/human_eval-0.25.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_50
dataset_uri: eval/local_data/programming/human_eval-0.5.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_75
dataset_uri: eval/local_data/programming/human_eval-0.75.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
15 changes: 15 additions & 0 deletions scripts/eval/yamls/eval_gauntlet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,21 @@ eval_gauntlet:
- name: human_eval_js
num_fewshot: 0
random_baseline: 0.0
- name: human_eval_return_simple
num_fewshot: 0
random_baseline: 0.0
- name: human_eval_return_complex
num_fewshot: 0
random_baseline: 0.0
- name: human_eval_25
num_fewshot: 0
random_baseline: 0.0
- name: human_eval_50
num_fewshot: 0
random_baseline: 0.0
- name: human_eval_75
num_fewshot: 0
random_baseline: 0.0
- name: world_knowledge_lm_task_subscore
benchmarks:
- name: jeopardy
Expand Down
44 changes: 42 additions & 2 deletions scripts/eval/yamls/tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -179,21 +179,61 @@ icl_tasks:
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
icl_task_type: code_evaluation
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_cpp
dataset_uri: eval/local_data/programming/processed_human_eval_cpp.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
icl_task_type: code_evaluation
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_js
dataset_uri: eval/local_data/programming/processed_human_eval_js.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_return_simple
dataset_uri: eval/local_data/programming/human_eval_return_simple.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_return_complex
dataset_uri: eval/local_data/programming/human_eval_return_complex.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_25
dataset_uri: eval/local_data/programming/human_eval-0.25.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_50
dataset_uri: eval/local_data/programming/human_eval-0.5.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation
-
label: human_eval_75
dataset_uri: eval/local_data/programming/human_eval-0.75.jsonl # ADD YOUR OWN DATASET URI
num_fewshot: [0]
pass_at_k: 1
num_beams: 20
batch_size: 1
icl_task_type: code_evaluation

0 comments on commit cdb1c28

Please sign in to comment.