Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix_function_category #155

Open
wants to merge 122 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
b716e1e
create linear_intensity_profile.ipynb #36
marabuuu Jul 8, 2024
392ced5
update linear_intensity_profile #36
marabuuu Jul 17, 2024
dd15f53
add docstring #36
marabuuu Jul 17, 2024
b5ac5ff
Added identify_centroids evaluation task
shinzlet Jul 31, 2024
a15bc40
Add find_closest_neighbors test
shinzlet Jul 31, 2024
9de0ca0
Added load_bgr_tif_and_output_rgb test case
shinzlet Aug 1, 2024
2228801
Merge pull request #82 from marabuuu/linear-intensity-profile
haesleinhuepf Aug 2, 2024
e4e43ac
Adjusted floating point comparison to account for error, made dtype e…
shinzlet Aug 2, 2024
5c236be
Add `local_maxima_from_distance_transform` test case
ilan-theodoro Aug 4, 2024
f2114b2
Add `flow_field_deformation` test case
ilan-theodoro Aug 5, 2024
2b1e508
Add `detect_ellipse` test case
ilan-theodoro Aug 5, 2024
9db4767
Fix typo
JoOkuma Aug 7, 2024
ed76d68
readme typo
JoOkuma Aug 7, 2024
18bfa09
Adding test case for selecting cells that co-express some gene (inten…
JoOkuma Aug 7, 2024
161cc05
clean up jupyter outputs
JoOkuma Aug 7, 2024
7843c13
Add new test to register 3D timelapses
JoOkuma Aug 7, 2024
68ce25e
added two test cases: gaussian fit to spot, and distance between two …
Aug 10, 2024
aef5fc4
Merge pull request #87 from JoOkuma/fix-positive-typo
haesleinhuepf Aug 12, 2024
e3ded1c
more detailed prompt, more asserts
haesleinhuepf Aug 12, 2024
0c9e9aa
Merge branch 'main' of https://github.com/shinzlet/human-eval-bia int…
haesleinhuepf Aug 12, 2024
5463133
rerun notebook
haesleinhuepf Aug 12, 2024
5c2aa76
renamed and rerun notebook
haesleinhuepf Aug 12, 2024
f41b084
Merge pull request #85 from shinzlet/main
haesleinhuepf Aug 12, 2024
2f889ce
made test more flexible, added test-cases
haesleinhuepf Aug 12, 2024
1ea7faa
tiny documentation update
haesleinhuepf Aug 12, 2024
9e35fda
test ignoring order of points, return points as list of coordinates
haesleinhuepf Aug 12, 2024
f30008c
Merge pull request #86 from ilan-theodoro/main
haesleinhuepf Aug 12, 2024
90e2d67
add another example assert, rerun notebook
haesleinhuepf Aug 12, 2024
79fb8cc
Merge pull request #91 from TeunHuijben/two-test-cases
haesleinhuepf Aug 12, 2024
fc5caf4
reran notebook
haesleinhuepf Aug 12, 2024
747b08b
be more precise in the docstring
haesleinhuepf Aug 12, 2024
0bc2d53
Merge pull request #90 from JoOkuma/registration
haesleinhuepf Aug 12, 2024
84fb711
more precise docstring
haesleinhuepf Aug 12, 2024
25f3947
Merge pull request #88 from JoOkuma/cell-coexp
haesleinhuepf Aug 12, 2024
3e39781
Added FFT spectrum testcase. It is a hard one :-)
royerloic Aug 13, 2024
4e0da0c
Addressed feedback from Robert.
royerloic Aug 13, 2024
812cca7
Addressed feedback from Robert and fixed typos
royerloic Aug 13, 2024
269abcb
add test case translate 3d image #7
marabuuu Aug 27, 2024
d574186
update function name in check function #7
marabuuu Aug 29, 2024
8fb18cf
test case for image histogram
pr4deepr Sep 3, 2024
4d65794
updated to include num_bins as argument
pr4deepr Sep 3, 2024
a5c5af9
Merge pull request #105 from pr4deepr/main
haesleinhuepf Sep 3, 2024
bbab4a5
Update open_image_return_dimensions.ipynb
tischi Sep 3, 2024
fa5b2bf
Merge pull request #106 from haesleinhuepf/tischi-patch-1
haesleinhuepf Sep 3, 2024
78ffd66
reran the notebook
haesleinhuepf Sep 3, 2024
bdce242
Merge pull request #94 from royerloic/main
haesleinhuepf Sep 3, 2024
2f3f096
renamed notebook, simplified docstring
haesleinhuepf Sep 3, 2024
56c749d
Merge pull request #101 from marabuuu/translate-image
haesleinhuepf Sep 3, 2024
d4e38f8
save_image_voxel_size notebook
pr4deepr Sep 3, 2024
346a023
modified function to save the image and return save path
pr4deepr Sep 4, 2024
cc70493
creating new categories
pr4deepr Sep 5, 2024
ae8e242
ensured all tasks are in categories. Plot results
pr4deepr Sep 6, 2024
9c6cd9d
remove all samples and result data except gpt-4o-2024-05-13, deepseek…
haesleinhuepf Sep 6, 2024
bd5d6c6
rephrased almost all prompts, to have more precise type hints
haesleinhuepf Sep 6, 2024
798e99a
renamed old result files file to have "_old_prompt" in their model name
haesleinhuepf Sep 6, 2024
2df0b84
regenerated prompt list and problem file with new prompts
haesleinhuepf Sep 6, 2024
535d60b
recreated samples for gpt-4o2024-05-13 and reference
haesleinhuepf Sep 6, 2024
462d6c9
sampled deepseek-coder-v2 with new prompts
haesleinhuepf Sep 6, 2024
3a09c4f
evaluated samples
haesleinhuepf Sep 6, 2024
997c218
redraw all figures with old and new prompts comparison
haesleinhuepf Sep 6, 2024
ce54212
Fixed sum_intensity_projection docstring
ian-coccimiglio Sep 7, 2024
cf639aa
Merge pull request #117 from ian-coccimiglio/fix_docstring
haesleinhuepf Sep 7, 2024
5208a15
fix sum images notebook to add two images rather than lists
ian-coccimiglio Sep 7, 2024
1106276
add interpolate_stack.ipynb #6
marabuuu Sep 8, 2024
8b81759
Fixed functions requesting images providing lists
ian-coccimiglio Sep 9, 2024
c2dac0a
More flexible standard deviation acceptance
ian-coccimiglio Sep 9, 2024
f3fd45d
Allowed flexible interpretation of image dimensions based on PIL or N…
ian-coccimiglio Sep 9, 2024
4173d72
Removed extra blocks
ian-coccimiglio Sep 9, 2024
380fecc
replace function with np.linspace to allow more intermediate slices #6
marabuuu Sep 9, 2024
d91385f
add categorise_functions yaml
pr4deepr Sep 9, 2024
8a6b51b
Code to check if all test cases in yaml file
pr4deepr Sep 9, 2024
48b164f
removed old files
pr4deepr Sep 9, 2024
de3de62
read new yaml file and plot model performance by category
pr4deepr Sep 9, 2024
c2bf2ae
updated pull request template
pr4deepr Sep 9, 2024
e2ba3f6
update dock string #6
marabuuu Sep 10, 2024
6ae01fa
test case for scaling and affine transform
pr4deepr Sep 10, 2024
ccc37b1
Merge branch 'haesleinhuepf:development-collecting-new-test-cases' in…
pr4deepr Sep 10, 2024
77bcb51
scale image affine transform test case
pr4deepr Sep 10, 2024
d5bda33
remove print statement and added clarity for return values
pr4deepr Sep 11, 2024
84a923c
cleared outputs
pr4deepr Sep 11, 2024
f6646dc
polygon and multipolygon test cases shapely
pr4deepr Sep 11, 2024
e255420
Revert "scale image affine transform test case"
pr4deepr Sep 11, 2024
fb12e77
Merge pull request #132 from pr4deepr/shapely
haesleinhuepf Sep 11, 2024
07dbe46
remove comment, too obvious
haesleinhuepf Sep 11, 2024
d0639d7
remove comment
haesleinhuepf Sep 11, 2024
67608cf
just added another simple assert
haesleinhuepf Sep 11, 2024
2e89826
Merge pull request #125 from marabuuu/interpolate-stack
haesleinhuepf Sep 11, 2024
603fdc0
rename notebook, format prompt
haesleinhuepf Sep 11, 2024
3adede0
Merge pull request #110 from pr4deepr/development-collecting-new-test…
haesleinhuepf Sep 11, 2024
3d7003f
Updated label images for more flexible inputs
ian-coccimiglio Sep 12, 2024
128fe25
clean rerun notebook
haesleinhuepf Sep 13, 2024
4b1cc18
Merge pull request #126 from ian-coccimiglio/fix_stdev
haesleinhuepf Sep 13, 2024
b0b37fb
Reverted change to workflow_batch_process_folder_count_labels
ian-coccimiglio Sep 13, 2024
54dafef
updated test cases jsonl + readme
haesleinhuepf Oct 14, 2024
d6416b4
Added new cases studies for metadata reading plus some image analysis…
rmassei Oct 17, 2024
115d08a
Correct the prompting of the new test cases
rmassei Oct 18, 2024
0a062eb
Remove test-case which didnt pass
haesleinhuepf Nov 21, 2024
8594164
Remove test which seems impossible to pass
haesleinhuepf Nov 21, 2024
988338d
remove dyfunctional test-case
haesleinhuepf Nov 21, 2024
384a9c3
remove test with to simple assert statements
haesleinhuepf Nov 21, 2024
39915c0
renamed notebooks and functions, executed notebooks, refined docstrings
haesleinhuepf Nov 21, 2024
e0c2df7
Merge pull request #142 from rmassei/main
haesleinhuepf Nov 21, 2024
9bee740
Merge branch 'main' into pr/129
haesleinhuepf Nov 21, 2024
db2e3fd
reran notebook
haesleinhuepf Nov 21, 2024
d4ab9f8
Merge pull request #129 from pr4deepr/function_categorize
haesleinhuepf Nov 21, 2024
fa65fe7
Merge pull request #134 from ian-coccimiglio/bfix_label
haesleinhuepf Nov 21, 2024
1117b11
Merge pull request #127 from ian-coccimiglio/bfix_imdim
haesleinhuepf Nov 21, 2024
8f3e180
Merge branch 'main' into prompt-refinement
haesleinhuepf Nov 21, 2024
ca19c87
sync with dev branch
haesleinhuepf Nov 21, 2024
683ad2e
sync with dev branch
haesleinhuepf Nov 21, 2024
f33b93a
Merge branch 'development-collecting-new-test-cases' into prompt-refi…
haesleinhuepf Nov 21, 2024
861114b
Merge pull request #118 from haesleinhuepf/prompt-refinement
haesleinhuepf Nov 21, 2024
875a81d
Merge pull request #121 from ian-coccimiglio/fix_sum_images
haesleinhuepf Nov 21, 2024
4cc51e9
deleted all sample files
haesleinhuepf Nov 21, 2024
d12113b
Merge branch 'development-collecting-new-test-cases' of https://githu…
haesleinhuepf Nov 21, 2024
17130bb
deleted all sample files
haesleinhuepf Nov 21, 2024
87913fd
remove test case which didn't pass
haesleinhuepf Nov 21, 2024
d2a5fb0
test-case formatting
haesleinhuepf Nov 21, 2024
7bc4208
categorised new test-cases, re-generated jsonl file + readme
haesleinhuepf Nov 21, 2024
c040396
sampled reference, redraw plots
haesleinhuepf Nov 21, 2024
2117f80
update documentation
haesleinhuepf Nov 21, 2024
0957819
fixed summarize by case notebook for reference
pr4deepr Nov 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
This PR contains:
* [ ] a new test-case for the benchmark
* [ ] I hereby confirm that NO LLM-based technology (such as github copilot) was used while writing this benchmark
* [ ] I have added my function into the `data/human-eval-bia-categories.yaml` file and specified the category. If it is a new category, justify it below.
* [ ] new dependencies in requirements.txt
* [ ] The environment.yml file was updated using the command `conda env export > environment.yml`
* [ ] new generator-functions allowing to sample from other LLMs
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def sum(a, b):
return a + b
```
* This function must have a meaningful docstring between """ and """. It must be so meaningful that a language model could possibly write the entire function.
* There must be another code cell that starts with `def check(candiate):` and contains test code to test the generated code.
* There must be another code cell that starts with `def check(candidate):` and contains test code to test the generated code.
* The text code must use `assert` statements and call the `candidate` function. E.g. if a given function to test is `sum`, then a valid test for `sum` would be:
```
def check(candidate):
Expand Down
189 changes: 189 additions & 0 deletions data/human-eval-bia-categories.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
#Define categories for each function. Function name should be the filename of your test_case
#Categories can be a combination of:
#'segmentation','morphological_operations', 'statistical_analysis', 'feature_extraction', 'measurement', 'image_preprocessing','file_i_o', 'hello_world', 'workflow_automation'
apply_otsu_threshold_and_count_positive_pixels:
- segmentation
binary_closing:
- morphological_operations
binary_skeleton:
- morphological_operations
bland_altman:
- statistical_analysis
combine_columns_of_tables:
- data_wrangling
convex_hull_measure_area:
- feature_extraction
- measurement
convolve_images:
- image_filtering
count_number_of_touching_neighbors:
- feature_extraction
- measurement
count_objects_over_time:
- measurement
count_overlapping_regions:
- measurement
create_umap:
- feature_extraction
- measurement
crop_quarter_image:
- image_transformation
deconvolve_image:
- image_filtering
detect_edges:
- segmentation
expand_labels_without_overlap:
- segmentation_post_processing
extract_surface_measure_area:
- measurement
fit_circle:
- measurement
label_binary_image_and_count_labels:
- segmentation_post_processing
- measurement
label_sequentially:
- segmentation_post_processing
list_image_files_in_folder:
- file_i_o
map_pixel_count_of_labels:
- measurement
mask_image:
- segmentation_post_processing
maximum_intensity_projection:
- image_transformation
mean_squared_error:
- statistical_analysis
mean_std_column:
- statistical_analysis
measure_aspect_ratio_of_regions:
- feature_extraction
- measurement
measure_intensity_of_labels:
- feature_extraction
- measurement
measure_intensity_over_time:
- feature_extraction
- measurement
measure_mean_image_intensity:
- feature_extraction
- measurement
measure_pixel_count_of_labels:
- measurement
measure_properties_of_regions:
- feature_extraction
- measurement
open_image_read_voxel_size:
- file_i_o
open_image_return_dimensions:
- file_i_o
open_nifti_image:
- file_i_o
open_zarr:
- file_i_o
pair_wise_correlation_matrix:
- statistical_analysis
radial_intensity_profile:
- image_transformation
region_growing_segmentation:
- segmentation
remove_labels_on_edges:
- segmentation_post_processing
remove_noise_edge_preserving:
- image_preprocessing
remove_small_labels:
- segmentation_post_processing
return_hello_world:
- hello_world
rgb_to_grey_image_transform:
- image_transformation
rotate_image_by_90_degrees:
- image_transformation
subsample_image:
- image_transformation
subtract_background_tophat:
- image_filtering
sum_images:
- image_transformation
sum_intensity_projection:
- image_transformation
t_test:
- statistical_analysis
tiled_image_processing:
- image_filtering
- workflow_automation
transpose_image_axes:
- image_transformation
workflow_batch_process_folder_count_labels:
- workflow_automation
- measurement
workflow_batch_process_folder_measure_intensity:
- workflow_automation
- measurement
workflow_segment_measure_umap:
- feature_extraction
- measurement
- segmentation
- workflow_automation
workflow_segmentation_counting:
- measurement
- segmentation
- workflow_automation
workflow_segmentation_measurement_summary:
- measurement
- segmentation
- workflow_automation
workflow_watershed_segmentation_correction_measurement:
- measurement
- segmentation
- workflow_automation
convert_points_polygon:
- data_wrangling
create_multipolygon_from_coordinates:
- data_wrangling
dataframe_column_rename:
- data_wrangling
detect_ellipse:
- segmentation
distance_between_maxima:
- measurement
fft_spectrum:
- image_transformation
find_closest_neighbors:
- measurement
fit_gaussian_to_spot:
- segmentation
flow_field_deformation:
- image_transformation
generate_image_histogram:
- measurement
identify_centroids:
- measurement
interpolate_stack:
- image_transformation
linear_intensity_profile:
- image_transformation
load_tif_and_output_rgb:
- file_i_o
local_maxima_from_distance_transform:
- segmentation
read_imagej_tif_metadata:
- file_i_o
read_ome_metadata_from_ome_xml:
- file_i_o
register_timelapse:
- image_transformation
reshape_array:
- image_transformation
roi_imagej_to_ezomero:
- data_wrangling
save_image_with_voxel_size:
- file_i_o
scale_image_affine_transform:
- image_transformation
select_coexpressing_cells:
- measurement
- segmentation_post_processing
stack_and_merge:
- image_transformation
translate_3d_image_along_vector:
- image_transformation
Loading