[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

ian-coccimiglio · 2024-09-10T09:10:36Z

Still on my quest to figure out why LLMs fail on basic test-cases (#76), I've found a rather common failure mode is that LLMs will not know whether to include or exclude 0s while processing label images.

This issue affects (at least) the following test-cases:

Test Case	Current Pass-Rate	Current prompt
measure_pixel_count_of_labels	37 / 230 (16%)	Takes a label image and returns a list of counts of number of pixels per label.
workflow_batch_process_folder_count_labels	24 / 230 (10.4%)	This functions goes through all .tif image files in a specified folder, loads the images and count labels each image. It returns a dictionary with filenames and corresponding counts.
map_pixel_count_of_labels	19 / 230 (8%)	Takes a label_image, determines the pixel-count per label and creates an image where the label values are replaced by the corresponding pixel count.

It is common in bioimaging to consider that 0 is just background (and therefore is not a 'real' label), but it's also plausible that our tasks were ambiguous if 0 counted or not. I can propose a couple solutions:

Modify the prompts for these questions to explicitly say 'positive labels'
Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)
Leave as-is, as a 'good' bioimaging LLM should be able to interpret whether to exclude/include 0.

Let me know what you all think

pr4deepr · 2024-09-11T00:05:46Z

Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)

I think this is a good solution

ian-coccimiglio · 2024-09-12T05:31:36Z

Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)

After implementing the above solution (allowing labels to include/exclude 0):

measure_pixel_count_of_labels pass-rate increased to 160 / 230 (69.5% pass-rate)
workflow_batch_process_folder_count_labels increased to 64 / 230 (27.8% pass-rate)
map_pixel_count_of_labels increased to 160 / 230 (69.5% pass-rate)

ian-coccimiglio mentioned this issue Sep 12, 2024

[Fixing Test Cases] Allow 0 in label image test cases #134

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

ian-coccimiglio commented Sep 10, 2024

pr4deepr commented Sep 11, 2024

ian-coccimiglio commented Sep 12, 2024

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

Comments

ian-coccimiglio commented Sep 10, 2024

pr4deepr commented Sep 11, 2024

ian-coccimiglio commented Sep 12, 2024