Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

Open
ian-coccimiglio opened this issue Sep 10, 2024 · 2 comments · Fixed by #134
Open

[Fixing Test Cases] Label Images - When should 0 be considered a label? #130

ian-coccimiglio opened this issue Sep 10, 2024 · 2 comments · Fixed by #134

Comments

@ian-coccimiglio
Copy link
Contributor

Still on my quest to figure out why LLMs fail on basic test-cases (#76), I've found a rather common failure mode is that LLMs will not know whether to include or exclude 0s while processing label images.

This issue affects (at least) the following test-cases:

Test Case Current Pass-Rate Current prompt
measure_pixel_count_of_labels 37 / 230 (16%) Takes a label image and returns a list of counts of number of pixels per label.
workflow_batch_process_folder_count_labels 24 / 230 (10.4%) This functions goes through all .tif image files in a specified folder, loads the images and count labels each image. It returns a dictionary with filenames and corresponding counts.
map_pixel_count_of_labels 19 / 230 (8%) Takes a label_image, determines the pixel-count per label and creates an image where the label values are replaced by the corresponding pixel count.

It is common in bioimaging to consider that 0 is just background (and therefore is not a 'real' label), but it's also plausible that our tasks were ambiguous if 0 counted or not. I can propose a couple solutions:

  • Modify the prompts for these questions to explicitly say 'positive labels'
  • Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)
  • Leave as-is, as a 'good' bioimaging LLM should be able to interpret whether to exclude/include 0.

Let me know what you all think

@pr4deepr
Copy link
Contributor

  • Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)

I think this is a good solution

@ian-coccimiglio
Copy link
Contributor Author

Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)

After implementing the above solution (allowing labels to include/exclude 0):

  • measure_pixel_count_of_labels pass-rate increased to 160 / 230 (69.5% pass-rate)
  • workflow_batch_process_folder_count_labels increased to 64 / 230 (27.8% pass-rate)
  • map_pixel_count_of_labels increased to 160 / 230 (69.5% pass-rate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants