You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Still on my quest to figure out why LLMs fail on basic test-cases (#76), I've found a rather common failure mode is that LLMs will not know whether to include or exclude 0s while processing label images.
This issue affects (at least) the following test-cases:
Test Case
Current Pass-Rate
Current prompt
measure_pixel_count_of_labels
37 / 230 (16%)
Takes a label image and returns a list of counts of number of pixels per label.
workflow_batch_process_folder_count_labels
24 / 230 (10.4%)
This functions goes through all .tif image files in a specified folder, loads the images and count labels each image. It returns a dictionary with filenames and corresponding counts.
map_pixel_count_of_labels
19 / 230 (8%)
Takes a label_image, determines the pixel-count per label and creates an image where the label values are replaced by the corresponding pixel count.
It is common in bioimaging to consider that 0 is just background (and therefore is not a 'real' label), but it's also plausible that our tasks were ambiguous if 0 counted or not. I can propose a couple solutions:
Modify the prompts for these questions to explicitly say 'positive labels'
Modify the solution to be flexible and accept either answer (as per other cases, this is my preferred solution)
Leave as-is, as a 'good' bioimaging LLM should be able to interpret whether to exclude/include 0.
Let me know what you all think
The text was updated successfully, but these errors were encountered:
Still on my quest to figure out why LLMs fail on basic test-cases (#76), I've found a rather common failure mode is that LLMs will not know whether to include or exclude 0s while processing label images.
This issue affects (at least) the following test-cases:
It is common in bioimaging to consider that 0 is just background (and therefore is not a 'real' label), but it's also plausible that our tasks were ambiguous if 0 counted or not. I can propose a couple solutions:
Let me know what you all think
The text was updated successfully, but these errors were encountered: