-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we represent images using default numpy.asarray()? #115
Comments
Hey @ian-coccimiglio , great point! I must admit, in daily practice, I never use OpenCV (because most of the image data I work with are not compatible). We also expressed that in the paper:
If you think this statement does not reflect reality, we need to formulate it differently. We could also introduce new test-cases for opencv specifically. I'm just certainly the wrong person for writing them ;-)
Also, I hardly convert image dtypes in routine. There was a similar discussion in #111 where we concluded, that we want to try to modify all/many prompts so that variable types are clearly specified, e.g. "image provided as numpy array" instead of just "image". Please note, I did some of you proposed modifications in #118. It would be great if you could express your opinion there! |
I wanted to answer this particular point here. I also rarely convert image datatypes - I think that's because I normally start processing images and end up with images - numpy arrays just act intermediately. My problem is that I think sometimes, our test-cases conflate "all images are really just numpy arrays" with "all numpy arrays are really just images". This statement is 'usually true', but when it fails, it dramatically lowers the success rate. There's a similar thing that's happening with "lists of lists of ints" being treated as images, which is why I think |
In many (possibly most) of our test cases, we consider an image generated with np.asarray([...]) to be our representation of an image. However, this generates numpy arrays of 64-bit precision.
As I was continuing work on #76, I noticed some problems emerging from this abstraction. I think most computer-vision algorithms are written expecting 8-32bit images (I really don't if many images are specified to 64-bit precision). I'll focus on one question which surprised me, which was "why are so many models failing to perform an Otsu's Threshold?'
The following error is super common across many LLMs.
Essentially, these models fail because OTSU_Thresholding only allows for 8- or 16- bit inputs. Perhaps we expect the LLMs to perform this type-checking, or perhaps we don't. But, in my experience, it's uncommon for image analysts to work on 64-bit images, so perhaps we should avoid failing LLMs if they don't assume this either.
My idea: For every test-case that generates a numpy array, specify a plausible data-type. I'd vote on Unsigned 16-bit as the default. What do others think?
The text was updated successfully, but these errors were encountered: