Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we represent images using default numpy.asarray()? #115

Open
ian-coccimiglio opened this issue Sep 6, 2024 · 2 comments · May be fixed by #123
Open

Should we represent images using default numpy.asarray()? #115

ian-coccimiglio opened this issue Sep 6, 2024 · 2 comments · May be fixed by #123

Comments

@ian-coccimiglio
Copy link
Contributor

ian-coccimiglio commented Sep 6, 2024

In many (possibly most) of our test cases, we consider an image generated with np.asarray([...]) to be our representation of an image. However, this generates numpy arrays of 64-bit precision.

As I was continuing work on #76, I noticed some problems emerging from this abstraction. I think most computer-vision algorithms are written expecting 8-32bit images (I really don't if many images are specified to 64-bit precision). I'll focus on one question which surprised me, which was "why are so many models failing to perform an Otsu's Threshold?'

The following error is super common across many LLMs.

"OpenCV(4.10.0) /io/opencv/modules/imgproc/src/thresh.cpp:1559: error: (-2:Unspecified error) in function 'double cv::threshold(cv::InputArray, cv::OutputArray, double, double, int)'
THRESH_OTSU mode:
>     'src_type == CV_8UC1 || src_type == CV_16UC1'
> where
>     'src_type' is 4 (CV_32SC1)"

Essentially, these models fail because OTSU_Thresholding only allows for 8- or 16- bit inputs. Perhaps we expect the LLMs to perform this type-checking, or perhaps we don't. But, in my experience, it's uncommon for image analysts to work on 64-bit images, so perhaps we should avoid failing LLMs if they don't assume this either.

My idea: For every test-case that generates a numpy array, specify a plausible data-type. I'd vote on Unsigned 16-bit as the default. What do others think?

@haesleinhuepf
Copy link
Owner

Hey @ian-coccimiglio ,

great point! I must admit, in daily practice, I never use OpenCV (because most of the image data I work with are not compatible). We also expressed that in the paper:

On the other hand, in natural image processing, libraries such as OpenCV \citep{itseez2015opencv} are common, while our community often uses scikit-image \citep{scikit-image} for similar purposes. As natural image processing is a very active research field, the LLM’s training data may contain more examples from that domain.

If you think this statement does not reflect reality, we need to formulate it differently. We could also introduce new test-cases for opencv specifically. I'm just certainly the wrong person for writing them ;-)

My idea: For every test-case that generates a numpy array, specify a plausible data-type. I'd vote on Unsigned 16-bit as the default. What do others think?

Also, I hardly convert image dtypes in routine. There was a similar discussion in #111 where we concluded, that we want to try to modify all/many prompts so that variable types are clearly specified, e.g. "image provided as numpy array" instead of just "image".

Please note, I did some of you proposed modifications in #118. It would be great if you could express your opinion there!

@ian-coccimiglio
Copy link
Contributor Author

Also, I hardly convert image dtypes in routine. There was a similar discussion in #111 where we concluded, that we want to try to modify all/many prompts so that variable types are clearly specified, e.g. "image provided as numpy array" instead of just "image".

I wanted to answer this particular point here. I also rarely convert image datatypes - I think that's because I normally start processing images and end up with images - numpy arrays just act intermediately. My problem is that I think sometimes, our test-cases conflate "all images are really just numpy arrays" with "all numpy arrays are really just images". This statement is 'usually true', but when it fails, it dramatically lowers the success rate. There's a similar thing that's happening with "lists of lists of ints" being treated as images, which is why I think sum_images.ipynb is failing (I'll be sending in a PR for that one shortly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants