-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Image Classifier #1
Comments
Stumbled upon these two, which might be relevant to revisit at a later stage: |
Yeah, saw |
I've thought about what would be the best way of doing this and I've found a fair share of resources that I think may help get something close to what we want. Image Captioning modelsMost common open-source LLMs, such as Llama2 or Claude2, only receive text input. I took a gander at https://github.com/bentoml/OpenLLM, as I've stated in the comment above. However, it's not really useful to us as these LLms do not understand image inputs (though maybe some of these can understand vectorial representations of images). Therefore, we have to forgo these more "mainstream" LLMs for this use case. There are, however, models pertaining to computer vision we can definitely use. I started my dive in https://github.com/salesforce/LAVIS#image-captioning, which led to me discovering I'm not going to explain how You can find a demo at https://huggingface.co/spaces/Salesforce/BLIP2. Langchain 🦜I had heard about Langchain several times for a few months, and how it makes it easy to create LLM-based applications, and chain different models together to yield a given output for a person for whatever use case. And the fact that you can easily deploy it to I was thinking of using
|
Good research/summary. Thanks. 👌 |
As @nelsonic suggested, we can give https://github.com/elixir-image/image a whirl, as well. |
@LuchoTurtle I've lowered the priority on this issue to reflect the fact that it's a very "nice to have" feature but isn't "core" to the experience of our Ref: dwyl/product-roadmap#40 we need to work on the |
Having said that, when you take "breaks" from the |
It will be an awesome enhancement to add image recognition to the images people upload in the |
@LuchoTurtle given that we are |
Once we are uploading images dwyl/imgup#51
We want to classify the images and suggest meta tags to describe the images so that they become "searchable".
That means pulling any text out of images using OCR.
And attempting to find any detail in images that can be useful.
We aren't going build our own models from scratch. but we are going to ...
Todo
Research the available models, services/APIs we can use to send an image that classify images
Research available OCR services or models.
Images that are uploaded from a Camera or Smart Phone contain
metadata
includingcamera type
/model
,location
(where the photo was taken),ISO
,Shutter
,Focal Length
,Original Resolution
, etc. We want to capture this and feed it into the classifier. Feat: Store Metadata and Image Classification/Info #3The objective of the classifier is to attempt to describe the image and return a few keywords.
If it makes more sense to have this as a standalone
app
(separate fromimgup
) then feel free to create anew
repo! Then just send the data to the standalone app and receiveJSON
data in response. 💭@LuchoTurtle please leave comments with your research. 🙏
Context
We want to be able to upload images in our
App
and have them become anitem
of content.i.e. I take a photo of a messy kitchen and it becomes "Tidy The Kitchen" with a small thumbnail of the image.
If I tap on the thumbnail I see the full-screen. But the Text is the important part.
The reason we want to have a "Visual Todo List" is that it becomes easy for people who don't yet read (think toddlers) or people who don't read well (people who only have basic literacy) to follow instructions.
The text was updated successfully, but these errors were encountered: