Unleash your Imagination: Craft compelling narratives from your Images using the magic of Gemini 1.5 Flash.

Image-to-text.mp4

Project Explanation:

This project harnesses the power of Gemini 1.5 Flash, Google's advanced large language model, to analyze images and generate insightful information based on user prompts. It works by:

Image Input: The user provides an image, which the project processes to extract relevant features and data.
Prompt Input: The user also supplies a prompt, guiding the type of information they want to generate from the image (e.g., "Describe the scene," "Identify objects," or "Write a creative story").
Gemini 1.5 Flash Integration: The processed image data and prompt are fed into Gemini 1.5 Flash, which uses its powerful understanding of language and context to produce the desired information.
Output: The generated information is presented to the user, offering insights, descriptions, or creative content based on the image and prompt.

Potential Applications:

Image Captioning: Automatically generate descriptive captions for images.
Object Recognition: Identify and label objects within images.
Scene Understanding: Provide detailed descriptions of the scene depicted in an image.
Creative Writing: Generate stories, poems, or other creative content inspired by images.
Accessibility: Help visually impaired users understand the content of images.

click here to view the Application.

How it Works

1. Environment Setup:

The GOOGLE-API-KEY environment variable needs to be set with your valid Gemini API key.
Necessary libraries are imported: streamlit, google.generativeai, os, textwrap, PIL (Pillow), and pathlib.

2. Streamlit App Initialization:

Sets the page title to "Gemini".
Creates a header with the title "Gemini Image to Text Application 🖼️ 🖥️ 📄".

3. User Input:

A text input field is provided for the user to enter a prompt (optional).
A file uploader allows the user to upload an image.
If an image is uploaded, it is displayed in the app.

4. Gemini Interaction:

The get_gemini_response function handles the communication with the Gemini 1.5 Flash model.
It takes the user's prompt and the uploaded image as input.
If a prompt is provided, it is included in the request to Gemini along with the image otherwise, only the image is sent to Gemini.
The generated text response from Gemini is returned.

5. Output Presentation:

When the "Do the Magic" button is clicked:

The get_gemini_response function is called to generate the text.
The generated text is displayed under the subheader "The Response is".

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
image-to-text.py		image-to-text.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unleash your Imagination: Craft compelling narratives from your Images using the magic of Gemini 1.5 Flash.

Project Explanation:

Potential Applications:

How it Works

About

Releases

Packages

Languages

License

mukul-mschauhan/image-to-text

Folders and files

Latest commit

History

Repository files navigation

Unleash your Imagination: Craft compelling narratives from your Images using the magic of Gemini 1.5 Flash.

Project Explanation:

Potential Applications:

How it Works

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages