MoonDreamMojo

Moondream is a multimodal tiny language vision model with 1.6B parameters, designed to handle both text and image inputs, generating text-based outputs such as captions or responses based on the given input. Here the inference model has been entirely implemented in Mojo, while pre-processing (such as tokenization) and post-processing are handled in Python. The original model implementation in Pytorch can be found here

How to use the model:

Clone the repo:

git clone https://github.com/taalhaataahir0102/MoonDreamMojo.git
Download the weights file from here.
Extract the weights file in the same directory.
Make executable file

chmod +x run.sh
Run the model using:

./run.sh

Model will ask for the input image and question.

Requirments:

RAM: 16 GB

Hard-disk space: 10 GB

Examples:

Question: What is the flower wearing?

Answer: The flower is wearing sunglasses.

Question: Describe the image

Answer: The image features a group of three paper sculptures of animals, including an elephant, a zebra, and a lion, set against a backdrop of a sunset. The sculptures are arranged in a way that showcases the animals together in a natural setting.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
README.md		README.md
helper.py		helper.py
image_encoder.mojo		image_encoder.mojo
llm.mojo		llm.mojo
mojoproject.toml		mojoproject.toml
requirments.txt		requirments.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoonDreamMojo

How to use the model:

Requirments:

Examples:

About

Releases

Packages

Languages

10x-Engineers/MoonDreamMojo

Folders and files

Latest commit

History

Repository files navigation

MoonDreamMojo

How to use the model:

Requirments:

Examples:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages