scaleway · nerda-codes · Nov 28, 2024 · Nov 18, 2024 · Nov 18, 2024 · Nov 18, 2024
@@ -0,0 +1,168 @@
+---
+meta:
+  title: Understanding the Molmo-72b-0924 model
+  description: Deploy your own secure Molmo-72b-0924 model with Scaleway Managed Inference. Privacy-focused, fully managed.
+content:
+  h1:  Understanding the Molmo-72b-0924 model
+  paragraph: This page provides information on the Molmo-72b-0924 model
+tags: 
+dates:
+  validation: 2024-11-18
+  posted: 2024-11-18
+categories:
+  - ai-data
+---
+
+## Model overview
+
+| Attribute       | Details                            |
+|-----------------|------------------------------------|
+| Provider        | [Allen Institute for AI](https://molmo.allenai.org/blog)                         |
+| License        | Apache 2.0  |                |
+| Compatible Instances | H100-2 (FP8)                 |
+| Context size | 50k tokens    |
+
+## Model name
+
+```bash
+allenai/molmo-72b-0924:fp8
+```
+
+## Compatible Instances
+
+| Instance type  | Max context length |
+| ------------- |-------------|
+| H100-2      | 50k (FP8)
+
+## Model introduction
+
+Molmo 72B is the powerhouse of the Molmo family, multimodal models developed by the renowned research lab Allen Institute for AI.
+Vision-language model like Molmo can analyze images and offer insights from visual content alongside text. This multimodal functionality creates new opportunities for applications that need both visual and textual comprehension. 
+
+Molmo is open-weight and distributed under the Apache 2.0 license. All artifacts (code, data set, evaluations) are also expected to be full open-source.
+Its base model is Qwen2-72B ((Twonyi Qianwen license)[https://huggingface.co/Qwen/Qwen2-72B/blob/main/LICENSE]).
+
+## Why is it useful?
+
+- Molmo-72b allows you to process real world and high resolution images, unlocking capacities such as transcribing handwritten files or payment receipts, extracting information from graphs, captioning images, etc.
+- This model achieves the [highest academic benchmark scores and second rank on human evaluations](https://huggingface.co/allenai/Molmo-72B-0924#evaluations) as of writing (September 2024)
+
+<Message type="note">
+  Molmo-72b can understand and analyze images, not generate them. You will use it through the /v1/chat/completions endpoint.
+</Message>
+
+<Message type="important">
+  Molmo-72b was reported to struggle with transparent images. The official recommandation is to add white or dark background to images for the time being.
+</Message>
+
+## How to use it
+
+### Sending Inference requests
+
+<Message type="tip">
+  Unlike regular chat models, Molmo-72b can take an `image_url` in the content array.
+</Message>
+
+To perform inference tasks with your Molmo-72b model deployed at Scaleway, use the following command:
+
+```bash
+curl -s \
+-H "Authorization: Bearer <IAM API key>" \
+-H "Content-Type: application/json" \
+--request POST \
+--url "https://<Deployment UUID>.ifr.fr-par.scw.cloud/v1/chat/completions" \
+--data '{
+       "model": "allenai/molmo-72b-0924:fp8",
+       "messages": [
+        {
+          "role": "user",
+          "content": [
+              {"type" : "text", "text": "Describe this image in detail please."},
+              {"type": "image_url", "image_url": {"url": "https://picsum.photos/id/32/512/512"}},
+              {"type" : "text", "text": "and this one as well."},
+              {"type": "image_url", "image_url": {"url": "https://www.wolframcloud.com/obj/resourcesystem/images/a0e/a0ee3983-46c6-4c92-b85d-059044639928/6af8cfb971db031b.png"}}
+          ]
+        }
+       ],
+       "top_p": 1, 
+       "temperature": 0.7, 
+       "stream": false
+}'
+```
+
+Make sure to replace `<IAM API key>` and `<Deployment UUID>` with your actual [IAM API key](/identity-and-access-management/iam/how-to/create-api-keys/) and the Deployment UUID you are targeting.
+
+<Message type="tip">
+  The model name allows Scaleway to put your prompts in the expected format.
+</Message>
+
+<Message type="note">
+  Ensure that the `messages` array is properly formatted with roles (system, user, assistant) and content.
+</Message>
+
+### Passing images to Molmo-72b
+
+1. Image URLs
+If the image is available online, you can just include the image URL in your request as demonstrated above. This approach is simple and does not require any encoding.
+
+2. Base64 encoded image
+Base64 encoding is a standard way to transform binary data, like images, into a text format, making it easier to transmit over the internet.
+
+The following Python code sample shows you how to encode an image in base64 format and pass it to your request payload.
+
+
+```python
+import base64
+from io import BytesIO
+from PIL import Image
+
+def encode_image(img):
+    buffered = BytesIO()
+    img.save(buffered, format="JPEG")
+    encoded_string = base64.b64encode(buffered.getvalue()).decode("utf-8")
+    return encoded_string
+
+img = Image.open("path_to_your_image.jpg")
+base64_img = encode_image(img)
+
+payload = {
+    "messages": [
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "What is this image?"},
+                {
+                    "type": "image_url",
+                    "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"},
+                },
+            ],
+        }
+    ],
+    ... # other parameters
+}
+
+```
+
+### Receiving Managed Inference responses
+
+Upon sending the HTTP request to the public or private endpoints exposed by the server, you will receive inference responses from the managed Managed Inference server. 
+Process the output data according to your application's needs. The response will contain the output generated by the visual language model based on the input provided in the request.
+
+<Message type="note">
+  Despite efforts for accuracy, the possibility of generated text containing inaccuracies or [hallucinations](/ai-data/managed-inference/concepts/#hallucinations) exists. Always verify the content generated independently.
+</Message>
+
+## Frequently Asked Questions
+
+#### What types of images are supported by Molmo-72b?
+- Bitmap (or raster) image formats, meaning storing images as grids of individual pixels, are supported: PNG, JPEG, WEBP, and non-animated GIFs in particular.
+- Vector image formats (SVG, PSD) are not supported.
+
+#### Are other files supported?
+Only bitmaps can be analyzed by Molmo, PDFs and videos are not supported.
+
+#### Is there a limit to the size of each image?
+The only limitation is in context window (1 token for each 16x16 pixel).
+
+#### What is the maximum amount of images per conversation?
+One conversation can handle up to 12 images (per request). The 13rd will return a 413 error.
diff --git a/menu/navigation.json b/menu/navigation.json
@@ -623,6 +623,10 @@
                     "label": "Mixtral-8x7b-instruct-v0.1 model",
                     "slug": "mixtral-8x7b-instruct-v0.1"
                   },
+                  {
+                    "label": "Molmo-72b-0924 model",
+                    "slug": "molmo-72b-0924"
+                  },
                   {
                     "label": "Sentence-t5-xxl model",
                     "slug": "sentence-t5-xxl"