Skip to content

Latest commit

 

History

History
197 lines (149 loc) · 3.67 KB

README.md

File metadata and controls

197 lines (149 loc) · 3.67 KB

Moondream TypeScript Client

A lightweight TypeScript client for the Moondream AI vision-language model. This client provides an easy-to-use interface for interacting with the Moondream model, supporting both image captioning and visual question answering.

Features

  • Image captioning
  • Visual question answering
  • Streaming support for real-time responses
  • Support for multiple image input types (ImageData, HTMLImageElement, File)
  • Configurable settings via environment variables or constructor options
  • Both CommonJS and ESM builds
  • TypeScript support out of the box

Installation

Clone the repository:

git clone https://github.com/yourusername/moondream-ts.git
cd moondream-ts

# Using pnpm (recommended)
pnpm install

# Build the project
pnpm build

Usage

Basic Usage

import { VL } from './dist';

// Initialize the client
const vl = new VL();

// Generate a caption for an image
const captionResult = await vl.caption(imageFile);
console.log(captionResult.caption);

// Ask a question about an image
const queryResult = await vl.query(imageFile, "What is in this image?");
console.log(queryResult.answer);

Streaming Responses

// Stream caption tokens
const streamResult = await vl.caption(imageFile, 'normal', true);
for await (const token of streamResult.caption) {
  process.stdout.write(token);
}

// Stream query response
const queryStream = await vl.query(
  imageFile, 
  "What is in this image?", 
  true
);
for await (const token of queryStream.answer) {
  process.stdout.write(token);
}

Configuration

You can configure the client either through environment variables or constructor options.

Environment Variables

Create a .env file in your project root:

MOONDREAM_BASE_URL=http://localhost:3000
MOONDREAM_MAX_TOKENS=2048

Constructor Options

const vl = new VL({
  baseUrl: 'http://localhost:3000',
  timeout: 5000
});

Advanced Usage

// Custom sampling settings
const result = await vl.caption(imageFile, 'normal', false, {
  maxTokens: 100
});

// Pre-encode image for multiple queries
const encodedImage = await vl.encodeImage(imageFile);
const caption = await vl.caption(encodedImage);
const answer = await vl.query(encodedImage, "What colors do you see?");

Development

Setup Development Environment

  1. Clone and install dependencies:
git clone https://github.com/yourusername/moondream-ts.git
cd moondream-ts
pnpm install
  1. Start development:
pnpm dev

Running Tests

# Run tests once
pnpm test

# Run tests in watch mode
pnpm test:watch

Linting and Formatting

# Run ESLint
pnpm lint

# Format code with Prettier
pnpm format

API Reference

VL Class

Constructor

new VL(config?: ClientConfig)

Methods

caption()
async caption(
  image: ImageData | HTMLImageElement | File | EncodedImage,
  length?: string,
  stream?: boolean,
  settings?: SamplingSettings
): Promise<CaptionOutput>
query()
async query(
  image: ImageData | HTMLImageElement | File | EncodedImage,
  question: string,
  stream?: boolean,
  settings?: SamplingSettings
): Promise<QueryOutput>
encodeImage()
async encodeImage(
  image: ImageData | HTMLImageElement | File | EncodedImage
): Promise<EncodedImage>

Types

interface ClientConfig {
  baseUrl?: string;
  timeout?: number;
}

interface SamplingSettings {
  maxTokens?: number;
}

interface CaptionOutput {
  caption: string | AsyncGenerator<string, void, unknown>;
}

interface QueryOutput {
  answer: string | AsyncGenerator<string, void, unknown>;
}