Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mixedbread-ai/mxbai-rerank-base-v1 #632

Closed
fakerybakery opened this issue Mar 7, 2024 · 2 comments
Closed

Add mixedbread-ai/mxbai-rerank-base-v1 #632

fakerybakery opened this issue Mar 7, 2024 · 2 comments
Labels
new model Request a new model

Comments

@fakerybakery
Copy link

fakerybakery commented Mar 7, 2024

Hi,
Please add mxbai-rerank-base-v1.
Thanks!

@fakerybakery fakerybakery added the new model Request a new model label Mar 7, 2024
@xenova
Copy link
Collaborator

xenova commented Mar 7, 2024

Hi there! Fortunately, the model is already supported by transformers.js (see the readme; javascript example).

Here it is for reference:

import { AutoTokenizer, AutoModelForSequenceClassification } from '@xenova/transformers';

const model_id = 'mixedbread-ai/mxbai-rerank-base-v1';
const model = await AutoModelForSequenceClassification.from_pretrained(model_id);
const tokenizer = await AutoTokenizer.from_pretrained(model_id);

/**
 * Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
 * @param {string} query A single query
 * @param {string[]} documents A list of documents
 * @param {Object} options Options for ranking
 * @param {number} [options.top_k=undefined] Return the top-k documents. If undefined, all documents are returned.
 * @param {number} [options.return_documents=false] If true, also returns the documents. If false, only returns the indices and scores.
 */
async function rank(query, documents, {
    top_k = undefined,
    return_documents = false,
} = {}) {
    const inputs = tokenizer(
        new Array(documents.length).fill(query),
        {
            text_pair: documents,
            padding: true,
            truncation: true,
        }
    )
    const { logits } = await model(inputs);
    return logits
        .sigmoid()
        .tolist()
        .map(([score], i) => ({
            corpus_id: i,
            score,
            ...(return_documents ? { text: documents[i] } : {})
        }))
        .sort((a, b) => b.score - a.score)
        .slice(0, top_k);
}

// Example usage:
const query = "Who wrote 'To Kill a Mockingbird'?"
const documents = [
    "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.",
    "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.",
    "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.",
    "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.",
    "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.",
    "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."
]

const results = await rank(query, documents, { return_documents: true, top_k: 3 });
console.log(results);

We also built a demo for it here (source code):
reranking-demo

@xenova xenova closed this as completed Mar 7, 2024
@fakerybakery
Copy link
Author

fakerybakery commented Mar 7, 2024

Ah, sorry about the duplicate! Must've missed that – thanks for the demo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Request a new model
Projects
None yet
Development

No branches or pull requests

2 participants