Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removing old models #5

Merged
merged 1 commit into from
May 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 2 additions & 51 deletions fern/docs/pages/models/details.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ LLMs are hosted by Prediction Guard in a secure, privacy conserving environment

Open access models are amazing these days! Each of these models was trained by a talented team and released publicly under a permissive license. The data used to train each model and the prompt formatting for each model varies. We've tried to give you some of the relevant details here, but shoot us a message [in Slack](support) with any questions.

### The best models (start here)
### Models available in `/completions` and `/chat/completions` endpoints

| Model Name | Type | Use Case | Prompt Format | Context Length | More Info |
| ---------------------------- | --------------- | ------------------------------------------------------- | ---------------------------------- | -------------- | ----------------------------------------------------------------------- |
Expand All @@ -25,53 +25,4 @@ Open access models are amazing these days! Each of these models was trained by a
| Hermes-2-Pro-Mistral-7B | Chat | Instruction following or chat-like applications | [ChatML](prompts#chatml) | 4096 | [link](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B) |
| Neural-Chat-7B | Chat | Instruction following or chat-like applications | [Neural Chat](prompts#neural-chat) | 4096 | [link](https://huggingface.co/Intel/neural-chat-7b-v3-1) |
| Yi-34B-Chat | Chat | Instruction following in English or Chinese | [ChatML](prompts#chatml) | 2048 | [link](https://huggingface.co/01-ai/Yi-34B-Chat) |
| deepseek-coder-6.7b-instruct | Code Generation | Generating computer code or answering tech questions | [Deepseek](prompts#deepseek) | 4096 | [link](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) |

### Other models available

The models below are available in our API. However, these models scale to zero (i.e., they might not be ready for you to interact with). These models are less frequently accessed by our users, so we suggest you start with the models above. If your company requires one of these models to be up-and-running 24/7. [Reach out to us](support), and we will help make that happen!

| Model Name | Model Card | Parameters | Context Length |
| ---------------------------- | --------------------------------------------------------------------------------- | ---------- | -------------- |
| Llama-2-13B | [link](https://huggingface.co/meta-llama/Llama-2-13b-hf) | 13B | 4096 |
| Llama-2-7B | [link](https://huggingface.co/meta-llama/Llama-2-7b-hf) | 7B | 4096 |
| Nous-Hermes-Llama2-7B | [link](https://huggingface.co/NousResearch/Nous-Hermes-Llama2-7b) | 7B | 4096 |
| Camel-5B | [link](https://huggingface.co/Writer/camel-5b-hf) | 5B | 2048 |
| Dolly-3B | [link](https://huggingface.co/databricks/dolly-v2-3b) | 3B | 2048 |
| Dolly-7B | [link](https://huggingface.co/databricks/dolly-v2-7b) | 7B | 2048 |
| Falcon-7B-Instruct | [link](https://huggingface.co/tiiuae/falcon-7b-instruct) | 7B | 2048 |
| h2oGPT-6_9B | [link](https://huggingface.co/h2oai/h2ogpt-oig-oasst1-512-6_9b) | 6.9B | 2048 |
| MPT-7B-Instruct | [link](https://huggingface.co/mosaicml/mpt-7b-instruct) | 7B | 4096 |
| Pythia-6_9-Deduped | [link](https://huggingface.co/EleutherAI/pythia-6.9b-deduped) | 6.9B | 2048 |
| RedPajama-INCITE-Instruct-7B | [link](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1) | 7B | 2048 |
| WizardCoder | [link](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0) | 15.5B | 8192 |
| StarCoder | [link](https://huggingface.co/bigcode/starcoder) | 15.5B | 8192 |

import { Callout } from "nextra-theme-docs";

<Callout type="info" emoji="ℹ️">
Note if you aren't actively using these models, they are scaled down. As such,
your first call to a model might need to "wake up" that model inference
server. You will get a message "Waking up model. Try again in a few minutes."
in such cases. Typically it takes around 5-15 minutes to wake up the model
server depending on the size of the model. We are actively working on reducing
these cold start times.
</Callout>

## Closed LLMs (if you t̶r̶u̶s̶t̶ need them)

These models are integrated into our API, but they are not hosted by Prediction Guard in the same manner as the models above.

**Note - You will need your own OpenAI API key to use the models below. Customers worried about data privacy, IP/PII leakage, HIPAA compliance, etc. should look into the above "Open Access LLMs" and/or our enterprise deploy. [Contact support](support) with any questions.**

| Model Name | Generation | Context Length |
| ----------------------------- | ---------- | -------------- |
| OpenAI-gpt-3.5-turbo-instruct | GPT-3.5 | 4097 |
| OpenAI-davinci-002 | GPT-3.5 | 4097 |
| OpenAI-babbage-002 | GPT-3 | 2049 |

<Callout type="info" emoji="ℹ️">
To use the OpenAI models above, make sure you either: (1) define the
environment variable `OPENAI_API_KEY` if you are using the Python client; or
(2) set the header parameter `OpenAI-ApiKey` if you are using the REST API.
</Callout>
| deepseek-coder-6.7b-instruct | Code Generation | Generating computer code or answering tech questions | [Deepseek](prompts#deepseek) | 4096 | [link](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct) |
Loading