diff --git a/fern/pages/get-started/datasets.mdx b/fern/pages/get-started/datasets.mdx index 32e85c39..5f86cafe 100644 --- a/fern/pages/get-started/datasets.mdx +++ b/fern/pages/get-started/datasets.mdx @@ -175,8 +175,8 @@ The following table describes the types of datasets supported by the Dataset API | Dataset Type | Description | Schema | Rules | Task Type | Status | File Types Supported | Are Metadata Fields Supported? | Sample File | |----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|---------------------------|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `single-label-classification-finetune-input` | A file containing text and a single label (class) for each text | `text:string label:string` | You must include 40 valid train examples, \nwith five examples per label. A label cannot be present in all examples \nThere must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `csv` and `jsonl` | No | [Art classification file](https://drive.google.com/file/d/15-CchSiALUQwto4b-yAMWhdUqz8vfwQ1/view?usp=drive_link) | -| `multi-label-classification-finetune-input` | A file containing text and an array of label(s) (class) for each text | `text:string label:list[string]` | You must include 40 valid train examples, with five examples per label \nA label cannot be present in all examples. There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `jsonl` | No | n/a | +| `single-label-classification-finetune-input` | A file containing text and a single label (class) for each text | `text:string label:string` | You must include 40 valid train examples, with five examples per label. A label cannot be present in all examples There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `csv` and `jsonl` | No | [Art classification file](https://drive.google.com/file/d/15-CchSiALUQwto4b-yAMWhdUqz8vfwQ1/view?usp=drive_link) | +| `multi-label-classification-finetune-input` | A file containing text and an array of label(s) (class) for each text | `text:string label:list[string]` | You must include 40 valid train examples, with five examples per label. A label cannot be present in all examples. There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `jsonl` | No | n/a | | `reranker-finetune-input` | A file containing queries and an array of passages relevant to the query. There must also be "hard negatives", passages semantically similar but ultimately not relevant. | `query:string relevant_passages:list[string] hard_negatives:list[string]` | There must be 256 train examples and at least 64 evaluation examples. There must be at least one relevant passage, with no overlap between relevant passage and hard negatives. | Rerank Fine-tuning | Supported | `jsonl` | No | [train_valid.json](https://drive.google.com/file/d/1CmXWfQRedVyWBDCsSkeF9g8gyqmpUA7C/view?usp=drive_link) | | `chat-finetune-input` | A file containing conversations | `messages: list[{role: string, content: string}]` | There must be two valid train examples and one valid evaluation example. | Chat Fine-tuning | In progress/not supported | `jsonl` | No | [train_celestial_fox.json](https://drive.google.com/file/d/19x6sOPXNWoZj9Jo989h09wd4IJ6Su9by/view?usp=drive_link) | | `embed-input` | A file containing text to be embedded | `text:string` | None of the rows in the file can be empty. | Embed job | Supported | `csv` and `jsonl` | Yes | [embed_jobs_sample_data.jsonl](https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl) / [embed_jobs_sample_data.csv](https://github.com/cohere-ai/notebooks/blob/main/notebooks/data/embed_jobs_sample_data.csv) |