Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove random newlines from datasets docs #66

Merged
merged 2 commits into from
Aug 23, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions fern/pages/get-started/datasets.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -175,10 +175,10 @@ The following table describes the types of datasets supported by the Dataset API

| Dataset Type | Description | Schema | Rules | Task Type | Status | File Types Supported | Are Metadata Fields Supported? | Sample File |
|----------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|---------------------------|--------------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `single-label-classification-finetune-input` | A file containing text and a single label (class) for each text | `text:string \nlabel:string` | You must include 40 valid train examples, \nwith five examples per label. A label cannot be present in all examples \nThere must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `csv` and `jsonl` | No | [Art classification file](https://drive.google.com/file/d/15-CchSiALUQwto4b-yAMWhdUqz8vfwQ1/view?usp=drive_link) |
| `multi-label-classification-finetune-input` | A file containing text and an array of label(s) (class) for each text | `text:string \nlabel:list[string]` | You must include 40 valid train examples, with five examples per label \nA label cannot be present in all examples. There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `jsonl` | No | n/a |
| `reranker-finetune-input` | A file containing queries and an array of passages relevant to the query. There must also be "hard negatives", passages semantically similar but ultimately not relevant. | `query:string \nrelevant_passages:list[string] \nhard_negatives:list[string]` | There must be 256 train examples and at least 64 evaluation examples. There must be at least one relevant passage, with no overlap between relevant passage and hard negatives. | Rerank Fine-tuning | Supported | `jsonl` | No | [train_valid.json](https://drive.google.com/file/d/1CmXWfQRedVyWBDCsSkeF9g8gyqmpUA7C/view?usp=drive_link) |
| `chat-finetune-input` | A file containing conversations | `messages: list[Message]` \n \n`- Message - \n role: text \n context: text` | There must be two valid train examples and one valid evaluation example. | Chat Fine-tuning | In progress/not supported | `jsonl` | No | [train_celestial_fox.json](https://drive.google.com/file/d/19x6sOPXNWoZj9Jo989h09wd4IJ6Su9by/view?usp=drive_link) |
| `single-label-classification-finetune-input` | A file containing text and a single label (class) for each text | `text:string label:string` | You must include 40 valid train examples, with five examples per label. A label cannot be present in all examples There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `csv` and `jsonl` | No | [Art classification file](https://drive.google.com/file/d/15-CchSiALUQwto4b-yAMWhdUqz8vfwQ1/view?usp=drive_link) |
| `multi-label-classification-finetune-input` | A file containing text and an array of label(s) (class) for each text | `text:string label:list[string]` | You must include 40 valid train examples, with five examples per label. A label cannot be present in all examples. There must be 24 valid evaluation examples. | Classification Fine-tuning | Supported | `jsonl` | No | n/a |
| `reranker-finetune-input` | A file containing queries and an array of passages relevant to the query. There must also be "hard negatives", passages semantically similar but ultimately not relevant. | `query:string relevant_passages:list[string] hard_negatives:list[string]` | There must be 256 train examples and at least 64 evaluation examples. There must be at least one relevant passage, with no overlap between relevant passage and hard negatives. | Rerank Fine-tuning | Supported | `jsonl` | No | [train_valid.json](https://drive.google.com/file/d/1CmXWfQRedVyWBDCsSkeF9g8gyqmpUA7C/view?usp=drive_link) |
| `chat-finetune-input` | A file containing conversations | `messages: list[{role: string, content: string}]` | There must be two valid train examples and one valid evaluation example. | Chat Fine-tuning | In progress/not supported | `jsonl` | No | [train_celestial_fox.json](https://drive.google.com/file/d/19x6sOPXNWoZj9Jo989h09wd4IJ6Su9by/view?usp=drive_link) |
| `embed-input` | A file containing text to be embedded | `text:string` | None of the rows in the file can be empty. | Embed job | Supported | `csv` and `jsonl` | Yes | [embed_jobs_sample_data.jsonl](https://raw.githubusercontent.com/cohere-ai/notebooks/main/notebooks/data/embed_jobs_sample_data.jsonl) / [embed_jobs_sample_data.csv](https://github.com/cohere-ai/notebooks/blob/main/notebooks/data/embed_jobs_sample_data.csv) |


Expand Down
Loading