From bb51883e725e8593e37331658713726940536198 Mon Sep 17 00:00:00 2001 From: Bertrand Thia <56003053+bt2513@users.noreply.github.com> Date: Mon, 22 Jul 2024 13:08:27 -0400 Subject: [PATCH] [RoBERTa] Minor clarifications to model doc (#31949) * minor edits and clarifications * address comment Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/model_doc/roberta.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/source/en/model_doc/roberta.md b/docs/source/en/model_doc/roberta.md index 364b5b37e5f3f0..2a1843d8885abe 100644 --- a/docs/source/en/model_doc/roberta.md +++ b/docs/source/en/model_doc/roberta.md @@ -51,19 +51,19 @@ This model was contributed by [julien-c](https://huggingface.co/julien-c). The o ## Usage tips -- This implementation is the same as [`BertModel`] with a tiny embeddings tweak as well as a setup - for Roberta pretrained models. -- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a +- This implementation is the same as [`BertModel`] with a minor tweak to the embeddings, as well as a setup + for RoBERTa pretrained models. +- RoBERTa has the same architecture as BERT but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a different pretraining scheme. -- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just - separate your segments with the separation token `tokenizer.sep_token` (or ``) -- Same as BERT with better pretraining tricks: - - * dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all - * together to reach 512 tokens (so the sentences are in an order than may span several documents) - * train with larger batches - * use BPE with bytes as a subunit and not characters (because of unicode characters) -- [CamemBERT](camembert) is a wrapper around RoBERTa. Refer to this page for usage examples. +- RoBERTa doesn't have `token_type_ids`, so you don't need to indicate which token belongs to which segment. Just + separate your segments with the separation token `tokenizer.sep_token` (or ``). +- RoBERTa is similar to BERT but with better pretraining techniques: + + * Dynamic masking: tokens are masked differently at each epoch, whereas BERT does it once and for all. + * Sentence packing: Sentences are packed together to reach 512 tokens (so the sentences are in an order that may span several documents). + * Larger batches: Training uses larger batches. + * Byte-level BPE vocabulary: Uses BPE with bytes as a subunit instead of characters, accommodating Unicode characters. +- [CamemBERT](camembert) is a wrapper around RoBERTa. Refer to its model page for usage examples. ## Resources