[docs] Spanish translation of tokenizer_summary.md #31154

aaronjimv · 2024-05-31T01:19:24Z

What does this PR do?

Add the Spanish version of tokenizer_summary.md to transformers/docs/source/es.

Fix some broken links in the en/ version.

#28936

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@stevhliu

aaronjimv · 2024-05-31T01:26:08Z

cc: @tadeodonegana. @gisturiz

Hello guys! I always appreciate your help with the translation review, I am open to any feedback. Thank you for your help.

HuggingFaceDocBuilderDev · 2024-05-31T16:03:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu

LGTM, thanks for also fixing the links in the English version 🤗

docs/source/es/tokenizer_summary.md

tadeodonegana

Great work as always @aaronjimv, i just left one minimal comment!

tadeodonegana · 2024-06-03T19:50:00Z

docs/source/es/tokenizer_summary.md

+
+<Youtube id="VFp38yj8h3A"/>
+
+Como vimos en [el tutorial de preprocessamiento](preprocessing), tokenizar un texto es dividirlo en palabras o subpalabras, que luego se convierten en indices o ids a través de una tabla de búsqueda. Convertir palabras o subpalabras en ids es sencillo, así que en esta descripción general, nos centraremos en dividir un texto en palabras o subpalabras (es decir, tokenizar un texto). Más específicamente, examinaremos los tres principales tipos de tokenizadores utilizados en 🤗 Transformers: [Byte-Pair Encoding (BPE)](#byte-pair-encoding), [WordPiece](#wordpiece) y [SentencePiece](#sentencepiece), y mostraremos ejemplos de qué tipo de tokenizador se utiliza en cada modelo.


I would change preprocessamiento to preprocesamiento, i believe this is a typo.

Hi @tadeodonegana, thanks for comment it! I appreciate the help.

stevhliu

Great job!

* add tokenizer_summary to es/_toctree.yml * add tokenizer_summary to es/ * fix link to Transformes XL in en/ * translate until Subword tokenization section * fix GPT link in en/ * fix other GPT link in en/ * fix typo in en/ * translate the doc * run make fixup * Remove .md in Transformer XL link * fix some link issues in es/ * fix typo

aaronjimv added 8 commits May 30, 2024 10:29

add tokenizer_summary to es/_toctree.yml

cd2f30d

add tokenizer_summary to es/

2d1c1cf

fix link to Transformes XL in en/

5272c80

translate until Subword tokenization section

71b8561

fix GPT link in en/

28eba0a

fix other GPT link in en/

3f63289

fix typo in en/

dbb6295

translate the doc

90cacaa

aaronjimv added 3 commits May 30, 2024 20:53

Merge branch 'main' into translate_tokenizer_summary

2d9aebb

run make fixup

2ddebe8

Remove .md in Transformer XL link

15a6a82

stevhliu approved these changes May 31, 2024

View reviewed changes

docs/source/es/tokenizer_summary.md Outdated Show resolved Hide resolved

docs/source/es/tokenizer_summary.md Outdated Show resolved Hide resolved

docs/source/es/tokenizer_summary.md Outdated Show resolved Hide resolved

aaronjimv added 2 commits May 31, 2024 14:03

Merge branch 'main' into translate_tokenizer_summary

dcd4ac7

fix some link issues in es/

fbc8fb7

tadeodonegana suggested changes Jun 3, 2024

View reviewed changes

aaronjimv requested a review from stevhliu June 3, 2024 20:15

tadeodonegana approved these changes Jun 3, 2024

View reviewed changes

stevhliu approved these changes Jun 3, 2024

View reviewed changes

stevhliu merged commit c73ee13 into huggingface:main Jun 3, 2024
8 checks passed

aaronjimv added 2 commits June 3, 2024 16:05

Merge branch 'main' into translate_tokenizer_summary

ec38286

fix typo

01e5f82

aaronjimv deleted the translate_tokenizer_summary branch June 12, 2024 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Spanish translation of tokenizer_summary.md #31154

[docs] Spanish translation of tokenizer_summary.md #31154

aaronjimv commented May 31, 2024

aaronjimv commented May 31, 2024

HuggingFaceDocBuilderDev commented May 31, 2024

stevhliu left a comment

tadeodonegana left a comment

tadeodonegana Jun 3, 2024 •

edited

Loading

aaronjimv Jun 3, 2024

stevhliu left a comment


		<Youtube id="VFp38yj8h3A"/>

		Como vimos en [el tutorial de preprocessamiento](preprocessing), tokenizar un texto es dividirlo en palabras o subpalabras, que luego se convierten en indices o ids a través de una tabla de búsqueda. Convertir palabras o subpalabras en ids es sencillo, así que en esta descripción general, nos centraremos en dividir un texto en palabras o subpalabras (es decir, tokenizar un texto). Más específicamente, examinaremos los tres principales tipos de tokenizadores utilizados en 🤗 Transformers: [Byte-Pair Encoding (BPE)](#byte-pair-encoding), [WordPiece](#wordpiece) y [SentencePiece](#sentencepiece), y mostraremos ejemplos de qué tipo de tokenizador se utiliza en cada modelo.

[docs] Spanish translation of tokenizer_summary.md #31154

[docs] Spanish translation of tokenizer_summary.md #31154

Conversation

aaronjimv commented May 31, 2024

What does this PR do?

Before submitting

Who can review?

aaronjimv commented May 31, 2024

HuggingFaceDocBuilderDev commented May 31, 2024

stevhliu left a comment

Choose a reason for hiding this comment

tadeodonegana left a comment

Choose a reason for hiding this comment

tadeodonegana Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

aaronjimv Jun 3, 2024

Choose a reason for hiding this comment

stevhliu left a comment

Choose a reason for hiding this comment

tadeodonegana Jun 3, 2024 •

edited

Loading