diff --git a/docs/source/ko/_toctree.yml b/docs/source/ko/_toctree.yml index f7a5f640107526..e955fae4ea9c3f 100644 --- a/docs/source/ko/_toctree.yml +++ b/docs/source/ko/_toctree.yml @@ -87,8 +87,8 @@ title: ๐Ÿค— Tokenizers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ํ† ํฌ๋‚˜์ด์ € ์‚ฌ์šฉํ•˜๊ธฐ - local: multilingual title: ๋‹ค๊ตญ์–ด ๋ชจ๋ธ ์ถ”๋ก ํ•˜๊ธฐ - - local: in_translation - title: (๋ฒˆ์—ญ์ค‘) Customize text generation strategy + - local: generation_strategies + title: ํ…์ŠคํŠธ ์ƒ์„ฑ ์ „๋žต ์‚ฌ์šฉ์ž ์ •์˜ - local: create_a_model title: ๋ชจ๋ธ๋ณ„ API ์‚ฌ์šฉํ•˜๊ธฐ - local: custom_models diff --git a/docs/source/ko/generation_strategies.md b/docs/source/ko/generation_strategies.md new file mode 100644 index 00000000000000..fd7b9bf905aa0a --- /dev/null +++ b/docs/source/ko/generation_strategies.md @@ -0,0 +1,337 @@ + + +# Text generation strategies[[text-generation-strategies]] + +ํ…์ŠคํŠธ ์ƒ์„ฑ์€ ๊ฐœ๋ฐฉํ˜• ํ…์ŠคํŠธ ์ž‘์„ฑ, ์š”์•ฝ, ๋ฒˆ์—ญ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP) ์ž‘์—…์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋˜ํ•œ ์Œ์„ฑ-ํ…์ŠคํŠธ ๋ณ€ํ™˜, ์‹œ๊ฐ-ํ…์ŠคํŠธ ๋ณ€ํ™˜๊ณผ ๊ฐ™์ด ํ…์ŠคํŠธ๋ฅผ ์ถœ๋ ฅ์œผ๋กœ ํ•˜๋Š” ์—ฌ๋Ÿฌ ํ˜ผํ•ฉ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ๋„ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๋ช‡๋ช‡ ๋ชจ๋ธ๋กœ๋Š” GPT2, XLNet, OpenAI GPT, CTRL, TransformerXL, XLM, Bart, T5, GIT, Whisper ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค. + + +[`~transformers.generation_utils.GenerationMixin.generate`] ๋ฉ”์„œ๋“œ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์ž‘์—…๋“ค์— ๋Œ€ํ•ด ํ…์ŠคํŠธ ๊ฒฐ๊ณผ๋ฌผ์„ ์ƒ์„ฑํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”: +* [ํ…์ŠคํŠธ ์š”์•ฝ](./tasks/summarization#inference) +* [์ด๋ฏธ์ง€ ์บก์…”๋‹](./model_doc/git#transformers.GitForCausalLM.forward.example) +* [์˜ค๋””์˜ค ์ „์‚ฌ](./model_doc/whisper#transformers.WhisperForConditionalGeneration.forward.example) + +generate ๋ฉ”์†Œ๋“œ์— ์ž…๋ ฅ๋˜๋Š” ๊ฐ’๋“ค์€ ๋ชจ๋ธ์˜ ๋ฐ์ดํ„ฐ ํ˜•ํƒœ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค. ์ด ๊ฐ’๋“ค์€ AutoTokenizer๋‚˜ AutoProcessor์™€ ๊ฐ™์€ ๋ชจ๋ธ์˜ ์ „์ฒ˜๋ฆฌ ํด๋ž˜์Šค์— ์˜ํ•ด ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ „์ฒ˜๋ฆฌ ์žฅ์น˜๊ฐ€ ํ•˜๋‚˜ ์ด์ƒ์˜ ์ž…๋ ฅ ์œ ํ˜•์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒฝ์šฐ, ๋ชจ๋“  ์ž…๋ ฅ์„ generate()์— ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ์˜ ์ „์ฒ˜๋ฆฌ ์žฅ์น˜์— ๋Œ€ํ•ด์„œ๋Š” ํ•ด๋‹น ๋ชจ๋ธ์˜ ๋ฌธ์„œ์—์„œ ์ž์„ธํžˆ ์•Œ์•„๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +ํ…์ŠคํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ์ถœ๋ ฅ ํ† ํฐ์„ ์„ ํƒํ•˜๋Š” ๊ณผ์ •์„ ๋””์ฝ”๋”ฉ์ด๋ผ๊ณ  ํ•˜๋ฉฐ, `generate()` ๋ฉ”์†Œ๋“œ๊ฐ€ ์‚ฌ์šฉํ•  ๋””์ฝ”๋”ฉ ์ „๋žต์„ ์‚ฌ์šฉ์ž๊ฐ€ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋””์ฝ”๋”ฉ ์ „๋žต์„ ์ˆ˜์ •ํ•˜๋Š” ๊ฒƒ์€ ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ฐ’๋“ค์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š์ง€๋งŒ, ์ƒ์„ฑ๋œ ์ถœ๋ ฅ์˜ ํ’ˆ์งˆ์— ๋ˆˆ์— ๋„๋Š” ์˜ํ–ฅ์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ…์ŠคํŠธ์—์„œ ๋ฐ˜๋ณต์„ ์ค„์ด๊ณ , ๋” ์ผ๊ด€์„ฑ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐ ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + + +์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‚ด์šฉ์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค: +* ๊ธฐ๋ณธ ์ƒ์„ฑ ์„ค์ • +* ์ผ๋ฐ˜์ ์ธ ๋””์ฝ”๋”ฉ ์ „๋žต๊ณผ ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ +* ๐Ÿค— Hub์—์„œ ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ์ž ์ •์˜ ์ƒ์„ฑ ์„ค์ •์„ ์ €์žฅํ•˜๊ณ  ๊ณต์œ ํ•˜๋Š” ๋ฐฉ๋ฒ• + +## ๊ธฐ๋ณธ ํ…์ŠคํŠธ ์ƒ์„ฑ ์„ค์ •[[default-text-generation-configuration]] + +๋ชจ๋ธ์˜ ๋””์ฝ”๋”ฉ ์ „๋žต์€ ์ƒ์„ฑ ์„ค์ •์—์„œ ์ •์˜๋ฉ๋‹ˆ๋‹ค. ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ [`pipeline`] ๋‚ด์—์„œ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ๋•Œ, ๋ชจ๋ธ์€ ๋‚ด๋ถ€์ ์œผ๋กœ ๊ธฐ๋ณธ ์ƒ์„ฑ ์„ค์ •์„ ์ ์šฉํ•˜๋Š” `PreTrainedModel.generate()` ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž๊ฐ€ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ์ž ์ •์˜ ์„ค์ •์„ ์ €์žฅํ•˜์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ์—๋„ ๊ธฐ๋ณธ ์„ค์ •์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. + +๋ชจ๋ธ์„ ๋ช…์‹œ์ ์œผ๋กœ ๋กœ๋“œํ•  ๋•Œ, `model.generation_config`์„ ํ†ตํ•ด ์ œ๊ณต๋˜๋Š” ์ƒ์„ฑ ์„ค์ •์„ ๊ฒ€์‚ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoModelForCausalLM + +>>> model = AutoModelForCausalLM.from_pretrained("distilbert/distilgpt2") +>>> model.generation_config +GenerationConfig { + "bos_token_id": 50256, + "eos_token_id": 50256, +} +``` + + `model.generation_config`๋ฅผ ์ถœ๋ ฅํ•˜๋ฉด ๊ธฐ๋ณธ ์„ค์ •๊ณผ ๋‹ค๋ฅธ ๊ฐ’๋“ค๋งŒ ํ‘œ์‹œ๋˜๊ณ , ๊ธฐ๋ณธ๊ฐ’๋“ค์€ ๋‚˜์—ด๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. + +๊ธฐ๋ณธ ์ƒ์„ฑ ์„ค์ •์€ ์ž…๋ ฅ ํ”„๋กฌํ”„ํŠธ์™€ ์ถœ๋ ฅ์„ ํ•ฉ์นœ ์ตœ๋Œ€ ํฌ๊ธฐ๋ฅผ 20 ํ† ํฐ์œผ๋กœ ์ œํ•œํ•˜์—ฌ ๋ฆฌ์†Œ์Šค ๋ถ€์กฑ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋””์ฝ”๋”ฉ ์ „๋žต์€ ํƒ์š• ํƒ์ƒ‰(greedy search)์œผ๋กœ, ๋‹ค์Œ ํ† ํฐ์œผ๋กœ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํ† ํฐ์„ ์„ ํƒํ•˜๋Š” ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ๋””์ฝ”๋”ฉ ์ „๋žต์ž…๋‹ˆ๋‹ค. ๋งŽ์€ ์ž‘์—…๊ณผ ์ž‘์€ ์ถœ๋ ฅ ํฌ๊ธฐ์— ๋Œ€ํ•ด์„œ๋Š” ์ด ๋ฐฉ๋ฒ•์ด ์ž˜ ์ž‘๋™ํ•˜์ง€๋งŒ, ๋” ๊ธด ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋ฉด ๋งค์šฐ ๋ฐ˜๋ณต์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +## ํ…์ŠคํŠธ ์ƒ์„ฑ ์‚ฌ์šฉ์ž ์ •์˜[[customize-text-generation]] + +ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ํ•ด๋‹น ๊ฐ’์„ [`generate`] ๋ฉ”์†Œ๋“œ์— ์ง์ ‘ ์ „๋‹ฌํ•˜์—ฌ `generation_config`์„ ์žฌ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: + +```python +>>> my_model.generate(**inputs, num_beams=4, do_sample=True) # doctest: +SKIP +``` + +๊ธฐ๋ณธ ๋””์ฝ”๋”ฉ ์ „๋žต์ด ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์— ์ž˜ ์ž‘๋™ํ•œ๋‹ค ํ•˜๋”๋ผ๋„, ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์กฐ์ •๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒƒ๋“ค์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค: + +- `max_new_tokens`: ์ƒ์„ฑํ•  ์ตœ๋Œ€ ํ† ํฐ ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ฆ‰, ํ”„๋กฌํ”„ํŠธ์— ์žˆ๋Š” ํ† ํฐ์„ ์ œ์™ธํ•œ ์ถœ๋ ฅ ์‹œํ€€์Šค์˜ ํฌ๊ธฐ์ž…๋‹ˆ๋‹ค. ์ถœ๋ ฅ์˜ ๊ธธ์ด๋ฅผ ์ค‘๋‹จ ๊ธฐ์ค€์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , ์ „์ฒด ์ƒ์„ฑ๋ฌผ์ด ์ผ์ • ์‹œ๊ฐ„์„ ์ดˆ๊ณผํ•  ๋•Œ ์ƒ์„ฑ์„ ์ค‘๋‹จํ•˜๊ธฐ๋กœ ์„ ํƒํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ์•Œ์•„๋ณด๋ ค๋ฉด [`StoppingCriteria`]๋ฅผ ํ™•์ธํ•˜์„ธ์š”. +- `num_beams`: 1๋ณด๋‹ค ํฐ ์ˆ˜์˜ ๋น”์„ ์ง€์ •ํ•จ์œผ๋กœ์จ, ํƒ์š• ํƒ์ƒ‰(greedy search)์—์„œ ๋น” ํƒ์ƒ‰(beam search)์œผ๋กœ ์ „ํ™˜ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด ์ „๋žต์€ ๊ฐ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ ์—ฌ๋Ÿฌ ๊ฐ€์„ค์„ ํ‰๊ฐ€ํ•˜๊ณ  ๊ฒฐ๊ตญ ์ „์ฒด ์‹œํ€€์Šค์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๊ฐ€์„ค์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ดˆ๊ธฐ ํ† ํฐ์˜ ํ™•๋ฅ ์ด ๋‚ฎ์•„ ํƒ์š• ํƒ์ƒ‰์— ์˜ํ•ด ๋ฌด์‹œ๋˜์—ˆ์„ ๋†’์€ ํ™•๋ฅ ์˜ ์‹œํ€€์Šค๋ฅผ ์‹๋ณ„ํ•  ์ˆ˜ ์žˆ๋Š” ์žฅ์ ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. +- `do_sample`: ์ด ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ `True`๋กœ ์„ค์ •ํ•˜๋ฉด, ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง, ๋น” ํƒ์ƒ‰ ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง, Top-K ์ƒ˜ํ”Œ๋ง ๋ฐ Top-p ์ƒ˜ํ”Œ๋ง๊ณผ ๊ฐ™์€ ๋””์ฝ”๋”ฉ ์ „๋žต์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ „๋žต๋“ค์€ ์ „์ฒด ์–ดํœ˜์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ ๋‹ค์Œ ํ† ํฐ์„ ์„ ํƒํ•˜๋ฉฐ, ์ „๋žต๋ณ„๋กœ ํŠน์ • ์กฐ์ •์ด ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. +- `num_return_sequences`: ๊ฐ ์ž…๋ ฅ์— ๋Œ€ํ•ด ๋ฐ˜ํ™˜ํ•  ์‹œํ€€์Šค ํ›„๋ณด์˜ ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ด ์˜ต์…˜์€ ๋น” ํƒ์ƒ‰(beam search)์˜ ๋ณ€ํ˜•๊ณผ ์ƒ˜ํ”Œ๋ง๊ณผ ๊ฐ™์ด ์—ฌ๋Ÿฌ ์‹œํ€€์Šค ํ›„๋ณด๋ฅผ ์ง€์›ํ•˜๋Š” ๋””์ฝ”๋”ฉ ์ „๋žต์—๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํƒ์š• ํƒ์ƒ‰(greedy search)๊ณผ ๋Œ€์กฐ ํƒ์ƒ‰(contrastive search) ๊ฐ™์€ ๋””์ฝ”๋”ฉ ์ „๋žต์€ ๋‹จ์ผ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. + +## ๋ชจ๋ธ์— ์‚ฌ์šฉ์ž ์ •์˜ ๋””์ฝ”๋”ฉ ์ „๋žต ์ €์žฅ[[save-a-custom-decoding-strategy-with-your-model]] + +ํŠน์ • ์ƒ์„ฑ ์„ค์ •์„ ๊ฐ€์ง„ ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์„ ๊ณต์œ ํ•˜๊ณ ์ž ํ•  ๋•Œ, ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: +* [`GenerationConfig`] ํด๋ž˜์Šค ์ธ์Šคํ„ด์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. +* ๋””์ฝ”๋”ฉ ์ „๋žต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. +* ์ƒ์„ฑ ์„ค์ •์„ [`GenerationConfig.save_pretrained`]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅํ•˜๋ฉฐ, `config_file_name` ์ธ์ž๋Š” ๋น„์›Œ๋‘ก๋‹ˆ๋‹ค. +* ๋ชจ๋ธ์˜ ์ €์žฅ์†Œ์— ์„ค์ •์„ ์—…๋กœ๋“œํ•˜๊ธฐ ์œ„ํ•ด `push_to_hub`๋ฅผ `True`๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoModelForCausalLM, GenerationConfig + +>>> model = AutoModelForCausalLM.from_pretrained("my_account/my_model") # doctest: +SKIP +>>> generation_config = GenerationConfig( +... max_new_tokens=50, do_sample=True, top_k=50, eos_token_id=model.config.eos_token_id +... ) +>>> generation_config.save_pretrained("my_account/my_model", push_to_hub=True) # doctest: +SKIP +``` + +๋‹จ์ผ ๋””๋ ‰ํ† ๋ฆฌ์— ์—ฌ๋Ÿฌ ์ƒ์„ฑ ์„ค์ •์„ ์ €์žฅํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋•Œ [`GenerationConfig.save_pretrained`]์˜ `config_file_name` ์ธ์ž๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋‚˜์ค‘์— [`GenerationConfig.from_pretrained`]๋กœ ์ด๋“ค์„ ์ธ์Šคํ„ด์Šคํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋‹จ์ผ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ์ƒ์„ฑ ์„ค์ •์„ ์ €์žฅํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค(์˜ˆ: ์ƒ˜ํ”Œ๋ง์„ ์ด์šฉํ•œ ์ฐฝ์˜์  ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ์œ„ํ•œ ํ•˜๋‚˜, ๋น” ํƒ์ƒ‰์„ ์ด์šฉํ•œ ์š”์•ฝ์„ ์œ„ํ•œ ๋‹ค๋ฅธ ํ•˜๋‚˜ ๋“ฑ). ๋ชจ๋ธ์— ์„ค์ • ํŒŒ์ผ์„ ์ถ”๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์ ์ ˆํ•œ Hub ๊ถŒํ•œ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig + +>>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") +>>> model = AutoModelForSeq2SeqLM.from_pretrained("google-t5/t5-small") + +>>> translation_generation_config = GenerationConfig( +... num_beams=4, +... early_stopping=True, +... decoder_start_token_id=0, +... eos_token_id=model.config.eos_token_id, +... pad_token=model.config.pad_token_id, +... ) + +>>> # ํŒ: Hub์— pushํ•˜๋ ค๋ฉด `push_to_hub=True`๋ฅผ ์ถ”๊ฐ€ +>>> translation_generation_config.save_pretrained("/tmp", "translation_generation_config.json") + +>>> # ๋ช…๋ช…๋œ ์ƒ์„ฑ ์„ค์ • ํŒŒ์ผ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ์„ ๋งค๊ฐœ๋ณ€์ˆ˜ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. +>>> generation_config = GenerationConfig.from_pretrained("/tmp", "translation_generation_config.json") +>>> inputs = tokenizer("translate English to French: Configuration files are easy to use!", return_tensors="pt") +>>> outputs = model.generate(**inputs, generation_config=generation_config) +>>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True)) +['Les fichiers de configuration sont faciles ร  utiliser!'] +``` + +## ์ŠคํŠธ๋ฆฌ๋ฐ[[streaming]] + +`generate()` ๋ฉ”์†Œ๋“œ๋Š” `streamer` ์ž…๋ ฅ์„ ํ†ตํ•ด ์ŠคํŠธ๋ฆฌ๋ฐ์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. `streamer` ์ž…๋ ฅ์€ `put()`๊ณผ `end()` ๋ฉ”์†Œ๋“œ๋ฅผ ๊ฐ€์ง„ ํด๋ž˜์Šค์˜ ์ธ์Šคํ„ด์Šค์™€ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค. ๋‚ด๋ถ€์ ์œผ๋กœ, `put()`์€ ์ƒˆ ํ† ํฐ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋ฉฐ, `end()`๋Š” ํ…์ŠคํŠธ ์ƒ์„ฑ์˜ ๋์„ ํ‘œ์‹œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. + + + +์ŠคํŠธ๋ฆฌ๋จธ ํด๋ž˜์Šค์˜ API๋Š” ์•„์ง ๊ฐœ๋ฐœ ์ค‘์ด๋ฉฐ, ํ–ฅํ›„ ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + + + +์‹ค์ œ๋กœ ๋‹ค์–‘ํ•œ ๋ชฉ์ ์„ ์œ„ํ•ด ์ž์ฒด ์ŠคํŠธ๋ฆฌ๋ฐ ํด๋ž˜์Šค๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ๋˜ํ•œ, ๊ธฐ๋ณธ์ ์ธ ์ŠคํŠธ๋ฆฌ๋ฐ ํด๋ž˜์Šค๋“ค๋„ ์ค€๋น„๋˜์–ด ์žˆ์–ด ๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, [`TextStreamer`] ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ `generate()`์˜ ์ถœ๋ ฅ์„ ํ™”๋ฉด์— ํ•œ ๋‹จ์–ด์”ฉ ์ŠคํŠธ๋ฆฌ๋ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค: + +```python +>>> from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer + +>>> tok = AutoTokenizer.from_pretrained("openai-community/gpt2") +>>> model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2") +>>> inputs = tok(["An increasing sequence: one,"], return_tensors="pt") +>>> streamer = TextStreamer(tok) + +>>> # ์ŠคํŠธ๋ฆฌ๋จธ๋Š” ํ‰์†Œ์™€ ๊ฐ™์€ ์ถœ๋ ฅ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ƒ์„ฑ๋œ ํ…์ŠคํŠธ๋„ ํ‘œ์ค€ ์ถœ๋ ฅ(stdout)์œผ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. +>>> _ = model.generate(**inputs, streamer=streamer, max_new_tokens=20) +An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven, +``` + +## ๋””์ฝ”๋”ฉ ์ „๋žต[[decoding-strategies]] + +`generate()` ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ๊ถ๊ทน์ ์œผ๋กœ `generation_config`์˜ ํŠน์ • ์กฐํ•ฉ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ๋””์ฝ”๋”ฉ ์ „๋žต์„ ํ™œ์„ฑํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐœ๋…์ด ์ฒ˜์Œ์ด๋ผ๋ฉด, ํ”ํžˆ ์‚ฌ์šฉ๋˜๋Š” ๋””์ฝ”๋”ฉ ์ „๋žต์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์„ค๋ช…ํ•˜๋Š” [์ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ](https://huggingface.co/blog/how-to-generate)๋ฅผ ์ฝ์–ด๋ณด๋Š” ๊ฒƒ์„ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. + +์—ฌ๊ธฐ์„œ๋Š” ๋””์ฝ”๋”ฉ ์ „๋žต์„ ์ œ์–ดํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ , ์ด๋ฅผ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. + +### ํƒ์š• ํƒ์ƒ‰(Greedy Search)[[greedy-search]] + +[`generate`]๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ํƒ์š• ํƒ์ƒ‰ ๋””์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์ด๋ฅผ ํ™œ์„ฑํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ณ„๋„์˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ์ด๋Š” `num_beams`๊ฐ€ 1๋กœ ์„ค์ •๋˜๊ณ  `do_sample=False`๋กœ ๋˜์–ด ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค." + +```python +>>> from transformers import AutoModelForCausalLM, AutoTokenizer + +>>> prompt = "I look forward to" +>>> checkpoint = "distilbert/distilgpt2" + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) +>>> outputs = model.generate(**inputs) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['I look forward to seeing you all again!\n\n\n\n\n\n\n\n\n\n\n'] +``` + +### ๋Œ€์กฐ ํƒ์ƒ‰(Contrastive search)[[contrastive-search]] + +2022๋…„ ๋…ผ๋ฌธ [A Contrastive Framework for Neural Text Generation](https://arxiv.org/abs/2202.06417)์—์„œ ์ œ์•ˆ๋œ ๋Œ€์กฐ ํƒ์ƒ‰ ๋””์ฝ”๋”ฉ ์ „๋žต์€ ๋ฐ˜๋ณต๋˜์ง€ ์•Š์œผ๋ฉด์„œ๋„ ์ผ๊ด€๋œ ๊ธด ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์žˆ์–ด ์šฐ์ˆ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. ๋Œ€์กฐ ํƒ์ƒ‰์ด ์ž‘๋™ํ•˜๋Š” ๋ฐฉ์‹์„ ์•Œ์•„๋ณด๋ ค๋ฉด [์ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ](https://huggingface.co/blog/introducing-csearch)๋ฅผ ํ™•์ธํ•˜์„ธ์š”. ๋Œ€์กฐ ํƒ์ƒ‰์˜ ๋™์ž‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ณ  ์ œ์–ดํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” `penalty_alpha`์™€ `top_k`์ž…๋‹ˆ๋‹ค: + +```python +>>> from transformers import AutoTokenizer, AutoModelForCausalLM + +>>> checkpoint = "openai-community/gpt2-large" +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) + +>>> prompt = "Hugging Face Company is" +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> outputs = model.generate(**inputs, penalty_alpha=0.6, top_k=4, max_new_tokens=100) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['Hugging Face Company is a family owned and operated business. We pride ourselves on being the best +in the business and our customer service is second to none.\n\nIf you have any questions about our +products or services, feel free to contact us at any time. We look forward to hearing from you!'] +``` + +### ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง(Multinomial sampling)[[multinomial-sampling]] + +ํƒ์š• ํƒ์ƒ‰(greedy search)์ด ํ•ญ์ƒ ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํ† ํฐ์„ ๋‹ค์Œ ํ† ํฐ์œผ๋กœ ์„ ํƒํ•˜๋Š” ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง(multinomial sampling, ์กฐ์ƒ ์ƒ˜ํ”Œ๋ง(ancestral sampling)์ด๋ผ๊ณ ๋„ ํ•จ)์€ ๋ชจ๋ธ์ด ์ œ๊ณตํ•˜๋Š” ์ „์ฒด ์–ดํœ˜์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค์Œ ํ† ํฐ์„ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. 0์ด ์•„๋‹Œ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๋ชจ๋“  ํ† ํฐ์€ ์„ ํƒ๋  ๊ธฐํšŒ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ, ๋ฐ˜๋ณต์˜ ์œ„ํ—˜์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด `do_sample=True` ๋ฐ `num_beams=1`์„ ์„ค์ •ํ•˜์„ธ์š”. + +```python +>>> from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed +>>> set_seed(0) # ์žฌํ˜„์„ฑ์„ ์œ„ํ•ด + +>>> checkpoint = "openai-community/gpt2-large" +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) + +>>> prompt = "Today was an amazing day because" +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_new_tokens=100) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['Today was an amazing day because when you go to the World Cup and you don\'t, or when you don\'t get invited, +that\'s a terrible feeling."'] +``` + +### ๋น” ํƒ์ƒ‰(Beam-search) ๋””์ฝ”๋”ฉ[[beam-search-decoding]] + +ํƒ์š• ๊ฒ€์ƒ‰(greedy search)๊ณผ ๋‹ฌ๋ฆฌ, ๋น” ํƒ์ƒ‰(beam search) ๋””์ฝ”๋”ฉ์€ ๊ฐ ์‹œ๊ฐ„ ๋‹จ๊ณ„์—์„œ ์—ฌ๋Ÿฌ ๊ฐ€์„ค์„ ์œ ์ง€ํ•˜๊ณ  ๊ฒฐ๊ตญ ์ „์ฒด ์‹œํ€€์Šค์— ๋Œ€ํ•ด ๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ๊ฐ€์„ค์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋‚ฎ์€ ํ™•๋ฅ ์˜ ์ดˆ๊ธฐ ํ† ํฐ์œผ๋กœ ์‹œ์ž‘ํ•˜๊ณ  ๊ทธ๋ฆฌ๋”” ๊ฒ€์ƒ‰์—์„œ ๋ฌด์‹œ๋˜์—ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์‹œํ€€์Šค๋ฅผ ์‹๋ณ„ํ•˜๋Š” ์ด์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. + +์ด ๋””์ฝ”๋”ฉ ์ „๋žต์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด `num_beams` (์ถ”์ ํ•  ๊ฐ€์„ค ์ˆ˜๋ผ๊ณ ๋„ ํ•จ)๋ฅผ 1๋ณด๋‹ค ํฌ๊ฒŒ ์ง€์ •ํ•˜์„ธ์š”. + +```python +>>> from transformers import AutoModelForCausalLM, AutoTokenizer + +>>> prompt = "It is astonishing how one can" +>>> checkpoint = "openai-community/gpt2-medium" + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) + +>>> outputs = model.generate(**inputs, num_beams=5, max_new_tokens=50) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['It is astonishing how one can have such a profound impact on the lives of so many people in such a short period of +time."\n\nHe added: "I am very proud of the work I have been able to do in the last few years.\n\n"I have'] +``` + +### ๋น” ํƒ์ƒ‰ ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง(Beam-search multinomial sampling)[[beam-search-multinomial-sampling]] + +์ด ๋””์ฝ”๋”ฉ ์ „๋žต์€ ์ด๋ฆ„์—์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด ๋น” ํƒ์ƒ‰๊ณผ ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง์„ ๊ฒฐํ•ฉํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋””์ฝ”๋”ฉ ์ „๋žต์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” `num_beams`๋ฅผ 1๋ณด๋‹ค ํฐ ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜๊ณ , `do_sample=True`๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, set_seed +>>> set_seed(0) # ์žฌํ˜„์„ฑ์„ ์œ„ํ•ด + +>>> prompt = "translate English to German: The house is wonderful." +>>> checkpoint = "google-t5/t5-small" + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) + +>>> outputs = model.generate(**inputs, num_beams=5, do_sample=True) +>>> tokenizer.decode(outputs[0], skip_special_tokens=True) +'Das Haus ist wunderbar.' +``` + +### ๋‹ค์–‘ํ•œ ๋น” ํƒ์ƒ‰ ๋””์ฝ”๋”ฉ(Diverse beam search decoding)[[diverse-beam-search-decoding]] + +๋‹ค์–‘ํ•œ ๋น” ํƒ์ƒ‰(Decoding) ์ „๋žต์€ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ๋” ๋‹ค์–‘ํ•œ ๋น” ์‹œํ€€์Šค ์ง‘ํ•ฉ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋น” ํƒ์ƒ‰ ์ „๋žต์˜ ํ™•์žฅ์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๋ ค๋ฉด, [๋‹ค์–‘ํ•œ ๋น” ํƒ์ƒ‰: ์‹ ๊ฒฝ ์‹œํ€€์Šค ๋ชจ๋ธ์—์„œ ๋‹ค์–‘ํ•œ ์†”๋ฃจ์…˜ ๋””์ฝ”๋”ฉํ•˜๊ธฐ](https://arxiv.org/pdf/1610.02424.pdf)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. ์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค: `num_beams`, `num_beam_groups`, ๊ทธ๋ฆฌ๊ณ  `diversity_penalty`. ๋‹ค์–‘์„ฑ ํŒจ๋„ํ‹ฐ๋Š” ๊ทธ๋ฃน ๊ฐ„์— ์ถœ๋ ฅ์ด ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๊ฒƒ์ด๋ฉฐ, ๊ฐ ๊ทธ๋ฃน ๋‚ด์—์„œ ๋น” ํƒ์ƒ‰์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM + +>>> checkpoint = "google/pegasus-xsum" +>>> prompt = ( +... "The Permaculture Design Principles are a set of universal design principles " +... "that can be applied to any location, climate and culture, and they allow us to design " +... "the most efficient and sustainable human habitation and food production systems. " +... "Permaculture is a design system that encompasses a wide variety of disciplines, such " +... "as ecology, landscape design, environmental science and energy conservation, and the " +... "Permaculture design principles are drawn from these various disciplines. Each individual " +... "design principle itself embodies a complete conceptual framework based on sound " +... "scientific principles. When we bring all these separate principles together, we can " +... "create a design system that both looks at whole systems, the parts that these systems " +... "consist of, and how those parts interact with each other to create a complex, dynamic, " +... "living system. Each design principle serves as a tool that allows us to integrate all " +... "the separate parts of a design, referred to as elements, into a functional, synergistic, " +... "whole system, where the elements harmoniously interact and work together in the most " +... "efficient way possible." +... ) + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint) + +>>> outputs = model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30, diversity_penalty=1.0) +>>> tokenizer.decode(outputs[0], skip_special_tokens=True) +'The Design Principles are a set of universal design principles that can be applied to any location, climate and +culture, and they allow us to design the' +``` + +์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋””์ฝ”๋”ฉ ์ „๋žต์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์ฃผ์š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. [`generate`] ๋ฉ”์„œ๋“œ์— ๋Œ€ํ•œ ๊ณ ๊ธ‰ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์กด์žฌํ•˜๋ฏ€๋กœ [`generate`] ๋ฉ”์„œ๋“œ์˜ ๋™์ž‘์„ ๋”์šฑ ์„ธ๋ถ€์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ „์ฒด ๋ชฉ๋ก์€ [API ๋ฌธ์„œ](./main_classes/text_generation.md)๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. + +### ์ถ”๋ก  ๋””์ฝ”๋”ฉ(Speculative Decoding)[[speculative-decoding]] + +์ถ”๋ก  ๋””์ฝ”๋”ฉ(๋ณด์กฐ ๋””์ฝ”๋”ฉ(assisted decoding)์œผ๋กœ๋„ ์•Œ๋ ค์ง)์€ ๋™์ผํ•œ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ›จ์”ฌ ์ž‘์€ ๋ณด์กฐ ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ๋ช‡ ๊ฐ€์ง€ ํ›„๋ณด ํ† ํฐ์„ ์ƒ์„ฑํ•˜๋Š” ์ƒ์œ„ ๋ชจ๋ธ์˜ ๋””์ฝ”๋”ฉ ์ „๋žต์„ ์ˆ˜์ •ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ฃผ ๋ชจ๋ธ์€ ๋‹จ์ผ ์ „๋ฐฉ ํ†ต๊ณผ๋กœ ํ›„๋ณด ํ† ํฐ์„ ๊ฒ€์ฆํ•จ์œผ๋กœ์จ ๋””์ฝ”๋”ฉ ๊ณผ์ •์„ ๊ฐ€์†ํ™”ํ•ฉ๋‹ˆ๋‹ค. `do_sample=True`์ผ ๊ฒฝ์šฐ, [์ถ”๋ก  ๋””์ฝ”๋”ฉ ๋…ผ๋ฌธ](https://arxiv.org/pdf/2211.17192.pdf)์— ์†Œ๊ฐœ๋œ ํ† ํฐ ๊ฒ€์ฆ๊ณผ ์žฌ์ƒ˜ํ”Œ๋ง ๋ฐฉ์‹์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. + +ํ˜„์žฌ, ํƒ์š• ๊ฒ€์ƒ‰(greedy search)๊ณผ ์ƒ˜ํ”Œ๋ง๋งŒ์ด ์ง€์›๋˜๋Š” ๋ณด์กฐ ๋””์ฝ”๋”ฉ(assisted decoding) ๊ธฐ๋Šฅ์„ ํ†ตํ•ด, ๋ณด์กฐ ๋””์ฝ”๋”ฉ์€ ๋ฐฐ์น˜ ์ž…๋ ฅ์„ ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ณด์กฐ ๋””์ฝ”๋”ฉ์— ๋Œ€ํ•ด ๋” ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด, [์ด ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ](https://huggingface.co/blog/assisted-generation)๋ฅผ ํ™•์ธํ•ด ์ฃผ์„ธ์š”. + +๋ณด์กฐ ๋””์ฝ”๋”ฉ์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ `assistant_model` ์ธ์ˆ˜๋ฅผ ์„ค์ •ํ•˜์„ธ์š”. + +```python +>>> from transformers import AutoModelForCausalLM, AutoTokenizer + +>>> prompt = "Alice and Bob" +>>> checkpoint = "EleutherAI/pythia-1.4b-deduped" +>>> assistant_checkpoint = "EleutherAI/pythia-160m-deduped" + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) +>>> assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) +>>> outputs = model.generate(**inputs, assistant_model=assistant_model) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['Alice and Bob are sitting in a bar. Alice is drinking a beer and Bob is drinking a'] +``` + +์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•๊ณผ ํ•จ๊ป˜ ๋ณด์กฐ ๋””์ฝ”๋”ฉ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๋‹คํ•ญ ์ƒ˜ํ”Œ๋ง๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ `temperature` ์ธ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌด์ž‘์œ„์„ฑ์„ ์ œ์–ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ณด์กฐ ๋””์ฝ”๋”ฉ์—์„œ๋Š” `temperature`๋ฅผ ๋‚ฎ์ถ”๋ฉด ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. + +```python +>>> from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed +>>> set_seed(42) # ์žฌํ˜„์„ฑ์„ ์œ„ํ•ด + +>>> prompt = "Alice and Bob" +>>> checkpoint = "EleutherAI/pythia-1.4b-deduped" +>>> assistant_checkpoint = "EleutherAI/pythia-160m-deduped" + +>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint) +>>> inputs = tokenizer(prompt, return_tensors="pt") + +>>> model = AutoModelForCausalLM.from_pretrained(checkpoint) +>>> assistant_model = AutoModelForCausalLM.from_pretrained(assistant_checkpoint) +>>> outputs = model.generate(**inputs, assistant_model=assistant_model, do_sample=True, temperature=0.5) +>>> tokenizer.batch_decode(outputs, skip_special_tokens=True) +['Alice and Bob are going to the same party. It is a small party, in a small'] +```