๐ค Transformers๋ ๋ถ๋ฅ, ์ ๋ณด ์ถ์ถ, ์ง๋ฌธ ๋ต๋ณ, ์์ฝ, ๋ฒ์ญ, ๋ฌธ์ฅ ์์ฑ ๋ฑ์ 100๊ฐ ์ด์์ ์ธ์ด๋ก ์ํํ ์ ์๋ ์์ฒ๊ฐ์ ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ ์ ๊ณตํฉ๋๋ค. ์ฐ๋ฆฌ์ ๋ชฉํ๋ ๋ชจ๋๊ฐ ์ต์ฒจ๋จ์ NLP ๊ธฐ์ ์ ์ฝ๊ฒ ์ฌ์ฉํ๋ ๊ฒ์ ๋๋ค.
๐ค Transformers๋ ์ด๋ฌํ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ๋น ๋ฅด๊ฒ ๋ค์ด๋ก๋ํด ํน์ ํ ์คํธ์ ์ฌ์ฉํ๊ณ , ์ํ๋ ๋ฐ์ดํฐ๋ก fine-tuningํด ์ปค๋ฎค๋ํฐ๋ ์ฐ๋ฆฌ์ ๋ชจ๋ธ ํ๋ธ์ ๊ณต์ ํ ์ ์๋๋ก API๋ฅผ ์ ๊ณตํฉ๋๋ค. ๋ํ, ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ ์ํ๋ ๊ฐ ํ์ด์ฌ ๋ชจ๋์ ์์ ํ ๋ ๋ฆฝ์ ์ด์ฌ์ ์ฐ๊ตฌ ์คํ์ ์ํด ์์ฝ๊ฒ ์์ ํ ์ ์์ต๋๋ค.
๐ค Transformers๋ ๊ฐ์ฅ ์ ๋ช ํ 3๊ฐ์ ๋ฅ๋ฌ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ง์ํฉ๋๋ค. ์ด๋ค์ ์๋ก ์๋ฒฝํ ์ฐ๋๋ฉ๋๋ค โ Jax, PyTorch, TensorFlow. ๊ฐ๋จํ๊ฒ ์ด ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ค ํ๋๋ก ๋ชจ๋ธ์ ํ์ตํ๊ณ , ๋ ๋ค๋ฅธ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ก ์ถ๋ก ์ ์ํด ๋ชจ๋ธ์ ๋ถ๋ฌ์ฌ ์ ์์ต๋๋ค.
๋๋ถ๋ถ์ ๋ชจ๋ธ์ ๋ชจ๋ธ ํ๋ธ ํ์ด์ง์์ ๋ฐ๋ก ํ ์คํธํด๋ณผ ์ ์์ต๋๋ค. ๊ณต๊ฐ ๋ฐ ๋น๊ณต๊ฐ ๋ชจ๋ธ์ ์ํ ๋น๊ณต๊ฐ ๋ชจ๋ธ ํธ์คํ , ๋ฒ์ ๊ด๋ฆฌ, ์ถ๋ก API๋ ์ ๊ณตํฉ๋๋ค.
์์:
- BERT๋ก ๋ง์คํน๋ ๋จ์ด ์์ฑํ๊ธฐ
- Electra๋ฅผ ์ด์ฉํ ๊ฐ์ฒด๋ช ์ธ์
- GPT-2๋ก ํ ์คํธ ์์ฑํ๊ธฐ
- RoBERTa๋ก ์์ฐ์ด ์ถ๋ก ํ๊ธฐ
- BART๋ฅผ ์ด์ฉํ ์์ฝ
- DistilBERT๋ฅผ ์ด์ฉํ ์ง๋ฌธ ๋ต๋ณ
- T5๋ก ๋ฒ์ญํ๊ธฐ
Transformer์ ๊ธ์ฐ๊ธฐ ๋ ์ด ์ ์ฅ์์ ํ ์คํธ ์์ฑ ๋ฅ๋ ฅ์ ๊ดํ Hugging Face ํ์ ๊ณต์ ๋ฐ๋ชจ์ ๋๋ค.
์ํ๋ ํ
์คํธ์ ๋ฐ๋ก ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์๋๋ก, ์ฐ๋ฆฌ๋ pipeline
API๋ฅผ ์ ๊ณตํฉ๋๋ค. Pipeline์ ์ฌ์ ํ์ต ๋ชจ๋ธ๊ณผ ๊ทธ ๋ชจ๋ธ์ ํ์ตํ ๋ ์ ์ฉํ ์ ์ฒ๋ฆฌ ๋ฐฉ์์ ํ๋๋ก ํฉ์นฉ๋๋ค. ๋ค์์ ๊ธ์ ์ ์ธ ํ
์คํธ์ ๋ถ์ ์ ์ธ ํ
์คํธ๋ฅผ ๋ถ๋ฅํ๊ธฐ ์ํด pipeline์ ์ฌ์ฉํ ๊ฐ๋จํ ์์์
๋๋ค:
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
์ฝ๋์ ๋๋ฒ์งธ ์ค์ pipeline์ด ์ฌ์ฉํ๋ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ๋ค์ด๋ก๋ํ๊ณ ์บ์๋ก ์ ์ฅํฉ๋๋ค. ์ธ๋ฒ์งธ ์ค์์ ๊ทธ ๋ชจ๋ธ์ด ์ฃผ์ด์ง ํ ์คํธ๋ฅผ ํ๊ฐํฉ๋๋ค. ์ฌ๊ธฐ์ ๋ชจ๋ธ์ 99.97%์ ํ๋ฅ ๋ก ํ ์คํธ๊ฐ ๊ธ์ ์ ์ด๋ผ๊ณ ํ๊ฐํ์ต๋๋ค.
๋ง์ NLP ๊ณผ์ ๋ค์ pipeline
์ผ๋ก ๋ฐ๋ก ์ํํ ์ ์์ต๋๋ค. ์๋ฅผ ๋ค์ด, ์ง๋ฌธ๊ณผ ๋ฌธ๋งฅ์ด ์ฃผ์ด์ง๋ฉด ์์ฝ๊ฒ ๋ต๋ณ์ ์ถ์ถํ ์ ์์ต๋๋ค:
>>> from transformers import pipeline
# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}
๋ต๋ณ๋ฟ๋ง ์๋๋ผ, ์ฌ๊ธฐ์ ์ฌ์ฉ๋ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ํ์ ๋์ ํ ํฌ๋์ด์ฆ๋ ๋ฌธ์ฅ ์ ๋ต๋ณ์ ์์์ , ๋์ ๊น์ง ๋ฐํํฉ๋๋ค. ์ด ํํ ๋ฆฌ์ผ์์ pipeline
API๊ฐ ์ง์ํ๋ ๋ค์ํ ๊ณผ์ ๋ฅผ ํ์ธํ ์ ์์ต๋๋ค.
์ฝ๋ 3์ค๋ก ์ํ๋ ๊ณผ์ ์ ๋ง๊ฒ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ๋ค์ด๋ก๋ ๋ฐ๊ณ ์ฌ์ฉํ ์ ์์ต๋๋ค. ๋ค์์ PyTorch ๋ฒ์ ์ ๋๋ค:
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
๋ค์์ TensorFlow ๋ฒ์ ์ ๋๋ค:
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
ํ ํฌ๋์ด์ ๋ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ๋ชจ๋ ์ ์ฒ๋ฆฌ๋ฅผ ์ฑ ์์ง๋๋ค. ๊ทธ๋ฆฌ๊ณ (์์ ์์์ฒ๋ผ) 1๊ฐ์ ์คํธ๋ง์ด๋ ๋ฆฌ์คํธ๋ ์ฒ๋ฆฌํ ์ ์์ต๋๋ค. ํ ํฌ๋์ด์ ๋ ๋์ ๋๋ฆฌ๋ฅผ ๋ฐํํ๋๋ฐ, ์ด๋ ๋ค์ด์คํธ๋ฆผ ์ฝ๋์ ์ฌ์ฉํ๊ฑฐ๋ ์ธํจํน ์ฐ์ฐ์ ** ๋ฅผ ์ด์ฉํด ๋ชจ๋ธ์ ๋ฐ๋ก ์ ๋ฌํ ์๋ ์์ต๋๋ค.
๋ชจ๋ธ ์์ฒด๋ ์ผ๋ฐ์ ์ผ๋ก ์ฌ์ฉ๋๋ Pytorch nn.Module
๋ TensorFlow tf.keras.Model
์
๋๋ค. ์ด ํํ ๋ฆฌ์ผ์ ์ด๋ฌํ ๋ชจ๋ธ์ ํ์ค์ ์ธ PyTorch๋ TensorFlow ํ์ต ๊ณผ์ ์์ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ, ๋๋ ์๋ก์ด ๋ฐ์ดํฐ๋ก fine-tuneํ๊ธฐ ์ํด Trainer
API๋ฅผ ์ฌ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ค๋ช
ํด์ค๋๋ค.
-
์์ฝ๊ฒ ์ฌ์ฉํ ์ ์๋ ์ต์ฒจ๋จ ๋ชจ๋ธ:
- NLU์ NLG ๊ณผ์ ์์ ๋ฐ์ด๋ ์ฑ๋ฅ์ ๋ณด์ ๋๋ค.
- ๊ต์ก์ ์ค๋ฌด์์๊ฒ ์ง์ ์ฅ๋ฒฝ์ด ๋ฎ์ต๋๋ค.
- 3๊ฐ์ ํด๋์ค๋ง ๋ฐฐ์ฐ๋ฉด ๋ฐ๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค.
- ํ๋์ API๋ก ๋ชจ๋ ์ฌ์ ํ์ต ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์์ต๋๋ค.
-
๋ ์ ์ ๊ณ์ฐ ๋น์ฉ, ๋ ์ ์ ํ์ ๋ฐ์๊ตญ:
- ์ฐ๊ตฌ์๋ค์ ๋ชจ๋ธ์ ๊ณ์ ๋ค์ ํ์ต์ํค๋ ๋์ ํ์ต๋ ๋ชจ๋ธ์ ๊ณต์ ํ ์ ์์ต๋๋ค.
- ์ค๋ฌด์๋ค์ ํ์ต์ ํ์ํ ์๊ฐ๊ณผ ๋น์ฉ์ ์ ์ฝํ ์ ์์ต๋๋ค.
- ์์ญ๊ฐ์ ๋ชจ๋ธ ๊ตฌ์กฐ, 2,000๊ฐ ์ด์์ ์ฌ์ ํ์ต ๋ชจ๋ธ, 100๊ฐ ์ด์์ ์ธ์ด๋ก ํ์ต๋ ๋ชจ๋ธ ๋ฑ.
-
๋ชจ๋ธ์ ๊ฐ ์์ ์ฃผ๊ธฐ์ ์ ํฉํ ํ๋ ์์ํฌ:
- ์ฝ๋ 3์ค๋ก ์ต์ฒจ๋จ ๋ชจ๋ธ์ ํ์ตํ์ธ์.
- ์์ ๋กญ๊ฒ ๋ชจ๋ธ์ TF2.0๋ PyTorch ํ๋ ์์ํฌ๋ก ๋ณํํ์ธ์.
- ํ์ต, ํ๊ฐ, ๊ณต๊ฐ ๋ฑ ๊ฐ ๋จ๊ณ์ ๋ง๋ ํ๋ ์์ํฌ๋ฅผ ์ํ๋๋๋ก ์ ํํ์ธ์.
-
ํ์ํ ๋๋ก ๋ชจ๋ธ์ด๋ ์์๋ฅผ ์ปค์คํฐ๋ง์ด์ฆํ์ธ์:
- ์ฐ๋ฆฌ๋ ์ ์๊ฐ ๊ณต๊ฐํ ๊ฒฐ๊ณผ๋ฅผ ์ฌํํ๊ธฐ ์ํด ๊ฐ ๋ชจ๋ธ ๊ตฌ์กฐ์ ์์๋ฅผ ์ ๊ณตํฉ๋๋ค.
- ๋ชจ๋ธ ๋ด๋ถ ๊ตฌ์กฐ๋ ๊ฐ๋ฅํ ์ผ๊ด์ ์ผ๋ก ๊ณต๊ฐ๋์ด ์์ต๋๋ค.
- ๋น ๋ฅธ ์คํ์ ์ํด ๋ชจ๋ธ ํ์ผ์ ๋ผ์ด๋ธ๋ฌ๋ฆฌ์ ๋ ๋ฆฝ์ ์ผ๋ก ์ฌ์ฉ๋ ์ ์์ต๋๋ค.
- ์ด ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ ์ ๊ฒฝ๋ง ๋ธ๋ก์ ๋ง๋ค๊ธฐ ์ํ ๋ชจ๋์ด ์๋๋๋ค. ์ฐ๊ตฌ์๋ค์ด ์ฌ๋ฌ ํ์ผ์ ์ดํด๋ณด์ง ์๊ณ ๋ฐ๋ก ๊ฐ ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์๋๋ก, ๋ชจ๋ธ ํ์ผ ์ฝ๋์ ์ถ์ํ ์์ค์ ์ ์ ํ๊ฒ ์ ์งํ์ต๋๋ค.
- ํ์ต API๋ ๋ชจ๋ ๋ชจ๋ธ์ ์ ์ฉํ ์ ์๋๋ก ๋ง๋ค์ด์ง์ง ์์์ง๋ง, ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ ๊ณตํ๋ ๋ชจ๋ธ๋ค์ ์ ์ฉํ ์ ์๋๋ก ์ต์ ํ๋์์ต๋๋ค. ์ผ๋ฐ์ ์ธ ๋จธ์ ๋ฌ๋์ ์ํด์ , ๋ค๋ฅธ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ์ธ์.
- ๊ฐ๋ฅํ ๋ง์ ์ฌ์ฉ ์์๋ฅผ ๋ณด์ฌ๋๋ฆฌ๊ณ ์ถ์ด์, ์์ ํด๋์ ์คํฌ๋ฆฝํธ๋ฅผ ์ค๋นํ์ต๋๋ค. ์ด ์คํฌ๋ฆฝํธ๋ค์ ์์ ์์ด ํน์ ํ ๋ฌธ์ ์ ๋ฐ๋ก ์ ์ฉํ์ง ๋ชปํ ์ ์์ต๋๋ค. ํ์์ ๋ง๊ฒ ์ผ๋ถ ์ฝ๋๋ฅผ ์์ ํด์ผ ํ ์ ์์ต๋๋ค.
์ด ์ ์ฅ์๋ Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+, TensorFlow 2.3+์์ ํ ์คํธ ๋์์ต๋๋ค.
๊ฐ์ ํ๊ฒฝ์ ๐ค Transformers๋ฅผ ์ค์นํ์ธ์. Python ๊ฐ์ ํ๊ฒฝ์ ์ต์ํ์ง ์๋ค๋ฉด, ์ฌ์ฉ์ ๊ฐ์ด๋๋ฅผ ํ์ธํ์ธ์.
์ฐ์ , ์ฌ์ฉํ Python ๋ฒ์ ์ผ๋ก ๊ฐ์ ํ๊ฒฝ์ ๋ง๋ค๊ณ ์คํํ์ธ์.
๊ทธ ๋ค์, Flax, PyTorch, TensorFlow ์ค ์ ์ด๋ ํ๋๋ ์ค์นํด์ผ ํฉ๋๋ค. ํ๋ซํผ์ ๋ง๋ ์ค์น ๋ช ๋ น์ด๋ฅผ ํ์ธํ๊ธฐ ์ํด TensorFlow ์ค์น ํ์ด์ง, PyTorch ์ค์น ํ์ด์ง, Flax ์ค์น ํ์ด์ง๋ฅผ ํ์ธํ์ธ์.
์ด๋ค ์ค ์ ์ด๋ ํ๋๊ฐ ์ค์น๋์๋ค๋ฉด, ๐ค Transformers๋ ๋ค์๊ณผ ๊ฐ์ด pip์ ์ด์ฉํด ์ค์นํ ์ ์์ต๋๋ค:
pip install transformers
์์๋ค์ ์ฒดํํด๋ณด๊ณ ์ถ๊ฑฐ๋, ์ต์ต์ต์ฒจ๋จ ์ฝ๋๋ฅผ ์ํ๊ฑฐ๋, ์๋ก์ด ๋ฒ์ ์ด ๋์ฌ ๋๊น์ง ๊ธฐ๋ค๋ฆด ์ ์๋ค๋ฉด ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์์ค์์ ๋ฐ๋ก ์ค์นํ์ ์ผ ํฉ๋๋ค.
Transformers ๋ฒ์ v4.0.0๋ถํฐ, conda ์ฑ๋์ด ์๊ฒผ์ต๋๋ค: huggingface
.
๐ค Transformers๋ ๋ค์๊ณผ ๊ฐ์ด conda๋ก ์ค์นํ ์ ์์ต๋๋ค:
conda install -c huggingface transformers
Flax, PyTorch, TensorFlow ์ค์น ํ์ด์ง์์ ์ด๋ค์ conda๋ก ์ค์นํ๋ ๋ฐฉ๋ฒ์ ํ์ธํ์ธ์.
๐ค Transformers๊ฐ ์ ๊ณตํ๋ ๋ชจ๋ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ ๋ huggingface.co ๋ชจ๋ธ ํ๋ธ์ ์๋ฒฝํ ์ฐ๋๋์ด ์์ต๋๋ค. ๊ฐ์ธ๊ณผ ๊ธฐ๊ด์ด ๋ชจ๋ธ ํ๋ธ์ ์ง์ ์ ๋ก๋ํ ์ ์์ต๋๋ค.
ํ์ฌ ์ฌ์ฉ ๊ฐ๋ฅํ ๋ชจ๋ธ ์ฒดํฌํฌ์ธํธ์ ๊ฐ์:
๐ค Transformers๋ ๋ค์ ๋ชจ๋ธ๋ค์ ์ ๊ณตํฉ๋๋ค (๊ฐ ๋ชจ๋ธ์ ์์ฝ์ ์ฌ๊ธฐ์ ํ์ธํ์ธ์):
- ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
- BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
- BARThez (from รcole polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
- BARTpho (from VinAI Research) released with the paper BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
- BEiT (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
- BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
- BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
- BERTweet (from VinAI Research) released with the paper BERTweet: A pre-trained language model for English Tweets by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
- BigBird-Pegasus (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
- BigBird-RoBERTa (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
- Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
- BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
- BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
- ByT5 (from Google Research) released with the paper ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
- CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suรกrez*, Yoann Dupont, Laurent Romary, รric Villemonte de la Clergerie, Djamรฉ Seddah and Benoรฎt Sagot.
- CANINE (from Google Research) released with the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
- CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
- ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
- CPM (from Tsinghua University) released with the paper CPM: A Large-scale Generative Chinese Pre-trained Language Model by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
- CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
- DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
- DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
- DeiT (from Facebook) released with the paper Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervรฉ Jรฉgou.
- DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
- DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
- DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
- DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oฤuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
- ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
- EncoderDecoder (from Google Research) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
- FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loรฏc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoรฎt Crabbรฉ, Laurent Besacier, Didier Schwab.
- FNet (from Google Research) released with the paper FNet: Mixing Tokens with Fourier Transforms by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
- Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
- GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
- GPT Neo (from EleutherAI) released in the repository EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
- GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
- GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
- Hubert (from Facebook) released with the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
- I-BERT (from Berkeley) released with the paper I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
- ImageGPT (from OpenAI) released with the paper Generative Pretraining from Pixels by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
- LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
- LayoutLMv2 (from Microsoft Research Asia) released with the paper LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
- LayoutXLM (from Microsoft Research Asia) released with the paper LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
- LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
- Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
- LUKE (from Studio Ousia) released with the paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
- LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
- M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
- MarianMT Machine translation models trained using OPUS data by Jรถrg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
- MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
- MBart-50 (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
- Megatron-BERT (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
- Megatron-GPT2 (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
- mLUKE (from Studio Ousia) released with the paper mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
- MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
- MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
- Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
- Perceiver IO (from Deepmind) released with the paper Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hรฉnaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, Joรฃo Carreira.
- PhoBERT (from VinAI Research) released with the paper PhoBERT: Pre-trained language models for Vietnamese by Dat Quoc Nguyen and Anh Tuan Nguyen.
- ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
- QDQBert (from NVIDIA) released with the paper Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
- Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, ลukasz Kaiser, Anselm Levskaya.
- RemBERT (from Google Research) released with the paper Rethinking embedding coupling in pre-trained language models by Hyung Won Chung, Thibault Fรฉvry, Henry Tsai, M. Johnson, Sebastian Ruder.
- RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
- RoFormer (from ZhuiyiTechnology), released together with the paper a RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
- SegFormer (from NVIDIA) released with the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
- SEW (from ASAPP) released with the paper Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
- SEW-D (from ASAPP) released with the paper Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
- SpeechToTextTransformer (from Facebook), released together with the paper fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
- SpeechToTextTransformer2 (from Facebook), released together with the paper Large-Scale Self- and Semi-Supervised Learning for Speech Translation by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
- Splinter (from Tel Aviv University), released together with the paper Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
- SqueezeBert (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
- T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
- T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
- TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweล Krzysztof Nowak, Thomas Mรผller, Francesco Piccinno and Julian Martin Eisenschlos.
- Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
- TrOCR (from Microsoft), released together with the paper TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
- UniSpeech (from Microsoft Research) released with the paper UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
- UniSpeechSat (from Microsoft Research) released with the paper UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
- Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
- VisualBERT (from UCLA NLP) released with the paper VisualBERT: A Simple and Performant Baseline for Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
- Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
- Wav2Vec2Phoneme (from Facebook AI) released with the paper Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli.
- WavLM (from Microsoft Research) released with the paper WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
- XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
- XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
- XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmรกn, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
- XLNet (from Google/CMU) released with the paper โXLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- XLS-R (from Facebook AI) released with the paper XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
- XLSR-Wav2Vec2 (from Facebook AI) released with the paper Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
- ์๋ก์ด ๋ชจ๋ธ์ ์ฌ๋ฆฌ๊ณ ์ถ๋์? ์ฐ๋ฆฌ๊ฐ ์์ธํ ๊ฐ์ด๋์ ํ
ํ๋ฆฟ ์ผ๋ก ์๋ก์ด ๋ชจ๋ธ์ ์ฌ๋ฆฌ๋๋ก ๋์๋๋ฆด๊ฒ์. ๊ฐ์ด๋์ ํ
ํ๋ฆฟ์ ์ด ์ ์ฅ์์
templates
ํด๋์์ ํ์ธํ์ค ์ ์์ต๋๋ค. ์ปจํธ๋ฆฌ๋ทฐ์ ๊ฐ์ด๋๋ผ์ธ์ ๊ผญ ํ์ธํด์ฃผ์๊ณ , PR์ ์ฌ๋ฆฌ๊ธฐ ์ ์ ๋ฉ์ธํ ์ด๋์๊ฒ ์ฐ๋ฝํ๊ฑฐ๋ ์ด์๋ฅผ ์คํํด ํผ๋๋ฐฑ์ ๋ฐ์ผ์๊ธธ ๋ฐ๋๋๋ค.
๊ฐ ๋ชจ๋ธ์ด Flax, PyTorch, TensorFlow์ผ๋ก ๊ตฌํ๋์๋์ง ๋๋ ๐ค Tokenizers ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ง์ํ๋ ํ ํฌ๋์ด์ ๋ฅผ ์ฌ์ฉํ๋์ง ํ์ธํ๋ ค๋ฉด, ์ด ํ๋ฅผ ํ์ธํ์ธ์.
์ด ๊ตฌํ์ ์ฌ๋ฌ ๋ฐ์ดํฐ๋ก ๊ฒ์ฆ๋์๊ณ (์์ ์คํฌ๋ฆฝํธ๋ฅผ ์ฐธ๊ณ ํ์ธ์) ์ค๋ฆฌ์ง๋ ๊ตฌํ์ ์ฑ๋ฅ๊ณผ ๊ฐ์์ผ ํฉ๋๋ค. ๋ํ๋จผํธ์ Examples ์น์ ์์ ์ฑ๋ฅ์ ๋ํ ์์ธํ ์ค๋ช ์ ํ์ธํ ์ ์์ต๋๋ค.
์น์ | ์ค๋ช |
---|---|
๋ํ๋จผํธ | ์ ์ฒด API ๋ํ๋จผํธ์ ํํ ๋ฆฌ์ผ |
๊ณผ์ ์์ฝ | ๐ค Transformers๊ฐ ์ง์ํ๋ ๊ณผ์ ๋ค |
์ ์ฒ๋ฆฌ ํํ ๋ฆฌ์ผ | Tokenizer ํด๋์ค๋ฅผ ์ด์ฉํด ๋ชจ๋ธ์ ์ํ ๋ฐ์ดํฐ ์ค๋นํ๊ธฐ |
ํ์ต๊ณผ fine-tuning | ๐ค Transformers๊ฐ ์ ๊ณตํ๋ ๋ชจ๋ธ PyTorch/TensorFlow ํ์ต ๊ณผ์ ๊ณผ Trainer API์์ ์ฌ์ฉํ๊ธฐ |
ํต ํฌ์ด: Fine-tuning/์ฌ์ฉ ์คํฌ๋ฆฝํธ | ๋ค์ํ ๊ณผ์ ์์ ๋ชจ๋ธ fine-tuningํ๋ ์์ ์คํฌ๋ฆฝํธ |
๋ชจ๋ธ ๊ณต์ ๋ฐ ์ ๋ก๋ | ์ปค๋ฎค๋ํฐ์ fine-tune๋ ๋ชจ๋ธ์ ์ ๋ก๋ ๋ฐ ๊ณต์ ํ๊ธฐ |
๋ง์ด๊ทธ๋ ์ด์ | pytorch-transformers ๋ pytorch-pretrained-bert ์์ ๐ค Transformers๋ก ์ด๋ํ๊ธฐ |
๐ค Transformers ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ธ์ฉํ๊ณ ์ถ๋ค๋ฉด, ์ด ๋ ผ๋ฌธ์ ์ธ์ฉํด ์ฃผ์ธ์:
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rรฉmi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}