Skip to content

Commit

Permalink
Expand tutorial 2
Browse files Browse the repository at this point in the history
  • Loading branch information
alanakbik committed Dec 7, 2024
1 parent fa07cb3 commit 22cc158
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 4 deletions.
3 changes: 3 additions & 0 deletions docs/tutorial/tutorial-training/how-to-load-custom-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,3 +159,6 @@ example we chose `label_type='topic'` to denote that we are loading a corpus wit



## Next

Next, learn [how to train a sequence tagger](how-to-train-sequence-tagger.md).
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,7 @@ The following datasets are supported:
| Universal Dependency Treebanks | [flair.datasets.treebanks](#flair.datasets.treebanks) |
| OCR-Layout-NER | [flair.datasets.ocr](#flair.datasets.ocr) |


## Next

Next, learn how to load a [custom dataset](how-to-load-custom-dataset.md).
Original file line number Diff line number Diff line change
Expand Up @@ -223,3 +223,6 @@ trainer.train('resources/taggers/example-universal-pos',
This gives you a multilingual model. Try experimenting with more languages!


## Next

Next, learn [how to train a text classifier](how-to-train-text-classifier.md).
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,7 @@ classifier.predict(sentence)
print(sentence.labels)
```


## Next

Next, learn [how to train an entity linker](how-to-train-span-classifier.md).
22 changes: 18 additions & 4 deletions docs/tutorial/tutorial-training/train-vs-fine-tune.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,37 @@ Since in this case, the vast majority of parameters in the model is already trai
model. This means: Very small learning rate (LR) and just a few epochs. You are essentially just minimally modifying
the model to adapt it to the task you want to solve.

Most models in Flair are trained using fine-tuning. So this is likely the approach you'll want to use.
Use this method by calling [`ModelTrainer.fine_tune()`](#flair.trainers.ModelTrainer.fine_tune).
Since most models in Flair were trained this way, this is likely the approach you'll want to use.


## Training

On the other hand, you should use the classic training approach if the majority of the trainable parameters in your
model is randomly initialized. This is essentially the "old way", before fine-tuning of transformers.

model is randomly initialized. This can happen for instance if you freeze the model weights of the pre-trained language
model, leaving only the randomly initialited prediction head as trainable parameters. This training approach is also
referred to as "feature-based" or "probing" in some papers.

Since the majority of parameters is randomly initialized, you need to fully train the model. This means: high learning
rate and many epochs.

Use this method by calling [`ModelTrainer.train()`](#flair.trainers.ModelTrainer.train) .

```{note}
Another application of classic training is for linear probing of pre-trained language models. In this scenario, you
"freeze" the weights of the language model (meaning that they cannot be changed) and add a prediction head that is
trained from scratch. So, even though a language model is involved, its parameters are not trainable. This means that
all trainable parameters in this scenario are randomly initialized, therefore necessitating the use of the classic
training approach.
```


## Paper

Our paper
If you are interested in an experimental comparison of the two above-mentioned approach, check out [our paper](https://arxiv.org/pdf/2011.06993)
that compares fine-tuning to the feature-based approach.


## Next

Next, learn how to load a [training dataset](how-to-load-prepared-dataset.md).

0 comments on commit 22cc158

Please sign in to comment.