You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Model training quits after epoch 1 with a "learning rate too small - quitting training!" error message even though the "patience" parameter is set to 10.
To Reproduce
InGoogleColab:
!pipinstallflair-qqimportosfromosimportmkdir, listdirfromos.pathimportjoin, existsimportrefromtorch.optim.adamimportAdamfromflair.datasetsimportCSVClassificationCorpusfromflair.dataimportCorpus, Sentencefromflair.embeddingsimportTransformerDocumentEmbeddingsfromflair.modelsimportTextClassifierfromflair.trainersimportModelTrainerforembeddingin ["distilbert-base-uncased"]:
print("Training on", embedding)
# 1a. define the column format indicating which columns contain the text and labelscolumn_name_map= {1: "text", 2: "label"}
# 1b. load the preprocessed training, development, and test setscorpus: Corpus=CSVClassificationCorpus(processed_dir,
column_name_map,
label_type="label",
skip_header=True,
delimiter='\t')
# 2. create the label dictionarylabel_dict=corpus.make_label_dictionary(label_type="label")
# 3. initialize the transformer document embeddingsdocument_embeddings=TransformerDocumentEmbeddings(embedding,
fine_tune=True,
layers="all")
#document_embeddings.tokenizer.pad_token = document_embeddings.tokenizer.eos_token# 4. create the text classifierclassifier=TextClassifier(document_embeddings,
label_dictionary=label_dict,
label_type="label")
# 5. initialize the trainertrainer=ModelTrainer(classifier,
corpus)
# 6. start the trainingtrainer.train('model/'+embedding,
learning_rate=1e-5,
mini_batch_size=8,
max_epochs=3,
patience=10,
optimizer=Adam,
train_with_dev=False,
save_final_model=False
)
Expected behavior
In this case, the model should be trained for 3 epochs without reducing the learning rate. In prior cases, even when a learning rate of 1e-5 was reduced by an anneal factor of 0.5, I did not receive a "learning rate too small - quitting training!" error message.
Logs and Stack traces
2024-03-18 14:11:51,783 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,786 Model: "TextClassifier(
(embeddings): TransformerDocumentEmbeddings(
(model): DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30523, 768)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(
(layer): ModuleList(
(0-5): 6 x TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
(activation): GELUActivation()
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
)
)
)
)
(decoder): Linear(in_features=5376, out_features=2, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(locked_dropout): LockedDropout(p=0.0)
(word_dropout): WordDropout(p=0.0)
(loss_function): CrossEntropyLoss()
(weights): None
(weight_tensor) None
)"
2024-03-18 14:11:51,787 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,789 Corpus: 8800 train + 2200 dev + 2200 test sentences
2024-03-18 14:11:51,793 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,794 Train: 8800 sentences
2024-03-18 14:11:51,795 (train_with_dev=False, train_with_test=False)
2024-03-18 14:11:51,799 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,802 Training Params:
2024-03-18 14:11:51,804 - learning_rate: "1e-05"
2024-03-18 14:11:51,806 - mini_batch_size: "8"
2024-03-18 14:11:51,807 - max_epochs: "3"
2024-03-18 14:11:51,812 - shuffle: "True"
2024-03-18 14:11:51,813 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,814 Plugins:
2024-03-18 14:11:51,816 - AnnealOnPlateau | patience: '10', anneal_factor: '0.5', min_learning_rate: '0.0001'
2024-03-18 14:11:51,817 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,818 Final evaluation on model from best epoch (best-model.pt)
2024-03-18 14:11:51,820 - metric: "('micro avg', 'f1-score')"
2024-03-18 14:11:51,821 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,823 Computation:
2024-03-18 14:11:51,825 - compute on device: cuda:0
2024-03-18 14:11:51,835 - embedding storage: cpu
2024-03-18 14:11:51,836 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,837 Model training base path: "model/distilbert-base-uncased"
2024-03-18 14:11:51,840 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:51,846 ----------------------------------------------------------------------------------------------------
2024-03-18 14:11:55,845 epoch 1 - iter 110/1100 - loss 0.57600509 - time (sec): 4.00 - samples/sec: 220.19 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:11:58,978 epoch 1 - iter 220/1100 - loss 0.50393908 - time (sec): 7.13 - samples/sec: 246.84 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:01,876 epoch 1 - iter 330/1100 - loss 0.46954644 - time (sec): 10.03 - samples/sec: 263.27 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:05,276 epoch 1 - iter 440/1100 - loss 0.44181235 - time (sec): 13.43 - samples/sec: 262.14 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:08,456 epoch 1 - iter 550/1100 - loss 0.41807515 - time (sec): 16.61 - samples/sec: 264.93 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:11,447 epoch 1 - iter 660/1100 - loss 0.40403758 - time (sec): 19.60 - samples/sec: 269.41 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:14,420 epoch 1 - iter 770/1100 - loss 0.38948912 - time (sec): 22.57 - samples/sec: 272.91 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:17,914 epoch 1 - iter 880/1100 - loss 0.38118810 - time (sec): 26.07 - samples/sec: 270.09 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:21,085 epoch 1 - iter 990/1100 - loss 0.37110791 - time (sec): 29.24 - samples/sec: 270.89 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:24,027 epoch 1 - iter 1100/1100 - loss 0.36139164 - time (sec): 32.18 - samples/sec: 273.47 - lr: 0.000010 - momentum: 0.000000
2024-03-18 14:12:24,030 ----------------------------------------------------------------------------------------------------
2024-03-18 14:12:24,032 EPOCH 1 done: loss 0.3614 - lr: 0.000010
2024-03-18 14:12:28,158 DEV : loss 0.28874295949935913 - f1-score (micro avg) 0.9095
2024-03-18 14:12:29,719 - 0 epochs without improvement
2024-03-18 14:12:29,721 ----------------------------------------------------------------------------------------------------
2024-03-18 14:12:29,723 learning rate too small - quitting training!
2024-03-18 14:12:29,725 ----------------------------------------------------------------------------------------------------
2024-03-18 14:12:29,727 Done.
2024-03-18 14:12:29,729 ----------------------------------------------------------------------------------------------------
2024-03-18 14:12:29,733 Testing using last state of model ...
2024-03-18 14:12:33,651
Results:
- F-score (micro) 0.9132
- F-score (macro) 0.9029
- Accuracy 0.9132
By class:
precision recall f1-score support
0 0.9184 0.9511 0.9345 1432
1 0.9024 0.8424 0.8714 768
accuracy 0.9132 2200
macro avg 0.9104 0.8968 0.9029 2200
weighted avg 0.9128 0.9132 0.9125 2200
2024-03-18 14:12:33,653 ----------------------------------------------------------------------------------------------------
Screenshots
No response
Additional Context
No response
Environment
Versions:
Flair
0.13.1
Pytorch
2.2.1+cu121
Transformers
4.38.2
GPU
True
The text was updated successfully, but these errors were encountered:
I figured out what the issue was. It looks like a "min_learning_rate" parameter was added as a default since I last used Flair, and its default value (0.0001) was greater than my learning rate (0.00001).
Describe the bug
Model training quits after epoch 1 with a "learning rate too small - quitting training!" error message even though the "patience" parameter is set to 10.
To Reproduce
Expected behavior
In this case, the model should be trained for 3 epochs without reducing the learning rate. In prior cases, even when a learning rate of 1e-5 was reduced by an anneal factor of 0.5, I did not receive a "learning rate too small - quitting training!" error message.
Logs and Stack traces
Screenshots
No response
Additional Context
No response
Environment
Versions:
Flair
0.13.1
Pytorch
2.2.1+cu121
Transformers
4.38.2
GPU
True
The text was updated successfully, but these errors were encountered: