You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After training on a seperate machine we got some promising results, and we are now looking to move our model into production. However we encounter an issue. Downloading missing files and verifying the model like this:
# First we download any missing files and verify the pipeline
import trankit
# Download any missing files
trankit.download_missing_files(
category='customized-mwt-ner',
save_dir='./trankit_model',
embedding_name='xlm-roberta-base',
language='dutch'
)
# Verify the pipeline
trankit.verify_customized_pipeline(
category='customized-mwt-ner', # pipeline category
save_dir='./trankit_model', # directory used for saving models in previous steps
embedding_name='xlm-roberta-base' # embedding version that we use for training our customized pipeline, by default, it is `xlm-roberta-base`
)
Leads to the following output and error:
Missing ./trankit_model/xlm-roberta-base/customized-mwt-ner/customized-mwt-ner_mwt_expander.pt
Missing ./trankit_model/xlm-roberta-base/customized-mwt-ner/customized-mwt-ner_lemmatizer.pt
Missing ./trankit_model/xlm-roberta-base/customized-mwt-ner/customized-mwt-ner.ner.mdl
Missing ./trankit_model/xlm-roberta-base/customized-mwt-ner/customized-mwt-ner.ner-vocab.json
http://nlp.uoregon.edu/download/trankit/v1.0.0/xlm-roberta-base/dutch.zip
Downloading: 100%|██████████| 46.3M/46.3M [01:07<00:00, 682kiB/s]
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[7], line 6
3 import trankit
5 # Download any missing files
----> 6 trankit.download_missing_files(
7 category='customized-mwt-ner',
8 save_dir='./trankit_model',
9 embedding_name='xlm-roberta-base',
10 language='dutch'
11 )
13 # Verify the pipeline
14 trankit.verify_customized_pipeline(
15 category='customized-mwt-ner', # pipeline category
16 save_dir='./trankit_model', # directory used for saving models in previous steps
17 embedding_name='xlm-roberta-base' # embedding version that we use for training our customized pipeline, by default, it is `xlm-roberta-base`
18 )
File ~/Projects/UDParserEvaluation/venv/lib/python3.10/site-packages/trankit/__init__.py:71, in download_missing_files(category, save_dir, embedding_name, language)
69 tgt_dir = os.path.join(save_dir, embedding_name, category)
70 for fname in missing_filenamess:
---> 71 copyfile(os.path.join(src_dir, fname.format(language)), os.path.join(tgt_dir, fname.format(category)))
72 print('Copying {} to {}'.format(
73 os.path.join(src_dir, fname.format(language)),
74 os.path.join(tgt_dir, fname.format(category))
75 ))
76 remove_with_path(src_dir)
File /usr/lib/python3.10/shutil.py:254, in copyfile(src, dst, follow_symlinks)
252 os.symlink(os.readlink(src), dst)
253 else:
--> 254 with open(src, 'rb') as fsrc:
255 try:
256 with open(dst, 'wb') as fdst:
257 # macOS
FileNotFoundError: [Errno 2] No such file or directory: './trankit_model/xlm-roberta-base/dutch/dutch_mwt_expander.pt'
No file named *_mwt_expander.pt seems to be present.
import trankit
# initialize a trainer for the task
trainer = trankit.TPipeline(
training_config={
'category': 'customized-mwt-ner', # pipeline category
'task': 'posdep', # task name
'save_dir': './trankit_model', # directory for saving trained model
'train_conllu_fpath': './corpus/split-conllu/train.conllu', # annotations file in CONLLU format for training
'dev_conllu_fpath': './corpus/split-conllu/dev.conllu' # annotations file in CONLLU format for development
}
)
# start training
trainer.train()
The text was updated successfully, but these errors were encountered:
For now, we have chosen to run a model of the "customized" type instead of the "customized-mwt-ner" type. For "customized" all missing files seem to be downloaded correctly.
After training on a seperate machine we got some promising results, and we are now looking to move our model into production. However we encounter an issue. Downloading missing files and verifying the model like this:
Leads to the following output and error:
No file named *_mwt_expander.pt seems to be present.
I tried to download a few zips from http://nlp.uoregon.edu/download/trankit/ and it's subfolders, but no luck finding any mwt_expander.
Am I missing something?
The model was trained like this:
The text was updated successfully, but these errors were encountered: