Skip to content

Commit

Permalink
Update sentencepiece IO test
Browse files Browse the repository at this point in the history
- Switch to tiny testing model to reduce memory usage
- Use slow tokenizer to test sentencepiece requirement
- Add sentencepiece extra to dev requirements
  • Loading branch information
adrianeboyd committed Oct 11, 2023
1 parent f0b475d commit 2bf4a9b
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
spacy>=3.5.0,<4.0.0
numpy>=1.15.0
transformers>=3.4.0,<4.35.0
transformers[sentencepiece]>=3.4.0,<4.35.0
torch>=1.8.0
srsly>=2.4.0,<3.0.0
dataclasses>=0.6,<1.0; python_version < "3.7"
Expand Down
3 changes: 2 additions & 1 deletion spacy_transformers/tests/test_pipeline_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,8 @@ def test_transformer_pipeline_tagger_senter_listener():
def test_transformer_sentencepiece_IO():
"""Test that a transformer using sentencepiece trains + IO goes OK"""
orig_config = Config().from_str(cfg_string)
orig_config["components"]["transformer"]["model"]["name"] = "camembert-base"
orig_config["components"]["transformer"]["model"]["name"] = "hf-internal-testing/tiny-xlm-roberta"
orig_config["components"]["transformer"]["model"]["tokenizer_config"] = {"use_fast": False}
nlp = util.load_model_from_config(orig_config, auto_fill=True, validate=True)
tagger = nlp.get_pipe("tagger")
tagger_trf = tagger.model.get_ref("tok2vec").layers[0]
Expand Down

0 comments on commit 2bf4a9b

Please sign in to comment.