You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>> from tokenizer import split_into_sentences, detokenize, tokenize, correct_spaces
# En dash and detokenize
>>> sent = 'Hamarinn dugir – og meira en það.'
>>> detokenize(tokenize(sent))
# Expected output: 'Hamarinn dugir – og meira en það.'
# Output: 'Hamarinn dugir–og meira en það.'
# En dash and correct_spaces
>>> s = list(split_into_sentences(sent))[0]
>>> correct_spaces(s)
# Expected output: 'Hamarinn dugir – og meira en það.'
# Output: 'Hamarinn dugir–og meira en það.'
# Hyphen and detokenize
>>> sent = 'Hamarinn dugir - og meira en það.'
>>> detokenize(tokenize(sent))
# Expected output: 'Hamarinn dugir - og meira en það.'
# Output: 'Hamarinn dugir-og meira en það.'
# Hyphen and correct_spaces
>>> s = list(split_into_sentences(sent))[0]
>>> correct_spaces(s)
# Expected output: 'Hamarinn dugir - og meira en það.'
# Output: 'Hamarinn dugir- og meira en það.'
The text was updated successfully, but these errors were encountered:
Using the newest version of Tokenizer, 3.4.5:
The text was updated successfully, but these errors were encountered: