You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the alignments are invalid (say, because you concatenate the alignments with an untokenized version of the input corpus), Thrax never really tells you, it will just barf at some point down the road.
Thrax should do a validation pass to make sure all the alignments are sensible.
The text was updated successfully, but these errors were encountered:
More generally, the input should be validated. You can pass a totally bogus file in as the thrax input file, and only learn about it deep in the pipeline. For example, The following input file will only cause cryptic errors:
¿ aló ? hello ? 1-0 0-1 2-1
hola . hello . 0-0 1-1
¿ con quién hablo ? with whom am i speaking ? 1-0 2-1 3-2 3-3 3-4 0-5 4-5
eh , silvia . sí , ¿ cómo se llama ? eh , silvia , yes . what is your name ? 0-0 1-1 2-2 3-3 4-4 5-5 6-6 7-6 8-7 9-8 9-9 10-10
hola , silvia . eh , yo me llamo nicole . hello silvia , eh , my name is nicole . 0-0 2-1 3-2 4-3 5-4 6-5 7-6 8-6 8-7 9-8 10-9
ah , mucho gusto . ah , nice to meet you . 0-0 1-1 2-2 3-3 3-4 3-5 4-6
mucho gusto . em , ¿ y dónde está usted ? nice to meet you . em , and where are you from ? 0-0 1-0 1-1 1-2 1-3 2-4 3-5 4-6 6-7 7-8 8-9 9-10 9-11 5-12 10-12
n- eh , yo estoy en filadelfia . eh , i 'm in philadelphia . 0-0 1-0 2-1 3-2 4-3 5-4 6-5 7-6
(Note that there should be |||s instead of tabs separating the fields.)
If the alignments are invalid (say, because you concatenate the alignments with an untokenized version of the input corpus), Thrax never really tells you, it will just barf at some point down the road.
Thrax should do a validation pass to make sure all the alignments are sensible.
The text was updated successfully, but these errors were encountered: