Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate alignments #7

Open
mjpost opened this issue Mar 13, 2015 · 1 comment
Open

Validate alignments #7

mjpost opened this issue Mar 13, 2015 · 1 comment

Comments

@mjpost
Copy link
Member

mjpost commented Mar 13, 2015

If the alignments are invalid (say, because you concatenate the alignments with an untokenized version of the input corpus), Thrax never really tells you, it will just barf at some point down the road.

Thrax should do a validation pass to make sure all the alignments are sensible.

@mjpost
Copy link
Member Author

mjpost commented Jan 19, 2016

More generally, the input should be validated. You can pass a totally bogus file in as the thrax input file, and only learn about it deep in the pipeline. For example, The following input file will only cause cryptic errors:

¿ aló ? hello ? 1-0 0-1 2-1
hola .  hello . 0-0 1-1
¿ con quién hablo ?     with whom am i speaking ?       1-0 2-1 3-2 3-3 3-4 0-5 4-5
eh , silvia . sí , ¿ cómo se llama ?    eh , silvia , yes . what is your name ? 0-0 1-1 2-2 3-3 4-4 5-5 6-6 7-6 8-7 9-8 9-9 10-10
hola , silvia . eh , yo me llamo nicole .       hello silvia , eh , my name is nicole . 0-0 2-1 3-2 4-3 5-4 6-5 7-6 8-6 8-7 9-8 10-9
ah , mucho gusto .      ah , nice to meet you . 0-0 1-1 2-2 3-3 3-4 3-5 4-6
mucho gusto . em , ¿ y dónde está usted ?       nice to meet you . em , and where are you from ?    0-0 1-0 1-1 1-2 1-3 2-4 3-5 4-6 6-7 7-8 8-9 9-10 9-11 5-12 10-12
n- eh , yo estoy en filadelfia .        eh , i 'm in philadelphia .     0-0 1-0 2-1 3-2 4-3 5-4 6-5 7-6

(Note that there should be |||s instead of tabs separating the fields.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant