You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I was trying to train syntactic HMM on my data. My training data contains 10050
parallel sentences with parsed target trees.
wc output of my training data
-------------------------------
10050 284765 1599230 corpus.en
10050 804959 4284275 corpus.entrees
10050 228873 5058993 corpus.ta
30150 1318597 10942498 total
When I run the alignment, the logfile indicate that there are only 9811
sentences read instead of 10050. Here is what I am seeing in the logfile.
Eventually after the training, I am seeing alignment only for 9811 sentences.
PS: I don't have any testing data. My test data directories are empty. I have
attached my config file too.
main() {
Execution directory: en-ta/alignment_models/berkeley/lc_tok_10000_S
Preparing Training Data
Unknown number of training, 0 test
Training models: 2 stages {
Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations {
Initializing forward model [7.9s, cum. 7.9s]
Initializing reverse model [5.2s, cum. 13s]
Joint Train: 9811 sentences, jointly {
Iteration 1/5 {
Sentence 1/9811
Sentence 2/9811
Sentence 3/9811
Sentence 169/9811
Sentence 3304/9811
Sentence 7650/9811
Log-likelihood 1 = -1337616.882
Log-likelihood 2 = -1336443.902
... 9805 lines omitted ...
} [20s, cum. 20s]
pls, let me know if I am missing something.
Original issue reported on code.google.com by [email protected] on 2 Aug 2013 at 10:10
Original issue reported on code.google.com by
[email protected]
on 2 Aug 2013 at 10:10Attachments:
The text was updated successfully, but these errors were encountered: