Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocab can't be loaded from SentencePiece model #12

Closed
alvations opened this issue Mar 4, 2019 · 1 comment
Closed

Vocab can't be loaded from SentencePiece model #12

alvations opened this issue Mar 4, 2019 · 1 comment

Comments

@alvations
Copy link

After training like in https://github.com/marian-nmt/marian-examples/tree/master/training-basics-sentencepiece , the marian-decoder is throwing error when decoding:

~/marian/build/marian-decoder -c model.npz.decoder.yml 
[2019-03-04 09:22:36] [config] alignment: 0
[2019-03-04 09:22:36] [config] allow-unk: false
[2019-03-04 09:22:36] [config] beam-size: 6
[2019-03-04 09:22:36] [config] best-deep: false
[2019-03-04 09:22:36] [config] clip-gemm: 0
[2019-03-04 09:22:36] [config] cpu-threads: 0
[2019-03-04 09:22:36] [config] dec-cell: gru
[2019-03-04 09:22:36] [config] dec-cell-base-depth: 2
[2019-03-04 09:22:36] [config] dec-cell-high-depth: 1
[2019-03-04 09:22:36] [config] dec-depth: 6
[2019-03-04 09:22:36] [config] devices:
[2019-03-04 09:22:36] [config]   - 0
[2019-03-04 09:22:36] [config] dim-emb: 1024
[2019-03-04 09:22:36] [config] dim-rnn: 1024
[2019-03-04 09:22:36] [config] dim-vocabs:
[2019-03-04 09:22:36] [config]   - 32000
[2019-03-04 09:22:36] [config]   - 32000
[2019-03-04 09:22:36] [config] enc-cell: gru
[2019-03-04 09:22:36] [config] enc-cell-depth: 1
[2019-03-04 09:22:36] [config] enc-depth: 6
[2019-03-04 09:22:36] [config] enc-type: bidirectional
[2019-03-04 09:22:36] [config] ignore-model-config: false
[2019-03-04 09:22:36] [config] input:
[2019-03-04 09:22:36] [config]   - stdin
[2019-03-04 09:22:36] [config] interpolate-env-vars: false
[2019-03-04 09:22:36] [config] layer-normalization: false
[2019-03-04 09:22:36] [config] log-level: info
[2019-03-04 09:22:36] [config] max-length: 1000
[2019-03-04 09:22:36] [config] max-length-crop: false
[2019-03-04 09:22:36] [config] max-length-factor: 3
[2019-03-04 09:22:36] [config] maxi-batch: 100
[2019-03-04 09:22:36] [config] maxi-batch-sort: src
[2019-03-04 09:22:36] [config] mini-batch: 16
[2019-03-04 09:22:36] [config] mini-batch-words: 0
[2019-03-04 09:22:36] [config] models:
[2019-03-04 09:22:36] [config]   - /disk2/models/ja-en/model.npz
[2019-03-04 09:22:36] [config] n-best: false
[2019-03-04 09:22:36] [config] normalize: 0.6
[2019-03-04 09:22:36] [config] optimize: false
[2019-03-04 09:22:36] [config] port: 8080
[2019-03-04 09:22:36] [config] quiet: false
[2019-03-04 09:22:36] [config] quiet-translation: false
[2019-03-04 09:22:36] [config] relative-paths: false
[2019-03-04 09:22:36] [config] right-left: false
[2019-03-04 09:22:36] [config] seed: 0
[2019-03-04 09:22:36] [config] skip: false
[2019-03-04 09:22:36] [config] skip-cost: false
[2019-03-04 09:22:36] [config] tied-embeddings: false
[2019-03-04 09:22:36] [config] tied-embeddings-all: true
[2019-03-04 09:22:36] [config] tied-embeddings-src: false
[2019-03-04 09:22:36] [config] transformer-aan-activation: swish
[2019-03-04 09:22:36] [config] transformer-aan-depth: 2
[2019-03-04 09:22:36] [config] transformer-aan-nogate: false
[2019-03-04 09:22:36] [config] transformer-decoder-autoreg: self-attention
[2019-03-04 09:22:36] [config] transformer-dim-aan: 2048
[2019-03-04 09:22:36] [config] transformer-dim-ffn: 4096
[2019-03-04 09:22:36] [config] transformer-ffn-activation: swish
[2019-03-04 09:22:36] [config] transformer-ffn-depth: 2
[2019-03-04 09:22:36] [config] transformer-guided-alignment-layer: last
[2019-03-04 09:22:36] [config] transformer-heads: 8
[2019-03-04 09:22:36] [config] transformer-no-projection: false
[2019-03-04 09:22:36] [config] transformer-postprocess: da
[2019-03-04 09:22:36] [config] transformer-postprocess-emb: d
[2019-03-04 09:22:36] [config] transformer-preprocess: n
[2019-03-04 09:22:36] [config] transformer-tied-layers:
[2019-03-04 09:22:36] [config]   []
[2019-03-04 09:22:36] [config] type: transformer
[2019-03-04 09:22:36] [config] version: v1.7.6 9cc5b176 2018-12-14 15:11:34 -0800
[2019-03-04 09:22:36] [config] vocabs:
[2019-03-04 09:22:36] [config]   - /disk2/models/ja-en/vocab.src.spm
[2019-03-04 09:22:36] [config]   - /disk2/models/ja-en/vocab.trg.spm
[2019-03-04 09:22:36] [config] word-penalty: 0
[2019-03-04 09:22:36] [config] workspace: 512
[2019-03-04 09:22:36] [config] Model created with Marian v1.7.6 9cc5b176 2018-12-14 15:11:34 -0800
[2019-03-04 09:22:36] [data] Loading vocabulary from text file /disk2/models/ja-en/vocab.src.spm
[2019-03-04 09:22:36] Vocabulary file /disk2/models/ja-en/vocab.src.spm must not contain empty lines
Aborted from int marian::Vocab::load(const string&, int) in /home/ltan/marian/src/marian/src/data/vocab.cpp: 117

My config file looks like this:

$ cat model.npz.decoder.yml 
models:
  - /disk2/models/ja-en/model.npz
vocabs:
  - /disk2/models/ja-en/vocab.src.spm
  - /disk2/models/ja-en/vocab.trg.spm
beam-size: 6
normalize: 0.6
word-penalty: 0
mini-batch: 16
maxi-batch: 100
maxi-batch-sort: src
relative-paths: false

Is there a special argument that needs to be used when using sentence piece as the tokenizer when decoding?

@alvations
Copy link
Author

It's strange. Somehow I recompiled the binary and it works. Although the same version of the binary was compiled, version: v1.7.6 9cc5b176. At least it works now =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant