You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At first, I encountered this error:
(base) [[email protected] slurm_stderr]$ cat slurm-12831381.out
thread '' panicked at /home/runner/work/tokenizers/tokenizers/tokenizers/src/models/unigram/trainer.rs:228:53:
called Result::unwrap() on an Err value: Internal
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Traceback (most recent call last):
File "/home/shfa523g/Chi_Internship/tokenizers_scripts/generate_tokenizers_parallel.py", line 90, in
tokenizer.train_from_iterator(iterator=all_seqs, trainer=trainer)
pyo3_runtime.PanicException: called Result::unwrap() on an Err value: Internal
Then I found a similar issue description and did as what it said, the issue is here: #821 (comment)
After I tried what it says, it gives me a new issue:
(base) [[email protected] slurm_stderr]$ cat slurm-12861633.out
thread '' panicked at /home/shfa523g/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esaxx-rs-0.1.10/src/esa.rs:70:50:
called Result::unwrap() on an Err value: TryFromIntError(())
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Traceback (most recent call last):
File "/home/shfa523g/Chi_Internship/tokenizers_scripts/generate_tokenizers_parallel.py", line 90, in
tokenizer.train_from_iterator(iterator=all_seqs, trainer=trainer)
pyo3_runtime.PanicException: called Result::unwrap() on an Err value: TryFromIntError(())
I asked ChatGPT, it says it might be the sequences it took are too long, then I reduce it to a very small number but the same error keeps happening.
Please take a look, thank you very much!
The text was updated successfully, but these errors were encountered:
Update:
The sequence number I reduced is the chunk size I put in, the total amount of length didn't change.
The reason I mention this is because when I reduce the whole dataset from 24 chromosomes into 1 single chromosome, it works and generate correct output.
So the problem now is: How can I train the Unigram tokenizer on a large scale dataset (whole 24 chromosomes)? Please help me.
Hi, I was trying to train a Unigram tokenizer with DNA sequence data. And this is the code I use to train:
At first, I encountered this error:
(base) [[email protected] slurm_stderr]$ cat slurm-12831381.out
thread '' panicked at /home/runner/work/tokenizers/tokenizers/tokenizers/src/models/unigram/trainer.rs:228:53:
called
Result::unwrap()
on anErr
value: Internalnote: run with
RUST_BACKTRACE=1
environment variable to display a backtraceTraceback (most recent call last):
File "/home/shfa523g/Chi_Internship/tokenizers_scripts/generate_tokenizers_parallel.py", line 90, in
tokenizer.train_from_iterator(iterator=all_seqs, trainer=trainer)
pyo3_runtime.PanicException: called
Result::unwrap()
on anErr
value: InternalThen I found a similar issue description and did as what it said, the issue is here:
#821 (comment)
After I tried what it says, it gives me a new issue:
(base) [[email protected] slurm_stderr]$ cat slurm-12861633.out
thread '' panicked at /home/shfa523g/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esaxx-rs-0.1.10/src/esa.rs:70:50:
called
Result::unwrap()
on anErr
value: TryFromIntError(())note: run with
RUST_BACKTRACE=1
environment variable to display a backtraceTraceback (most recent call last):
File "/home/shfa523g/Chi_Internship/tokenizers_scripts/generate_tokenizers_parallel.py", line 90, in
tokenizer.train_from_iterator(iterator=all_seqs, trainer=trainer)
pyo3_runtime.PanicException: called
Result::unwrap()
on anErr
value: TryFromIntError(())I asked ChatGPT, it says it might be the sequences it took are too long, then I reduce it to a very small number but the same error keeps happening.
Please take a look, thank you very much!
The text was updated successfully, but these errors were encountered: