Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

AssertionError: Input not finite #99

Open
dikkeaap opened this issue Oct 9, 2020 · 2 comments
Open

AssertionError: Input not finite #99

dikkeaap opened this issue Oct 9, 2020 · 2 comments

Comments

@dikkeaap
Copy link

dikkeaap commented Oct 9, 2020

Hi,

we have recently succesfully trained a model for a plant species sequenced on the MinION using R9.4 flowcell. We have also sequenced the same plant species on the MinION on R10.3 flowcell and scussefully trained a model with those data.

We now have sequenced the same plant (again) on PromethION R10.4 flowcell, but are running into an error when attempting to train a model:

* Taiyaki version 5.1.0
* Platform is Linux-4.15.0-38-generic-x86_64-with-debian-buster-sid
* PyTorch version 1.2.0
* CUDA version 10.0.130 on device GeForce GTX 1080 Ti
* Command line:
* "/opt/kgapps/taiyaki/bin/train_flipflop.py resume2/model_checkpoint_00018.checkpoint mapped_reads_2.hdf5 --min_sub_batch_size 48 --outdir resume3 --lr_max 0.00160 --niteration 40000 --lr_cosine_iters 30000 --overwrite --device 0
* Started on 2020-09-25 08:40:06.741154
* Loading data from mapped_reads_2.hdf5
* Per read file MD5 62e8f6baab6b7ca1d1c046bdaed7e933
* Reads not filtered by id
* Using alphabet definition: canonical alphabet ACGT and no modified bases
* Loaded 14191 reads.
* Reading network from resume2/model_checkpoint_00018.checkpoint
* Network has 10683280 parameters.
* Loaded standard (canonical bases-only) model.
* Dumping initial model
* Sampled 100000 chunks: median(mean_dwell)=9.20, mad(mean_dwell)=0.89
* Learning rate goes like cosine from lr_max to lr_min over 30000.0 iterations.
* At start, train for 200 batches at warm-up learning rate 0.0001
* Standard loss reporting from 141 validation reads held out of training. 
* Standard loss report: chunk length = 5500 & sub-batch size = 48 for 10 sub-batches. 
* Gradient L2 norm cap will be upper 0.05 quantile of the last 100 norms.
* Training
..................................................     1 0.10118 0.10477  116.30s (164.95 ksample/s 18.39 kbase/s) lr=1.00e-04  22.8% chunks filtered
..................................................     2 0.10152 0.10432  116.79s (164.35 ksample/s 18.31 kbase/s) lr=1.00e-04  23.1% chunks filtered
..................................................     3 0.10123 0.10401  110.55s (173.69 ksample/s 19.35 kbase/s) lr=1.00e-04  22.4% chunks filtered
..................................................     4 0.10249 0.10369  117.33s (163.65 ksample/s 18.22 kbase/s) lr=1.00e-04  22.5% chunks filtered
............................Traceback (most recent call last):
  File "/opt/kgapps/taiyaki/bin/train_flipflop.py", line 4, in <module>
    __import__('pkg_resources').run_script('taiyaki==5.1.0', 'train_flipflop.py')
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1446, in run_script
    exec(code, namespace, namespace)
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 624, in <module>
    main()
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 541, in main
    mod_factor_t, calc_grads = True )
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 247, in calculate_loss
    outputs, seqs, seqlens, sharpen)
  File "taiyaki/ctc/ctc.pyx", line 88, in taiyaki.ctc.ctc.FlipFlopCRF.forward
  File "taiyaki/ctc/ctc.pyx", line 62, in taiyaki.ctc.ctc.crf_flipflop_grad
AssertionError: Input not finite

If we resume from the checkpoint, we run into the same error sometime later:

* Taiyaki version 5.1.0
* Platform is Linux-4.15.0-38-generic-x86_64-with-debian-buster-sid
* PyTorch version 1.2.0
* CUDA version 10.0.130 on device GeForce GTX 1080 Ti
* Command line:
* "/opt/kgapps/taiyaki/bin/train_flipflop.py resume2/model_checkpoint_00018.checkpoint mapped_reads_2.hdf5 --min_sub_batch_size 48 --outdir resume3 --lr_max 0.00160 --niteration 40000 --lr_cosine_iters 30000 --overwrite --device 0
* Started on 2020-09-29 08:43:32.712279
* Loading data from mapped_reads_2.hdf5
* Per read file MD5 62e8f6baab6b7ca1d1c046bdaed7e933
* Reads not filtered by id
* Using alphabet definition: canonical alphabet ACGT and no modified bases
* Loaded 14191 reads.
* Reading network from resume2/model_checkpoint_00018.checkpoint
* Network has 10683280 parameters.
* Loaded standard (canonical bases-only) model.
* Dumping initial model
* Sampled 100000 chunks: median(mean_dwell)=9.20, mad(mean_dwell)=0.89
* Learning rate goes like cosine from lr_max to lr_min over 30000.0 iterations.
* At start, train for 200 batches at warm-up learning rate 0.0001
* Standard loss reporting from 141 validation reads held out of training. 
* Standard loss report: chunk length = 5500 & sub-batch size = 48 for 10 sub-batches. 
* Gradient L2 norm cap will be upper 0.05 quantile of the last 100 norms.
* Training
..................................................     1 0.10303 0.09424  114.06s (168.34 ksample/s 18.76 kbase/s) lr=1.00e-04  23.7% chunks filtered
..................................................     2 0.10134 0.09381  115.19s (166.80 ksample/s 18.57 kbase/s) lr=1.00e-04  23.0% chunks filtered
..................................................     3 0.10066 0.09348  121.96s (157.53 ksample/s 17.53 kbase/s) lr=1.00e-04  23.4% chunks filtered
..................................................     4 0.10001 0.09326  115.78s (165.98 ksample/s 18.45 kbase/s) lr=1.00e-04  23.1% chunks filtered
..................................................     5 0.10257 0.09576  112.31s (170.96 ksample/s 19.06 kbase/s) lr=1.60e-03  22.9% chunks filtered
..................................................     6 0.10423 0.09548  112.72s (170.15 ksample/s 18.97 kbase/s) lr=1.60e-03  22.6% chunks filtered
..................................................     7 0.10399 0.09554  116.29s (165.05 ksample/s 18.44 kbase/s) lr=1.60e-03  22.5% chunks filtered
..................................................     8 0.10446 0.09625  115.58s (166.09 ksample/s 18.52 kbase/s) lr=1.60e-03  22.6% chunks filtered
..................................................     9 0.10420 0.09789  115.14s (166.58 ksample/s 18.57 kbase/s) lr=1.60e-03  22.7% chunks filtered
..................................................    10 0.10316 0.09563  111.75s (171.81 ksample/s 19.07 kbase/s) lr=1.60e-03  22.6% chunks filtered
..................................................    11 0.10285 0.09823  113.14s (169.61 ksample/s 18.91 kbase/s) lr=1.60e-03  22.6% chunks filtered
..................................................    12 0.10350 0.09618  113.01s (169.96 ksample/s 18.88 kbase/s) lr=1.60e-03  22.5% chunks filtered
..................................................    13 0.10436 0.09668  118.44s (162.04 ksample/s 18.04 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    14 0.10136 0.09609  120.23s (159.74 ksample/s 17.71 kbase/s) lr=1.60e-03  22.5% chunks filtered
..................................................    15 0.10160 0.09668  114.07s (168.49 ksample/s 18.78 kbase/s) lr=1.60e-03  22.3% chunks filtered
..................................................    16 0.10144 0.09606  121.89s (157.41 ksample/s 17.53 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    17 0.10180 0.09654  113.69s (168.87 ksample/s 18.75 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    18 0.10381 0.09578  114.91s (166.92 ksample/s 18.63 kbase/s) lr=1.60e-03  22.3% chunks filtered
..................................................    19 0.10369 0.09621  119.26s (161.01 ksample/s 17.88 kbase/s) lr=1.60e-03  22.4% chunks filtered
.................................................C    20 0.10542 0.09585  116.87s (164.28 ksample/s 18.32 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    21 0.10229 0.09613  112.86s (170.12 ksample/s 18.89 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    22 0.10319 0.09585  119.18s (161.01 ksample/s 17.88 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    23 0.10428 0.09592  112.53s (170.74 ksample/s 19.03 kbase/s) lr=1.60e-03  22.3% chunks filtered
..................................................    24 0.10430 0.09659  118.36s (162.13 ksample/s 18.06 kbase/s) lr=1.60e-03  22.3% chunks filtered
..................................................    25 0.10297 0.09839  114.13s (168.21 ksample/s 18.72 kbase/s) lr=1.60e-03  22.3% chunks filtered
..................................................    26 0.10191 0.09591  119.56s (160.64 ksample/s 17.88 kbase/s) lr=1.60e-03  22.4% chunks filtered
..................................................    27 0.10201 0.09587  119.00s (161.43 ksample/s 18.00 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    28 0.10112 0.09673  115.11s (166.81 ksample/s 18.54 kbase/s) lr=1.59e-03  22.3% chunks filtered
..................................................    29 0.10262 0.09598  116.90s (164.09 ksample/s 18.27 kbase/s) lr=1.59e-03  22.3% chunks filtered
..................................................    30 0.10112 0.09629  119.62s (160.46 ksample/s 17.91 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    31 0.10133 0.09634  113.04s (169.94 ksample/s 18.97 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    32 0.10289 0.09638  114.80s (167.16 ksample/s 18.62 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    33 0.10126 0.09713  118.53s (162.10 ksample/s 18.09 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    34 0.10132 0.09618  114.32s (167.75 ksample/s 18.63 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    35 0.10256 0.09735  114.83s (167.14 ksample/s 18.61 kbase/s) lr=1.59e-03  22.4% chunks filtered
..................................................    36 0.10273 0.09627  118.25s (162.45 ksample/s 18.04 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    37 0.10070 0.09690  119.18s (160.95 ksample/s 17.92 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    38 0.10240 0.09587  122.12s (157.21 ksample/s 17.53 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    39 0.10228 0.09652  121.98s (157.32 ksample/s 17.50 kbase/s) lr=1.59e-03  22.5% chunks filtered
.................................................C    40 0.10154 0.09753  119.72s (160.40 ksample/s 17.86 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    41 0.10145 0.09766  116.83s (164.52 ksample/s 18.34 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    42 0.10305 0.09752  116.33s (165.06 ksample/s 18.37 kbase/s) lr=1.59e-03  22.5% chunks filtered
..................................................    43 0.10309 0.09718  113.95s (168.32 ksample/s 18.77 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    44 0.10519 0.09719  117.84s (163.13 ksample/s 18.14 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    45 0.10280 0.09720  114.28s (167.86 ksample/s 18.66 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    46 0.10338 0.09686  117.43s (163.33 ksample/s 18.20 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    47 0.10158 0.09750  117.49s (163.25 ksample/s 18.12 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    48 0.10328 0.09947  117.06s (163.95 ksample/s 18.26 kbase/s) lr=1.58e-03  22.5% chunks filtered
..................................................    49 0.10487 0.09910  115.14s (166.81 ksample/s 18.65 kbase/s) lr=1.58e-03  22.6% chunks filtered
..................................................    50 0.10118 0.09751  120.68s (159.18 ksample/s 17.63 kbase/s) lr=1.58e-03  22.6% chunks filtered
..................................................    51 0.10375 0.09986  115.22s (166.70 ksample/s 18.58 kbase/s) lr=1.58e-03  22.6% chunks filtered
..................................................    52 0.10603 0.09855  119.27s (161.08 ksample/s 17.95 kbase/s) lr=1.58e-03  22.6% chunks filtered
..................................................    53 0.10155 0.09740  116.16s (165.29 ksample/s 18.38 kbase/s) lr=1.58e-03  22.6% chunks filtered
..................................................    54 0.10260 0.09768  112.45s (170.72 ksample/s 18.98 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    55 0.10155 0.09779  116.01s (165.47 ksample/s 18.41 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    56 0.10258 0.09721  117.70s (163.23 ksample/s 18.10 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    57 0.10468 0.09874  118.00s (162.63 ksample/s 18.08 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    58 0.10292 0.09673  120.48s (159.47 ksample/s 17.73 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    59 0.10120 0.09683  116.46s (164.98 ksample/s 18.34 kbase/s) lr=1.57e-03  22.6% chunks filtered
.................................................C    60 0.10291 0.09715  111.07s (172.95 ksample/s 19.24 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    61 0.10265 0.09764  117.70s (163.35 ksample/s 18.17 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    62 0.10124 0.09724  118.02s (162.72 ksample/s 18.12 kbase/s) lr=1.57e-03  22.6% chunks filtered
..................................................    63 0.10444 0.09788  116.20s (165.41 ksample/s 18.37 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    64 0.10290 0.09740  115.10s (166.85 ksample/s 18.55 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    65 0.10396 0.09741  116.88s (164.25 ksample/s 18.32 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    66 0.10418 0.09737  120.62s (159.13 ksample/s 17.71 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    67 0.10352 0.09803  113.83s (168.62 ksample/s 18.76 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    68 0.10091 0.09806  114.76s (167.36 ksample/s 18.64 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    69 0.10277 0.09752  109.89s (174.39 ksample/s 19.44 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    70 0.10152 0.09761  121.69s (157.95 ksample/s 17.54 kbase/s) lr=1.56e-03  22.6% chunks filtered
..................................................    71 0.10293 0.09910  112.92s (169.99 ksample/s 18.91 kbase/s) lr=1.55e-03  22.7% chunks filtered
..................................................    72 0.10342 0.09704  115.96s (165.39 ksample/s 18.41 kbase/s) lr=1.55e-03  22.7% chunks filtered
..................................................    73 0.10179 0.09838  112.01s (171.42 ksample/s 19.05 kbase/s) lr=1.55e-03  22.6% chunks filtered
..................................................    74 0.10373 0.09874  111.87s (171.61 ksample/s 19.09 kbase/s) lr=1.55e-03  22.6% chunks filtered
..................................................    75 0.10389 0.09729  110.51s (173.63 ksample/s 19.31 kbase/s) lr=1.55e-03  22.6% chunks filtered
..................................................    76 0.10115 0.09802  116.14s (165.44 ksample/s 18.41 kbase/s) lr=1.55e-03  22.6% chunks filtered
..................................................    77 0.10189 0.09826  121.15s (158.53 ksample/s 17.68 kbase/s) lr=1.55e-03  22.6% chunks filtered
..................................................    78 0.10195 0.09802  115.58s (165.89 ksample/s 18.42 kbase/s) lr=1.54e-03  22.6% chunks filtered
..................................................    79 0.10193 0.09749  120.47s (159.53 ksample/s 17.80 kbase/s) lr=1.54e-03  22.6% chunks filtered
.................................................C    80 0.10151 0.09734  121.18s (158.53 ksample/s 17.65 kbase/s) lr=1.54e-03  22.6% chunks filtered
..................................................    81 0.10123 0.09833  114.94s (166.94 ksample/s 18.59 kbase/s) lr=1.54e-03  22.6% chunks filtered
..................................................    82 0.10002 0.09774  122.28s (157.09 ksample/s 17.51 kbase/s) lr=1.54e-03  22.6% chunks filtered
..................................................    83 0.10148 0.09776  120.81s (158.98 ksample/s 17.69 kbase/s) lr=1.54e-03  22.6% chunks filtered
..................................................    84 0.10369 0.09914  118.36s (162.27 ksample/s 18.08 kbase/s) lr=1.54e-03  22.7% chunks filtered
..................................................    85 0.10256 0.09831  118.93s (161.22 ksample/s 17.93 kbase/s) lr=1.53e-03  22.7% chunks filtered
..................................................    86 0.10094 0.09856  117.98s (162.84 ksample/s 18.06 kbase/s) lr=1.53e-03  22.7% chunks filtered
..................................................    87 0.10262 0.09856  115.67s (165.98 ksample/s 18.48 kbase/s) lr=1.53e-03  22.7% chunks filtered
..................................................    88 0.10268 0.09754  114.59s (167.54 ksample/s 18.67 kbase/s) lr=1.53e-03  22.6% chunks filtered
..................................................    89 0.10122 0.09729  115.43s (166.36 ksample/s 18.49 kbase/s) lr=1.53e-03  22.6% chunks filtered
..................................................    90 0.10119 0.09826  117.85s (162.79 ksample/s 18.10 kbase/s) lr=1.53e-03  22.6% chunks filtered
..................................................    91 0.10304 0.09846  113.75s (168.66 ksample/s 18.74 kbase/s) lr=1.52e-03  22.6% chunks filtered
..................................................    92 0.10238 0.09824  112.37s (171.00 ksample/s 19.04 kbase/s) lr=1.52e-03  22.6% chunks filtered
..................................................    93 0.10246 0.09978  113.78s (168.81 ksample/s 18.80 kbase/s) lr=1.52e-03  22.6% chunks filtered
..................................................    94 0.10248 0.09794  116.51s (164.81 ksample/s 18.42 kbase/s) lr=1.52e-03  22.6% chunks filtered
..................................................    95 0.10194 0.09909  116.15s (165.32 ksample/s 18.43 kbase/s) lr=1.52e-03  22.6% chunks filtered
..................................................    96 0.10665 0.10063  112.00s (171.39 ksample/s 19.09 kbase/s) lr=1.51e-03  22.6% chunks filtered
..................................................    97 0.10250 0.09883  119.84s (160.29 ksample/s 17.83 kbase/s) lr=1.51e-03  22.6% chunks filtered
..................................................    98 0.10183 0.09980  121.37s (158.07 ksample/s 17.59 kbase/s) lr=1.51e-03  22.6% chunks filtered
..................................................    99 0.10106 0.09903  123.88s (155.04 ksample/s 17.33 kbase/s) lr=1.51e-03  22.6% chunks filtered
.................................................C   100 0.10287 0.09871  116.61s (164.78 ksample/s 18.32 kbase/s) lr=1.51e-03  22.6% chunks filtered
..................................................   101 0.09959 0.09830  115.93s (165.56 ksample/s 18.41 kbase/s) lr=1.51e-03  22.6% chunks filtered
..................................................   102 0.10232 0.10040  116.30s (165.17 ksample/s 18.37 kbase/s) lr=1.50e-03  22.6% chunks filtered
..................................................   103 0.10204 0.09922  113.61s (168.76 ksample/s 18.80 kbase/s) lr=1.50e-03  22.6% chunks filtered
..................................................   104 0.10118 0.10029  115.30s (166.43 ksample/s 18.55 kbase/s) lr=1.50e-03  22.6% chunks filtered
..................................................   105 0.10316 0.09928  119.12s (161.12 ksample/s 17.96 kbase/s) lr=1.50e-03  22.7% chunks filtered
..................................................   106 0.10159 0.09964  114.67s (167.49 ksample/s 18.65 kbase/s) lr=1.50e-03  22.6% chunks filtered
..................................................   107 0.10175 0.09931  115.64s (165.94 ksample/s 18.48 kbase/s) lr=1.49e-03  22.6% chunks filtered
..................................................   108 0.10043 0.09866  125.75s (152.72 ksample/s 17.02 kbase/s) lr=1.49e-03  22.7% chunks filtered
..................................................   109 0.10020 0.09867  118.72s (161.77 ksample/s 18.02 kbase/s) lr=1.49e-03  22.7% chunks filtered
..................................................   110 0.10340 0.09991  116.44s (164.87 ksample/s 18.40 kbase/s) lr=1.49e-03  22.7% chunks filtered
..................................................   111 0.10187 0.09888  120.00s (160.10 ksample/s 17.83 kbase/s) lr=1.49e-03  22.7% chunks filtered
..................................................   112 0.10131 0.09886  116.21s (165.18 ksample/s 18.35 kbase/s) lr=1.48e-03  22.7% chunks filtered
..................................................   113 0.10127 0.09878  119.95s (159.88 ksample/s 17.79 kbase/s) lr=1.48e-03  22.7% chunks filtered
..................................................   114 0.10092 0.09867  116.38s (164.89 ksample/s 18.39 kbase/s) lr=1.48e-03  22.7% chunks filtered
..................................................   115 0.10235 0.09870  117.45s (163.42 ksample/s 18.19 kbase/s) lr=1.48e-03  22.7% chunks filtered
..................................................   116 0.10141 0.09837  118.39s (162.21 ksample/s 18.07 kbase/s) lr=1.47e-03  22.7% chunks filtered
..................................................   117 0.10130 0.09781  121.45s (158.15 ksample/s 17.60 kbase/s) lr=1.47e-03  22.7% chunks filtered
..................................................   118 0.09917 0.09845  121.79s (157.63 ksample/s 17.54 kbase/s) lr=1.47e-03  22.7% chunks filtered
..................................................   119 0.10108 0.09849  118.65s (161.84 ksample/s 17.97 kbase/s) lr=1.47e-03  22.7% chunks filtered
.................................................C   120 0.09964 0.09825  121.11s (158.32 ksample/s 17.58 kbase/s) lr=1.47e-03  22.7% chunks filtered
..................................................   121 0.10007 0.09785  115.92s (165.46 ksample/s 18.38 kbase/s) lr=1.46e-03  22.7% chunks filtered
..................................................   122 0.09985 0.09845  114.33s (168.07 ksample/s 18.72 kbase/s) lr=1.46e-03  22.7% chunks filtered
..................................................   123 0.09904 0.09813  120.89s (158.90 ksample/s 17.63 kbase/s) lr=1.46e-03  22.7% chunks filtered
..................................................   124 0.10266 0.09908  117.09s (163.89 ksample/s 18.23 kbase/s) lr=1.46e-03  22.7% chunks filtered
..................................................   125 0.10281 0.09849  118.34s (162.25 ksample/s 18.06 kbase/s) lr=1.45e-03  22.7% chunks filtered
..................................................   126 0.10141 0.09834  116.42s (164.98 ksample/s 18.35 kbase/s) lr=1.45e-03  22.7% chunks filtered
..................................................   127 0.09954 0.09840  117.82s (162.89 ksample/s 18.10 kbase/s) lr=1.45e-03  22.7% chunks filtered
..................................................   128 0.09923 0.09771  115.27s (166.61 ksample/s 18.57 kbase/s) lr=1.45e-03  22.7% chunks filtered
..................................................   129 0.10268 0.09790  110.59s (173.61 ksample/s 19.28 kbase/s) lr=1.45e-03  22.7% chunks filtered
..................................................   130 0.09932 0.09760  115.02s (166.87 ksample/s 18.58 kbase/s) lr=1.44e-03  22.7% chunks filtered
..................................................   131 0.10253 0.09847  113.70s (168.84 ksample/s 18.83 kbase/s) lr=1.44e-03  22.7% chunks filtered
..................................................   132 0.09926 0.09810  114.94s (166.99 ksample/s 18.55 kbase/s) lr=1.44e-03  22.7% chunks filtered
..................................................   133 0.10006 0.09842  120.04s (160.03 ksample/s 17.81 kbase/s) lr=1.44e-03  22.7% chunks filtered
..................................................   134 0.10196 0.09928  115.63s (166.08 ksample/s 18.52 kbase/s) lr=1.43e-03  22.7% chunks filtered
..................................................   135 0.09793 0.09739  119.16s (161.16 ksample/s 17.89 kbase/s) lr=1.43e-03  22.7% chunks filtered
..................................................   136 0.10120 0.09817  113.37s (169.33 ksample/s 18.89 kbase/s) lr=1.43e-03  22.7% chunks filtered
..................................................   137 0.10041 0.09765  110.80s (173.22 ksample/s 19.23 kbase/s) lr=1.43e-03  22.7% chunks filtered
......Traceback (most recent call last):
  File "/opt/kgapps/taiyaki/bin/train_flipflop.py", line 4, in <module>
    __import__('pkg_resources').run_script('taiyaki==5.1.0', 'train_flipflop.py')
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1446, in run_script
    exec(code, namespace, namespace)
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 624, in <module>
    main()
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 541, in main
    mod_factor_t, calc_grads = True )
  File "/opt/kgapps/taiyaki/lib/python3.7/site-packages/taiyaki-5.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 247, in calculate_loss
    outputs, seqs, seqlens, sharpen)
  File "taiyaki/ctc/ctc.pyx", line 88, in taiyaki.ctc.ctc.FlipFlopCRF.forward
  File "taiyaki/ctc/ctc.pyx", line 62, in taiyaki.ctc.ctc.crf_flipflop_grad
AssertionError: Input not finite

Do you have any idea what is going on here, and what we are doing wrong?

@marcus1487
Copy link
Contributor

This is an area of active research internally. Currently the best solution/workaround is to decrease the --max_lr and increase the --niterations (and maybe --lr_cosine_iters).

@SCDealy
Copy link

SCDealy commented Nov 2, 2020

I just generated the same exception using the example training set:

  https://s3-eu-west-1.amazonaws.com/ont-research/taiyaki_walkthrough.tar.gz

while following the taiyaki walk through instructions:

  https://github.com/nanoporetech/taiyaki/blob/master/docs/walkthrough.rst

though it took 556 iterations before it failed.

FWIW.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants