some question about train #8

FFY0207 · 2024-07-06T05:13:00Z

Epoch 1, Batch 3, Loss: 7.225614070892334
Train step: 2it [00:05, 2.95s/it]
Traceback (most recent call last):
File "/mnt/e/code/silent_speech/transduction_model.py", line 365, in
main()
File "/mnt/e/code/silent_speech/transduction_model.py", line 361, in main
model = train_model(trainset, devset, device, save_sound_outputs=save_sound_outputs)
File "/mnt/e/code/silent_speech/transduction_model.py", line 260, in train_model
loss.backward() # 反向传播
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/ffy/anaconda3/envs/ffy112/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: unknown error
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What problem did I encounter? I lowered the size of the batch, but it didn't work and the error still occurred

The text was updated successfully, but these errors were encountered:

FFY0207 · 2024-07-09T08:41:34Z

This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it

dgaddy · 2024-07-10T03:26:51Z

The first error sounds like some sort of hardware, driver, or pytorch error. It is probably unrelated to the code of this repository - maybe check your CUDA and pytorch installations.
About the loss and accuracy suddenly getting worse, are you using the same batch size as the original code or is this with a smaller batch? A batch size that is too small is the most likely issue.

FFY0207 · 2024-07-11T12:34:34Z

Why can the evaluation. py run normally with the transduction model. pt you provided, but the model I trained myself encountered the following error？can you help me?

Gray-ly · 2024-08-30T09:27:27Z

It seems you loaded a false model, the output should be 80, whice matches the num_speech_features

Gray-ly · 2024-08-30T09:32:22Z

This is my training log. Why did the loss and accuracy suddenly become very poor from the 21st cycle? How should I handle it

I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy

dgaddy · 2024-09-10T03:26:19Z

I encounter the problem when I reproduce the normalizers.pkl by running make_normalizers() in read_emg.py. Obviously, doing so resulted in the pkl being different from the original files in the repository . Do you know why this is? Thanks for your contribution! @dgaddy

It's been quite a while so I don't really remember, but it's possible I may have manually adjusted the normalizers to scale down the size of the inputs or outputs. Sometimes larger values for inputs or outputs can make training less stable. You could try adjusting them and see if that helps. (Inputs seems more likely to help. You would want to increase the normalizer feature_stddevs values to decrease the feature scales. Multiplying by something like 2 or 5 seems reasonable. It might also help to compare the values in your normalizers file vs the one in the repository.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some question about train #8

some question about train #8

FFY0207 commented Jul 6, 2024

FFY0207 commented Jul 9, 2024

dgaddy commented Jul 10, 2024

FFY0207 commented Jul 11, 2024

Gray-ly commented Aug 30, 2024

Gray-ly commented Aug 30, 2024

dgaddy commented Sep 10, 2024

some question about train #8

some question about train #8

Comments

FFY0207 commented Jul 6, 2024

FFY0207 commented Jul 9, 2024

dgaddy commented Jul 10, 2024

FFY0207 commented Jul 11, 2024

Gray-ly commented Aug 30, 2024

Gray-ly commented Aug 30, 2024

dgaddy commented Sep 10, 2024