Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PESQ value of my reproduced model(3 epoch) is only 2.0438 instead of 2.818 #6

Open
Xu-Kaibo opened this issue Oct 12, 2022 · 1 comment

Comments

@Xu-Kaibo
Copy link

Xu-Kaibo commented Oct 12, 2022

PESQ value of my reproduced model(3 epoch) is only 2.0438 instead of 2.818 from Mr.Filippov's experiment.
I followed the steps of the provided code. But the function of calculating PESQ value can't run, so I modified it a little.
The PESQ value of testset itself is 1.9306. And after denoising by the model trained after 3 epoches, the PESQ is just 2.0438.
I'm so confused about where's the wrong. My PESQ calculation code is pasted below:
"
def metrics_score(mode, net, test_loader):
# Calculate mode: "Testset"/"TestModel",
# calculate the metrics of the noisy samples of testset OR valuate the performance of model
# if the function is in the "Testset" mode, pass anything you want to "net"
# Considered metrics: PESQ
print("Measuring Metrics for",mode)
if mode=="TestModel":
net.eval()
test_pesq = 0.
counter = 0.

for noisy_x, clean_x in tqdm(test_loader):
    # get the output from the model
    noisy_x = noisy_x.to(DEVICE)

    clean_x = torch.squeeze(clean_x, 1)
    clean_x = torch.istft(clean_x, n_fft=N_FFT, hop_length=HOP_LENGTH, normalized=True)

    pesq_a = 0.
    if mode=="Testset":
        noisy_x = torch.squeeze(noisy_x, 1)
        noisy_x = torch.istft(noisy_x, n_fft=N_FFT, hop_length=HOP_LENGTH, normalized=True)
        for i in range(len(clean_x)): # speech may be in the form of [d,n] instead of [1,n]
            clean_x_16 = down_sample(clean_x[i, :].view(1, -1), 48000, 16000)
            noisy_x_16 = down_sample(noisy_x[i, :].view(1, -1), 48000, 16000)        
            clean_x_16 = clean_x_16.cpu().numpy().flatten()
            noisy_x_16 = noisy_x_16.detach().cpu().numpy().flatten()

            pesq_a += pesq.pesq(16000, clean_x_16, noisy_x_16, 'wb')

    elif mode=="TestModel":
        with torch.no_grad():
            pred_x = net(noisy_x)
        for i in range(len(clean_x)):
            clean_x_16 = down_sample(clean_x[i, :].view(1, -1), 48000, 16000)
            pred_x_16 = down_sample(pred_x[i, :].view(1, -1), 48000, 16000)        
            # I cannot run the Resample function below
            # clean_x_16 = torchaudio.transforms.Resample(48000, 16000)(clean_x[i, :].view(1, -1))
            # pred_x_16 = torchaudio.transforms.Resample(48000, 16000)(pred_x[i, :].view(1, -1))
            clean_x_16 = clean_x_16.cpu().numpy().flatten()
            pred_x_16 = pred_x_16.detach().cpu().numpy().flatten()

            pesq_a += pesq.pesq(16000, clean_x_16, pred_x_16, 'wb')

    pesq_a /= len(clean_x)
    test_pesq += pesq_a
    counter += 1

test_pesq /= counter
return test_pesq

"

@JalaJalera
Copy link

Hey, I'm working with this model right now. There is nothing ready to be published yet, but I found some problems in this implementation. But the thing that will fix the bad pesq is to use a right stft window: Here they use a rectangle window.
You can pass a better suited window like this:

clean_stft = torch.stft(input=clean_sample, n_fft=self.n_fft, window=torch.hann_window(self.n_fft, True),
                                  hop_length=self.hop_length, normalized=True, return_complex=False)

In this case I used a "hann" window.
Some other flaws:

  1. The complex Batchnorm is not wrong but not really right either. You can find a better implementation here (https://github.com/wavefrontshaping/complexPyTorch)
  2. All the waveform are restricted to 3.5 seconds although some of the samples are longer so the data is not used at all
  3. The model is trained with 48 kHz samples, although the pesq uses only 16 kHz

Hope this helped and I will share my code when it's "presentable"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants