Adversarial-Attack-on-Chinese-ASR-systems

To generate adversarial examples for your own files, please ensure that the file is sampled at 16KHz and uses signed 16-bit ints as the data type. Our method is based on multi-objective evolutionary algorithm with three evaluated objectives, namely, CTC loss, speech similarity, and speech signal-to-noise ratio.
Datasets of our experiments: We selected the datasets THCHS-30 and AISHELL-1. We randomly select 100 audio samples in wav format from each of these two datasets as the experimental subjects of our adversarial attack.
Chinese ASR System: The Chinese ASR system we selected is DeepSpeech2 developed by Baidu.

Attack on Chinese ASR System: DeepSpeech2(PaddleSpeech)

Ensure to Install DeepSpeech2 system first. One of the Implementations for DeepSpeech2 can be find here. This project is developed based on the DeepSpeech2 project based on PaddlePaddle. The paper of DeepSpeech2 is "Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin". The project supports for training and prediction under Windows, Linux, and support for development board reasoning predictions such as NVIDIA Jetson.
Please copy files: sadversarial_tools.py, nsga3based.py and adversarial_model.py into the DeepSpeech2 project directory. Now create and run an attack:
```
python nsga3based.py
```
We use this script to recognized the audio by DeepSpeech2 in roder to verify that the attack succeeded:
```
python recognization.py
```

Chinese Adversarial Samples

We encourage readers to listen to our chinese audio adversarial examples and the original one in the attacking_samples directory.

Revise Chinese Words in a Sentence

The chinese_audio.wav will be recognized as "想听歌曲父亲" and adversarial_audio.wav will be recognized as "想听歌曲母亲" by the DeepSpeech2 system.

成功加载了预训练模型：models/step_final
ctcloss: 0.94573337 
final_text decoded as:  想听歌曲母亲
Audio similarity to input: 0.9966

Adversarial Attack on Chinese Phrase

The chinese_audio_phrase.wav will be recognized as "可怜好哦" and adversarial_audio_phrase.wav will be recognized as "取款机" by the DeepSpeech2 system.

成功加载了预训练模型：models/step_final
ctcloss: 11.0739765
final_text decoded as: 取款机
Audio similarity to input: 0.8445

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Adversarial-Attack-on-Chinese-ASR-systems

Attack on Chinese ASR System: DeepSpeech2(PaddleSpeech)

Chinese Adversarial Samples

Revise Chinese Words in a Sentence

Adversarial Attack on Chinese Phrase

Files

README.md

Latest commit

History

README.md

File metadata and controls

Adversarial-Attack-on-Chinese-ASR-systems

Attack on Chinese ASR System: DeepSpeech2(PaddleSpeech)

Chinese Adversarial Samples

Revise Chinese Words in a Sentence

Adversarial Attack on Chinese Phrase