- This is a record of the learning of classical speech recognition and speaker recognition algorithms
- Welcome to my personal technology blog:https://www.cnblogs.com/zy230530/
- If you have any questions,please contact me.
- CLDNN: CONVOLUTIONAL, LONG SHORT-TERM MEMORY,FULLY CONNECTED DEEP NEURAL NETWORKS,Google:https://www.cnblogs.com/zy230530/p/13658385.html
- CTC:Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks:https://www.cnblogs.com/zy230530/p/13661661.html
- listen, attented and spell,Google:https://www.cnblogs.com/zy230530/p/13661785.html
- RNNT:SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS,2013:https://www.cnblogs.com/zy230530/p/13675993.html
- EXPLORING ARCHITECTURES, DATA AND UNITS FOR STREAMING END-TO-END SPEECH RECOGNITION WITH RNN-TRANSDUCER,2018:https://www.cnblogs.com/zy230530/p/13676052.html
- EESEN:END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING:https://www.cnblogs.com/zy230530/p/13676238.html
- IMPROVING LATENCY-CONTROLLED BLSTM ACOUSTIC MODELS FOR ONLINE SPEECH RECOGNITION:https://www.cnblogs.com/zy230530/p/13677580.html
- Feedforward Sequential Memory Networks:A New Structure to Learn Long-term Dependency:https://www.cnblogs.com/zy230530/p/13677721.html
- Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition:https://www.cnblogs.com/zy230530/p/13677789.html
- Deep-FSMN for Large Vocabulary Continuous Speech Recognition:https://www.cnblogs.com/zy230530/p/13681669.html
- A NOVEL PYRAMIDAL-FSMN ARCHITECTURE WITH LATTICE-FREE MMI FOR SPEECH RECOGNITION:https://www.cnblogs.com/zy230530/p/13681722.html
- SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODELFOR SPEECH RECOGNITION:https://www.cnblogs.com/zy230530/p/13681774.html
- THE SPEECHTRANSFORMER FOR LARGE-SCALE MANDARIN CHINESE SPEECH RECOGNITION:https://www.cnblogs.com/zy230530/p/13681892.html
- TRANSFORMER TRANSDUCER: A STREAMABLE SPEECH RECOGNITION MODELWITH TRANSFORMER ENCODERS AND RNN-T LOSS:https://www.cnblogs.com/zy230530/p/13681954.html
- TRANSFORMER-TRANSDUCER:END-TO-END SPEECH RECOGNITION WITH SELF-ATTENTION:https://www.cnblogs.com/zy230530/p/13682010.html
- A time delay neural network architecture for efficient modeling of longtemporal contexts:https://www.cnblogs.com/zy230530/p/13682126.html
- VoxCeleb: a large-scale speaker identification dataset:https://www.cnblogs.com/zy230530/p/13657435.html
- VoxCeleb2: Deep Speaker Recognition:https://www.cnblogs.com/zy230530/p/13657462.html
- End-to-End Text-Dependent Speaker Verification:https://www.cnblogs.com/zy230530/p/13657502.html
- ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION:https://www.cnblogs.com/zy230530/p/13657568.html
- GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION:https://www.cnblogs.com/zy230530/p/13657678.html
- Deep Speaker: an End-to-End Neural Speaker Embedding System:https://www.cnblogs.com/zy230530/p/13657717.html
- TEXT-INDEPENDENT SPEAKER VERIFICATION USING 3D CONVOLUTIONAL NEURAL NETWORKS:https://www.cnblogs.com/zy230530/p/13657771.html
- X-VECTORS: ROBUST DNN EMBEDDINGS FOR SPEAKER RECOGNITION:https://www.cnblogs.com/zy230530/p/13657793.html
- Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification:https://www.cnblogs.com/zy230530/p/13657823.html
- CN-Celeb: A CHALLENGING CHINESE SPEAKER RECOGNITION DATASET:https://www.cnblogs.com/zy230530/p/13715298.html
- SpecAugment: A Simple Data Augmentation Methodfor Automatic Speech Recognition:https://www.cnblogs.com/zy230530/p/13682080.html