[1]
|
C.-C.Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen,
A. Kannan, R. J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski,
and M. Bacchiani, “State-of-the-art speech recognition with
sequence-to-sequence models,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[2]
|
S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and
K. Rao, “Multilingual speech recognition with a single end-to-end model,”
in Proc. IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2018.
[ pdf ]
|
[3]
|
B. Li, T. N. Sainath, K. Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen,
Y. Wu, and K. Rao, “Multi-Dialect Speech Recognition With a Single
Sequence-to-Sequence Model,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[4]
|
T. N. Sainath, P. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach,
V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen, and C. C. Chiu, “No Need for
a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End
Models,” in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[5]
|
D. Lawson, C. C. Chiu, G. Tucker, C. Raffel, K. Swersky, and N. Jaitly,
“Learning hard alignments with variational inference,” in Proc.
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2018.
[ pdf ]
|
[6]
|
A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An
analysis of incorporating an external language model into a
sequence-to-sequence model,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[7]
|
R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C. C. Chiu, and
A. Kannan, “Minimum Word Error Rate Training for Attention-based
Sequence-to-sequence Models,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[8]
|
T. N. Sainath, C. C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen, and
Z. C. Z, “Improving the Performance of Online Neural Transducer Models,”
in Proc. IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2018.
[ pdf ]
|
[9]
|
C. C. Chiu and C. Raffel, “Monotonic Chunkwise Attention,” in Proc.
International Conference on Learning Representations (ICLR), 2018.
[ pdf ]
|
[10]
|
I. Williams, A. Kannan, P. Aleksic, D. Rybach, and T. N. S. TN, “Contextual
Speech Recognition in End-to-End Neural Network Systems using Beam Search,”
in Proc. Interspeech, 2018.
[ pdf ]
|
[11]
|
C. C. Chiu, A. Tripathi, K. Chou, C. Co, N. Jaitly, D. Jaunzeikare, A. Kannan,
P. Nguyen, H. Sak, A. Sankar, J. Tansuwan, N. Wan, Y. Wu, and X. Zhang,
“Speech recognition for medical conversations,” in Proc.
Interspeech, 2018.
[ pdf ]
|
[12]
|
R. Pang, T. N. Sainath, R. Prabhavalkar, S. Gupta, Y. Wu, S. Zhang, and C. C.
Chiu, “Compression of End-to-End Models,” in Proc. Interspeech,
2018.
[ pdf ]
|
[13]
|
S. Toshniwal, A. Kannan, C. C. Chiu, Y. Wu, T. N. Sainath, and K. Livescu, “A
comparison of techniques for language model integration in encoder-decoder
speech recognition,” in Proc. IEEE Spoken Language Technology
Workshop (SLT), 2018.
[ pdf ]
|
[14]
|
G. Pundak, T. N. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, “Deep
context: End-to-end contextual speech recognition,” in Proc. IEEE
Spoken Language Technology Workshop (SLT), 2018.
[ pdf ]
|
[15]
|
B. Li, Y. Zhang, T. N. Sainath, Y. Wu, and W. Chan, “Bytes are all you need:
End-to-end multilingual speech recognition and synthesis with bytes,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2019.
[ pdf ]
|
[16]
|
J. Guo, T. N. Sainath, and R. J. Weiss, “A spelling correction model for
end-to-end speech recognition,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[17]
|
U. Alon, G. Pundak, and T. N. Sainath, “Contextual speech recognition with
difficult negative training examples,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[18]
|
Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, “Imperceptible,
robust, and targeted adversarial examples for automatic speech recognition,”
in Proc. International Conference on Machine Learning (ICML), 2019.
[ pdf ]
|
[19]
|
D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le,
“SpecAugment: A Simple Data Augmentation Method for Automatic Speech
Recognition,” in arXiv, 2019.
[ pdf ]
|
[20]
|
B. Li, T. N. Sainath, R. Pang, and Z. Wu, “Semi-supervised training for
end-to-end models via weak distillation,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[21]
|
S.-Y. Chang, R. Prabhavalkar, Y. He, T. N. Sainath, and G. Simko, “Joint
endpointing and decoding with end-to-end models,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2019.
[ pdf ]
|
[22]
|
J. Heymann, K. C. Sim, and B. Li, “Improving ctc using stimulated learning for
sequence modeling,” in Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[23]
|
A. Bruguier, R. Prabhavalkar, G. Pundak, and T. N. Sainath, “Phoebe:
Pronunciation-aware contextualization for end-to-end speech recognition,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2019.
[ pdf ]
|
[24]
|
Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao,
D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shangguan,
B. Li, G. Pundak, K. C. Sim, T. Bagby, S.-Y. Chang, K. Rao, and
A. Gruenstein, “Streaming end-to-end speech recognition for mobile
devices,” in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[25]
|
K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. Nguyen,
“Model unit exploration for sequence-to-sequence speech recognition,”
arXiv e-prints, 2019.
[ pdf ]
|
[26]
|
C. Peyser, H. Zhang, T. N. Sainath, and Z. Wu, “Improving Performance of
End-to-End ASR on Numeric Sequences,” in Proc. Interspeech, 2019.
[ pdf ]
|
[27]
|
D. Zhao, T. N. Sainath, D. Rybach, D. Bhatia, B. Li, and R. Pang,
“Shallow-fusion end-to-end contextual biasing,” in Proc. Interspeech,
2019.
[ pdf ]
|
[28]
|
T. N. Sainath, R. Pang, D. Rybach, Y. He, R. Prabhavalkar, W. Li, M. Visontai,
Q. Liang, T. Strohman, Y. Wu, I. McGraw, and C.-C. Chiu, “Two-pass
end-to-end speech recognition,” in Proc. Interspeech, 2019.
[ pdf ]
|
[29]
|
C.-C. Chiu, W. Han, Y. Zhang, R. Pang, S. Kishchenko, P. Nguyen, A. Narayanan,
H. Liao, S. Zhang, A. Kannan, R. Prabhavalkar, Z. Chen, T. Sainath, and
Y. Wu, “A comparison of end-to-end models for long-form speech
recognition,” 2019.
[ pdf ]
|
[30]
|
A. Narayanan, R. Prabhavalkar, C. Chiu, D. Rybach, T. Sainath, and T. Strohman,
“Recognizing long-form speech using streaming end-to-end models,” 2019.
[ pdf ]
|
[31]
|
T. N. Sainath, R. Pang, R. Weiss, Y. He, C.-C. Chiu, and T. Strohman, “An
attention-based joint acoustic and text on-device end-to-end model,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2020.
|
[32]
|
Z. Lu, L. Cao, Y. Zhang, C.-C. Chiu, and J. Fan, “Speech sentiment analysis
via pre-trained features from end-to-end asr models,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2020.
|
[33]
|
D. Park, Y. Zhang, C.-C. Chiu, Y. Chen, B. Li, W. Chan, Q. Le, and Y. Wu,
“Specaugment on large scale datas,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
|
[34]
|
T. Sainath, Y. He, B. Li, A. Narayanan, R. Pang, A. Bruguier, S. yiin Chang,
W. Li, R. Alvarez, Z. Chen, C. cheng Chiu, D. Garcia, A. Gruenstein, K. Hu,
M. Jin, A. Kannan, Q. Liang, I. McGraw, C. Peyser, R. Prabhavalkar,
G. Pundak, D. Rybach, Y. Shangguan, Y. Sheth, T. Strohman, M. Visontai,
Y. Wu, Y. Zhang, and D. Zhao, “A streaming on-device end-to-end model
surpassing server-side conventional model quality and latency,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2020.
|