-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
548 lines (452 loc) · 24.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
<!DOCTYPE html>
<html lang="en">
<!-- <head>
<meta charset="utf-8">
<meta name="generator" content="Hugo 0.66.0" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="stylesheet" href="../css/normalize.css">
<link rel="stylesheet" href="../css/skeleton.css">
<link rel="stylesheet" href="../css/custom.css">
<link rel="alternate" href="index.xml" type="application/rss+xml" title="Speech Research">
<link rel="shortcut icon" href="favicon.png" type="image/x-icon" />
<title >CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model - Speech Research</title>
</head> -->
<head>
<meta charset="utf-8">
<meta name="generator" content="Hugo 0.88.1" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
<link rel="stylesheet" href=""https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<link rel="stylesheet" href="css/custom.css">
<link rel="stylesheet" href="css/normalize.css">
<title>SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow</title>
<link href="css/bootstrap.min.css" rel="stylesheet">
</head>
<!--
<body rightmargin="100" leftmargin="100" topmargin="50" bottommargin="100" line-height:160%>
<font size="5"> -->
<div class="container">
<header role="banner">
</header>
<main role="main">
<article itemscope itemtype="https://schema.org/BlogPosting">
<h1 class="entry-title" itemprop="headline" align="center" ><font color="145d93" ><b>SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow</b></font></h1>
<section itemprop="entry-text">
<br>
<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
<h2 id="abstract" style="text-align:center;"><font color="145d93">Abstract</font></h2>
<style>
.centered-link {
text-align: center;
}
</style>
<!-- <div class="centered-link">
<a href="https://github.com/">CODE LINK</a>
</div> -->
<p style="text-align: justify;"><font color="061E61"> Recently, flow matching based speech synthesis has significantly enhanced the quality of synthesized speech while reducing the number of inference steps. In this paper, we introduce SlimSpeech, a lightweight and efficient speech synthesis system based on rectified flow. We have built upon the existing speech synthesis method utilizing the rectified flow model, modifying its structure to reduce parameters and serve as a teacher model. By refining the reflow operation, we directly derive a smaller model with a more straight sampling trajectory from the larger model, while utilizing distillation techniques to further enhance the model performance. Experimental results demonstrate that our proposed method, with significantly reduced model parameters, achieves comparable performance to larger models through one-step sampling.</font></p>
<!-- <br></br> -->
<!-- <br></br> -->
</div>
<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
<h2 id="TTS" style="text-align:center;"><font color="145d93">Text-to-Speech results</font></h2>
<p style="text-align: justify;"><font color="061E61"> The below audio samples show the comparison between GT, FastSpeech 2, Grad-TTS, Matcha-TTS, Reflow-TTS, 2-ReFlow-TTS, SlimFlow-TTS and <b> our SlimSpeech (1step, 4step) </b> . Samples are selected from the test dataset of LJSpeech.
<ul style="list-style-type:square">
<!--
<li><font color="061E61"> For each dataset, the same HiFi-GAN vocoder pretrained on GT mel is used for all methods. </font></li>
-->
<!--<li>
<font color="061E61"> Each generated sample has been normalized, and the raw recording is shown in Recordings. </font></li>
-->
<!--<li>
<li><font color="061E61"> Our method ResGrad uses FastSpeech 2 as the initial predictor for estimating mel-spectrogram. </font></li>
-->
<!-- <li><font color="061E61"> The number of inference steps used by diffusion models is shown behind the name of models. For example, ResGrad-4 stands for ResGrad with 4 inference steps. But FastSpeech 2 is a model name instead of denoting the number of inference steps. </font></li> -->
<!-- <li><font color="061E61"> For each dataset, the inference speed measured by the real-time factor (RTF) is shown in the brackets of the first case. </font></li> -->
<!-- <li><font color="061E61"> In order to improve the inference speed at each sampling step, ResGrad uses the same U-Net architecture in GradTTS but with twice fewer channels to reduce the model size (2.0M in ResGrad vs 7.6M in GradTTS). </font></li> -->
</ul>
<!-- <br></br>
<p><b><font color="145d93">LJSpeech: A dataset contains the recordings from a female speaker at a sampling rate of 22.05kHz.</font></b></p> -->
<br></br>
<p><em><font color="061E62">Sample 1: Charles Thomas White, awaiting execution for arson, made a desperate effort to escape from Newgate in 1827.</font></em></p>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">GT</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/vocoder/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS (RK45 Solver)<br> (179 steps, 27.09M)</th>
<th style="font-weight: normal;text-align: center">SlimFlow-TTS (RK45 Solver) <br> (164 steps, 17.61M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (4 steps, 14.86M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-rk45/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimflowtts/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-4step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (4 steps, 18.22M)</th>
<th style="font-weight: normal;text-align: center">2-Reflow-TTS <br> (4 steps, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (4 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-4/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-4step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-4/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">FastSpeech 2 <br> (1 step, 28.83M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (1 step, 14.86M)</th>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (1 step, 18.22M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/fastspeech/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-1step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-1/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS <br> (1 step, 27.09M)</th>
<th style=" font-weight: normal;text-align: center"> 2-Reflow-TTS <br> (1 step, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-1step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-1step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<br></br>
<p><em><font color="061E62">Sample 2: Another helpful feature of unemployment insurance is the incentive it will give to employers to plan more carefully.</font></em></p>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">GT</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/vocoder/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS (RK45 Solver)<br> (179 steps, 27.09M)</th>
<th style="font-weight: normal;text-align: center">SlimFlow-TTS (RK45 Solver) <br> (164 steps, 17.61M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (4 steps, 14.86M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-rk45/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimflowtts/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-4step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (4 steps, 18.22M)</th>
<th style="font-weight: normal;text-align: center">2-Reflow-TTS <br> (4 steps, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (4 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-4/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-4step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-4/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">FastSpeech 2 <br> (1 step, 28.83M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (1 step, 14.86M)</th>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (1 step, 18.22M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/fastspeech/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-1step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-1/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS <br> (1 step, 27.09M)</th>
<th style=" font-weight: normal;text-align: center"> 2-Reflow-TTS <br> (1 step, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-1step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-1step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<br></br>
<p><em><font color="061E62">Sample 3: While it was apparently intended for publication in the United States, and is in many respects critical of certain aspects of life in the Soviet Union,</font></em></p>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">GT</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/vocoder/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS (RK45 Solver)<br> (179 steps, 27.09M)</th>
<th style="font-weight: normal;text-align: center">SlimFlow-TTS (RK45 Solver) <br> (164 steps, 17.61M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (4 steps, 14.86M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-rk45/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimflowtts/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-4step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (4 steps, 18.22M)</th>
<th style="font-weight: normal;text-align: center">2-Reflow-TTS <br> (4 steps, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (4 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-4/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-4step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-4/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">FastSpeech 2 <br> (1 step, 28.83M)</th>
<th style="font-weight: normal;text-align: center">Grad-TTS <br> (1 step, 14.86M)</th>
<th style="font-weight: normal;text-align: center">Matcha-TTS <br> (1 step, 18.22M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/fastspeech/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/gradtts-1step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/matchatts-1/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="font-weight: normal;text-align: center">Reflow-TTS <br> (1 step, 27.09M)</th>
<th style=" font-weight: normal;text-align: center"> 2-Reflow-TTS <br> (1 step, 27.09M)</th>
<th style="text-align: center">SlimSpeech <br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/1-reflowtts-1step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/2-reflowtts-1step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style=" text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
</div>
<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
<h2 id="TTS" style="text-align:center;"><font color="145d93">Ablation study</font></h2>
<p style="text-align: center;"><font color="061E61"> In this section, we conduct ablation experiments to demonstrate the effectiveness of annealing reflow and flow-guided distillation.
<ul style="list-style-type:square">
<!-- <ul style="list-style-type:square"> -->
<br></br>
<p><em><font color="061E61"> Sample 1: Charles Thomas White, awaiting execution for arson, made a desperate effort to escape from Newgate in 1827.</font></em></p>
<table align="center">
<thead>
<tr>
<th style="text-align: center">student model (RK45 Solver)<br> (96 steps, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o anealing reflow (RK45 Solver)<br> (96 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/annealingflowtts/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/ablation_annealingflowtts/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="text-align: center">SlimSpeech <br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o $\mathcal{L}_{2\text{-step}}$ <br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o $\mathcal{L}_{2\text{-step}} + \mathcal{L}_{Distill}$ <br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/no2step/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/nodistill/LJ016-0008.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<br></br>
<p><em><font color="061E61"> Sample 2: Another helpful feature of unemployment insurance is the incentive it will give to employers to plan more carefully.</font></em></p>
<table align="center">
<thead>
<tr>
<th style="text-align: center">student model (RK45 Solver)<br> (96 steps, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o anealing reflow (RK45 Solver)<br> (96 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/annealingflowtts/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/ablation_annealingflowtts/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="text-align: center">SlimSpeech<br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o $\mathcal{L}_{2\text{-step}}$<br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o $\mathcal{L}_{2\text{-step}} + \mathcal{L}_{Distill}$ <br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/no2step/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/nodistill/LJ022-0068.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<br></br>
<p><em><font color="061E61"> Sample 3: While it was apparently intended for publication in the United States, and is in many respects critical of certain aspects of life in the Soviet Union,</font></em></p>
<table align="center">
<thead>
<tr>
<th style="text-align: center">student model (RK45 Solver)<br> (96 steps, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o anealing reflow (RK45 Solver)<br> (96 steps, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/annealingflowtts/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/ablation_annealingflowtts/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
<table align="center">
<thead>
<tr>
<th style="text-align: center">SlimSpeech<br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center">w/o $\mathcal{L}_{2\text{-step}}$<br> (1 step, 5.48M)</th>
<th style="font-weight: normal;text-align: center"> w/o $\mathcal{L}_{2\text{-step}} + \mathcal{L}_{Distill}$<br> (1 step, 5.48M)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/slimspeech-1/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/no2step/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
<td style="text-align: center"><audio controls="controls" ><source src="SlimSpeech/nodistill/LJ042-0154.wav" autoplay/>Your browser does not support the audio element.</audio></td>
</tr>
</tbody>
</table>
</p>
</section>
</article>
</main>
</div>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-139981676-1', 'auto');
ga('send', 'pageview');
</script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
HTML: ["input/TeX","output/HTML-CSS"],
TeX: {
Macros: {
bm: ["\\boldsymbol{#1}", 1],
argmax: ["\\mathop{\\rm arg\\,max}\\limits"],
argmin: ["\\mathop{\\rm arg\\,min}\\limits"]},
extensions: ["AMSmath.js","AMSsymbols.js"],
equationNumbers: { autoNumber: "AMS" } },
extensions: ["tex2jax.js"],
jax: ["input/TeX","output/HTML-CSS"],
tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
processEscapes: true },
"HTML-CSS": { availableFonts: ["TeX"],
linebreaks: { automatic: true } }
});
</script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
}
});
</script>
<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
</body>
</html>