forked from AOMediaCodec/iamf
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.bs
2912 lines (2118 loc) · 157 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Group: AOM
Status: WGD
Title: Immersive Audio Model and Formats
Editor: SungHee Hwang, Samsung, [email protected]
Editor: Felicia Lim, Google, [email protected]
Repository: AOMediaCodec/iamf
Shortname: iamf
URL: https://aomediacodec.github.io/iamf/
Date: 2023-01-09
Abstract: This document specifies an immersive audio (IA) architecture and model, a standalone IA sequence format and an [[!ISOBMFF]]-based IA container format.
</pre>
<pre class="anchors">
url: https://www.iso.org/standard/68960.html#; spec: ISOBMFF; type: dfn;
text: AudioSampleEntry
text: boxtype
text: grouping_type
text: SampleGroupDescriptionEntry
text: channelcount
text: samplerate
text: AudioPreRollEntry
url: https://www.iso.org/standard/68960.html#; spec: ISOBMFF; type: property;
text: iso6
text: sgpd
text: stsd
text: sbgp
text: edts
text: stts
text: prol
url: https://aomedia.org/av1/specification/conventions/; spec: AV1-Convention; type: dfn;
text: leb128()
text: Clip3
url: https://www.iso.org/standard/43345.html#; spec: AAC; type: dfn;
text: raw_data_block()
text: ADTS
text: Low Complexity Profile
url: https://opus-codec.org/docs/opus_in_isobmff.html#; spec: OPUS-IN-ISOBMFF; type: dfn;
text: OpusSpecificBox
text: OutputChannelCount
text: OutputGain
text: ChannelMappingFamily
text: PreSkip
text: InputSampleRate
url: https://opus-codec.org/docs/opus_in_isobmff.html#; spec: OPUS-IN-ISOBMFF; type: property;
text: opus
text: dOps
url: https://www.iso.org/standard/55688.html#; spec: MP4-Systems; type: dfn;
text: objectTypeIndication
text: streamType
text: upstream
text: decSpecificInfo()
text: DecoderConfigDescriptor()
text: Syntatic Description Language
url: https://www.iso.org/standard/76383.html#; spec: MP4-Audio; type: dfn;
text: AudioSpecificConfig()
text: audioObjectType
text: channelConfiguration
text: GASpecificConfig()
text: frameLengthFlag
text: dependsOnCoreCoder
text: extensionFlag
url: https://www.iso.org/standard/79110.html#; spec: MP4; type: dfn;
text: ESDBox
url: https://www.iso.org/standard/79110.html#; spec: MP4; type: property;
text: mp4a
text: esds
url: https://tools.ietf.org/html/rfc6381#; spec: RFC6381; type: property;
text: codecs
url: https://tools.ietf.org/html/rfc8486#; spec: RFC8486; type: dfn;
text: channel count
url: https://tools.ietf.org/html/rfc7845#; spec: RFC7845; type: dfn;
text: ID Header
text: Output Gain
url: https://tools.ietf.org/html/rfc6716#; spec: RFC6716; type: dfn;
text: opus packet
url: https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf#; spec: ITU1770-4; type: dfn;
text: LKFS
url: https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2051-3-202205-I!!PDF-E.pdf#; spec: ITU2051-3; type: dfn;
text: Loudspeaker configuration for Sound System A (0+2+0)
text: Loudspeaker configuration for Sound System B (0+5+0)
text: Loudspeaker configuration for Sound System C (2+5+0)
text: Loudspeaker configuration for Sound System D (4+5+0)
text: Loudspeaker configuration for Sound System E (4+5+1)
text: Loudspeaker configuration for Sound System F (3+7+0)
text: Loudspeaker configuration for Sound System G (4+9+0)
text: Loudspeaker configuration for Sound System H (9+10+3)
text: Loudspeaker configuration for Sound System I (0+7+0)
text: Loudspeaker configuration for Sound System J (4+7+0)
text: SP Label
url: https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2127-0-201906-I!!PDF-E.pdf#; spec: ITU2127-0; type: dfn;
text:
url: https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2076-2-201910-I!!PDF-E.pdf#; spec: ITU2076-2; type: dfn;
text:
url: https://en.wikipedia.org/wiki/Q_(number_format); spec: Q-Format; type: dfn;
text:
url: https://xiph.org/flac/format.html; spec: FLAC; type: dfn;
text: METADATA_BLOCK
text: FRAME
text: FRAME_HEADER
text: SUBFRAME
text: FRAME_FOOTER
url: https://xiph.org/flac/format.html; spec: FLAC; type: property;
text: fLaC
</pre>
<pre class='biblio'>
{
"AI-CAD-Mixing": {
"title": "AI 3D immersive audio codec based on content-adaptive dynamic down-mixing and up-mixing framework",
"status": "Paper",
"publisher": "AES",
"href": "https://www.aes.org/e-lib/browse.cfm?elib=21489"
},
"AAC": {
"title": "Information technology — Generic coding of moving pictures and associated audio information — Part 7: Advanced Audio Coding (AAC)",
"status": "Standard",
"publisher": "ISO/IEC",
"href": "https://www.iso.org/standard/43345.html"
},
"MP4-Audio": {
"title": "Information technology — Coding of audio-visual objects — Part 3: Audio",
"status": "Standard",
"publisher": "ISO/IEC",
"href": "https://www.iso.org/standard/76383.html"
},
"MP4-Systems": {
"title": "Information technology — Coding of audio-visual objects — Part 1: Systems",
"status": "Standard",
"publisher": "ISO/IEC",
"href": "https://www.iso.org/standard/55688.html"
},
"OPUS-IN-ISOBMFF": {
"title": "Encapsulation of Opus in ISO Base Media File Format",
"status": "Best Practice",
"publisher": "IETF",
"href": "https://opus-codec.org/docs/opus_in_isobmff.html"
},
"ITU1770-4": {
"title": "Algorithms to measure audio programme loudness and true-peak audio level",
"status": "Standard",
"publisher": "ITU",
"href": "https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf"
},
"ITU2051-3": {
"title": "Advance sound system for programme production",
"status": "Standard",
"publisher": "ITU",
"href": "https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2051-3-202205-I!!PDF-E.pdf"
},
"Q-Format": {
"title": "Q (number format)",
"status": "Best Practice",
"publisher": "Wikepedia",
"href": "https://en.wikipedia.org/wiki/Q_(number_format)"
},
"BCP47": {
"title": "BCP 47",
"status": "Best Practice",
"publisher": "IETF",
"href": "https://www.rfc-editor.org/info/bcp47"
},
"FLAC": {
"title": "Free Lossless Audio Codec",
"status": "Best Practice",
"publisher": "xiph.org",
"href": "https://xiph.org/flac/format.html"
},
"AV1-Convention": {
"title": "Conventions",
"status": "Spec",
"publisher": "aomedia.org",
"href": "https://aomedia.org/av1/specification/conventions/"
},
"ITU2076-2": {
"title": "Audio Definition Model",
"status": "Standard",
"publisher": "ITU",
"href": "https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2076-2-201910-I!!PDF-E.pdf"
},
"ITU2127-0": {
"title": "Audio Definition Model renderer for advanced sound systems",
"status": "Standard",
"publisher": "ITU",
"href": "https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2127-0-201906-I!!PDF-E.pdf"
}
}
</pre>
# Convention # {#convention}
## Syntax Description ## {#convention-syntaxstructure}
All of syntax elements shall conform to [=Syntatic Description Language=] specified in [[!MP4-Systems]] unless it is explicitly described in the specification.
### Data types ### {#convention-data-types}
<b>leb128()</b> <b>syntaxName</b>
<b>leb128()</b> indicates the type of an unsigned integer. It indicates the following unsigned integer <b>syntaxName</b> shall be encoded by [=leb128()=] specified in [[!AV1-Convention]].
<b>syntaxName</b> is an unsigned integer which is encoded by [=leb128()=] specified in [[!AV1-Convention]].
<b>sleb128()</b> <b>syntaxName</b>
<b>sleb128()</b> indicates the type of an signed integer. It indicates the following signed integer <b>syntaxName</b> shall be encoded by [=leb128()=] specified in [[!AV1-Convention]].
<b>syntaxName</b> is an signed integer which is encoded by [=leb128()=] specified in [[!AV1-Convention]].
<b>string</b> <b>syntaxName</b>
<b>string</b> indicates the type of a string with ring which is terminated by null of one byte (i.e. 0x00).
<b>syntaxName</b> is a human readable label whose byte representation shall consists of <b>two-letter primary language subtags</b> and <b>two-letter region subtags</b> which are connected by hyphen("-"), and followed by bytes representation of [=UTF-8_Enc(label)=].
Where, <b>two-letter primary language subtags</b> and <b>two-letter region subtags</b> shall conform to [[!BCP47]].
## Arithmetic Operators ## {#convention-arithmetic-operators}
<table class="def">
<tr>
<td>+</td><td>Addition.</td>
</tr>
<tr>
<td>-</td><td>Subtraction.</td>
</tr>
<tr>
<td>*</td><td>Multiplication.</td>
</tr>
<tr>
<td>floor(x)</td><td>The largest integer that is smaller than or equal to x.</td>
</tr>
<tr>
<td>sqrt(x)</td><td>The square root of x.</td>
</tr>
</table>
## Function ## {#convention-function}
### Function templates ### {#convention-function-templates}
When the <b>template</b> keyword is used to decorate the <b>class</b> declaration, it indicates that the code is a template with a placeholder type that can be reused by other classes. Only classes that use the template shall be present in the bitstream; the template itself shall not be present in the bitstream. Classes that use a function template shall pass a data type that is specified in either [[!MP4-Systems]] or [[#convention-data-types]].
<b>Example</b>
```
template <class T>
class Foo {
T t;
}
class Bar {
Foo<int> f;
}
```
### Mathematical functions ### {#convention-function-mathematical}
<b>Clip3(x, y, z)</b>
It shall conform to [=Clip3=] specified in [[!AV1-Convention]].
### Function UTF-8 Encoding ### {#convention-function-utf8}
<b>UTF-8_Enc(label)</b>
<dfn values noexport>UTF-8_Enc(label)</dfn> is byted represenation of the encoded <b>label</b>, which is UTF-8 string as defined in [[!RFC3629]], null terminated.
# Introduction # {#introduction}
The <dfn noexport>IA sequence</dfn> is a bitstream to represent immersive audio for presentation on a wide range of devices in both dynamic streaming and offline applications. These applications include internet audio streaming, multicasting/broadcasting services, file download, gaming, communication, virtual and augmented reality, and others. In these applications, audio may be played back on a wide range of devices, e.g. headsets, mobile phones, tablets, TVs, sound bars, home theater systems and big screen.
The bitstream comprises a number of coded audio substreams and the metadata that describes how to decode, render and mix the substreams to generate an audio signal for playback. The bitstream format itself is codec-agnostic; any supported audio codec may be used to code the audio substreams.
The immersive audio container (<dfn noexport>IAC</dfn>) is the storage format for immersive audio (IA) sequence in one single [[!ISOBMFF]] track.
The figure below shows the conceptual IAC architecture.
<center><img src="images/Conceptual IAC Architecture.png"></center>
<center><figcaption>Conceptual IAC Architecture</figcaption></center>
For a given input 3D audio,
- Pre-Processor generates Pre-Processed Audio and Codec Agnostic Metadata for immersive audio (IA).
- Audio Codec Enc generates Codec-Dependent Bitstream, which consists of the coded streams, coded from Pre-Processed Audio.
- File Packager generates IAC File by encapsulating IA sequence, which consists of Codec-Dependent Bitstream and Codec Agnostic Metadata, into [[!ISOBMFF]] tracks.
- File Parser reconstructs IA sequence by decapsulating IAC File.
- Audio Codec Dec outputs a decoded Pre-Processed Audio after decoding of Codec-Dependent Bitstream.
- Post-Processor outputs Immersive 3D Audio by using the decoded Pre-Processed Audio and Codec Agnostic Metadata.
The rest of this specification is formulated as follows:
- [[#overview]] describes the high level IA sequence architecture and introduces its components.
- [[#obu-syntax]] specifies the syntax and semantics of the top level IA components and detailed IA components.
- [[#profiles]] specifies the profiles for IA sequences and IA decoders.
- [[#standalone]] specifies the representation of a standalone IA sequence.
- [[#isobmff]] specifies the encapsulation of an IA sequence into [[!ISOBMFF]] tracks.
- [[#processing]] specifies how the IA sequence should be decoded to generatethe output immersive 3D audio.
- [[#iacgeneration]] provides a guideline for generating the IA sequence.
- [[#iacconsumption]] provides a guideline for consuming the IA sequence, for different use-cases.
# Overview # {#overview}
## IA sequence Components ## {#iab-components}
The IA sequence includes one or more audio elements, each of which consists of one or more audio substreams. The IA sequence further include mix presentations and parameters.
- <dfn noexport>Audio substream</dfn> is the actual audio signal, which may be encoded with any compatible audio codec.
- <dfn noexport>Audio element</dfn> is the 3D representation of the audio signals, and are constructed from one or more audio substreams and the metadata describing them. The audio substreams associated with one audio element use the same audio codec.
- <dfn noexport>Mix presentations</dfn> contain metadata that describe how the audio elements are rendered and mixed together for playback through physical loudspeakers or headphones. At any given time, only one mix presentation is used for playback. However, multiple mix presentations can be defined as alternatives to each other within the same IA sequence. Furthermore, the choice of which mix presentation to use at playback is left to the user. For example, multi-language support is implemented by defining different mix presentations, where the first mix describes the use of the audio element with English dialogue, and the second mix describes the use of the audio element with French dialogue.
- <dfn noexport>Parameters</dfn> are the values that are associated with the algorithms used for decoding, reconstructing, rendering and mixing. Parameters may change their values over time and may further be animated; for example, any changes in values may be smoothed over some time interval. Their rate of change is specific to its respective algorithm, and is independent of other algorithms and the frame rates associated with the audio substreams. As such, they may be viewed as a 1D signal that have different metadata specified for different time intervals.
The figure below shows the relationship between the audio substreams, audio elements and mix presentations and the processing flow to obtain the immersive audio playback.
<center><img src="images/decoding_flow_cropped.png" style="width:100%; height:auto;"></center>
<center><figcaption>Processing flow to decode, reconstruct, render and mix the audio signals for immersive audio playback.</figcaption></center>
## Use of OBU Syntax ## {#use-of-obu}
### Descriptors ### {#descriptors}
The descriptor OBUS contains all the information that is required to setup and configure the decoders, reconstruction algorithms, renderers and mixers.
- <dfn noexport>Magic Code OBU</dfn> indicates the start of a full IA sequence description, version and profile version.
- <dfn noexport>Codec Config OBU</dfn> describes information to set up a decoder for an audio substream.
- <dfn noexport>Audio Element OBU</dfn> describes information to combine one or more audio substreams to reconstruct an audio element.
- <dfn noexport>Mix Presentation OBU</dfn> describes information to render and mix one or more audio elements to generate the final audio output.
### Data ### {#data}
The data OBUs contain the actual time-varying data that is required in the generation of the final audio output.
The IA sequence supports the description of multiple audio substreams and algorithms, which may have different metadata update rates to each other. The update rate for the audio substreams and audio elements is governed by the frame rates of the audio codec used. Since a single bitstream may support multiple codecs, this may lead to multiple different frame rates. The algorithms for rendering and mixing may have parameters that update at different rates to each other and to the audio frame rates. Therefore, the IA sequence contains information to facilitate the synchronization of the different audio frames and parameters.
- <dfn noexport>Audio Frame OBU</dfn> provides the raw coded audio frame for an audio substream.
- <dfn noexport>Parameter Block OBU</dfn> provides the time-varying parameter values for an algorithm used in any of the decoding, reconstruction, rendering or mixing steps.
- <dfn noexport>Sync OBU</dfn> provides relative timestamp offsets to synchronize audio frames and parameter blocks.
- <dfn noexport>Temporal Delimiter OBU</dfn> identifies the temporal units.
The below figure shows the linking scheme among [=obu_id=]s in obu_header and ids in obu payload.
<center><img src="images/ID Linking Example.png" style="width:100%; height:auto;"></center>
<center><figcaption>ID Linking Scheme</figcaption></center>
In the above figure,
- codec config obu is saying that there are two audio elements (audio_element_id = 11 and 12) which are coded by using the codec_config() in the obu.
- The audio element having audio_element_id = 11 is linked to the audio element obu having audio_element_id = 11.
- The audio element obu is saying that there are two substreams (substream_id = 31 and 32) which composing of this audio element.
- The audio substream having substream_id = 31 is linked to the audio frame obus having id = 31.
- The audio substream having substream_id = 32 is linked to the audio frame obus having id = 32.
- The audio element obu is saying that there are one parameter block (parameter_id = 71) for demixing_info_parameter_data() which is applied to the audio element.
- The parameter block having parameter_id = 71 is linked to the parameter block obu having parameter_id = 71.
- IAC decoders applies the parameter block to the audio substreams after decoding by substream decoders.
- The audio element having audio_element_id = 12 is linked to the audio element obu having obu_id = 12.
- The audio element obu is saying that there are one substream (substream_id = 33) which composing of this audio element.
- The audio substream having substream_id = 33 is linked to the audio frame obus having id = 33.
- Substream decoder do decoding the substream.
- mix presentation obu is saying that there are two audio elements (audio_element_id = 11 and 12) which need to be mixed.
- The audio element having audio_element_id = 11 and the audio element having audio_element_id = 12 are mixed after decoding each of them.
- Then IAC decoders may do process loudness and drc controls by using mix_loudness_info() and drc_config().
# Open Bitstream Unit (OBU) Syntax and Semantics # {#obu-syntax}
## Top Level OBU Syntax and Semantics ## {#top-level-syntax}
The IA sequence uses the OBU syntax.
This section specifies the top-level OBU syntax elements and their semantics.
### Audio OBU Syntax and Semantics ### {#audio-obu}
<b>Syntax</b>
```
class audio_open_bitstream_unit() {
obu_header();
if (obu_type == OBU_IA_Magic_Code)
magic_code_obu();
else if (obu_type == OBU_IA_Codec_Config)
codec_config_obu();
else if (obu_type == OBU_IA_Audio_Element)
audio_element_obu();
else if (obu_type == OBU_IA_Mix_Presentation)
mix_presentation_obu();
else if (obu_type == OBU_IA_Parameter_Block)
parameter_block_obu();
else if (obu_type == OBU_IA_Temporal_Delimiter)
temporal_delimiter_obu();
else if (obu_type == OBU_IA_Sync)
sync_obu();
else if (obu_type == OBU_IA_Audio_Frame)
audio_frame_obu_with_no_id();
else if (obu_type >= 9 and <= 30)
audio_frame_obu(obu_type - 9);
else if (obu_type == 6 or 7)
reserved_obu();
byte_alignment():
}
```
<b>Semantics</b>
If the syntax element obu_type is equal to OBU_IA_Magic_Code, an ordered series of OBUs is presented to the decoding process as a string of bytes.
OBU data shall start on the first (most significant) bit and shall end on the last bit of the given bytes. The payload of an OBU shall lie between the first bit of the given bytes and the last bit before the first zero bit of the byte_alignment().
### OBU Header Syntax and Semantics ### {#obu-header}
<b>Syntax</b>
```
class obu_header() {
unsigned int (5) obu_type;
unsigned int (1) obu_redundant_copy;
unsigned int (1) obu_trimming_status_flag;
unsigned int (1) obu_extension_flag;
leb128() obu_size;
if (obu_trimming_status_flag) {
leb128() num_samples_to_trim_at_end;
leb128() num_samples_to_trim_at_start;
}
if (obu_extension_flag == 1)
leb128() extension_header_size;
}
```
<b>Semantics</b>
OBUs are structured with a header and a payload.
<dfn noexport>obu_type</dfn> specifies the type of data structure contained in the OBU payload.
<pre class = "def">
obu_type: Name of obu_type
0 : OBU_IA_Codec_Config
1 : OBU_IA_Audio_Element
2 : OBU_IA_Mix_Presentation
3 : OBU_IA_Parameter_Block
4 : OBU_IA_Temporal_Delimiter
5 : OBU_IA_Sync
6~7 : Reserved
8 : OBU_IA_Audio_Frame
9~30 : OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID21
31 : OBU_IA_Magic_Code
</pre>
<dfn noexport>obu_redundant_copy</dfn> indicates whether this OBU is a redundant copy of the previous OBU in the IA sequence with the same obu_type. A value of 1 shall indicate that it is a redundant copy, while a value of 0 shall indicate that it is not.
It shall always be set to 0 for the following obu_type values:
- OBU_IA_Temporal_Delimiter
- OBU_IA_Sync
- OBU_IA_Audio_Frame
- OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID21
<dfn noexport>obu_trimming_status_flag</dfn> indicates whether this OBU has audio samples to be trimmed or not. If it is set to 1, the [=num_samples_to_trim_at_start=] and [=num_samples_to_trim_at_end=] fields shall be present.
<dfn noexport>obu_extension_flag</dfn> indicates whether the [=extension_header_size=] field shall be present. If it set to 0, the [=extension_header_size=] field shall not be present. Otherwise, the [=extension_header_size=] field shall be present.
This flag shall be set to 0 for the current version of the specification (i.e. [=version=] = 0). An IAC-OBU parser which is conformant with the current version of the specification shall be able to parse this flag and [=extension_header_size=].
NOTE: A future version of specification may use this flag to specify an extension header field by setting [=obu_extension_flag=] = 1 and setting the size of extended header to [=extension_header_size=].
<dfn noexport>obu_size</dfn> shall indicate the size in bytes of the OBU not including the bytes within the obu_header of the preceding fields, i.e. obu_type, obu_redundant_copy, obu_trimming_status_flag and obu_extension_flag.
<dfn noexport>num_samples_to_trim_at_start</dfn> shall indicate the number of samples that needs to be trimmed from the start of the samples in this Audio Frame OBU.
<dfn noexport>num_samples_to_trim_at_end</dfn> shall indicate the number of samples that needs to be trimmed from the end of the samples in this Audio Frame OBU.
<dfn noexport>extension_header_size</dfn> shall indicate the size in bytes of the extension header including this field.
### Byte Alignment Syntax and Semantics ### {#obu-bytealignment}
<b>Syntax</b>
```
class byte_alignment() {
while (get_position() & 7)
unsigned int (1) zero_bit;
}
```
<b>Semantics</b>
<dfn noexport>zero_bit</dfn> shall be equal to 0 and shall be inserted into the bitstream to align the bit position to a multiple of 8 bits.
### Reserved OBU Syntax and Semantics ### {#obu-reserved}
The reserved OBU allows the extension of this specification with additional OBU types in a way that allows IAC-OBU parsers compliant to this version of specification to ignore them.
### Magic Code OBU Syntax and Semantics ### {#obu-magiccode}
This section specifies obu payload of OBU_IA_Magic_Code.
For this obu, the obu header (2 bytes) shall be set to 0xF006.
<b>Syntax</b>
```
class magic_code_obu() {
unsigned int (32) ia_code;
unsigned int (8) version;
unsigned int (8) profile_version
}
```
<b>Semantics</b>
<dfn noexport>ia_code</dfn> shall be a ‘four-character code’ (4CC) to identify the start of the IA sequence. It shall be 'iamf'.
<dfn noexport>version</dfn> shall indicate the version of an IA sequence. It shall be set to 0 for this version of the specification. Implementations should treat IA sequences where the MSB four bits of the version number match that of a recognized specification as backwards compatible with that specification. That is, the version number can be split into "major" and "minor" version sub-fields, with changes to the minor sub-field (in the LSB four bits) signaling compatible changes. For example, an implementation of this specification should accept any stream with a version number of ’15’ or less, and should assume any stream with a version number ’16’ or
greater is incompatible.
<dfn noexport>profile_version</dfn> shall indicate the profile of an IA sequence. The MSB four bits shall indicate the profile of an IA sequence. Implementations should treat IA sequences where the MSB four bits of the version number match that of a recognized profile as backwards compatible with that specification. That is, the version number can be split into "profile major" and "profile minor" version sub-fields, with changes to the minor sub-field (in the LSB four bits) signaling compatible changes with the profile major version. The semantic of this field shall be only valid when the MSB four bits of [=version=] = 0.
### Codec Config OBU Syntax and Semantics ### {#obu-codecconfig}
This section specifies the OBU payload of OBU_IA_Codec_Config.
<b>Syntax</b>
```
class codec_config_obu() {
leb128() codec_config_id;
leb128() num_audio_elements;
for (i = 0; i < num_audio_elements; i++) {
leb128() audio_element_id;
}
codec_config();
}
class codec_config() {
unsigned int (32) codec_id;
leb128() num_samples_per_frame;
signed int (16) roll_distance;
decoder_config(codec_id);
}
```
<b>Semantics</b>
<dfn noexport>codec_config_id</dfn> shall indicate a unique ID in an IA sequence for a given codec config.
<dfn value noexport for="codec_config_obu()">num_audio_elements</dfn> shall specify the number of audio elements that refer to this codec config.
<dfn value noexport for="codec_config_obu()">audio_element_id</dfn> shall specify the unique ID associated with the specific audio element that refers to this codec config.
<dfn noexport>codec_id</dfn> shall be a ‘four-character code’ (4CC) to identify the codec used to generate the audio substreams. It shall be 'opus' for IAC-OPUS, 'mp4a' for IAC-AAC-LC, 'fLaC' for IAC-FLAC and 'lpcm' for IAC-LPCM.
For ISOBMFF encapsulation, it shall be the same as the [=boxtype=] of its AudioSampleEntry if exist.
<dfn noexport>num_samples_per_frame</dfn> shall indicate the frame length, in samples, of the raw coded audio provided in by audio_frame_obu().
<dfn noexport>roll_distance</dfn> is a signed integer that gives the number of frames that need to be decoded in order for a frame to be decoded correctly. A negative value indicates the number of frames before the frame to be decoded corrently.
- It shall be set to -1 for IAC-AAC-LC and -R (R = 4 when the frame size = 960) for IAC-OPUS. IAC-FLAC may ignore this field. Where, R is the smallest integer greater than or equal to 3840 divided by the frame size.
<dfn noexport>decoder_config()</dfn> specifies the set of codec parameters required to decode an audio substream for the given codec_id. It shall be byte aligned.
- The codec_id and decoder_config() for IAC-OPUS shall conform to [=Codec_Specific_Info=] of [[#iac-opus-specific]]
- The codec_id and decoder_config() for IAC-AAC-LC shall conform to [=Codec_Specific_Info=] of [[#iac-aac-lc-specific]].
- The codec_id and decoder_config() for IAC-FLAC shall conform to [=Codec_Specific_Info=] of [[#iac-flac-specific]]
- The codec_id and decoder_config() for IAC-LPCM shall conform to [=Codec_Specific_Info=] of [[#iac-lpcm-specific]].
### Audio Element OBU Syntax and Semantics ### {#obu-audioelement}
This section specifies the OBU payload of OBU_IA_Audio_Element.
<b>Syntax</b>
```
class audio_element_obu() {
leb128() audio_element_id;
unsigned int (3) audio_element_type;
unsigned int (5) reserved;
leb128() num_substreams;
for (i = 0; i < num_substreams; i++) {
leb128() audio_substream_id;
}
leb128() num_parameters;
for (i = 0; i < num_parameters; i++) {
leb128() param_definition_type;
if (param_definition_type == PARAMETER_DEFINITION_DEMIXING) {
DemixingParamDefinition demixing_info;
}
if (param_definition_type == PARAMETER_DEFINITION_RECON_GAIN) {
ReconGainParamDefinition recon_gain_info;
}
}
if (audio_element_type == CHANNEL_BASED) {
scalable_channel_layout_config();
} else if (audio_element_type == SCENE_BASED) {
ambisonics_config();
}
}
```
```
class DemixingParamDefinition() extends ParamDefinition() {
}
```
```
class ReconGainParamDefinition() extends ParamDefinition() {
}
```
<b>Semantics</b>
<dfn value noexport for="audio_element_obu()">audio_element_id</dfn> shall indicate a unique ID in an IA sequence for a given audio element. A Codec Config OBU that refers to that audio element shall use the same value for its [=audio_element_id=] field.
<dfn noexport>audio_element_type</dfn> shall specify the audio representation of this audio element which is constructed from one or more audio substreams.
<pre class = "def">
audio_element_type: The type of audio representation.
0 : CHANNEL_BASED
1 : SCENE_BASED
2~7 : Reserved
</pre>
<dfn noexport>num_substreams</dfn> shall specify the number of audio substreams that are used to reconstruct this audio element.
<dfn noexport>audio_substream_id</dfn> shall specify the unique ID associated with the audio substream that is used to reconstruct this audio element.
Let a particular ChannelGroup's substream be indexed as [<dfn noexport>c</dfn>, <dfn noexport>n_c</dfn>], where
- [=c=] = [1, ..., C] is the ChannelGroup index and C is the number of ChannelGroups.
- [=n_c=] = [1, ..., N_c] is the substream index in the c-th ChannelGroup and N_c is the number of substreams in the c-th ChannelGroup.
- The i-th audio_substream_id maps to a ChannelGroup's substream as follows, where i is the index of the array:
```
[[1, 1], [1, 2], ..., [1, N_1], [2, 1], [2, 2], ..., [2, N_2], ..., [C, 1], [C, 2], ..., [C, N_c]]
```
A ChannelGroup is defined in [[#iacgeneration]]. The order of the substreams in each ChannelGroup., i.e. the semantics of n_c, is specified in [[#syntax-scalable-channel-layout-config]].
<dfn noexport>num_parameters</dfn> shall specify the number of parameters that are used by the algorithms specified in this audio element.
<dfn noexport>param_definition_type</dfn> specifies the type of the parameter definition. All parameter definition types described in this version of the specification are listed in the table below, along with their associated parameter definitions.
<table class = "def">
<tr>
<th>param_definition_type</th><th>Parameter definition type</th><th>Parameter definition</th>
</tr>
<tr>
<td>0</td><td>PARAMETER_DEFINITION_MIX_GAIN</td><td>MixGainParamDefinition</td>
</tr>
<tr>
<td>1</td><td>PARAMETER_DEFINITION_DEMIXING</td><td>DemixingParamDefinition</td>
</tr>
<tr>
<td>2</td><td>PARAMETER_DEFINITION_RECON_GAIN</td><td>ReconGainParamDefinition</td>
</tr>
</table>
<dfn noexport>demixing_info</dfn> provides the parameter definition for the demixing information to reconstruct channel audios according to [=loudspeaker_layout=] from scalable channel audio. The parameter definition is provided by DemixingParamDefinition() and the corresponding parameter data to be provided in parameter blocks is specified in demixing_info_parameter_data().
<dfn noexport>recon_gain_info</dfn> provides the parameter definition for the gain value to reconstruct channel audios according to [=loudspeaker_layout=] from scalable channel audio. The parameter definition is provided by ReconGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks is specified in recon_gain_info_parameter_data().
<dfn noexport>scalable_channel_layout_config()</dfn> is a class that provides the metadata required for combining the substreams identified here in order to reconstruct a scalable channel layout.
<dfn noexport>ambisonics_config()</dfn> is a class that provides the metadata required for combining the substreams identified here in order to reconstruct an Ambisonics layout.
### Mix Presentation OBU Syntax and Semantics ### {#obu-mixpresentation}
This section specifies the OBU payload of OBU_IA_Mix_Presentation.
The metadata in mix_presentation() specifies how to render, process and mix one or more audio elements, with details provided in [[#processing-mixpresentation]].
An IA sequence may have one or more mix presentations specified. The IA parser shall select the appropriate mix presentation to process according to the rules specified in [[#processing-mixpresentation-selection]].
A mix presentation may contain one or more sub-mixes. Common use-cases may specify only one sub-mix, which includes all rendered and processed audio elements used in the mix presentation. The use-case for specifying more than one sub-mix arises if an IA multiplexer is merging two or more IA sequences. In this case, it may choose to capture the loudness information from the original IA sequences in multiple sub-mixes, instead of recomputing the loudness information for the final mix.
<b>Syntax</b>
```
class mix_presentation_obu() {
leb128() mix_presentation_id;
mix_presentation_annotations();
leb128() num_sub_mixes;
for (i = 0; i < num_sub_mixes; i++) {
leb128() num_audio_elements;
for (j = 0; j < num_audio_elements; j++) {
leb128() audio_element_id;
mix_presentation_element_annotations();
rendering_config();
element_mix_config();
}
output_mix_config();
leb128() num_layouts;
for (j = 0; j < num_layouts; j++) {
layout loudness_layout;
loudness_info loudness;
}
}
}
```
<b>Semantics</b>
<dfn noexport>mix_presentation_id</dfn> shall indicate a unique ID in an IA sequence for a given mix presentation.
<dfn noexport>mix_presentation_annotations()</dfn> is a class that provides informational metadata that an IA parser should refer to when selecting the mix presentation to use. The metadata may also be used by the playback system to display information to the user, but is not used in the rendering or mixing process to generate the final output audio signal.
<dfn noexport>num_sub_mixes</dfn> specifies the number of sub-mixes.
<dfn value noexport for ="mix_presentation_obu()">num_audio_elements</dfn> shall specify the number of audio elements that are used in this mix presentation to generate the final output audio signal for playback.
<dfn noexport>audio_element_id</dfn> shall indicate the unique ID associated with a specific audio element that is used in this mix presentation.
<dfn noexport>rendering_config()</dfn> is a class that provides the metadata required for rendering the referenced audio element.
<dfn noexport>element_mix_config()</dfn> is a class that provides the metadata required for applying any processing to the referenced and rendered audio element before being summed with other processed audio elements.
<dfn noexport>output_mix_config()</dfn> is a class that provides the metadata required for post-processing the mixed audio signal to generate the audio signal for playback.
<dfn noexport>num_layouts</dfn> specifies the number of layouts for this sub-mix which the loudness informations were measured on.
<dfn noexport>loudness_layout</dfn> identifies the layout that was used to measure the loudness information provided in this sub-mix.
<dfn noexport>loudness</dfn> provides the loudness information which was measured on [=loudness_layout=] for the mixed audio elements by this sub-mix.
The layout specified in [=loudness_layout=] should not be higher than the highest layout among layouts provided by the audio elements. In other words, rendering from an audio element with the highest layout to the [=loudness_layout=] should not require an upmix.
If one sub-mix of Mix Presentation OBU includes only one single scalable channel audio, then it shall compy with as follows:
- [=num_layouts=] shall be greater than or equal to [=num_layers=] specified in [=scalable_channel_layout_config()=] of Audio Element OBU for the [=audio_element_id=].
- The set of [=loudness_layout=]s shall include all of [=loudspeaker_layout=]s specified in the [=channel_audio_layer_config()=]s of Audio Element OBU for the [=audio_element_id=].
The highest [=loudness_layout=] specified in one sub-mix is the layout which was used for authoring the sub-mix.
ISSUE: Loudness_info in scalable_channel_audio_layer is removed instead.
#### Mix Presentation Annotations Syntax and Semantics #### {#obu-mixpresentation-annotation}
<b>Syntax</b>
```
class mix_presentation_annotations() {
string mix_presentation_friendly_label;
}
```
<b>Semantics</b>
<dfn noexport>mix_presentation_friendly_label</dfn> shall specify a human-friendly label to describe this mix presentation.
#### Mix Presentation Element Annotations Syntax and Semantics #### {#obu-mixpresentation-elementannotation}
<b>Syntax</b>
```
class mix_presentation_element_annotations() {
string audio_element_friendly_label;
}
```
<b>Semantics</b>
<dfn noexport>audio_element_friendly_label</dfn> shall specify a human-friendly label to describe the referenced audio element.
#### Output Mix Config Syntax and Semantics #### {#obu-mixpresentation-outputmix}
output_mix_config() provides a gain value to be applied to the mixed audio signal.
<b>Syntax</b>
```
class output_mix_config() {
MixGainParamDefinition output_mix_gain;
}
```
<b>Semantics</b>
<dfn noexport>output_mix_gain</dfn> provides the parameter definition for the gain value that is applied to all channels of the mixed audio signal. The parameter definition is provided by MixGainParamDefinition() and the corresponding parameter data to be provided in parameter blocks is specified in mix_gain_parameter_data().
#### Loudness Info Syntax and Semantics #### {#obu-mixpresentation-loudness}
loudness_info() provides loudness information for a given audio signal.
All signed values are stored as signed Q7.8 fixed-point values (in [[!Q-Format]]).
<b>Syntax</b>
```
class loudness_info() {
unsigned int (8) info_type;
signed int (16) integrated_loudness;
signed int (16) digital_peak;
if (info_type & 1) {
signed int (16) true_peak;
}
}
```
<b>Semantics</b>
<dfn noexport>info_type</dfn> is a bitmask that specifies the type of optional loudness information provided. The bits are set as follows, where the first bit is the LSB:
<pre class = "def">
Bit : Type of information provided
0 : True peak
1~7 : Reserved
</pre>
<dfn noexport>integrated_loudness</dfn> provides the integrated loudness information, specified in [=LKFS=] as defined in [[!ITU1770-4]], and measured according to [[!ITU1770-4]].
<dfn noexport>digital_peak</dfn> specifies the digital (sampled) peak value of the audio signal, specified in dBFS.
<dfn noexport>true_peak</dfn> specifies the true peak of the audio signal, specified in dBFS and measured according to [[!ITU1770-4]].
NOTE: [[!ITU1770-4]] adopts the convention of using the dBov unit for dBFS, where the RMS value of a full-scale square wave is 0 dBov. The same convention is adopted here.
### Parameter Block OBU Syntax and Semantics ### {#obu-parameterblock}
This section specifies the OBU payload of OBU_IA_Parameter_Block.
The metadata specified in this OBU defines the parameter values for an algorithm for an indicated duration, including any animation of the parameter values over this duration. The metadata shall be used in conjunction with a corresponding parameter definition and parameter data specification. The parameter definition shall be specified based on [=ParamDefinition()=]. The parameter data shall provide the values to apply in each parameter block. These shall be specified using the [=AnimatedParameterData()=] function template if parameter animation is supported.
<b>Syntax</b>
```
class parameter_block_obu() {
leb128() parameter_id;
leb128() duration;
leb128() num_segments;
leb128() constant_segment_interval;
param_definition_type = get_param_definition_type(parameter_id);
for (i = 0; i < num_segments; i++) {
if (constant_segment_interval == 0) {
leb128() segment_interval;
}
if (param_definition_type == PARAMETER_DEFINITION_MIX_GAIN) {
leb128() animation_type;
mix_gain_parameter_data(animation_type);
}
if (param_definition_type == PARAMETER_DEFINITION_DEMIXING) {
demixing_info_parameter_data();
}
if (param_definition_type == PARAMETER_DEFINITION_RECON_GAIN) {
recon_gain_info_parameter_data();
}
}
}
```
<b>Semantics</b>
<dfn value noexport for="parameter_block_obu()">parameter_id</dfn> shall indicate the unique ID that is associated with a specific parameter definition. All parameter blocks that provide data for that parameter definition shall have the same parameter_id.
<dfn noexport>duration</dfn> shall specify the duration for which this parameter block is valid and applicable.
<dfn noexport>num_segments</dfn> shall specify the number of different sets of parameter values specified in this parameter block, where each set describes a different segment of the timeline, contiguously.
<dfn noexport>constant_segment_interval</dfn> shall specify the interval of each segment, in the case where all segments except the last segment have equal intervals. If all segments except the last segment do not have equal intervals, the value of constant_segment_interval shall be set to 0.
<dfn noexport>get_param_definition_type()</dfn> is a run-time function to get the parameter definition type mapped to the parameter_id.
Audio Element OBU and/or Mix Presentation OBU is mapping a parameter_id to the parameter definition type. So, IA decoders can know the definition type mapped to the parameter_id.
<dfn noexport>segment_interval</dfn> shall specify the interval for the given segment.
Each value of [=duration=], [=constant_segment_interval=] and [=segment_interval=] shall be expressed as the number of ticks at the rate indicated by the time base specified in the corresponding parameter definition.
- When it defines <dfn noexport>D</dfn> = the value of [=duration=], <dfn noexport>NS</dfn> = the value of [=num_segments=], <dfn noexport>CSI</dfn> = the value of [=constant_segment_interval=] and <dfn noexport>SI</dfn> = the value of [=segment_interval=].
- When [=CSI=] != 0, [=NS=] x [=CSI=] shall be equal to or greater than [=D=].
- If [=NS=] x [=CSI=] > [=D=], the actual interval of the last segment shall be [=D=] - ([=NS=] - 1) x [=CSI=].
- When [=CSI=] = 0, the summation of all [=SI=]s in this parameter block shall be equal to [=D=].
<dfn noexport>animation_type</dfn> specifies the type of animation applied to the parameter values in this parameter block.
<pre class = "def">
animation_type : Animation Type
0 : STEP
1 : LINEAR
2 : BEZIER
</pre>
Classes that take [=animation_type=] as an input argument must use the <dfn noexport>AnimatedParameterData()</dfn> function template. The method of applying the animation is described in [[#processing-animated-params]].
```
template <class T>
class AnimatedParameterData(animation_type) {
if (animation_type == STEP) {
T start_point_value;
}
if (animation_type == LINEAR) {
T start_point_value;
T end_point_value;
}
if (animation_type == BEZIER) {
T start_point_value;
T end_point_value;
T control_point_value;
unsigned int (8) control_point_relative_time;
}
}
```
<dfn noexport>start_point_value</dfn> shall specify the parameter value that is applied at the start of the segment.
<dfn noexport>end_point_value</dfn> shall specify the parameter value that is applied at the end of the segment.
<dfn noexport>control_point_value</dfn> shall specify the parameter value of the middle control point of a quadratic Bezier curve, i.e. its y-axis value.
<dfn noexport>control_point_relative_time</dfn> shall specify the time of the middle control point of a quadratic Bezier curve, i.e. its x-axis value. This value is expressed as a fraction of the parameter segment interval with valid values in the range of 0 and 1, inclusively. A value equal to 0 or 1 shall indicate that this animation implements a linear Bezier curve, in which case control_point_value shall be ignored by the IA parser. It is stored as an 8-bit, unsigned, fixed-point value with 8 fractional bits (i.e. Q0.8 in [[!Q-Format]]).
#### Parameter Definition Syntax and Semantics #### {#parameter-definition}
Parameter definition classes shall inherit from the abstract <dfn noexport>ParamDefinition()</dfn> class. They may optionally further provide default parameter values, which are applied when there are no parameter blocks available.
<b>Syntax</b>
```
abstract class ParamDefinition() {
leb128() parameter_id;
leb128() time_base;
}
```
<b>Semantics</b>
<dfn value noexport for="ParamDefinition()">parameter_id</dfn> shall indicate the unique ID in an IA sequence for a given parameter.
<dfn value noexport for="ParamDefinition()">time_base</dfn> shall specify the time base used by this parameter, expressed as seconds per tick. Time-related fields associated with this parameter, such as durations and intervals, shall be expressed in the number of ticks.
### Audio Frame OBU Syntax and Semantics ### {#obu-audioframe}
This section specifies OBU payloads of OBU_IA_Audio_Frame and OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID21.
The first 22 audio substreams in an IA sequence may use the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID21, which have predefined audio substream IDs associated with them. This avoids the need to manually specify an audio_substream_id.
<b>Syntax</b>
```
class audio_frame_obu_with_no_id() {
leb128() audio_substream_id;
audio_frame_obu(audio_substream_id);
}
```
```
class audio_frame_obu(audio_substream_id) {
unsigned int (8*coded_frame_size) audio_frame();
}
```
<b>Semantics</b>
<dfn value noexport for="audio_frame_obu_with_no_id()">audio_substream_id</dfn> shall indicate a unique ID in an IA sequence for a given substream. All Audio Frame OBUs of the same substream shall have the same audio_substream_id.
This value must be greater or equal to 22, in order to avoid collision with the reserved IDs for the OBU types OBU_IA_Audio_Frame_ID0 to OBU_IA_Audio_Frame_ID21.
<dfn noexport>coded_frame_size</dfn> is the size of [=audio_frame()=] in bytes.
<dfn noexport>audio_frame()</dfn> is the raw coded audio data for the frame. It shall be [=opus packet=] of [[!RFC6716]] for IAC-OPUS, [=raw_data_block()=] of [[!AAC]] for IAC-AAC-LC and [=FRAME=] of [[!FLAC]] for IAC-FLAC.
For IAC-LPCM, [=audio_frame()=] shall be LPCM samples. When more than one byte is used to represent a LPCM sample, the byte order shall be in little endian.