You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even though SAMv1.pdf §4.2 limits (via bounds on l_text) the textual size of headers in BAM files to 2 GB, traditionally samtools has treated this field as a uint32_t and been able to read and write BAM files with headers up to 4 GB.
See e.g. samtools/samtools#67 which noted that 0.1.19 had introduced an error when trying to write out such files whereas samtools up to 0.1.18 completed without error. HTSlib-based samtools of the time fixed this problem in 0.2.0-rc4 and samtools versions from 1.0 have been able to read and write BAM files with headers up to 4 GB. See also samtools/samtools#1613 and samtools/hts-specs#460 (comment) which describes this as a “quality-of-implementation” feature of htslib/samtools. To be sure, people working with tens of millions of reference sequences (who thus have >2GB of text headers) are pushing the envelope, but samtools has historically supported them in their endeavours.
(In analysing this history now, I have discovered that — as least as compiled with today's compilers — samtools 0.1.18 and earlier completed without error when writing such BAM files but also did not produce valid output files! It is hard to believe that samtools/samtools#67's reporters did not notice this…)
This changed with the new header API in HTSlib 1.10 due to 62f9909, which adds a 2 GB check when writing BAM files out. (This check is “artificial” — there is no related internal implementation limit, and reading in BAM files with larger headers continues to work.)
This removal of existing samtools functionality was not mentioned in either HTSlib's or samtools's NEWS files. (Probably it was not realised that the existing behaviour for 2 GB … 4 GB existed.)
While this BAM spec extension potentially causes interoperability issues with other BAM implementations, it also enables these samtools users to do their work with rougher assemblies.
IMHO either the previous functionality should be restored or the new limitation needs to be called out in NEWS.
The text was updated successfully, but these errors were encountered:
This isn't permitted by the BAM specification, but was accepted by
earlier htslib release. 62f9909 added code to check the maximum
length. This now has a warning at 2GB and the hard-failure at 4GB.
Fixessamtools#1420. Fixessamtools/samtools#1613
This isn't permitted by the BAM specification, but was accepted by
earlier htslib release. 62f9909 added code to check the maximum
length. This now has a warning at 2GB and the hard-failure at 4GB.
Fixes#1420. Fixessamtools/samtools#1613
Even though SAMv1.pdf §4.2 limits (via bounds on
l_text
) the textual size of headers in BAM files to 2 GB, traditionallysamtools
has treated this field as auint32_t
and been able to read and write BAM files with headers up to 4 GB.See e.g. samtools/samtools#67 which noted that 0.1.19 had introduced an error when trying to write out such files whereas samtools up to 0.1.18 completed without error. HTSlib-based samtools of the time fixed this problem in 0.2.0-rc4 and samtools versions from 1.0 have been able to read and write BAM files with headers up to 4 GB. See also samtools/samtools#1613 and samtools/hts-specs#460 (comment) which describes this as a “quality-of-implementation” feature of htslib/samtools. To be sure, people working with tens of millions of reference sequences (who thus have >2GB of text headers) are pushing the envelope, but samtools has historically supported them in their endeavours.
(In analysing this history now, I have discovered that — as least as compiled with today's compilers — samtools 0.1.18 and earlier completed without error when writing such BAM files but also did not produce valid output files! It is hard to believe that samtools/samtools#67's reporters did not notice this…)
This changed with the new header API in HTSlib 1.10 due to 62f9909, which adds a 2 GB check when writing BAM files out. (This check is “artificial” — there is no related internal implementation limit, and reading in BAM files with larger headers continues to work.)
This removal of existing samtools functionality was not mentioned in either HTSlib's or samtools's
NEWS
files. (Probably it was not realised that the existing behaviour for 2 GB … 4 GB existed.)While this BAM spec extension potentially causes interoperability issues with other BAM implementations, it also enables these samtools users to do their work with rougher assemblies.
IMHO either the previous functionality should be restored or the new limitation needs to be called out in
NEWS
.The text was updated successfully, but these errors were encountered: