Skip to content

v0.7.0

Compare
Choose a tag to compare
@Shelnutt2 Shelnutt2 released this 30 Dec 20:42

Notable

This release introduces a new schema (v4) for storing VCF data with TileDB. Variants are now stored in a 3D sparse array, indexed by contig, start_pos, and sample. The contig and sample dimensions are now of type TILEDB_STRING_ASCII and leverage optimizations in TileDB core for string coordinates.

Note: Arrays created using previous schemas are still fully supported.

Complete documentation about the new schema is available here.

A number of other note-worthy improvements are included in this release:

  • New tiledbvcf utils sub command for consolidating and vacuuming fragment metadata
  • Removal of registration phase, which no longer needed 🎉

Core

Changed

  • Switch to 3D array and row-major layout (#194)
  • Use prebuilt artifacts by default for TileDB (#201)
  • Create versioned methods for init_for_reads/next_read_batch (#206)
  • Only fetch vcf headers on a new read sample batch (#207)
  • Sort v4 regions in lexicographical order for writes (#208)
  • Remove contig check from process_query_results_v4() (#210)
  • htslib was updated to 1.10 (#230)
  • Support TileDB 2.2 C++ result estimate API changes (#209)
  • Handle all sample export for v4 with optimal range (#221)
  • Don't batch v4 reads by samples (#215)
  • V4 writes should split fragments by contig (#216)
  • Update TileDB to v2.1.4 (#225)

Fixed

  • Add support for ASAN and fix leak in htslib plugin (#181)
  • Fix leaking of hfile* (#197)
  • Reduce the number of times queries open an array (#204)
  • Optimize v2/v3 read queries (#205)
  • Set the contig for ingestion on seek of VCF file to avoid records seeking beyond a contig's boundaries (#218)

Added

  • Added a new memory budget parameter for setting the max size of the TileDB query buffer for write (#217)
  • Add option to disable including vcf header stats (#213)
  • The number of writes performed during each ingestion batch is now included in the verbose output (#219)

Python

Changed

  • Format with black (#199)
  • Add lint check to CI (#202)

Fixed

  • Sort pandas dataframes for unordered comparison in unit tests (#223)