Skip to content

Releases: TileDB-Inc/TileDB-VCF

0.8.3

01 Mar 18:25
Compare
Choose a tag to compare

Changes:

  • 6e6691d Add python api parameter for skip_check_samples
  • 0d879c7 Make check samples default to on
  • 1032437 Use fixed buffer budget for VCF headers
  • 4c90c0d Adjust info/fmt field mapping to be lazily loaded
  • e2a9d4f Lazily fetch all sample names for v4 arrays
  • 292abca Update Java/Spark versions for 0.8.3 release
  • 13f73ef Merge pull request #283 from TileDB-Inc/ihn/disable-libtiledb-tests
  • 38bbcd9 Don't build libtiledb tests by default

This list of changes was auto generated.

0.8.2

23 Feb 19:16
Compare
Choose a tag to compare

Changes:

  • 0339c43 Update TileDB to 2.2.4
  • 694f344 Merge pull request #280 from TileDB-Inc/ss/update-spark-java-0.8.2
  • 7d34103 Update Java/Spark versions for 0.8.2 release
  • b897935 Merge pull request #269 from TileDB-Inc/ss/allow-user-setting-tile-cache-buffer-percentages
  • bc1309a Merge pull request #271 from TileDB-Inc/ss/add-java-gc-hint
  • d72ee69 Merge pull request #278 from TileDB-Inc/sethshelnutt/ch5387/optimize-range-merging-for-exit-early
  • 4592851 Add tile cache/buffer size percentage setters
  • 290198d Merge pull request #279 from TileDB-Inc/sethshelnutt/ch5449/502-error-when-reading-from-vcf-array-via
  • 7ff0683 Always reset the TileDB Query buffers
  • 13e1ff0 Merge pull request #275 from TileDB-Inc/ss/recompute-buffer-after-memory-each-setter-call
See More
  • 3a848ca Merge pull request #274 from TileDB-Inc/ss/add-verbose-details-on-double-buffering
  • 58b2fd2 Merge pull request #273 from TileDB-Inc/ss/set-query-config-after-reset
  • f2b4928 Merge pull request #270 from TileDB-Inc/ss/add-percentage-cells-complete-verbose
  • cfe5e50 Merge pull request #272 from TileDB-Inc/ss/clear-c++-buffer-on-read-complete
  • 81364e8 Merge pull request #276 from TileDB-Inc/ss/tiledb-query-timing-verbose
  • 72cb826 Add % of query processed in verbose output
  • a21a593 Limit range merge comparisons
  • 4b9acfd Add query timing to verbose output
  • e3ac72a Always recompute the memory sizes after budget set
  • 52a80d7 Add verbose message on double buffering
  • 7efbf32 Set config on query after reset
  • 0b0fe47 Preemptively clear buffers when read is complete
  • 0855d5d Add GC hint that there is items to garbage collect
  • 8f39bc0 Merge pull request #267 from TileDB-Inc/ss/add-checking-around-map-lookups
  • a9c75ae Merge pull request #268 from TileDB-Inc/ss/reader-memory-budget-verbose
  • 7c6f264 Add verbose output to memory budget calculations
  • 51ac85f Add proper checking for map lookups

This list of changes was auto generated.

0.8.1

09 Feb 18:50
Compare
Choose a tag to compare

Notable

This release includes several important changes to the way TileDB-VCF allocates the configured memory:

  1. Tile cache size is now allotted 10% of the total memory budget.
  2. The remainder of the budget is then split 75/25 between TileDB Embedded and TileDB-VCF itself. Previously this split was 50/50.
  3. For the APIs memory is now split 33/66 between their own buffers and the TileDB-VCF C++ code to increase the memory available for TileDB queries.

Core

Changed

  • Update TileDB Embedded to 2.2.3 (#254)
  • TileDB memory budget parameters are now attached directly to the query (#255)
  • Forward CMake variable TileDB_DIR to the TileDB EP so it's used (#259)
  • Skew memory budget towards core library (#261)
  • APIs allocate more memory to the C++ memory (#262)
  • htslib now directly manages counts of info/fmt field (#263)

Python

Changed

  • Build Chunked Arrow Table in _read_partition() (#226)

0.8.0

26 Jan 21:04
Compare
Choose a tag to compare

Notable

This updates the version of TileDB Embedded to 2.2.2.

0.7.2

25 Jan 20:03
40228d5
Compare
Choose a tag to compare

Notable

This release includes several updates to the Python API related to TileDB-VCF dataset creation and sample ingestion. See the Python API reference for complete documentation about these new options.

Core

Changed

  • Double buffering is now only used for queries larger than 200Mb (#250)

Python

Added

  • Gained option to list only materialized attributes (#249)
  • New create_dataset() method that exposes arguments for customizing the creation of new TileDB-VCF datasets (#243)
  • New unit test added for validating larger exports (#220)

Changed

  • The ingest_samples() method gains several new arguments for customizing the number of threads used, thread task size, memory budget, and max buffer size during sample ingestion (#243)

Spark

Added

  • Gained option list materialized attributes (#249)
  • Add verbose output and logging of buffer sizes (#247)

0.7.1

06 Jan 17:39
Compare
Choose a tag to compare

Notable

This release adds the ability to directly ingest VCF/BCF files from remote object stores so it's no longer necessary to allocate temporary scratch space and download them first.

Core

Added

  • Added support for reading samples remote directly (#229)

Changed

  • TileDB was updated to 2.1.6 (#239)
  • CI native library builds are no longer conditioned on the success of build stage (#238)

Docker

Added

  • The CLI Dockerfile now utilizes multi-stage builds to significantly reduce the image size (#136)

v0.7.0

30 Dec 20:42
Compare
Choose a tag to compare

Notable

This release introduces a new schema (v4) for storing VCF data with TileDB. Variants are now stored in a 3D sparse array, indexed by contig, start_pos, and sample. The contig and sample dimensions are now of type TILEDB_STRING_ASCII and leverage optimizations in TileDB core for string coordinates.

Note: Arrays created using previous schemas are still fully supported.

Complete documentation about the new schema is available here.

A number of other note-worthy improvements are included in this release:

  • New tiledbvcf utils sub command for consolidating and vacuuming fragment metadata
  • Removal of registration phase, which no longer needed 🎉

Core

Changed

  • Switch to 3D array and row-major layout (#194)
  • Use prebuilt artifacts by default for TileDB (#201)
  • Create versioned methods for init_for_reads/next_read_batch (#206)
  • Only fetch vcf headers on a new read sample batch (#207)
  • Sort v4 regions in lexicographical order for writes (#208)
  • Remove contig check from process_query_results_v4() (#210)
  • htslib was updated to 1.10 (#230)
  • Support TileDB 2.2 C++ result estimate API changes (#209)
  • Handle all sample export for v4 with optimal range (#221)
  • Don't batch v4 reads by samples (#215)
  • V4 writes should split fragments by contig (#216)
  • Update TileDB to v2.1.4 (#225)

Fixed

  • Add support for ASAN and fix leak in htslib plugin (#181)
  • Fix leaking of hfile* (#197)
  • Reduce the number of times queries open an array (#204)
  • Optimize v2/v3 read queries (#205)
  • Set the contig for ingestion on seek of VCF file to avoid records seeking beyond a contig's boundaries (#218)

Added

  • Added a new memory budget parameter for setting the max size of the TileDB query buffer for write (#217)
  • Add option to disable including vcf header stats (#213)
  • The number of writes performed during each ingestion batch is now included in the verbose output (#219)

Python

Changed

  • Format with black (#199)
  • Add lint check to CI (#202)

Fixed

  • Sort pandas dataframes for unordered comparison in unit tests (#223)

0.7.0 Beta 3

30 Dec 20:42
a7f86da
Compare
Choose a tag to compare
0.7.0 Beta 3 Pre-release
Pre-release

Changes:

  • a7f86da Merge pull request #225 from TileDB-Inc/ss/tiledb-2.1.4
  • ec05524 Update to TileDB 2.1.4
  • 3bec8c9 Merge pull request #216 from TileDB-Inc/sethshelnutt/ch4458/ingestion-in-fragments-split-by-contig
  • 4f05c04 V4 writes should split fragments by contig
  • ce113d2 Merge pull request #215 from TileDB-Inc/sethshelnutt/ch4459/remove-sample-batching-from-reads
  • 48e1e0a Merge pull request #217 from TileDB-Inc/ss/add-ingestion-query-buffer-option
  • 1d6d160 Merge pull request #213 from TileDB-Inc/ss/option-disable-vcf-header-stats
  • c28816d Add CLI option for max TileDB buffer size
  • 4378baa Merge pull request #219 from TileDB-Inc/ss/add-verbose-message-ingestion
  • 2a40ed2 Merge pull request #221 from TileDB-Inc/ss/handle-special-case-all-sample-export
See More
  • f1df5f3 Handle all sample export for v4 with optimal range
  • 8de6315 Reset pandas index for python unordered tests
  • a674f7e Merge pull request #223 from TileDB-Inc/ss/python-test-handle-unordered-results
  • 60a775b Merge pull request #218 from TileDB-Inc/ss/store-contig-vcf-ingestion
  • 06fd841 Sort pandas df's for unordered comparison
  • a93c63c Merge pull request #222 from TileDB-Inc/revert-203-ss/workaround-osx-ci-homebrew-openssl
  • 2ed3f3b Revert "Workaround homebrew openssl error for CI"
  • 05570a6 Fix writer init with parameters
  • 00aee70 Add option for enable/disable of vcf header stats
  • 528f0e3 Merge pull request #210 from TileDB-Inc/ss/optimize-v4-processing-with-single-contig-query
  • 8a4ae80 Add write message for verbose ingestion
  • b6e9585 Set the contig for ingestion on seek of VCF file
  • 0b63c64 Don't batch v4 reads by samples
  • c5ae2ea Remove contig check from process_query_results_v4
  • 93ed1d3 Merge pull request #209 from TileDB-Inc/ss/support-tiledb-2.2-est-result-api-change
  • c384d5d Support TileDB 2.2 C++ result estimate API changes
  • e3d50b6 Merge pull request #212 from TileDB-Inc/revert-198-vn/build-chunked-arrow-table
  • 0e163ab Revert "Build Chunked Arrow Table in _read_partition()"
  • f75d34b Merge pull request #198 from TileDB-Inc/vn/build-chunked-arrow-table
  • d75b35d Merge pull request #208 from TileDB-Inc/sethshelnutt/ch4407/3d-array-ingestion-need-to-write-in-lexigraphical
  • 63ad03b Sort regions in lexicographical order for v4 writes
  • 975dad3 Merge pull request #207 from TileDB-Inc/ss/only-fetch-header-on-sample-batch-change
  • b199c0f Only fetch vcf headers on a new read sample batch
  • 9c077d5 Merge pull request #206 from TileDB-Inc/ss/split-read-path-v4
  • 6983823 Split init_for_reads/next_read_batch into versions
  • c16499a Merge pull request #204 from TileDB-Inc/ss/reduce-array-opens
  • 71314c5 Merge pull request #202 from TileDB-Inc/ss/add-python-format-linting
  • 5117f69 Merge pull request #197 from TileDB-Inc/ss/hfile-destroy
  • bfa68ce Merge pull request #205 from TileDB-Inc/ss/v2-v3-query-optimizations
  • 31dd235 Optimize v2/v3 read queries
  • c448dd8 Merge pull request #201 from TileDB-Inc/ss/prebuilt-tiledb
  • 3e80dd4 Merge branch 'master' into vn/build-chunked-arrow-table
  • 323b04e Fix leaking of hfile* on file open errors
  • e16baf2 Modify read() and continue_read() To Take return_type Arg
  • f9ddd66 Add build matrix to test building tiledb from source
  • da55340 Use prebuilt artifacts by default for TileDB
  • 90695d6 Format additional python scripts
  • bc2daea Add python format and linting check
  • 075188c General cleanup of unneeded parameters in Dataset
  • 17e264f Share open data array w/reader and dataset classes
  • 14f4eef Don't reopen the array on every read query
  • defe57e Add data/vcf array pointers to Dataset class
  • b74a06e Merge pull request #203 from TileDB-Inc/ss/workaround-osx-ci-homebrew-openssl
  • f8c2107 Workaround homebrew openssl error for CI
  • 99e85c6 Build Chunked Arrow Table in _read_partition()
  • 2afe1dd Merge pull request #199 from TileDB-Inc/ss/format-python
  • b1422d0 Format with black
  • dc64ce9 ASAN fixes
  • 09bb52c Adjust writer init to allow setting config
  • c549054 Fix region sorting for reads
  • ef3631d Fix edge case in region overflow
  • dc4cf3f Only register samples for v2/v3 datasets in python
  • 4747b4d Switch writer to own a dataset unique_ptr
  • e071db3 Remove registration for v4 arrays
  • 2ea9a69 New v4 test arrays
  • b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
  • b3b4f84 Swich v4 record writting to avoid shared_ptr copy
  • febceb5 General cleanup
  • 9b75294 Cleanup
  • 20c2311 Update V4 unit test arrays
  • 783a9ea Switch dimension ordering to contig,start_pos,sample
  • f81abc3 Switch order of contig and start_pos dimensions
  • e1c7df3 Query ranges by contig
  • 6a732b8 Fix coalescing ranges in v4
  • f0bbbc1 Optimize contig map lookup for processing
  • 7888678 Update python test ordering
  • fdfd77f Fix ordering in cli tests
  • e86c0f9 Workaround est results returning 0 for offests
  • 5895cd9 Update unit-vcf-export tests based on new ordering of partitions
  • 1f47a55 Fix report_cell from rebase
  • 021a37e Test updates for more order issues, pt.3
  • 0c78652 Test updates for more order issues, pt:2
  • 01e0214 Test updates for more order issues
  • 083c0cc Improve run-cli test with context output
  • 5067a8e Switch sample to a string dimension
  • 3239a74 Set java/spark version for 0.7.0-SNAPSHOT
  • 7b50da8 Cleanup
  • 53eda54 Binary search over contig regions and exit early
  • 282ef97 Fix warning for different sign comparison
  • 3028d95 Don't use contig index hash, might be too expensive
  • 44df482 Switch to using point ranges for all sample ids
  • 6fb7cf9 Switch to do not report logic for intersection
  • c3c5951 Add new v4 dataset schema with 3D array.
  • d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
  • 66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
  • 28dbb0e Fix context leak in htslib plugin setup
  • 81c2b21 Add support for ASAN

This list of changes was auto generated.

v0.6.2

17 Dec 16:17
Compare
Choose a tag to compare

This is a minor release that includes a fix for the performance regression introduced in v0.6.0 and present in v0.6.1. Please note, this is the last version of TileDB-VCF that will create datasets using the current v3 array schema. The next version (v0.7) will introduce a new schema (v4). If you are planning to perform a large ingestion in the near future, we recommend postponing until this new version is released in the coming weeks.

Core

Fixed

  • Optimize v2/v3 array reads by exiting intersection check early when moving past record end position #205

0.7.0 Beta 2

25 Nov 21:36
Compare
Choose a tag to compare
0.7.0 Beta 2 Pre-release
Pre-release

Changes:

  • dc64ce9 ASAN fixes
  • 09bb52c Adjust writer init to allow setting config
  • c549054 Fix region sorting for reads
  • ef3631d Fix edge case in region overflow
  • dc4cf3f Only register samples for v2/v3 datasets in python
  • 4747b4d Switch writer to own a dataset unique_ptr
  • e071db3 Remove registration for v4 arrays
  • 2ea9a69 New v4 test arrays
  • b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
  • b3b4f84 Swich v4 record writting to avoid shared_ptr copy
See More
  • febceb5 General cleanup
  • 9b75294 Cleanup
  • 20c2311 Update V4 unit test arrays
  • 783a9ea Switch dimension ordering to contig,start_pos,sample
  • f81abc3 Switch order of contig and start_pos dimensions
  • e1c7df3 Query ranges by contig
  • 6a732b8 Fix coalescing ranges in v4
  • f0bbbc1 Optimize contig map lookup for processing
  • 7888678 Update python test ordering
  • fdfd77f Fix ordering in cli tests
  • e86c0f9 Workaround est results returning 0 for offests
  • 5895cd9 Update unit-vcf-export tests based on new ordering of partitions
  • 1f47a55 Fix report_cell from rebase
  • 021a37e Test updates for more order issues, pt.3
  • 0c78652 Test updates for more order issues, pt:2
  • 01e0214 Test updates for more order issues
  • 083c0cc Improve run-cli test with context output
  • 5067a8e Switch sample to a string dimension
  • 3239a74 Set java/spark version for 0.7.0-SNAPSHOT
  • 7b50da8 Cleanup
  • 53eda54 Binary search over contig regions and exit early
  • 282ef97 Fix warning for different sign comparison
  • 3028d95 Don't use contig index hash, might be too expensive
  • 44df482 Switch to using point ranges for all sample ids
  • 6fb7cf9 Switch to do not report logic for intersection
  • c3c5951 Add new v4 dataset schema with 3D array.
  • d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
  • 66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
  • 28dbb0e Fix context leak in htslib plugin setup
  • 81c2b21 Add support for ASAN

This list of changes was auto generated.