Releases: TileDB-Inc/TileDB-VCF
0.8.3
Changes:
- 6e6691d Add python api parameter for skip_check_samples
- 0d879c7 Make check samples default to on
- 1032437 Use fixed buffer budget for VCF headers
- 4c90c0d Adjust info/fmt field mapping to be lazily loaded
- e2a9d4f Lazily fetch all sample names for v4 arrays
- 292abca Update Java/Spark versions for 0.8.3 release
- 13f73ef Merge pull request #283 from TileDB-Inc/ihn/disable-libtiledb-tests
- 38bbcd9 Don't build libtiledb tests by default
This list of changes was auto generated.
0.8.2
Changes:
- 0339c43 Update TileDB to 2.2.4
- 694f344 Merge pull request #280 from TileDB-Inc/ss/update-spark-java-0.8.2
- 7d34103 Update Java/Spark versions for 0.8.2 release
- b897935 Merge pull request #269 from TileDB-Inc/ss/allow-user-setting-tile-cache-buffer-percentages
- bc1309a Merge pull request #271 from TileDB-Inc/ss/add-java-gc-hint
- d72ee69 Merge pull request #278 from TileDB-Inc/sethshelnutt/ch5387/optimize-range-merging-for-exit-early
- 4592851 Add tile cache/buffer size percentage setters
- 290198d Merge pull request #279 from TileDB-Inc/sethshelnutt/ch5449/502-error-when-reading-from-vcf-array-via
- 7ff0683 Always reset the TileDB Query buffers
- 13e1ff0 Merge pull request #275 from TileDB-Inc/ss/recompute-buffer-after-memory-each-setter-call
See More
- 3a848ca Merge pull request #274 from TileDB-Inc/ss/add-verbose-details-on-double-buffering
- 58b2fd2 Merge pull request #273 from TileDB-Inc/ss/set-query-config-after-reset
- f2b4928 Merge pull request #270 from TileDB-Inc/ss/add-percentage-cells-complete-verbose
- cfe5e50 Merge pull request #272 from TileDB-Inc/ss/clear-c++-buffer-on-read-complete
- 81364e8 Merge pull request #276 from TileDB-Inc/ss/tiledb-query-timing-verbose
- 72cb826 Add % of query processed in verbose output
- a21a593 Limit range merge comparisons
- 4b9acfd Add query timing to verbose output
- e3ac72a Always recompute the memory sizes after budget set
- 52a80d7 Add verbose message on double buffering
- 7efbf32 Set config on query after reset
- 0b0fe47 Preemptively clear buffers when read is complete
- 0855d5d Add GC hint that there is items to garbage collect
- 8f39bc0 Merge pull request #267 from TileDB-Inc/ss/add-checking-around-map-lookups
- a9c75ae Merge pull request #268 from TileDB-Inc/ss/reader-memory-budget-verbose
- 7c6f264 Add verbose output to memory budget calculations
- 51ac85f Add proper checking for map lookups
This list of changes was auto generated.
0.8.1
Notable
This release includes several important changes to the way TileDB-VCF allocates the configured memory:
- Tile cache size is now allotted 10% of the total memory budget.
- The remainder of the budget is then split 75/25 between TileDB Embedded and TileDB-VCF itself. Previously this split was 50/50.
- For the APIs memory is now split 33/66 between their own buffers and the TileDB-VCF C++ code to increase the memory available for TileDB queries.
Core
Changed
- Update TileDB Embedded to 2.2.3 (#254)
- TileDB memory budget parameters are now attached directly to the query (#255)
- Forward CMake variable
TileDB_DIR
to the TileDB EP so it's used (#259) - Skew memory budget towards core library (#261)
- APIs allocate more memory to the C++ memory (#262)
- htslib now directly manages counts of
info
/fmt
field (#263)
Python
Changed
- Build Chunked Arrow Table in
_read_partition()
(#226)
0.8.0
0.7.2
Notable
This release includes several updates to the Python API related to TileDB-VCF dataset creation and sample ingestion. See the Python API reference for complete documentation about these new options.
Core
Changed
- Double buffering is now only used for queries larger than 200Mb (#250)
Python
Added
- Gained option to list only materialized attributes (#249)
- New
create_dataset()
method that exposes arguments for customizing the creation of new TileDB-VCF datasets (#243) - New unit test added for validating larger exports (#220)
Changed
- The
ingest_samples()
method gains several new arguments for customizing the number of threads used, thread task size, memory budget, and max buffer size during sample ingestion (#243)
Spark
Added
0.7.1
Notable
This release adds the ability to directly ingest VCF/BCF files from remote object stores so it's no longer necessary to allocate temporary scratch space and download them first.
Core
Added
- Added support for reading samples remote directly (#229)
Changed
- TileDB was updated to 2.1.6 (#239)
- CI native library builds are no longer conditioned on the success of build stage (#238)
Docker
Added
- The CLI Dockerfile now utilizes multi-stage builds to significantly reduce the image size (#136)
v0.7.0
Notable
This release introduces a new schema (v4) for storing VCF data with TileDB. Variants are now stored in a 3D sparse array, indexed by contig
, start_pos
, and sample
. The contig
and sample
dimensions are now of type TILEDB_STRING_ASCII
and leverage optimizations in TileDB core for string coordinates.
Note: Arrays created using previous schemas are still fully supported.
Complete documentation about the new schema is available here.
A number of other note-worthy improvements are included in this release:
- New
tiledbvcf utils
sub command for consolidating and vacuuming fragment metadata - Removal of registration phase, which no longer needed 🎉
Core
Changed
- Switch to 3D array and row-major layout (#194)
- Use prebuilt artifacts by default for TileDB (#201)
- Create versioned methods for
init_for_reads
/next_read_batch
(#206) - Only fetch vcf headers on a new read sample batch (#207)
- Sort v4 regions in lexicographical order for writes (#208)
- Remove contig check from
process_query_results_v4()
(#210) - htslib was updated to 1.10 (#230)
- Support TileDB 2.2 C++ result estimate API changes (#209)
- Handle all sample export for v4 with optimal range (#221)
- Don't batch v4 reads by samples (#215)
- V4 writes should split fragments by contig (#216)
- Update TileDB to v2.1.4 (#225)
Fixed
- Add support for ASAN and fix leak in htslib plugin (#181)
- Fix leaking of hfile* (#197)
- Reduce the number of times queries open an array (#204)
- Optimize v2/v3 read queries (#205)
- Set the contig for ingestion on seek of VCF file to avoid records seeking beyond a contig's boundaries (#218)
Added
- Added a new memory budget parameter for setting the max size of the TileDB query buffer for write (#217)
- Add option to disable including vcf header stats (#213)
- The number of writes performed during each ingestion batch is now included in the verbose output (#219)
Python
Changed
Fixed
- Sort pandas
dataframes
for unordered comparison in unit tests (#223)
0.7.0 Beta 3
Changes:
- a7f86da Merge pull request #225 from TileDB-Inc/ss/tiledb-2.1.4
- ec05524 Update to TileDB 2.1.4
- 3bec8c9 Merge pull request #216 from TileDB-Inc/sethshelnutt/ch4458/ingestion-in-fragments-split-by-contig
- 4f05c04 V4 writes should split fragments by contig
- ce113d2 Merge pull request #215 from TileDB-Inc/sethshelnutt/ch4459/remove-sample-batching-from-reads
- 48e1e0a Merge pull request #217 from TileDB-Inc/ss/add-ingestion-query-buffer-option
- 1d6d160 Merge pull request #213 from TileDB-Inc/ss/option-disable-vcf-header-stats
- c28816d Add CLI option for max TileDB buffer size
- 4378baa Merge pull request #219 from TileDB-Inc/ss/add-verbose-message-ingestion
- 2a40ed2 Merge pull request #221 from TileDB-Inc/ss/handle-special-case-all-sample-export
See More
- f1df5f3 Handle all sample export for v4 with optimal range
- 8de6315 Reset pandas index for python unordered tests
- a674f7e Merge pull request #223 from TileDB-Inc/ss/python-test-handle-unordered-results
- 60a775b Merge pull request #218 from TileDB-Inc/ss/store-contig-vcf-ingestion
- 06fd841 Sort pandas df's for unordered comparison
- a93c63c Merge pull request #222 from TileDB-Inc/revert-203-ss/workaround-osx-ci-homebrew-openssl
- 2ed3f3b Revert "Workaround homebrew openssl error for CI"
- 05570a6 Fix writer init with parameters
- 00aee70 Add option for enable/disable of vcf header stats
- 528f0e3 Merge pull request #210 from TileDB-Inc/ss/optimize-v4-processing-with-single-contig-query
- 8a4ae80 Add write message for verbose ingestion
- b6e9585 Set the contig for ingestion on seek of VCF file
- 0b63c64 Don't batch v4 reads by samples
- c5ae2ea Remove contig check from process_query_results_v4
- 93ed1d3 Merge pull request #209 from TileDB-Inc/ss/support-tiledb-2.2-est-result-api-change
- c384d5d Support TileDB 2.2 C++ result estimate API changes
- e3d50b6 Merge pull request #212 from TileDB-Inc/revert-198-vn/build-chunked-arrow-table
- 0e163ab Revert "Build Chunked Arrow Table in _read_partition()"
- f75d34b Merge pull request #198 from TileDB-Inc/vn/build-chunked-arrow-table
- d75b35d Merge pull request #208 from TileDB-Inc/sethshelnutt/ch4407/3d-array-ingestion-need-to-write-in-lexigraphical
- 63ad03b Sort regions in lexicographical order for v4 writes
- 975dad3 Merge pull request #207 from TileDB-Inc/ss/only-fetch-header-on-sample-batch-change
- b199c0f Only fetch vcf headers on a new read sample batch
- 9c077d5 Merge pull request #206 from TileDB-Inc/ss/split-read-path-v4
- 6983823 Split init_for_reads/next_read_batch into versions
- c16499a Merge pull request #204 from TileDB-Inc/ss/reduce-array-opens
- 71314c5 Merge pull request #202 from TileDB-Inc/ss/add-python-format-linting
- 5117f69 Merge pull request #197 from TileDB-Inc/ss/hfile-destroy
- bfa68ce Merge pull request #205 from TileDB-Inc/ss/v2-v3-query-optimizations
- 31dd235 Optimize v2/v3 read queries
- c448dd8 Merge pull request #201 from TileDB-Inc/ss/prebuilt-tiledb
- 3e80dd4 Merge branch 'master' into vn/build-chunked-arrow-table
- 323b04e Fix leaking of hfile* on file open errors
- e16baf2 Modify read() and continue_read() To Take return_type Arg
- f9ddd66 Add build matrix to test building tiledb from source
- da55340 Use prebuilt artifacts by default for TileDB
- 90695d6 Format additional python scripts
- bc2daea Add python format and linting check
- 075188c General cleanup of unneeded parameters in Dataset
- 17e264f Share open data array w/reader and dataset classes
- 14f4eef Don't reopen the array on every read query
- defe57e Add data/vcf array pointers to Dataset class
- b74a06e Merge pull request #203 from TileDB-Inc/ss/workaround-osx-ci-homebrew-openssl
- f8c2107 Workaround homebrew openssl error for CI
- 99e85c6 Build Chunked Arrow Table in _read_partition()
- 2afe1dd Merge pull request #199 from TileDB-Inc/ss/format-python
- b1422d0 Format with black
- dc64ce9 ASAN fixes
- 09bb52c Adjust writer init to allow setting config
- c549054 Fix region sorting for reads
- ef3631d Fix edge case in region overflow
- dc4cf3f Only register samples for v2/v3 datasets in python
- 4747b4d Switch writer to own a dataset unique_ptr
- e071db3 Remove registration for v4 arrays
- 2ea9a69 New v4 test arrays
- b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
- b3b4f84 Swich v4 record writting to avoid shared_ptr copy
- febceb5 General cleanup
- 9b75294 Cleanup
- 20c2311 Update V4 unit test arrays
- 783a9ea Switch dimension ordering to contig,start_pos,sample
- f81abc3 Switch order of contig and start_pos dimensions
- e1c7df3 Query ranges by contig
- 6a732b8 Fix coalescing ranges in v4
- f0bbbc1 Optimize contig map lookup for processing
- 7888678 Update python test ordering
- fdfd77f Fix ordering in cli tests
- e86c0f9 Workaround est results returning 0 for offests
- 5895cd9 Update unit-vcf-export tests based on new ordering of partitions
- 1f47a55 Fix report_cell from rebase
- 021a37e Test updates for more order issues, pt.3
- 0c78652 Test updates for more order issues, pt:2
- 01e0214 Test updates for more order issues
- 083c0cc Improve run-cli test with context output
- 5067a8e Switch sample to a string dimension
- 3239a74 Set java/spark version for 0.7.0-SNAPSHOT
- 7b50da8 Cleanup
- 53eda54 Binary search over contig regions and exit early
- 282ef97 Fix warning for different sign comparison
- 3028d95 Don't use contig index hash, might be too expensive
- 44df482 Switch to using point ranges for all sample ids
- 6fb7cf9 Switch to do not report logic for intersection
- c3c5951 Add new v4 dataset schema with 3D array.
- d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
- 66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
- 28dbb0e Fix context leak in htslib plugin setup
- 81c2b21 Add support for ASAN
This list of changes was auto generated.
v0.6.2
This is a minor release that includes a fix for the performance regression introduced in v0.6.0 and present in v0.6.1. Please note, this is the last version of TileDB-VCF that will create datasets using the current v3 array schema. The next version (v0.7) will introduce a new schema (v4). If you are planning to perform a large ingestion in the near future, we recommend postponing until this new version is released in the coming weeks.
Core
Fixed
- Optimize v2/v3 array reads by exiting intersection check early when moving past record end position #205
0.7.0 Beta 2
Changes:
- dc64ce9 ASAN fixes
- 09bb52c Adjust writer init to allow setting config
- c549054 Fix region sorting for reads
- ef3631d Fix edge case in region overflow
- dc4cf3f Only register samples for v2/v3 datasets in python
- 4747b4d Switch writer to own a dataset unique_ptr
- e071db3 Remove registration for v4 arrays
- 2ea9a69 New v4 test arrays
- b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
- b3b4f84 Swich v4 record writting to avoid shared_ptr copy
See More
- febceb5 General cleanup
- 9b75294 Cleanup
- 20c2311 Update V4 unit test arrays
- 783a9ea Switch dimension ordering to contig,start_pos,sample
- f81abc3 Switch order of contig and start_pos dimensions
- e1c7df3 Query ranges by contig
- 6a732b8 Fix coalescing ranges in v4
- f0bbbc1 Optimize contig map lookup for processing
- 7888678 Update python test ordering
- fdfd77f Fix ordering in cli tests
- e86c0f9 Workaround est results returning 0 for offests
- 5895cd9 Update unit-vcf-export tests based on new ordering of partitions
- 1f47a55 Fix report_cell from rebase
- 021a37e Test updates for more order issues, pt.3
- 0c78652 Test updates for more order issues, pt:2
- 01e0214 Test updates for more order issues
- 083c0cc Improve run-cli test with context output
- 5067a8e Switch sample to a string dimension
- 3239a74 Set java/spark version for 0.7.0-SNAPSHOT
- 7b50da8 Cleanup
- 53eda54 Binary search over contig regions and exit early
- 282ef97 Fix warning for different sign comparison
- 3028d95 Don't use contig index hash, might be too expensive
- 44df482 Switch to using point ranges for all sample ids
- 6fb7cf9 Switch to do not report logic for intersection
- c3c5951 Add new v4 dataset schema with 3D array.
- d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
- 66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
- 28dbb0e Fix context leak in htslib plugin setup
- 81c2b21 Add support for ASAN
This list of changes was auto generated.