Releases · TileDB-Inc/TileDB-VCF

01 Mar 18:25

aaronwolen

0.8.3

6e6691d

0.8.3

Changes:

6e6691d Add python api parameter for skip_check_samples
0d879c7 Make check samples default to on
1032437 Use fixed buffer budget for VCF headers
4c90c0d Adjust info/fmt field mapping to be lazily loaded
e2a9d4f Lazily fetch all sample names for v4 arrays
292abca Update Java/Spark versions for 0.8.3 release
13f73ef Merge pull request #283 from TileDB-Inc/ihn/disable-libtiledb-tests
38bbcd9 Don't build libtiledb tests by default

This list of changes was auto generated.

Assets 6

23 Feb 19:16

aaronwolen

0.8.2

0339c43

0.8.2

Changes:

0339c43 Update TileDB to 2.2.4
694f344 Merge pull request #280 from TileDB-Inc/ss/update-spark-java-0.8.2
7d34103 Update Java/Spark versions for 0.8.2 release
b897935 Merge pull request #269 from TileDB-Inc/ss/allow-user-setting-tile-cache-buffer-percentages
bc1309a Merge pull request #271 from TileDB-Inc/ss/add-java-gc-hint
d72ee69 Merge pull request #278 from TileDB-Inc/sethshelnutt/ch5387/optimize-range-merging-for-exit-early
4592851 Add tile cache/buffer size percentage setters
290198d Merge pull request #279 from TileDB-Inc/sethshelnutt/ch5449/502-error-when-reading-from-vcf-array-via
7ff0683 Always reset the TileDB Query buffers
13e1ff0 Merge pull request #275 from TileDB-Inc/ss/recompute-buffer-after-memory-each-setter-call

See More

3a848ca Merge pull request #274 from TileDB-Inc/ss/add-verbose-details-on-double-buffering
58b2fd2 Merge pull request #273 from TileDB-Inc/ss/set-query-config-after-reset
f2b4928 Merge pull request #270 from TileDB-Inc/ss/add-percentage-cells-complete-verbose
cfe5e50 Merge pull request #272 from TileDB-Inc/ss/clear-c++-buffer-on-read-complete
81364e8 Merge pull request #276 from TileDB-Inc/ss/tiledb-query-timing-verbose
72cb826 Add % of query processed in verbose output
a21a593 Limit range merge comparisons
4b9acfd Add query timing to verbose output
e3ac72a Always recompute the memory sizes after budget set
52a80d7 Add verbose message on double buffering
7efbf32 Set config on query after reset
0b0fe47 Preemptively clear buffers when read is complete
0855d5d Add GC hint that there is items to garbage collect
8f39bc0 Merge pull request #267 from TileDB-Inc/ss/add-checking-around-map-lookups
a9c75ae Merge pull request #268 from TileDB-Inc/ss/reader-memory-budget-verbose
7c6f264 Add verbose output to memory budget calculations
51ac85f Add proper checking for map lookups

This list of changes was auto generated.

Assets 6

09 Feb 18:50

aaronwolen

0.8.1

2410182

0.8.1

Notable

This release includes several important changes to the way TileDB-VCF allocates the configured memory:

Tile cache size is now allotted 10% of the total memory budget.
The remainder of the budget is then split 75/25 between TileDB Embedded and TileDB-VCF itself. Previously this split was 50/50.
For the APIs memory is now split 33/66 between their own buffers and the TileDB-VCF C++ code to increase the memory available for TileDB queries.

Core

Changed

Update TileDB Embedded to 2.2.3 (#254)
TileDB memory budget parameters are now attached directly to the query (#255)
Forward CMake variable TileDB_DIR to the TileDB EP so it's used (#259)
Skew memory budget towards core library (#261)
APIs allocate more memory to the C++ memory (#262)
htslib now directly manages counts of info/fmt field (#263)

Python

Changed

Build Chunked Arrow Table in _read_partition() (#226)

Assets 6

26 Jan 21:04

aaronwolen

0.8.0

31a179b

0.8.0

Notable

This updates the version of TileDB Embedded to 2.2.2.

Assets 6

25 Jan 20:03

aaronwolen

0.7.2

40228d5

0.7.2

Notable

This release includes several updates to the Python API related to TileDB-VCF dataset creation and sample ingestion. See the Python API reference for complete documentation about these new options.

Core

Changed

Double buffering is now only used for queries larger than 200Mb (#250)

Python

Added

Gained option to list only materialized attributes (#249)
New create_dataset() method that exposes arguments for customizing the creation of new TileDB-VCF datasets (#243)
New unit test added for validating larger exports (#220)

Changed

The ingest_samples() method gains several new arguments for customizing the number of threads used, thread task size, memory budget, and max buffer size during sample ingestion (#243)

Spark

Added

Gained option list materialized attributes (#249)
Add verbose output and logging of buffer sizes (#247)

Assets 6

06 Jan 17:39

aaronwolen

0.7.1

7abd794

0.7.1

Notable

This release adds the ability to directly ingest VCF/BCF files from remote object stores so it's no longer necessary to allocate temporary scratch space and download them first.

Core

Added

Added support for reading samples remote directly (#229)

Changed

TileDB was updated to 2.1.6 (#239)
CI native library builds are no longer conditioned on the success of build stage (#238)

Docker

Added

The CLI Dockerfile now utilizes multi-stage builds to significantly reduce the image size (#136)

Assets 6

30 Dec 20:42

Shelnutt2

0.7.0

e3c1ff2

v0.7.0

Notable

This release introduces a new schema (v4) for storing VCF data with TileDB. Variants are now stored in a 3D sparse array, indexed by contig, start_pos, and sample. The contig and sample dimensions are now of type TILEDB_STRING_ASCII and leverage optimizations in TileDB core for string coordinates.

Note: Arrays created using previous schemas are still fully supported.

Complete documentation about the new schema is available here.

A number of other note-worthy improvements are included in this release:

New tiledbvcf utils sub command for consolidating and vacuuming fragment metadata
Removal of registration phase, which no longer needed 🎉

Core

Changed

Switch to 3D array and row-major layout (#194)
Use prebuilt artifacts by default for TileDB (#201)
Create versioned methods for init_for_reads/next_read_batch (#206)
Only fetch vcf headers on a new read sample batch (#207)
Sort v4 regions in lexicographical order for writes (#208)
Remove contig check from process_query_results_v4() (#210)
htslib was updated to 1.10 (#230)
Support TileDB 2.2 C++ result estimate API changes (#209)
Handle all sample export for v4 with optimal range (#221)
Don't batch v4 reads by samples (#215)
V4 writes should split fragments by contig (#216)
Update TileDB to v2.1.4 (#225)

Fixed

Add support for ASAN and fix leak in htslib plugin (#181)
Fix leaking of hfile* (#197)
Reduce the number of times queries open an array (#204)
Optimize v2/v3 read queries (#205)
Set the contig for ingestion on seek of VCF file to avoid records seeking beyond a contig's boundaries (#218)

Added

Added a new memory budget parameter for setting the max size of the TileDB query buffer for write (#217)
Add option to disable including vcf header stats (#213)
The number of writes performed during each ingestion batch is now included in the verbose output (#219)

Python

Changed

Format with black (#199)
Add lint check to CI (#202)

Fixed

Sort pandas dataframes for unordered comparison in unit tests (#223)

Assets 6

30 Dec 20:42

gsvic

0.7.0-beta3

a7f86da

0.7.0 Beta 3 Pre-release

Pre-release

Changes:

a7f86da Merge pull request #225 from TileDB-Inc/ss/tiledb-2.1.4
ec05524 Update to TileDB 2.1.4
3bec8c9 Merge pull request #216 from TileDB-Inc/sethshelnutt/ch4458/ingestion-in-fragments-split-by-contig
4f05c04 V4 writes should split fragments by contig
ce113d2 Merge pull request #215 from TileDB-Inc/sethshelnutt/ch4459/remove-sample-batching-from-reads
48e1e0a Merge pull request #217 from TileDB-Inc/ss/add-ingestion-query-buffer-option
1d6d160 Merge pull request #213 from TileDB-Inc/ss/option-disable-vcf-header-stats
c28816d Add CLI option for max TileDB buffer size
4378baa Merge pull request #219 from TileDB-Inc/ss/add-verbose-message-ingestion
2a40ed2 Merge pull request #221 from TileDB-Inc/ss/handle-special-case-all-sample-export

See More

f1df5f3 Handle all sample export for v4 with optimal range
8de6315 Reset pandas index for python unordered tests
a674f7e Merge pull request #223 from TileDB-Inc/ss/python-test-handle-unordered-results
60a775b Merge pull request #218 from TileDB-Inc/ss/store-contig-vcf-ingestion
06fd841 Sort pandas df's for unordered comparison
a93c63c Merge pull request #222 from TileDB-Inc/revert-203-ss/workaround-osx-ci-homebrew-openssl
2ed3f3b Revert "Workaround homebrew openssl error for CI"
05570a6 Fix writer init with parameters
00aee70 Add option for enable/disable of vcf header stats
528f0e3 Merge pull request #210 from TileDB-Inc/ss/optimize-v4-processing-with-single-contig-query
8a4ae80 Add write message for verbose ingestion
b6e9585 Set the contig for ingestion on seek of VCF file
0b63c64 Don't batch v4 reads by samples
c5ae2ea Remove contig check from process_query_results_v4
93ed1d3 Merge pull request #209 from TileDB-Inc/ss/support-tiledb-2.2-est-result-api-change
c384d5d Support TileDB 2.2 C++ result estimate API changes
e3d50b6 Merge pull request #212 from TileDB-Inc/revert-198-vn/build-chunked-arrow-table
0e163ab Revert "Build Chunked Arrow Table in _read_partition()"
f75d34b Merge pull request #198 from TileDB-Inc/vn/build-chunked-arrow-table
d75b35d Merge pull request #208 from TileDB-Inc/sethshelnutt/ch4407/3d-array-ingestion-need-to-write-in-lexigraphical
63ad03b Sort regions in lexicographical order for v4 writes
975dad3 Merge pull request #207 from TileDB-Inc/ss/only-fetch-header-on-sample-batch-change
b199c0f Only fetch vcf headers on a new read sample batch
9c077d5 Merge pull request #206 from TileDB-Inc/ss/split-read-path-v4
6983823 Split init_for_reads/next_read_batch into versions
c16499a Merge pull request #204 from TileDB-Inc/ss/reduce-array-opens
71314c5 Merge pull request #202 from TileDB-Inc/ss/add-python-format-linting
5117f69 Merge pull request #197 from TileDB-Inc/ss/hfile-destroy
bfa68ce Merge pull request #205 from TileDB-Inc/ss/v2-v3-query-optimizations
31dd235 Optimize v2/v3 read queries
c448dd8 Merge pull request #201 from TileDB-Inc/ss/prebuilt-tiledb
3e80dd4 Merge branch 'master' into vn/build-chunked-arrow-table
323b04e Fix leaking of hfile* on file open errors
e16baf2 Modify read() and continue_read() To Take return_type Arg
f9ddd66 Add build matrix to test building tiledb from source
da55340 Use prebuilt artifacts by default for TileDB
90695d6 Format additional python scripts
bc2daea Add python format and linting check
075188c General cleanup of unneeded parameters in Dataset
17e264f Share open data array w/reader and dataset classes
14f4eef Don't reopen the array on every read query
defe57e Add data/vcf array pointers to Dataset class
b74a06e Merge pull request #203 from TileDB-Inc/ss/workaround-osx-ci-homebrew-openssl
f8c2107 Workaround homebrew openssl error for CI
99e85c6 Build Chunked Arrow Table in _read_partition()
2afe1dd Merge pull request #199 from TileDB-Inc/ss/format-python
b1422d0 Format with black
dc64ce9 ASAN fixes
09bb52c Adjust writer init to allow setting config
c549054 Fix region sorting for reads
ef3631d Fix edge case in region overflow
dc4cf3f Only register samples for v2/v3 datasets in python
4747b4d Switch writer to own a dataset unique_ptr
e071db3 Remove registration for v4 arrays
2ea9a69 New v4 test arrays
b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
b3b4f84 Swich v4 record writting to avoid shared_ptr copy
febceb5 General cleanup
9b75294 Cleanup
20c2311 Update V4 unit test arrays
783a9ea Switch dimension ordering to contig,start_pos,sample
f81abc3 Switch order of contig and start_pos dimensions
e1c7df3 Query ranges by contig
6a732b8 Fix coalescing ranges in v4
f0bbbc1 Optimize contig map lookup for processing
7888678 Update python test ordering
fdfd77f Fix ordering in cli tests
e86c0f9 Workaround est results returning 0 for offests
5895cd9 Update unit-vcf-export tests based on new ordering of partitions
1f47a55 Fix report_cell from rebase
021a37e Test updates for more order issues, pt.3
0c78652 Test updates for more order issues, pt:2
01e0214 Test updates for more order issues
083c0cc Improve run-cli test with context output
5067a8e Switch sample to a string dimension
3239a74 Set java/spark version for 0.7.0-SNAPSHOT
7b50da8 Cleanup
53eda54 Binary search over contig regions and exit early
282ef97 Fix warning for different sign comparison
3028d95 Don't use contig index hash, might be too expensive
44df482 Switch to using point ranges for all sample ids
6fb7cf9 Switch to do not report logic for intersection
c3c5951 Add new v4 dataset schema with 3D array.
d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
28dbb0e Fix context leak in htslib plugin setup
81c2b21 Add support for ASAN

This list of changes was auto generated.

Assets 6

17 Dec 16:17

Shelnutt2

0.6.2

4529721

v0.6.2

This is a minor release that includes a fix for the performance regression introduced in v0.6.0 and present in v0.6.1. Please note, this is the last version of TileDB-VCF that will create datasets using the current v3 array schema. The next version (v0.7) will introduce a new schema (v4). If you are planning to perform a large ingestion in the near future, we recommend postponing until this new version is released in the coming weeks.

Core

Fixed

Optimize v2/v3 array reads by exiting intersection check early when moving past record end position #205

Assets 6

25 Nov 21:36

gsvic

0.7.0-beta2

dc64ce9

0.7.0 Beta 2 Pre-release

Pre-release

Changes:

dc64ce9 ASAN fixes
09bb52c Adjust writer init to allow setting config
c549054 Fix region sorting for reads
ef3631d Fix edge case in region overflow
dc4cf3f Only register samples for v2/v3 datasets in python
4747b4d Switch writer to own a dataset unique_ptr
e071db3 Remove registration for v4 arrays
2ea9a69 New v4 test arrays
b390173 Merge pull request #194 from TileDB-Inc/ss/string-sample-id-query-by-contig
b3b4f84 Swich v4 record writting to avoid shared_ptr copy

See More

febceb5 General cleanup
9b75294 Cleanup
20c2311 Update V4 unit test arrays
783a9ea Switch dimension ordering to contig,start_pos,sample
f81abc3 Switch order of contig and start_pos dimensions
e1c7df3 Query ranges by contig
6a732b8 Fix coalescing ranges in v4
f0bbbc1 Optimize contig map lookup for processing
7888678 Update python test ordering
fdfd77f Fix ordering in cli tests
e86c0f9 Workaround est results returning 0 for offests
5895cd9 Update unit-vcf-export tests based on new ordering of partitions
1f47a55 Fix report_cell from rebase
021a37e Test updates for more order issues, pt.3
0c78652 Test updates for more order issues, pt:2
01e0214 Test updates for more order issues
083c0cc Improve run-cli test with context output
5067a8e Switch sample to a string dimension
3239a74 Set java/spark version for 0.7.0-SNAPSHOT
7b50da8 Cleanup
53eda54 Binary search over contig regions and exit early
282ef97 Fix warning for different sign comparison
3028d95 Don't use contig index hash, might be too expensive
44df482 Switch to using point ranges for all sample ids
6fb7cf9 Switch to do not report logic for intersection
c3c5951 Add new v4 dataset schema with 3D array.
d1189e7 Merge pull request #181 from TileDB-Inc/ss/asan
66110c9 Switch htslib plugin to not use unique_ptrs for ctx/cfg
28dbb0e Fix context leak in htslib plugin setup
81c2b21 Add support for ASAN

This list of changes was auto generated.

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes:

Changes:

Notable

Core

Changed

Python

Changed

Notable

Notable

Core

Changed

Python

Added

Changed

Spark

Added

Notable

Core

Added

Changed

Docker

Added

Notable

Core

Changed

Fixed

Added

Python

Changed

Fixed

Changes:

Core

Fixed

Changes:

Releases: TileDB-Inc/TileDB-VCF

0.8.3

Changes:

0.8.2

Changes:

0.8.1

Notable

Core

Changed

Python

Changed

0.8.0

Notable

0.7.2

Notable

Core

Changed

Python

Added

Changed

Spark

Added

0.7.1

Notable

Core

Added

Changed

Docker

Added

v0.7.0

Notable

Core

Changed

Fixed

Added

Python

Changed

Fixed

0.7.0 Beta 3

Changes:

v0.6.2

Core

Fixed

0.7.0 Beta 2

Changes: