Releases · TileDB-Inc/TileDB-VCF

20 Nov 18:14

0.6.1

2520fb4

v0.6.1

This is a minor release that includes two export-related bug fixes. Please note, this is the last version of TileDB-VCF that will create datasets using the current v3 array schema. The next version (v0.7) will introduce a new schema (v4). If you are planning to perform a large ingestion in the near future, we recommend postponing until this new version is released in the coming weeks.

Core

Added

Added support for printing array creation statistics from TileDB (#191)
```
tiledbvcf create -u my-array --stats
```

Changed

Updated TileDB to 2.1.3 (#192)

Fixed

Check both the upper- and lower-bounds of contigs when exporting from v2/v3 arrays (#188)
Verify query region and records share a contig (#189)

Assets 6

03 Nov 23:38

aaronwolen

0.6.0

9c24dad

v0.6.0

Core

Changed

Updated TileDB to v2.1.2 (#187)
Point ranges are now used for each sample ID to prevent specific queries from retrieving the entire sample dimension (#175)
Refactor C++ unit tests using CATCH matcher so export results can be checked, regardless of their order (#180)
CLI unit tests will now report differences where they exist (#177)
Remove calls to deprecated max_buffer_elements() function (#179)

Fixed

Updated the record intersection algorithm to ensure we only report a single record for each query region and VCF record (#172)
Remote files are now downloaded to sample-specific temporary locations (#182)
Every record now independently checks regions for intersections to avoid the possibility that region filtering might exclude a result (#171)

Python

Added

Add C/Python APIs for retrieving schema version and sample names (#164)

Assets 6

26 Sep 14:43

aaronwolen

0.5.3

9efbfd5

v0.5.3

Core

Changed

END tags are added to the INFO field of exported vcf/bcf files only if END was defined in the original file's header (#173).

Fixed

A missing FILTER tag was added to several test files and arrays (#168).

Docker Images

Changed

Pinned htlib's minimum version to 1.8 (#170).

Assets 6

14 Sep 21:24

aaronwolen

0.5.2

e323cc6

v0.5.2

Notable

This release introduces significant performance enhancements for exports using large (indexed) BED files.

Core

Added

Support for reading indexed BED files in parallel (#162).
BED file parsing times are now included in export's verbose output (#160).

Fixed

Sort internal index by start position when processing v3 arrays (#163).
Use htslib's default read capacity size (#161).

All Changes

5f870dd Update spark/java versions to 0.5.2
38043fd External CI script for collecting native libs
76c0735 Don't build native libs if build stage failed
cfa8cab Switch github releases to drafts
95f7892 Use boolean vars in CI
c554bc0 Add support for reading BED file in parallel.
b7153c3 V3 Arrays should move regions based on start_pos
8e22e04 Use the default HTSLIB read capacity size
13cff2a Add timing for parsing and sorting bed file

Assets 6

01 Sep 19:28

gsvic

0.5.1

1e37e36

0.5.1

Notable

Compressed BED files are now supported for exports.

Core

Added

htslib is now used for parsing BED files (#130).
CI now builds cross-platform JAR files (#148) .

Changed

Docker images are now built in a separate CI stage (#158).

CLI

Added

export gains a --sorted flag to skip BED file sorting when the file is already known to be sorted (#147)

Python

Fixed

Fix buffer memory leak (#152)

Java/Spark

Added

Additional tests for decoding attributes from info/fmt byte blobs (#140)

Changed

getRanges()'s option for specifying genomic regions has been renamed from ranges to regions in order to be consistent with the other APIs (#151).

All Changes

4bb70cd Add stage for building docker images
df78833 Merge pull request #157 from TileDB-Inc/ss/log-on-error-java-native-loading
ce46126 Log all errors for java native lib loading
cc4b8a1 Merge pull request #156 from TileDB-Inc/aw/ch3038/fix-var-len-filters
9994517 Merge pull request #155 from TileDB-Inc/sethshelnutt/ch3038/incorrect-values-in-filter-field
f5bdb58 fix pytest for varable length filter
36b09a3 Filter/Allele index should be based on list_offset
5fd6fe9 Add pytest for querying variable length filters
6eebc7d Use correct post-condition for limit_partitions
e1022c7 Add docstring and test for read_dask(limit_partitions=) kwarg
d58a9a3 Add limit_partitions kwarg option for read_dask/map_dask
9a92107 Fix buffer memory leak
7c949d9 Add support for regions spark option
04f4437 Merge pull request #149 from TileDB-Inc/aw/ch2942/v3-example-array
0c0275e Merge pull request #150 from TileDB-Inc/sethshelnutt/ch2994/setting-memory-budget-and-tbb-config-causes
b9930e4 Delay creating contexts for read
f4011c8 Pin badge to master branch
6269ee8 Don't use deprecated python class in README
0409ac7 Use URI for v3 of vcf-samples-20 array
ad59d23 Added CI step that builds a cross-platform jar
c401bc5 Make clang-format happy
dab62cb Update CLI description for no-duplicates arg
68803e5 Add CLI flag to skip region sorting
7b93018 Update to Gradle 6.6
a12a147 Reader validity test
2398f30 Add parsing of bedfile via htslib
e504249 Report diffs in run-cli-tests.sh
efef9e2 Enable trace logging in cli tests
b31d17f Add htslib plugin for reading files via VFS
a3d0569 Merge pull request #145 from TileDB-Inc/ss/spark-java-version-0.5.1-snapshot
c70da3e Update spark/java version to 0.5.1-SNAPSHOT
ebd7905 Add unit tests which compares VCFInfoFmtDecoder [ch2819]

Assets 6

12 Aug 21:04

aaronwolen

0.5.0

3ef1772

0.5.0

Notable

This release includes a new version of TileDB-VCF's schema for representing genomic data, which now indexes variants by start position. Together with numerous improvements made to the ingestion algorithm, TileDB-VCF now supports overlapping variants.

Note: Arrays created using previous schemas are still fully supported.

Core

Added

C API methods for querying and counting fmt/info attributes (#115).
Install option to ignore system installs of TileDB and force build the pinned version of TileDB as an external project (#138).

Changed

Updated TileDB to 2.0.8 (#126, #139, #141).
Updated schema to (v3) to store variants by their start position (#105, #114).
Data types for commonly used genotype fields are now correctly defined (#112).
VCF records are now accessed internally using htslib's iterator during ingestion (#118).

Fixed

Don't access nodes in the record heap that have already been released (#120).
Fix segfault from reading an extra info value from the query result (#128).
Java API is now built and tested on CI (#135).
Don't take reference to query in reader futures (#137).
Fixed bug retrieving fixed-length attributes containing null values (#142, #143).
Fixed CI clang-format task (#126).

Spark

Added

New verbose option .option("verbose", true) for providing additional information when querying array (#121).
Add spark task stage/ID to partition reader logs (#129).

Changed

Schema is now dynamic based on materialized attributes and available fmt/info fields (#115). Use select * to pull in all available attributes and df.schema() will describe and show all possible fields.
Add pos as alias for the pos_start queryable attribute (#124).
The samples and sampleFile options are now mutually exclusive, an error is thrown if both options are passed (#132).

Python

Added

New attributes() method to retrieve all queryable attributes available in a dataset (#127)
ingest_samples() gains arguments for setting the location and amount of scratch space to use when ingesting samples from S3 (#119, #122).
Dataset class gained a verbose option that provides additional information when writing to or reading from an array (#121).

Changed

Renamed TileDBVCFDataset class to Dataset (#116).
Dataset.read()'s sample arguments (samples and samples_file) are now mutually exclusive, an error is thrown is both are defined (#134).

Fixed

Buffers are now refreshed when performing multiple reads with the same Dataset object (#133).

CLI

Changed

The --sample-names and --samples-file arguments are now optional. When omitted all samples are exported by default. Previously one or the either was required.

Docker Images

Added

Improved documentation for the tiledbvcf-cli and tiledbvcf-py Docker images (#110).

Changed

All images now use /data as their working directory rather than /tmp.
tiledbvcf-py can be used to execute a script or launch an interactive Python session.

Fixed

The environment variable AWS_EC2_METADATA_DISABLED is now set to avoid slow downs when querying S3 arrays outside of EC2.

Assets 2

16 Jun 12:09

Shelnutt2

0.4.3

b6e2a94

0.4.3

Changes include:

Fix duplicate cell reporting #104
Fix over allocation of buffers #106
Fix selective sample export for cli #107
Enable MacOS CI builds #109
Add getDefaultRecordByteCount for properly sized arrow buffers #108

Assets 2

01 Jun 14:28

Shelnutt2

0.4.2

9c37d99

0.4.2

Changes include:

Revert gradle git plugin for automatic versioning, some users had problems with it #93
Fix cmake overrides for python setup #94
Build shadowJar without classifier #95
Don't include spark libraries in shadowJar #96
Allow publishing spark shadowJar to maven #97
Update python dockerfile #98
Overhaul README #101
Update TileDB to 2.0.3 in superbuild #102

Assets 2

11 May 18:12

Shelnutt2

0.4.1

57e7587

0.4.1

This adds support for duplicates in the dataset. There is a new flag CLI flags, --no-duplicates to disable this new behavior. There is also a corresponding python dataset option.

Additional changes include:

Automatically build docker images #86 #87 #88
Add duplicate support #82 #90
Add option for dumping new TileDB statistics #91

Assets 2

04 May 16:31

Shelnutt2

0.4.0

e723c43

0.4.0

This release update to TileDB 2.0.0, which includes performance optimizations and memory improvements.

Changes Includes:

Add export unit tests for non-contiguous samples #80
Split coordinates internally for TileDB 2.0.0 #81
Update versions for python, java and spark APIs automatically based on git tags #83 #84 #85

Assets 2

Releases: TileDB-Inc/TileDB-VCF

v0.6.1

Core

Added

Changed

Fixed

v0.6.0

Core

Changed

Fixed

Python

Added

v0.5.3

Core

Changed

Fixed

Docker Images

Changed

v0.5.2

Notable

Core

Added

Fixed

All Changes

0.5.1

Notable

Core

Added

Changed

CLI

Added

Python

Fixed

Java/Spark

Added

Changed

All Changes

0.5.0

Notable

Core

Added

Changed

Fixed

Spark

Added

Changed

Python

Added

Changed

Fixed

CLI

Changed

Docker Images

Added

Changed

Fixed

0.4.3

0.4.2

0.4.1

0.4.0