Skip to content

Releases: TileDB-Inc/TileDB-VCF

v0.6.1

20 Nov 18:14
2520fb4
Compare
Choose a tag to compare

This is a minor release that includes two export-related bug fixes. Please note, this is the last version of TileDB-VCF that will create datasets using the current v3 array schema. The next version (v0.7) will introduce a new schema (v4). If you are planning to perform a large ingestion in the near future, we recommend postponing until this new version is released in the coming weeks.

Core

Added

  • Added support for printing array creation statistics from TileDB (#191)

    tiledbvcf create -u my-array --stats

Changed

  • Updated TileDB to 2.1.3 (#192)

Fixed

  • Check both the upper- and lower-bounds of contigs when exporting from v2/v3 arrays (#188)
  • Verify query region and records share a contig (#189)

v0.6.0

03 Nov 23:38
Compare
Choose a tag to compare

Core

Changed

  • Updated TileDB to v2.1.2 (#187)
  • Point ranges are now used for each sample ID to prevent specific queries from retrieving the entire sample dimension (#175)
  • Refactor C++ unit tests using CATCH matcher so export results can be checked, regardless of their order (#180)
  • CLI unit tests will now report differences where they exist (#177)
  • Remove calls to deprecated max_buffer_elements() function (#179)

Fixed

  • Updated the record intersection algorithm to ensure we only report a single record for each query region and VCF record (#172)
  • Remote files are now downloaded to sample-specific temporary locations (#182)
  • Every record now independently checks regions for intersections to avoid the possibility that region filtering might exclude a result (#171)

Python

Added

  • Add C/Python APIs for retrieving schema version and sample names (#164)

v0.5.3

26 Sep 14:43
9efbfd5
Compare
Choose a tag to compare

Core

Changed

  • END tags are added to the INFO field of exported vcf/bcf files only if END was defined in the original file's header (#173).

Fixed

  • A missing FILTER tag was added to several test files and arrays (#168).

Docker Images

Changed

  • Pinned htlib's minimum version to 1.8 (#170).

v0.5.2

14 Sep 21:24
e323cc6
Compare
Choose a tag to compare

Notable

This release introduces significant performance enhancements for exports using large (indexed) BED files.

Core

Added

  • Support for reading indexed BED files in parallel (#162).
  • BED file parsing times are now included in export's verbose output (#160).

Fixed

  • Sort internal index by start position when processing v3 arrays (#163).
  • Use htslib's default read capacity size (#161).

All Changes

  • 5f870dd Update spark/java versions to 0.5.2
  • 38043fd External CI script for collecting native libs
  • 76c0735 Don't build native libs if build stage failed
  • cfa8cab Switch github releases to drafts
  • 95f7892 Use boolean vars in CI
  • c554bc0 Add support for reading BED file in parallel.
  • b7153c3 V3 Arrays should move regions based on start_pos
  • 8e22e04 Use the default HTSLIB read capacity size
  • 13cff2a Add timing for parsing and sorting bed file

0.5.1

01 Sep 19:28
1e37e36
Compare
Choose a tag to compare

Notable

Compressed BED files are now supported for exports.

Core

Added

  • htslib is now used for parsing BED files (#130).
  • CI now builds cross-platform JAR files (#148) .

Changed

  • Docker images are now built in a separate CI stage (#158).

CLI

Added

  • export gains a --sorted flag to skip BED file sorting when the file is already known to be sorted (#147)

Python

Fixed

  • Fix buffer memory leak (#152)

Java/Spark

Added

  • Additional tests for decoding attributes from info/fmt byte blobs (#140)

Changed

  • getRanges()'s option for specifying genomic regions has been renamed from ranges to regions in order to be consistent with the other APIs (#151).

All Changes

  • 4bb70cd Add stage for building docker images
  • df78833 Merge pull request #157 from TileDB-Inc/ss/log-on-error-java-native-loading
  • ce46126 Log all errors for java native lib loading
  • cc4b8a1 Merge pull request #156 from TileDB-Inc/aw/ch3038/fix-var-len-filters
  • 9994517 Merge pull request #155 from TileDB-Inc/sethshelnutt/ch3038/incorrect-values-in-filter-field
  • f5bdb58 fix pytest for varable length filter
  • 36b09a3 Filter/Allele index should be based on list_offset
  • 5fd6fe9 Add pytest for querying variable length filters
  • 6eebc7d Use correct post-condition for limit_partitions
  • e1022c7 Add docstring and test for read_dask(limit_partitions=) kwarg
  • d58a9a3 Add limit_partitions kwarg option for read_dask/map_dask
  • 9a92107 Fix buffer memory leak
  • 7c949d9 Add support for regions spark option
  • 04f4437 Merge pull request #149 from TileDB-Inc/aw/ch2942/v3-example-array
  • 0c0275e Merge pull request #150 from TileDB-Inc/sethshelnutt/ch2994/setting-memory-budget-and-tbb-config-causes
  • b9930e4 Delay creating contexts for read
  • f4011c8 Pin badge to master branch
  • 6269ee8 Don't use deprecated python class in README
  • 0409ac7 Use URI for v3 of vcf-samples-20 array
  • ad59d23 Added CI step that builds a cross-platform jar
  • c401bc5 Make clang-format happy
  • dab62cb Update CLI description for no-duplicates arg
  • 68803e5 Add CLI flag to skip region sorting
  • 7b93018 Update to Gradle 6.6
  • a12a147 Reader validity test
  • 2398f30 Add parsing of bedfile via htslib
  • e504249 Report diffs in run-cli-tests.sh
  • efef9e2 Enable trace logging in cli tests
  • b31d17f Add htslib plugin for reading files via VFS
  • a3d0569 Merge pull request #145 from TileDB-Inc/ss/spark-java-version-0.5.1-snapshot
  • c70da3e Update spark/java version to 0.5.1-SNAPSHOT
  • ebd7905 Add unit tests which compares VCFInfoFmtDecoder [ch2819]

0.5.0

12 Aug 21:04
3ef1772
Compare
Choose a tag to compare

Notable

This release includes a new version of TileDB-VCF's schema for representing genomic data, which now indexes variants by start position. Together with numerous improvements made to the ingestion algorithm, TileDB-VCF now supports overlapping variants.

Note: Arrays created using previous schemas are still fully supported.

Core

Added

  • C API methods for querying and counting fmt/info attributes (#115).
  • Install option to ignore system installs of TileDB and force build the pinned version of TileDB as an external project (#138).

Changed

  • Updated TileDB to 2.0.8 (#126, #139, #141).
  • Updated schema to (v3) to store variants by their start position (#105, #114).
  • Data types for commonly used genotype fields are now correctly defined (#112).
  • VCF records are now accessed internally using htslib's iterator during ingestion (#118).

Fixed

  • Don't access nodes in the record heap that have already been released (#120).
  • Fix segfault from reading an extra info value from the query result (#128).
  • Java API is now built and tested on CI (#135).
  • Don't take reference to query in reader futures (#137).
  • Fixed bug retrieving fixed-length attributes containing null values (#142, #143).
  • Fixed CI clang-format task (#126).

Spark

Added

  • New verbose option .option("verbose", true) for providing additional information when querying array (#121).
  • Add spark task stage/ID to partition reader logs (#129).

Changed

  • Schema is now dynamic based on materialized attributes and available fmt/info fields (#115). Use select * to pull in all available attributes and df.schema() will describe and show all possible fields.
  • Add pos as alias for the pos_start queryable attribute (#124).
  • The samples and sampleFile options are now mutually exclusive, an error is thrown if both options are passed (#132).

Python

Added

  • New attributes() method to retrieve all queryable attributes available in a dataset (#127)
  • ingest_samples() gains arguments for setting the location and amount of scratch space to use when ingesting samples from S3 (#119, #122).
  • Dataset class gained a verbose option that provides additional information when writing to or reading from an array (#121).

Changed

  • Renamed TileDBVCFDataset class to Dataset (#116).
  • Dataset.read()'s sample arguments (samples and samples_file) are now mutually exclusive, an error is thrown is both are defined (#134).

Fixed

  • Buffers are now refreshed when performing multiple reads with the same Dataset object (#133).

CLI

Changed

  • The --sample-names and --samples-file arguments are now optional. When omitted all samples are exported by default. Previously one or the either was required.

Docker Images

Added

Changed

  • All images now use /data as their working directory rather than /tmp.
  • tiledbvcf-py can be used to execute a script or launch an interactive Python session.

Fixed

  • The environment variable AWS_EC2_METADATA_DISABLED is now set to avoid slow downs when querying S3 arrays outside of EC2.

0.4.3

16 Jun 12:09
b6e2a94
Compare
Choose a tag to compare

Changes include:

  • Fix duplicate cell reporting #104
  • Fix over allocation of buffers #106
  • Fix selective sample export for cli #107
  • Enable MacOS CI builds #109
  • Add getDefaultRecordByteCount for properly sized arrow buffers #108

0.4.2

01 Jun 14:28
9c37d99
Compare
Choose a tag to compare

Changes include:

  • Revert gradle git plugin for automatic versioning, some users had problems with it #93
  • Fix cmake overrides for python setup #94
  • Build shadowJar without classifier #95
  • Don't include spark libraries in shadowJar #96
  • Allow publishing spark shadowJar to maven #97
  • Update python dockerfile #98
  • Overhaul README #101
  • Update TileDB to 2.0.3 in superbuild #102

0.4.1

11 May 18:12
57e7587
Compare
Choose a tag to compare

This adds support for duplicates in the dataset. There is a new flag CLI flags, --no-duplicates to disable this new behavior. There is also a corresponding python dataset option.

Additional changes include:

  • Automatically build docker images #86 #87 #88
  • Add duplicate support #82 #90
  • Add option for dumping new TileDB statistics #91

0.4.0

04 May 16:31
e723c43
Compare
Choose a tag to compare

This release update to TileDB 2.0.0, which includes performance optimizations and memory improvements.

Changes Includes:

  • Add export unit tests for non-contiguous samples #80
  • Split coordinates internally for TileDB 2.0.0 #81
  • Update versions for python, java and spark APIs automatically based on git tags #83 #84 #85