Skip to content

tiledbsc 0.1.3

Compare
Choose a tag to compare
@aaronwolen aaronwolen released this 29 Jun 14:32
· 167 commits to main since this release
dacdcfc

Migration to SOMA-based names

This release changes the names of the 2 top-level classes in the tiledbsc package to follow new nomenclature adopted by the single-cell data model specification, which was implemented here. You can read more about the rationale for this change here.

Additionally, the misc slot has been renamed to uns. See below for details.

New class names

  • SCGroup is replaced by SOMA (stack of matrices, annotated)
  • SCDataset is replaced by SOMACollection

There are no functional changes to either class. SOMA is a drop-in replacement for SCGroup and SOMACollection is a drop-in replacement for SCDataset. However, with the new names two of SOMACollection's methods have changed accordingly:

  • the scgroups field is now somas
  • scgroup_uris() is now soma_uris()

To ease the transition, the SCDataset and SCGroup classes are still available as aliases for SOMACollection and SOMA, respectively. However, they have been deprecated and will be removed in the future.

New location for miscellaneous/unstructured data

Previously, the SCDataset and SCGroup classes included a TileDB group called misc that was intended for miscellaneous/unstructured data. To better align with the SOMA matrix-api specification this group has been renamed to uns. Practically, this means new SOMAs and SOMACollections will create TileDB groups named uns, rather than misc. And these groups can be accessed with the SOMA and SOMACollection classes using SOMA$uns.

For backwards compatibility:

  • if a misc group exists within a SOMACollection or SOMA on disk, it will be accessible via the uns field of the parent class
  • the deprecated SCDataset and SCGroup will continue to provide a misc field (actually an active binding that aliases the uns slot) so users can continue to use the old name

Dimension slicing and attribute filtering

It's now possible to read only a specific subset of data into memory.

The following classes now have a set_query() method:

  • TileDBArray and its subclasses
  • AnnotationGroup and its subclasses
  • SOMA
  • SOMACollection

With set_query() you can specify:

  • the ranges of the indexed dimensions to slice
  • attribute filter conditions

See the new Filtering vignette for details.

Additional changes

  • Added TileDBObject base class to provide fields and methods common to both TileDBArray- and TileDBGroup-based classes
  • The array_exists() and group_exists() methods have been deprecated in favor of the more general exists()
  • Similar to the TileDBGroup class, TileDBArray now maintains a reference to the underlying array pointer
  • All classes gain an objects field to provide direct access to the underlying TileDB objects
  • Added missing config/ctx fields to AnnotationGroup
  • AnnotationDataframe gains ids() to retrieve all values from the array's dimension
  • soma_object_type and soma_encoding_version metadata are written to groups/arrays at write time
  • Minimum required version of tiledb-r is now 0.14.0, which also updates TileDB to version 2.10
  • AnnotationDataframe$from_dataframe() no longer coerces logical columns to integers, as TileDB 2.10 provides support for BOOL data types
  • Messages about updating existing arrays are only printed in verbose mode
  • Disable duplicates for AnnotationArrays so updates will overwrite existing cells