- Set minimum version of Matrix to 1.5.3 to avoid
CsparseMatrix
validation issue present 1.5.2 - In CI
r-lib/actions/setup-r-dependencies@v2
is now used to install dependencies - Sparse matrix conversions are now performed via virtual classes to comply with the changes noted in the Matrix 1.5.0 release notes
- A new metadata tag,
soma_legacy_validity
, is now attached to all arrays created bySOMA
objects. By default this value is"false"
unless the TileDB-R legacy validity mode was enabled at creation time (i.e.,r.legacy_validity_mode
). When reading arrays from disk, theAnnotationDataFrame
class will check for this tag on initialization and when performing reads or writes. If the tag is present and set to"true"
, legacy validity mode is enabled globally (as its not possible to set on a per-array basis). Legacy validity mode is also enabled when readingAnnotationDataFrame
arrays that lack the tag, as this indicates the array was created with an older version of the package. These checks are limited toAnnotationDataFrame
arrays because the incorrect validity map values only affect nullable string attributes. See TileDB-R's release notes for more information.
- Group member cache is now updated when a member is removed (#102)
- The
AnnotationMatrix
'sto_matrix()
method now supports batched reads via thebatch_mode
argument. This functionality can also be leveraged fromSOMA
'sget_seurat_dimreductions_list()
andget_seurat_dimreduction()
methods. (#86) - The
SOMACollection
'sto_seurat()
method gains asomas
argument that makes it possible to select a subset ofSOMA
s andX
layers to be retrieved. (#89)
- Updated
setup-r
GitHub Action to v2 (#90)
-
Added
batch_mode
option to methods that readX
layers (i.e.,AssayMatrix
objects) into memory. When enabled, batch mode leverages the family ofBatched
classes added to tiledb-r in version 0.14.0 to detect partial query results and resubmit until all results are retrieved. This feature is currently disabled by default and only applies toX
layers (which are typically the largest arrays). You can enable batch mode from the following methods:SOMACollection$to_seurat()
SOMA$to_seurat_assay()
SOMA$to_summarized_experiment()
SOMA$to_single_cell_experiment()
AssayMatrix$to_dataframe()
AssayMatrix$to_matrix()
-
Members can now be removed from
TileDBGroup
s withremove_member()
-
New
vignette("quickstart")
which provides new users with a high-level overview of the package -
New function
dataset_seurat_pbmc3k()
to download the pbmc 3k dataset from 10X and import as aSeurat
object without requiring any extra dependencies. This dataset is used in the new vignette -
Updated bundled
Makefile
to add targets for generating pre-computed vignettes and performing common dev operations -
Added
CONTRIBUTING.md
to reference TileDB's CoC and document theMakefile
- Removed vestigial code for merging non-layerable COO data.frames, which was previously used to add ingest dense
scaled.data
from a SeuratAssay
as an attribute of theX
array, along withcounts
/data
. This is no longer necessary as each layer is now ingested into a separate array within theX
group (#73). - The internal utility
dgtmatrix_to_dataframe()
was replaced withmatrix_to_coo()
, which converts Matrix-like objects to COO data frames much more efficiently (#75). - The internal utility
pad_matrix()
can now pad a matrix by adding empty rows (#79). - The internal assertion
has_dimnames()
was replaced withis_labeled_matrix()
for clarity (#79).
- Matrix conversion message from
AssayMatrix
now respects theverbose
option - Upon initialization
SOMA
now looks for araw
group and warns the user it will be ignored. Currently tiledbsc-py creates araw
group when converting anndata objects where.raw
is populated. However, Seurat/BioC objects do not have an obvious place to store this data, so ignoring it improves compatibility. - Fixed a non-user-facing issue with the internal
dgtmatrix_to_dataframe()
function used to convert unordereddgTMatrix
objects to COO data frames (#73). - Pretty printing of classes that inherit from
TileDBObject
has been improved so that the class name is displayed first (#79). - Don't use default assay name when recreating a
Seurat
object (#80, thanks @dan11mcguire)
- Added
with_allocation_size_preference()
helper to temporarily set the allocation size preference for testing. - Tests were added to verify the internal
dgtmatrix_to_dataframe()
will error out if an input list contains non-layerable matrices.
This release changes the names of the 2 top-level classes in the tiledbsc package to follow new nomenclature adopted by the single-cell data model specification, which was implemented here. You can read more about the rationale for this change here.
Additionally, the misc
slot has been renamed to uns
. See below for details.
New class names
SCGroup
is replaced bySOMA
(stack of matrices, annotated)SCDataset
is replaced bySOMACollection
There are no functional changes to either class. SOMA
is a drop-in replacement for SCGroup
and SOMACollection
is a drop-in replacement for SCDataset
. However, with the new names two of SOMACollection
's methods have changed accordingly:
- the
scgroups
field is nowsomas
scgroup_uris()
is nowsoma_uris()
To ease the transition, the SCDataset
and SCGroup
classes are still available as aliases for SOMACollection
and SOMA
, respectively. However, they have been deprecated and will be removed in the future.
Previously, the SCDataset
and SCGroup
classes included a TileDB group called misc
that was intended for miscellaneous/unstructured data. To better align with the SOMA specification this group has been renamed to uns
. Practically, this means new SOMA
s and SOMACollection
s will create TileDB groups named uns
, rather than misc
. And these groups can be accessed with the SOMA
and SOMACollection
classes using SOMA$uns
.
For backwards compatibility:
- if a
misc
group exists within aSOMACollection
orSOMA
on disk, it will be accessible via theuns
field of the parent class - the deprecated
SCDataset
andSCGroup
will continue to provide amisc
field (actually an active binding that aliases theuns
slot) so users can continue to use the old name
It's now possible to read only a specific subset of data into memory.
The following classes now have a set_query()
method:
TileDBArray
and its subclassesAnnotationGroup
and its subclassesSOMA
SOMACollection
With set_query()
you can specify:
- the ranges of the indexed dimensions to slice
- attribute filter conditions
See the new Filtering vignette for details.
- Added
TileDBObject
base class to provide fields and methods common to bothTileDBArray
- andTileDBGroup
-based classes - The
array_exists()
andgroup_exists()
methods have been deprecated in favor of the more generalexists()
- Similar to the
TileDBGroup
class,TileDBArray
now maintains a reference to the underlying array pointer - All classes gain an
objects
field to provide direct access to the underlying TileDB objects - Added missing
config
/ctx
fields toAnnotationGroup
AnnotationDataframe
gainsids()
to retrieve all values from the array's dimensionsoma_object_type
andsoma_encoding_version
metadata are written to groups/arrays at write time- Minimum required version of tiledb-r is now 0.14.0, which also updates TileDB to version 2.10
AnnotationDataframe$from_dataframe()
no longer coerceslogical
columns tointeger
s, as TileDB 2.10 provides support forBOOL
data types- Messages about updating existing arrays are only printed in verbose mode
- Disable duplicates for
AnnotationArray
s so updates will overwrite existing cells
Improve handling of Seurat objects with empty cell identities (#58).
tiledbsc now uses the enhanced Group API's introduced in TileDB v2.8 and TileDB-R 0.12.0.
Note: The next version of tiledbsc will migrate to the new SOMA-based naming scheme described here.
Group-level metadata is now natively supported by TileDB so TileDBGroup
-based classes no longer create nested __tiledb_group_metadata
arrays for the purpose of storing group-level metadata.
See TileDB 2.8 release notes for additional changes.
- the
arrays
field has been replaced withmembers
, which includes both TileDB arrays and groups get_array()
has been replaced withget_member()
which add atype
argument to filter by object type- gain the following methods:
count_members()
,list_members()
,list_member_uris()
, andadd_member()
- the
scgroup_uris
argument has been dropped fromSCDataset
's initialize method (add_member()
should now be used instead to add additionalSCGroup
s)
SCDataset
'sscgroups
field is now an active binding that filtersmembers
forSCGroup
objects
- added a
NEWS.md
file to track changes to the package - the fs package is now a dependency
SCGroup
'sfrom_seurat_assay()
method gained two new arguments:layers
, to specify which SeuratAssay
slots should be ingested, andvar
, to control whether feature-level metadata is ingestedSCGroup
'sfrom_seurat_assay()
method will no longer ingest thedata
slot if it is identical tocounts
- Internally group members are now added with names
- New internal
TileDBURI
class for handling various URI formats - The
uri
field for all TileDB(Array|Group)-based classes is now an active binding that retrieves the URI from the privatetiledb_uri
field - Several default parameters have been changed to store the the
X
,obs
, andvar
arrays more efficiently on disk (#50) - Seurat cell identities are now stored in the
active_ident
attribute of theobs
array (#56) - Require at least version 0.13.0 of tiledb-r to support retrieval of group names