Releases: TileDB-Inc/tiledbsc
tiledbsc 0.1.5
Features
- The
AnnotationMatrix
'sto_matrix()
method now supports batched reads via thebatch_mode
argument. This functionality can also be leveraged fromSOMA
'sget_seurat_dimreductions_list()
andget_seurat_dimreduction()
methods. (#86) - The
SOMACollection
'sto_seurat()
method gains asomas
argument that makes it possible to select a subset ofSOMA
s andX
layers to be retrieved. (#89)
Changes
- Updated
setup-r
GitHub Action to v2 (#90)
tiledbsc 0.1.4
Features
-
Added
batch_mode
option to methods that readX
layers (i.e.,AssayMatrix
objects) into memory. When enabled, batch mode leverages the family ofBatched
classes added to tiledb-r in version 0.14.0 to detect partial query results and resubmit until all results are retrieved. This feature is currently disabled by default and only applies toX
layers (which are typically the largest arrays). You can enable batch mode from the following methods:SOMACollection$to_seurat()
SOMA$to_seurat_assay()
SOMA$to_summarized_experiment()
SOMA$to_single_cell_experiment()
AssayMatrix$to_dataframe()
AssayMatrix$to_matrix()
-
Members can now be removed from
TileDBGroup
s withremove_member()
-
New
vignette("quickstart")
which provides new users with a high-level overview of the package -
New function
dataset_seurat_pbmc3k()
to download the pbmc 3k dataset from 10X and import as aSeurat
object without requiring any extra dependencies. This dataset is used in the new vignette -
Updated bundled
Makefile
to add targets for generating pre-computed vignettes and performing common dev operations -
Added
CONTRIBUTING.md
to reference TileDB's CoC and document theMakefile
Changes
- Removed vestigial code for merging non-layerable COO data.frames, which was previously used to add ingest dense
scaled.data
from a SeuratAssay
as an attribute of theX
array, along withcounts
/data
. This is no longer necessary as each layer is now ingested into a separate array within theX
group (#73). - The internal utility
dgtmatrix_to_dataframe()
was replaced withmatrix_to_coo()
, which converts Matrix-like objects to COO data frames much more efficiently (#75). - The internal utility
pad_matrix()
can now pad a matrix by adding empty rows (#79). - The internal assertion
has_dimnames()
was replaced withis_labeled_matrix()
for clarity (#79).
Fixes
- Matrix conversion message from
AssayMatrix
now respects theverbose
option - Upon initialization
SOMA
now looks for araw
group and warns the user it will be ignored. Currently tiledbsc-py creates araw
group when converting anndata objects where.raw
is populated. However, Seurat/BioC objects do not have an obvious place to store this data, so ignoring it improves compatibility. - Fixed a non-user-facing issue with the internal
dgtmatrix_to_dataframe()
function used to convert unordereddgTMatrix
objects to COO data frames (#73). - Pretty printing of classes that inherit from
TileDBObject
has been improved so that the class name is displayed first (#79). - Don't use default assay name when recreating a
Seurat
object (#80, thanks @dan11mcguire)
Build and Test Systems
- Added
with_allocation_size_preference()
helper to temporarily set the allocation size preference for testing. - Tests were added to verify the internal
dgtmatrix_to_dataframe()
will error out if an input list contains non-layerable matrices.
tiledbsc 0.1.3
Migration to SOMA-based names
This release changes the names of the 2 top-level classes in the tiledbsc package to follow new nomenclature adopted by the single-cell data model specification, which was implemented here. You can read more about the rationale for this change here.
Additionally, the misc
slot has been renamed to uns
. See below for details.
New class names
SCGroup
is replaced bySOMA
(stack of matrices, annotated)SCDataset
is replaced bySOMACollection
There are no functional changes to either class. SOMA
is a drop-in replacement for SCGroup
and SOMACollection
is a drop-in replacement for SCDataset
. However, with the new names two of SOMACollection
's methods have changed accordingly:
- the
scgroups
field is nowsomas
scgroup_uris()
is nowsoma_uris()
To ease the transition, the SCDataset
and SCGroup
classes are still available as aliases for SOMACollection
and SOMA
, respectively. However, they have been deprecated and will be removed in the future.
New location for miscellaneous/unstructured data
Previously, the SCDataset
and SCGroup
classes included a TileDB group called misc
that was intended for miscellaneous/unstructured data. To better align with the SOMA matrix-api specification this group has been renamed to uns
. Practically, this means new SOMA
s and SOMACollection
s will create TileDB groups named uns
, rather than misc
. And these groups can be accessed with the SOMA
and SOMACollection
classes using SOMA$uns
.
For backwards compatibility:
- if a
misc
group exists within aSOMACollection
orSOMA
on disk, it will be accessible via theuns
field of the parent class - the deprecated
SCDataset
andSCGroup
will continue to provide amisc
field (actually an active binding that aliases theuns
slot) so users can continue to use the old name
Dimension slicing and attribute filtering
It's now possible to read only a specific subset of data into memory.
The following classes now have a set_query()
method:
TileDBArray
and its subclassesAnnotationGroup
and its subclassesSOMA
SOMACollection
With set_query()
you can specify:
- the ranges of the indexed dimensions to slice
- attribute filter conditions
See the new Filtering vignette for details.
Additional changes
- Added
TileDBObject
base class to provide fields and methods common to bothTileDBArray
- andTileDBGroup
-based classes - The
array_exists()
andgroup_exists()
methods have been deprecated in favor of the more generalexists()
- Similar to the
TileDBGroup
class,TileDBArray
now maintains a reference to the underlying array pointer - All classes gain an
objects
field to provide direct access to the underlying TileDB objects - Added missing
config
/ctx
fields toAnnotationGroup
AnnotationDataframe
gainsids()
to retrieve all values from the array's dimensionsoma_object_type
andsoma_encoding_version
metadata are written to groups/arrays at write time- Minimum required version of tiledb-r is now 0.14.0, which also updates TileDB to version 2.10
AnnotationDataframe$from_dataframe()
no longer coerceslogical
columns tointeger
s, as TileDB 2.10 provides support forBOOL
data types- Messages about updating existing arrays are only printed in verbose mode
- Disable duplicates for
AnnotationArray
s so updates will overwrite existing cells
0.1.2
0.1.1
tiledbsc now uses the enhanced Group API's introduced in TileDB v2.8 and TileDB-R 0.12.0.
Note: The next version of tiledbsc will migrate to the new SOMA-based naming scheme described here.
On-disk changes
Group-level metadata is now natively supported by TileDB so TileDBGroup
-based classes no longer create nested __tiledb_group_metadata
arrays for the purpose of storing group-level metadata.
See TileDB 2.8 release notes for additional changes.
API changes
For TileDBGroup
and its child classes:
- the
arrays
field has been replaced withmembers
, which includes both TileDB arrays and groups get_array()
has been replaced withget_member()
which add atype
argument to filter by object type- gain the following methods:
count_members()
,list_members()
,list_member_uris()
, andadd_member()
SCGroup
- the
scgroup_uris
argument has been dropped fromSCDataset
's initialize method (add_member()
should now be used instead to add additionalSCGroup
s)
SCDataset
SCDataset
'sscgroups
field is now an active binding that filtersmembers
forSCGroup
objects
Other changes
- added a
NEWS.md
file to track changes to the package - the fs package is now a dependency
SCGroup
'sfrom_seurat_assay()
method gained two new arguments:layers
, to specify which SeuratAssay
slots should be ingested, andvar
, to control whether feature-level metadata is ingestedSCGroup
'sfrom_seurat_assay()
method will no longer ingest thedata
slot if it is identical tocounts
- Internally group members are now added with names
- New internal
TileDBURI
class for handling various URI formats - The
uri
field for all TileDB(Array|Group)-based classes is now an active binding that retrieves the URI from the privatetiledb_uri
field - Several default parameters have been changed to store the the
X
,obs
, andvar
arrays more efficiently on disk (#50) - Seurat cell identities are now stored in the
active_ident
attribute of theobs
array (#56) - Require at least version 0.13.0 of tiledb-r to support retrieval of group names
0.1.0
Initial pre-release version.
This pre-release should be used only for testing. The API is still under active development and is not yet covered by TileDB compatibility or stability guarantees.