Enhance code with BIDS abstractions #446
Replies: 10 comments 19 replies
-
Look into https://github.com/bids-standard/pybids |
Beta Was this translation helpful? Give feedback.
-
Comparison of active BIDS implementationsSource: https://bids.neuroimaging.io/benefits.html
Possible validation against bids-examples and bids-langloc public datasets. |
Beta Was this translation helpful? Give feedback.
-
Here are the main usage of BIDS within Clinica which I have been able to identify from my recent audit: Converters
Pipelines
|
Beta Was this translation helpful? Give feedback.
-
Here is my personnal summary of the BIDS spec with Clinica's usage in mind. I have highlighted a few concepts and domain knowledge, which may be worth abstracting over. |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
Typesafe construction of BIDS pathBIDS-like pathsThis is an idea for a BIDS abstraction for safer handling of BIDS-like paths. I am making the clear distinction between BIDS-compliant and BIDS-like paths. The latter corresponds to the general formalism of what a BIDS filename should look like. The former would be validated against a specific modality which mandates additional constraints such as what entities and extensions are valid and their ordering in the naming scheme. For instance:
Proposal: define a BIDS path abstraction which would capture these constraints. path = Subject("P01") / Session("M00") / Suffix("T1w") / Extension(".nii.gz")
print(path.directory)
>>> 'sub-P01/ses-M00/anat'
print(path.filename)
>>> 'sub-P01_ses-M00_T1w.nii.gz'
# Raise en exception, or return an invalid BIDS-like type.
path = Subject("P01") / Suffix("T1w") / Acquisition("lowres") / Extension(".nii.gz")
# Raise en exception, or return an invalid BIDS-like type.
path = Subject("P01") / Acquisition("lowres") / Acquisition("PIB") / Suffix("pet") / Extension(".nii.gz") Here I am re-using the An alternative notation could be BIDS-compliant pathsBIDS-compliance basically links a BIDS-like path to a specific version of a BIDS specification (raw or derivative). As the BIDS specification progresses, new modalities are introduced whilst ambiguous definitions are deprecated. For instance,
Proposal: make this notion of compliance explicit to transform a BIDS-like path to a compliant one. bids_like_path = Subject("P01") / Modality("pet") / Suffix(".nii.gz")
# Returns a BIDS-compliant path.
maybe_bids_path = BIDSSpec(version="1.6.0").validate(bids_like_path)
# Raise an exception, or return as BIDS like path.
maybe_bids_path = BIDSSpec(version="1.4.0").validate(bids_like_path)
|
Beta Was this translation helpful? Give feedback.
-
Operations on BIDS filenamesConstructionpath = Subject("P01") / Suffix("T1w") / Extension(".nii.gz")
t1w = BIDSFilename(path)
t1w.directory
>>> 'sub-P01/anat'
t1w.name
>>> 'sub-P01_T1w.nii.gz'
t1w.path
>>> 'sub-P01/anat/sub-P01_T1w.nii.gz'
sidecar = t1w.with_extension(Extension(".json"))
sidecar.name
>>> 'sub-P01_T1w.json'
pet = t1w.with_suffix(Suffix("pet")).path
>>> 'sub-P01/pet/sub-P01_pet.nii.gz'
other_subject = t1w.with_entity(Subject('P02')).path
>>> 'sub-P02/anat/sub-P02_T1w.nii.gz' Queryingpath = Subject("P01") / Run(1) / Suffix("T1w") / Extension(".nii.gz")
filename = BIDSFilename(path)
filename.entity("sub")
>>> "P01"
filename.entity("run")
>>> 1
filename.entity("ses")
>>> None
filename.suffix
>>> "T1w"
filename.extension
>>> ".nii.gz" |
Beta Was this translation helpful? Give feedback.
-
BIDS model and schemaThere is a partial definition of the BIDS specification to a machine-readable format. The latter is used to generate the template sections in the web version. Other terms of the BIDS taxonomy are defined such as modalities, metadata, associated data, templates or top-level files. |
Beta Was this translation helpful? Give feedback.
-
Domain analysisBIDS datasetBIDS dataset is the highest level concept of the BIDS specification. It is the first term introduced in the common principles and is used consistently within the specification to refer to the whole set of neuroimaging data and metadata. There are two types of BIDS dataset: raw and derivative. Raw datasets must strictly comply with a certain version of a BIDS specification, whilst derivative datasets should follow their own subset with more relaxed constraints. The nominal set of attributes defining a dataset are its name, version of the BIDS specification and dataset type, all of which must be persisted to a BIDS data and filesA BIDS dataset aggregates imaging data and metadata as a tree-like hierarchy of modality-agnostic (MAF) and modality-specific files (MSF). MAF may stored in tabular or key-valued formats and MSF are stored in native recording format with additional metadata stored as a separate key-value mapping. The prescribed file formats are TSV for tabular data, JSON for key-valued data and compressed NIfTI for imaging data. Modality-specific data are always located at the bottom of the tree, whilst modality-agnostic data may appear anywhere above in the hierarchy, based on which level of metadata their corresponding data are associated with. Modality-specific metadata may appear anywhere in the tree and be composed and redefined following the BIDS inheritance principle. BIDS specificationGiven a BIDS dataset, the list of valid data (both MAF ans MSF), filenaming scheme, optional and required metadata are defined by a specific version of the BIDS specification. The latter uses concepts such as entity, modality, datatype, suffix, extension to model the list of possible file collections available within a BIDS hierarchy. The BIDS specification has been improved with BIDS extension proposals introducing new modalities, refinement of the required and optional modality-specific metadata, deprecation of ambiguous suffixes, etc... As far as BIDS derivatives are concerned, there were only recommandations up until version 1.6.0 of the BIDS specification. Version 1.7.0 introduced additional concepts and datatypes which may be used to define a custom specification for pipeline outputs. Official rules for custom specifications include:
Storage conventionA BIDS file (MAF or MSF) is composed of a suffix and an extension, and may prefixed by an ordered set of entities. The suffix identifies the nature of the recording and the extension its filetype. Entities are key-value pairs in the form For example:
All MSF are associated with a datatype (for instance
Inheritance principleMSF metadata can be factored out and recomposed following the inheritance principle. Common metadata belonging to a specific acquisition may be defined anywhere in the metadata hierarchy (subject, session or scan) and redefined multiple times, provided redefinition occurs at most once per metadata level. Composition is done by aggregating attributes from top to bottom, overriding already defined values. Multiple (re)definitions of MSF metadata is not recommended but made possible by this principle. In the most extreme case, the following files:
can all co-exist. |
Beta Was this translation helpful? Give feedback.
-
Domain modelingBIDS datasetThe main abstraction for querying a BIDS dataset. Successful instantiation of a # bids/dataset.py
class DatasetDescription:
name: str
bids_version: str
dataset_type: "raw" | "derivative"
class Dataset:
...
def open(path: PathLike) -> Dataset | None:
...
dataset = open(path) BIDS specificationThis abstraction is required to handle both BIDS raw and derivative datasets. In the case of raw datasets, the BIDS version is enough to identify specifications supported by the dataset. For derivatives however, it needs to be specified manually. # bids/specification.py
class Specification:
...
# bids/dataset.py
class Dataset:
specification: Specification
def with_specification(self, spec: Specification) -> Dataset:
...
spec = Specification(...) # Define the BIDS specification for the derivative
dataset = open(path).with_specification(spec) We could also use the proposed convention that the root BIDS path be named with a prefix identifying the derivative, like Regardless, since we would have to manage several variations of the BIDS specification, i.e. one per version and one per derivative at first, there is a need for a registry abstraction to which we would declare and query instances of BIDS specifications by version or derivative name. # bids/specification_registry.py
def register(spec: Specification, version: str, derivative: str | None):
...
def get(version: str, derivative: str | None) -> Specification | None:
... It will likely be done with a decorator applied to the specification definition. They are functionally equivalent. BIDS templateA BIDS specification models the space of valid BIDS hierarchies and associated metadata. A BIDS template materialize one such hierarchy in this space, onto which search queries may be applied to compute valid BIDS paths for MAF or MSF collections. # bids/template.py
class Template:
def apply(query) -> Iterable[Path]:
...
# bids/specification.py
class Specification:
def get_template(self) -> Template:
... BIDS file collectionsFile collections model entity-linked data corresponding to a specific suffix. This would aggregate files sharing a common suffix, plus metadata gathered in the tree as per the inheritance rules. This is where the distinction of modality-agnostic and modality-specific models would happen. # bids/file_collection.py
class ModalityAgnosticFileCollection:
datatype: Optional[str]
class ModalitySpecificFileCollection:
modality: str
datatype: str BIDS pathsModels a path to a valid BIDS file within a file collection. # bids/path.py
class Path:
entities: Optional[List[Entity]]
suffix: Suffix
extension: Extension
def __str__(self) -> str:
# As [<entities>]_<suffix>.<extension>
... BIDS componentsThose are the low-level components composing a BIDS path, i.e. entities, suffixes and extensions. An entity is modeled with a name, description, prefix and value type (label or index at first, maybe extended later to enum and flag if needed). Prefix must be alphanumerical. A suffix is an alphanumerical value. An extension is composed of alphanumerical values joined with dots. bids/path.py
class Entity:
name: str
description: str
prefix: str
value_type: "label" | "index"
value: str
class Suffix:
name: str
description: str
value: str
class Extension
name: str
description: str
value: str There is likely a pattern to be factored out here, but I would have to progress further in the implementation to find the best way. So far, this is close to what's being modeled in the schema of the BIDS specification. |
Beta Was this translation helpful? Give feedback.
-
Code that handles BIDS datasets is scattered across the code base. It would be useful to have abstract classes that centralize the manipulation of BIDS (and derivative?) directories.
Following a discussion with @ghisvail and @MatthieuJoulot, the following abstractions could be introduced:
class BIDSReader(): Centralizes methods for reading BIDS datasets and extracting information from the data structure
class BIDSWriter(): Centralizes methods for writing BIDS datasets end enforcing coherence with specifications
class BIDSModality(datatype: str, suffix: str, entities: list[str, str], participant_id: str, session_id: str): abstract imaging data for easier querying and manipulation in above classes
class BIDS[json metadata](): enforces schema on required json files (cf PR430 for an example with the required dataset_description.json)
class BIDS[phenotype](): enforces schema on phenotype files
Beta Was this translation helpful? Give feedback.
All reactions