Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Skip visits for which processed images exist #1399

Conversation

NicolasGensollen
Copy link
Member

@NicolasGensollen NicolasGensollen commented Nov 27, 2024

Closes #1079

This PR proposes to improve how pipelines determine which subject-session to skip.

Context

The base idea is to analyze the provided CAPS folder before running the processing pipeline and look for the expected output files. If those files are present for a given subject-session, then it won't be processed by the pipeline. This can save quite some time when re-running a pipeline.

This idea is definitely not new and some pipelines are already doing that (T1Linear and some Freesurfer longitudinal pipelines). Concretely, this is done in this method where the files are searched in the CAPS output folder:

def get_processed_images(
caps_directory: Path, subjects: List[str], sessions: List[str]
) -> List[str]:
from clinica.utils.filemanip import extract_image_ids
from clinica.utils.input_files import T1W_LINEAR_CROPPED
from clinica.utils.inputs import clinica_file_reader
image_ids: List[str] = []
if caps_directory.is_dir():
cropped_files, _ = clinica_file_reader(
subjects,
sessions,
caps_directory,
T1W_LINEAR_CROPPED,
)
image_ids = extract_image_ids(cropped_files)
return image_ids

The initial feature request of #1079 was to implement this logic for PETLinear too.

The method above considers that, whatever the user is asking, if the cropped image is found for a subject-session, then it will be skipped. In other words, if the user is asking for the un-cropped image, or if interested in the transformation matrix rather than the image, then the pipeline will still skip the subject-session, which isn't great.

Instead, the method should look for both the transformation and the image corresponding to the user-provided parameters, and only skip the subject-session if both are present.

For PETLinear the idea is the same except that we need to take more parameters into account (tracer, SUVR...).

Implementation

The diff is a bit long (sorry about that...), so here are the main ideas and implementation decisions:

There is a new dataclass called Visit, which is only a convenient way to represent a subject-session. I could also have used a NamedTuple but went for a frozen dataclass instead. Lists of Visit objects can be turned into sets (because they are frozen) and sorted (because they implement ordering): for example ("sub-02", "ses-M006") > ("sub-01", "ses-M100") or ("sub-01", "ses-M006") > ("sub-01", "ses-M000"). This class is in clinica.utils.bids but might better fit somewhere else ?

The previous method get_processed_images was renamed get_processed_visits and returns a list of Visit for which processing is assumed to have been done.

The skipping logic was moved from the concrete pipeline class to the base class (i.e. in the base Pipeline class in engine.py). It is done in the determine_subject_and_session_to_process method, which calls the get_processed_visits abstract method to know what subjects and sessions to remove. get_processed_visits has to be implemented in child classes as it is pipeline-specific.
The PR proposes the implementation of this method for T1Linear and PETLinear. Some "pattern builders" had to be implemented in clinica.utils.input_files in order to be able to query new types of files (for example the transformation matrices in the pet-linear outputs, see function pet_linear_transformation_matrix).

Most of the other changes are due to the implementation of the unit tests.

I had to modify a bit the CAPS generator to:

  • generate pet-linear folders
  • generate more than the nifti images (transformation matrices)
  • accept more parameters to write the appropriate files (tracer, SUVR, cropped vs. uncropped, ...)

Because of the last point, the API of the CAPS generator had to change a bit (the desired pipelines to generate is not a list of strings anymore, but a dictionary mapping pipeline names to a dictionary of parameters). A lot of test files were only changed to use the new API.

Usage

Basically, through the CLI, when passing a CAPS folder with existing files, the user should get a warning displaying the visits that will be skipped:

$ clinica run pet-linear ./in/bids ./ref/caps 18FFDG pons
2024-11-27 15:42:31,431:INFO:Found installation of ants with version 2.3.5, satisfying >=2.2.0.
2024-11-27 15:42:31,462:WARNING:In the provided CAPS folder /Users/nicolas.gensollen/GitRepos/clinica_data_ci/data_ci/PETLinear/ref/caps, Clinica found already processed images for 3 visit(s):
- sub-ADNI029S1384 ses-M000
- sub-ADNI029S1384 ses-M006
- sub-ADNI037S4015 ses-M000
Those visits will be ignored by Clinica.

Copy link
Contributor

@AliceJoubert AliceJoubert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @NicolasGensollen ! Just a few questions/remarks

clinica/pipelines/pet/linear/pipeline.py Show resolved Hide resolved
clinica/pipelines/engine.py Show resolved Hide resolved
@NicolasGensollen NicolasGensollen force-pushed the implement-get_processed_images-for-pet-linear branch from 10bce0a to 0492f90 Compare December 4, 2024 11:55
@NicolasGensollen NicolasGensollen merged commit 7457f03 into aramis-lab:dev Dec 4, 2024
12 of 14 checks passed
@NicolasGensollen NicolasGensollen deleted the implement-get_processed_images-for-pet-linear branch December 4, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pet Pipeline: Non-Selective Reprocessing of Eligible Bids Entries
2 participants