Ophys ROI segmentation

Overview

The Opyhs ROI Segmentation pipeline employs methods to improve the recall and precision of the suite2p sparsery segmentation algorithm. A denoising step is first added that improves the recall but leads to poor precision. An ROI cell classifier takes human labels ROIs as cell or not cell and that significantly improves the precision with marginal loss to the recall.

An example processed dataset can be found on Isilon at /allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/

ophys segmentation

Denoising - Deep Interpolation

Deep Interpolation is an initial denoising step that improves recall with suite2p segmentation

Green: true positives Red: false positives Cyan: false negatives

This deep learning based denoiser requires a two step training process. First, the model is trained on a large repository of videos (pretrained model is available). Next, the model is fine tuned on the video to be denoised. Finally, the denoised video is outputted from the model inference.

Finetuning

This module takes a pre-trained model and fine tunes it on the video to be denoised. A previous model trained on an ensemble of SSF datasets can be found on Isilon at /allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5

Run module

python -m ophys_etl.modules.denoising.fine_tuning --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.denoising.fine_tuning --help

Example Input JSON

{
"output_full_args": true,
"test_generator_params": {
  "pre_post_omission": 0,
  "cache_data": false,
  "batch_size": 5,
  "name": "MovieJSONGenerator",
  "total_samples": -1,
  "post_frame": 30,
  "end_frame": -1,
  "randomize": true,
  "pre_frame": 30,
  "start_frame": 0,
  "gpu_cache_full": false,
  "movie_statistics_nbframes": -100,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/val.json",
  "normalize_cache": true,
  "seed": 1234,
  "steps_per_epoch": -1
},
"input_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_input.json",
"run_uid": "1307046775",
"log_level": "INFO",
"finetuning_params": {
  "model_string": "",
  "cache_data": true,
  "multi_gpus": false,
  "nb_workers": 1,
  "output_dir": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878",
  "model_source": {
    "local_path": "/allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5"
  },
  "name": "transfer_trainer",
  "caching_validation": false,
  "use_multiprocessing": false,
  "steps_per_epoch": 20,
  "loss": "mean_squared_error",
  "measure_baseline_loss": false,
  "nb_times_through_data": 1,
  "learning_rate": 0.0001,
  "verbose": 2,
  "period_save": 1,
  "apply_learning_decay": false,
  "epochs_drop": 5
},
"generator_params": {
  "pre_post_omission": 0,
  "cache_data": false,
  "batch_size": 5,
  "name": "MovieJSONGenerator",
  "total_samples": -1,
  "post_frame": 30,
  "end_frame": -1,
  "randomize": true,
  "pre_frame": 30,
  "start_frame": 0,
  "gpu_cache_full": false,
  "movie_statistics_nbframes": -100,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/train.json",
  "normalize_cache": true,
  "seed": 1234,
  "steps_per_epoch": 20
},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_output.json"
}

Module outputs

fine tuned model h5 - best validation loss
fine tuned models h5 - epochs where validation loss improved
epoch vs train/val loss plot

Inference

Run module

python -m ophys_etl.modules.denoising.inference --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.denoising.inference --help

Example Input JSON

{
"generator_params": {
  "batch_size": 5,
  "name": "InferenceOphysGenerator",
  "start_frame": 0,
  "cache_data": true,
  "normalize_cache": false,
  "gpu_cache_full": false,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/MOTION_CORRECTION/2023-10-28_08-10-39-046840/1307046775_suite2p_motion_output.h5",
  "seed": 1234
},
"inference_params": {
  "model_source": {
    "local_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/1307046775_mean_squared_error_transfer_model.h5"
  },
  "rescale": true,
  "save_raw": false,
  "output_padding": true,
  "output_file": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
  "steps_per_epoch": 0
},
"run_uid": "1307046775"
}

Module outputs

denoised video h5

Addendum

We forked the main repository to make several improvements to deepinterpolation with regard to logging and optimizations.

Forked repo

Main repo

Change log

Performance optimizations
- Cache input video with MovieJSONGenerator
- The data cache for the input video is shared between the train and test generator objects instead of caching twice
- 2D slicing with MovieJSONGenerator when generating a batch input instead of running through an inefficient for loop
Separate logging of train and validation by Keras. This helps determine performance bottlenecks during training
Bugfix with marshmallow schema validator
Unpin argschema in requirements

Suite2P Segmentation

Segmentation

Segmentation is performed with suite2p's sparsery module

The threshold_scaling parameter was optimized ad hoc for the SSF datasets by lowering the threshold such that the recall was maximized but not lowered excessively to result in excessive false positives. This value may need to be optimized ad hoc for new datasets (such as mFISH learning).

The defaults for all other parameters were used.

The resultant ROIs are further postprocessed with the following:

filter by aspect ratio
- postprocess_args.aspect_ratio_threshold
binarize masks from suite2p weights if it crosses an absolute threshold
- postprocess_args.abs_threshold (optional), default to use the quantile defined by binary quantile
- postprocess_args.binary_quantile (optional) (default=0.1)
reduce pixelation by performing binary closing followed by binary opening
format suite2p output to LIMS schema

Usage

Run module

python -m ophys_etl.modules.segment_postprocess --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.segment_postprocess --help

Example Input JSON

{
"suite2p_args": {
  "h5py": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
  "movie_frame_rate_hz": 9.48
},
"postprocess_args": {},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/SEGMENTATION/2023-10-28_13-10-13-009716/SEGMENTATION_output.json"
}

Module outputs

ROI json file with a list of ROIs stored with the following schema

x: left most coordinate of bounding box
y: top most coordinate of bounding box
width: width of bounding box
height: height of bounding box
mask_matrix: 2D boolean array of cropped ROI
valid_roi: boolean value (this may change in further downstream steps)
exclusion_labels: list of reasons the ROI may be invalidated (examples include but are not limited to intersects motion border, empty neuropil mask, is a decrosstalk ghost)
mask_image_plane
id: unique identifier for each ROI
max_correction_up: maximum upwards shift of motion correction
max_correction_down: maximum downwards shift of motion correction
max_correction_left: maximum left shift of motion correction
max_correction_right: maximum right shift of motion correction

ROI Cell Classifier

This step is to filter out false positives from the suite2p segmentation step. Since suite2p's segmentation algorithm is optimized to be sensitive for high recall, there is an abundance of false positives.

Example original suite2p predictions on left and classified predictions on right for a selected set of regions in a FOV Screenshot 2023-11-03 at 4 19 20 PM

The classifier uses an ImageNet CNN to classify ROIs as cell or not cell.

The input to the CNN is a 128x128x3 image representing each ROI cropped in three representations, correlation projection, max projection, and mask. These are represented by the channels dimension of the 2D CNN. These are generated with two modules, generate_correlation_projection_graph and generate_thumbnails.

Training data is generated by randomly sampling ophys experiments to get representative FOVs. Each FOV is cropped in select regions. The ROIs from suite2p are labeled by human labelers as cell or not cell.

Generate correlation projection graph

Run module

python -m ophys_etl.modulessegmentation.calculate_edges --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.segmentation.calculate_edges --help

Example Input JSON

{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}

Module output

correlation projection graph pkl file

Generate thumbnails

Run module

python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --help

Example Input JSON

{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}

Module output

png images of 128x128 cropped thumbnails around each ROI with three representations

correlation projection
max projection
suite2p segmentation mask

Labeling App

https://github.com/AllenInstitute/cell_labeling_app

Backend

The backend uses flask. The primary endpoints are all in src/server/cell_labeling_app/endpoints/endpoints.py. The database schema are in src/server/cell_labeling_app/database/schemas.py

The database currently lives at cell-labeling-app-db-instance-1.cdxknown3r0n.us-west-2.rds.amazonaws.com (credentials can be obtained through AWS RDS)

The steps for creating a labeling job are as follows:

generate inputs to the app for labeling. The script to do that is ophys_etl.modules.roi_cell_classifier.computer_labeler_artifacts. This script basically takes in several inputs required for labeling and puts them in a single H5 file. The video is downsampled to load quicker. Artifacts at s3://prod.cell-labeling-app.alleninstitute.org/learning_mfish/, which have been pulled down to the ec2 instance in an EBS.
populate a labeling job in the database. Use cell_labeling_app.database.populate_labeling_job.
register users if needed scripts/register_users.py
Launch app using cell_labeling_app.main.

This will launch a user-configured number of workers for the web server (uses gunicorn)

frontend

The frontend code is in the client dir and the main app code is in app.js.

A currently running app can be found at http://ec2-34-211-120-165.us-west-2.compute.amazonaws.com/

Training

Training is performed with DeepCell https://github.com/AllenInstitute/DeepCell

An example notebook to train/evaluate the model here

Labels can be obtained via deepcell.cli.modules.create_dataset.construct_dataset where the cell labeling app host is ec2-34-211-120-165.us-west-2.compute.amazonaws.com

Run module

The preferred way to train in production is using the cloud api since it efficiently trains folds in parallel and logs everything to sagemaker as well as logs to mlflow.

python -m deepcell.cli.modules.cloud.train --input_json

This will run training using sagemaker. It also makes uses of MLFlow for logging. The URL for that can be provided on request. It does kfold CV and launches a job per fold in parallel. All inputs/outputs are logged to sagemaker and stored on s3.

View module input schema

python -m deepcell.cli.modules.cloud.train --help

Example Input JSON

{
"log_level": "INFO",
"s3_params": {
  "bucket_name": "dev.deepcell.alleninstitute.org"
},
"train_params": {
  "model_inputs_path": "/home/ec2-user/SageMaker/train_model_inputs_103023_only_learning_mfish.json",
  "model_load_path": "/home/ec2-user/SageMaker/ssf_baseline_checkpoints",
  "model_params": {
    "truncate_to_layer": 22,
    "freeze_to_layer": 8
  },
  "optimization_params": {
    "learning_rate": 1e-4
  },
  "n_folds": 5,
  "tracking_params": {
    "mlflow_server_uri": "http://mlflo-mlflo-16mqjx084gpy-1597208273.us-west-2.elb.amazonaws.com/",
    "mlflow_experiment_name": "learning_mfish_cell_classifier"
  }
},
"instance_type": "ml.p3.2xlarge"
}

Module output

Model weights stored in s3 and tracked by sagemaker.

Model performance metrics tracked by MLFlow.

To access, find training job in sagemaker.

Screenshot 2023-11-07 at 3 32 14 PM

Input data logged:

Screenshot 2023-11-07 at 3 32 52 PM

Model weights:

Screenshot 2023-11-07 at 3 33 22 PM

Logs:

Screenshot 2023-11-07 at 3 33 40 PM

Performance metrics tracked in MLFlow

Screenshot 2023-11-07 at 3 34 28 PM

Inference

Run module

python -m deepcell.cli.modules.inference --input_json <path_to_input_json>

View module input schema

python -m deepcell.cli.modules.inference --help

Example Input JSON

{
  "model_inputs_paths": <path to model inputs>,
  "model_params": {
      "use_pretrained_model": true,
      "model_architecture": "vgg11_bn",
      "truncate_to_layer": 22
  },
  "model_load_path": <path to model checkpoints>,
  "save_path": <where to write outputs>
  "mode": "production",
  "experiment_id": "1234",
  "classification_threshold": 0.5
}

Module output

P(cell) csv with schema:

roi-id
experiment_id
y_score: probability of cell by classifier
y_pred: boolean classification (y_pred > classification_threshold)

NWAY Cell Matching

Code for matching cells across experiments in a container is found in a separate repo: https://github.com/AllenInstitute/ophys_nway_matching

A summary of the algorithm and break down of the input json for the module can be found here: https://github.com/AllenInstitute/ophys_nway_matching/wiki/Schema-overview.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ophys ROI segmentation

Overview

Denoising - Deep Interpolation

Finetuning

Inference

Addendum

Suite2P Segmentation

Segmentation

Usage

ROI Cell Classifier

Generate correlation projection graph

Generate thumbnails

Labeling App

Training

Inference

NWAY Cell Matching

Clone this wiki locally