Skip to content

Ophys ROI segmentation

Mike Huang edited this page Nov 8, 2023 · 37 revisions

Overview

The Opyhs ROI Segmentation pipeline employs methods to improve the recall and precision of the suite2p sparsery segmentation algorithm. A denoising step is first added that improves the recall but leads to poor precision. An ROI cell classifier takes human labels ROIs as cell or not cell and that significantly improves the precision with marginal loss to the recall.

An example processed dataset can be found on Isilon at /allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/

ophys segmentation

Denoising - Deep Interpolation

Deep Interpolation is an initial denoising step that improves recall with suite2p segmentation image

Green: true positives Red: false positives Cyan: false negatives

This deep learning based denoiser requires a two step training process. First, the model is trained on a large repository of videos (pretrained model is available). Next, the model is fine tuned on the video to be denoised. Finally, the denoised video is outputted from the model inference.

Finetuning

This module takes a pre-trained model and fine tunes it on the video to be denoised. A previous model trained on an ensemble of SSF datasets can be found on Isilon at /allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5

Run module

python -m ophys_etl.modules.denoising.fine_tuning --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.denoising.fine_tuning --help
Example Input JSON
{
"output_full_args": true,
"test_generator_params": {
  "pre_post_omission": 0,
  "cache_data": false,
  "batch_size": 5,
  "name": "MovieJSONGenerator",
  "total_samples": -1,
  "post_frame": 30,
  "end_frame": -1,
  "randomize": true,
  "pre_frame": 30,
  "start_frame": 0,
  "gpu_cache_full": false,
  "movie_statistics_nbframes": -100,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/val.json",
  "normalize_cache": true,
  "seed": 1234,
  "steps_per_epoch": -1
},
"input_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_input.json",
"run_uid": "1307046775",
"log_level": "INFO",
"finetuning_params": {
  "model_string": "",
  "cache_data": true,
  "multi_gpus": false,
  "nb_workers": 1,
  "output_dir": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878",
  "model_source": {
    "local_path": "/allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5"
  },
  "name": "transfer_trainer",
  "caching_validation": false,
  "use_multiprocessing": false,
  "steps_per_epoch": 20,
  "loss": "mean_squared_error",
  "measure_baseline_loss": false,
  "nb_times_through_data": 1,
  "learning_rate": 0.0001,
  "verbose": 2,
  "period_save": 1,
  "apply_learning_decay": false,
  "epochs_drop": 5
},
"generator_params": {
  "pre_post_omission": 0,
  "cache_data": false,
  "batch_size": 5,
  "name": "MovieJSONGenerator",
  "total_samples": -1,
  "post_frame": 30,
  "end_frame": -1,
  "randomize": true,
  "pre_frame": 30,
  "start_frame": 0,
  "gpu_cache_full": false,
  "movie_statistics_nbframes": -100,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/train.json",
  "normalize_cache": true,
  "seed": 1234,
  "steps_per_epoch": 20
},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_output.json"
}

Module outputs

  1. fine tuned model h5 - best validation loss
  2. fine tuned models h5 - epochs where validation loss improved
  3. epoch vs train/val loss plot

Inference

Run module

python -m ophys_etl.modules.denoising.inference --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.denoising.inference --help
Example Input JSON
{
"generator_params": {
  "batch_size": 5,
  "name": "InferenceOphysGenerator",
  "start_frame": 0,
  "cache_data": true,
  "normalize_cache": false,
  "gpu_cache_full": false,
  "data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/MOTION_CORRECTION/2023-10-28_08-10-39-046840/1307046775_suite2p_motion_output.h5",
  "seed": 1234
},
"inference_params": {
  "model_source": {
    "local_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/1307046775_mean_squared_error_transfer_model.h5"
  },
  "rescale": true,
  "save_raw": false,
  "output_padding": true,
  "output_file": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
  "steps_per_epoch": 0
},
"run_uid": "1307046775"
}

Module outputs

  1. denoised video h5

Addendum

We forked the main repository to make several improvements to deepinterpolation with regard to logging and optimizations.

Forked repo

Main repo

Change log

  • Performance optimizations
    • Cache input video with MovieJSONGenerator
    • The data cache for the input video is shared between the train and test generator objects instead of caching twice
    • 2D slicing with MovieJSONGenerator when generating a batch input instead of running through an inefficient for loop
  • Separate logging of train and validation by Keras. This helps determine performance bottlenecks during training
  • Bugfix with marshmallow schema validator
  • Unpin argschema in requirements

Suite2P Segmentation

Segmentation

Segmentation is performed with suite2p's sparsery module

The threshold_scaling parameter was optimized ad hoc for the SSF datasets by lowering the threshold such that the recall was maximized but not lowered excessively to result in excessive false positives. This value may need to be optimized ad hoc for new datasets (such as mFISH learning).

The defaults for all other parameters were used.

The resultant ROIs are further postprocessed with the following:

  1. filter by aspect ratio
    • postprocess_args.aspect_ratio_threshold
  2. binarize masks from suite2p weights if it crosses an absolute threshold
    • postprocess_args.abs_threshold (optional), default to use the quantile defined by binary quantile
    • postprocess_args.binary_quantile (optional) (default=0.1)
  3. reduce pixelation by performing binary closing followed by binary opening
  4. format suite2p output to LIMS schema

Usage

Run module

python -m ophys_etl.modules.segment_postprocess --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.segment_postprocess --help
Example Input JSON
{
"suite2p_args": {
  "h5py": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
  "movie_frame_rate_hz": 9.48
},
"postprocess_args": {},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/SEGMENTATION/2023-10-28_13-10-13-009716/SEGMENTATION_output.json"
}

Module outputs

  1. ROI json file with a list of ROIs stored with the following schema
  • x: left most coordinate of bounding box
  • y: top most coordinate of bounding box
  • width: width of bounding box
  • height: height of bounding box
  • mask_matrix: 2D boolean array of cropped ROI
  • valid_roi: boolean value (this may change in further downstream steps)
  • exclusion_labels: list of reasons the ROI may be invalidated (examples include but are not limited to intersects motion border, empty neuropil mask, is a decrosstalk ghost)
  • mask_image_plane
  • id: unique identifier for each ROI
  • max_correction_up: maximum upwards shift of motion correction
  • max_correction_down: maximum downwards shift of motion correction
  • max_correction_left: maximum left shift of motion correction
  • max_correction_right: maximum right shift of motion correction

ROI Cell Classifier

This step is to filter out false positives from the suite2p segmentation step. Since suite2p's segmentation algorithm is optimized to be sensitive for high recall, there is an abundance of false positives.

Example original suite2p predictions on left and classified predictions on right for a selected set of regions in a FOV Screenshot 2023-11-03 at 4 19 20 PM

The classifier uses an ImageNet CNN to classify ROIs as cell or not cell.

The input to the CNN is a 128x128x3 image representing each ROI cropped in three representations, correlation projection, max projection, and mask. These are represented by the channels dimension of the 2D CNN. These are generated with two modules, generate_correlation_projection_graph and generate_thumbnails.

image

Training data is generated by randomly sampling ophys experiments to get representative FOVs. Each FOV is cropped in select regions. The ROIs from suite2p are labeled by human labelers as cell or not cell.

image

Generate correlation projection graph

Run module

python -m ophys_etl.modulessegmentation.calculate_edges --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.segmentation.calculate_edges --help
Example Input JSON
{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}

Module output

  1. correlation projection graph pkl file

Generate thumbnails

Run module

python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --input_json <path_to_input_json>

View module input schema

python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --help
Example Input JSON
{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}

Module output

  1. png images of 128x128 cropped thumbnails around each ROI with three representations
  • correlation projection
  • max projection
  • suite2p segmentation mask

Labeling App

https://github.com/AllenInstitute/cell_labeling_app

Backend

The backend uses flask. The primary endpoints are all in src/server/cell_labeling_app/endpoints/endpoints.py. The database schema are in src/server/cell_labeling_app/database/schemas.py

The database currently lives at cell-labeling-app-db-instance-1.cdxknown3r0n.us-west-2.rds.amazonaws.com (credentials can be obtained through AWS RDS)

The steps for creating a labeling job are as follows:

  1. generate inputs to the app for labeling. The script to do that is ophys_etl.modules.roi_cell_classifier.computer_labeler_artifacts. This script basically takes in several inputs required for labeling and puts them in a single H5 file. The video is downsampled to load quicker. Artifacts at s3://prod.cell-labeling-app.alleninstitute.org/learning_mfish/, which have been pulled down to the ec2 instance in an EBS.
  2. populate a labeling job in the database. Use cell_labeling_app.database.populate_labeling_job.
  3. register users if needed scripts/register_users.py
  4. Launch app using cell_labeling_app.main.

This will launch a user-configured number of workers for the web server (uses gunicorn)

frontend

The frontend code is in the client dir and the main app code is in app.js.

A currently running app can be found at http://ec2-34-211-120-165.us-west-2.compute.amazonaws.com/

Training

Training is performed with DeepCell https://github.com/AllenInstitute/DeepCell

An example notebook to train/evaluate the model here

Labels can be obtained via deepcell.cli.modules.create_dataset.construct_dataset where the cell labeling app host is ec2-34-211-120-165.us-west-2.compute.amazonaws.com

Run module

The preferred way to train in production is using the cloud api since it efficiently trains folds in parallel and logs everything to sagemaker as well as logs to mlflow.

python -m deepcell.cli.modules.cloud.train --input_json

This will run training using sagemaker. It also makes uses of MLFlow for logging. The URL for that can be provided on request. It does kfold CV and launches a job per fold in parallel. All inputs/outputs are logged to sagemaker and stored on s3.

View module input schema

python -m deepcell.cli.modules.cloud.train --help
Example Input JSON
{
"log_level": "INFO",
"s3_params": {
  "bucket_name": "dev.deepcell.alleninstitute.org"
},
"train_params": {
  "model_inputs_path": "/home/ec2-user/SageMaker/train_model_inputs_103023_only_learning_mfish.json",
  "model_load_path": "/home/ec2-user/SageMaker/ssf_baseline_checkpoints",
  "model_params": {
    "truncate_to_layer": 22,
    "freeze_to_layer": 8
  },
  "optimization_params": {
    "learning_rate": 1e-4
  },
  "n_folds": 5,
  "tracking_params": {
    "mlflow_server_uri": "http://mlflo-mlflo-16mqjx084gpy-1597208273.us-west-2.elb.amazonaws.com/",
    "mlflow_experiment_name": "learning_mfish_cell_classifier"
  }
},
"instance_type": "ml.p3.2xlarge"
}

Module output

Model weights stored in s3 and tracked by sagemaker.

Model performance metrics tracked by MLFlow.

To access, find training job in sagemaker.

Screenshot 2023-11-07 at 3 32 14 PM

Input data logged:

Screenshot 2023-11-07 at 3 32 52 PM

Model weights:

Screenshot 2023-11-07 at 3 33 22 PM

Logs:

Screenshot 2023-11-07 at 3 33 40 PM

Performance metrics tracked in MLFlow

Screenshot 2023-11-07 at 3 34 28 PM

Inference

Run module

python -m deepcell.cli.modules.inference --input_json <path_to_input_json>

View module input schema

python -m deepcell.cli.modules.inference --help
Example Input JSON
{
  "model_inputs_paths": <path to model inputs>,
  "model_params": {
      "use_pretrained_model": true,
      "model_architecture": "vgg11_bn",
      "truncate_to_layer": 22
  },
  "model_load_path": <path to model checkpoints>,
  "save_path": <where to write outputs>
  "mode": "production",
  "experiment_id": "1234",
  "classification_threshold": 0.5
}

Module output

  1. P(cell) csv with schema:
  • roi-id
  • experiment_id
  • y_score: probability of cell by classifier
  • y_pred: boolean classification (y_pred > classification_threshold)

NWAY Cell Matching

Code for matching cells across experiments in a container is found in a separate repo: https://github.com/AllenInstitute/ophys_nway_matching

A summary of the algorithm and break down of the input json for the module can be found here: https://github.com/AllenInstitute/ophys_nway_matching/wiki/Schema-overview.