-
Notifications
You must be signed in to change notification settings - Fork 5
Ophys ROI segmentation
The Opyhs ROI Segmentation pipeline employs methods to improve the recall and precision of the suite2p sparsery segmentation algorithm. A denoising step is first added that improves the recall but leads to poor precision. An ROI cell classifier takes human labels ROIs as cell or not cell and that significantly improves the precision with marginal loss to the recall.
An example processed dataset can be found on Isilon at /allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/
Deep Interpolation is an initial denoising step that improves recall with suite2p segmentation
Green: true positives Red: false positives Cyan: false negatives
This deep learning based denoiser requires a two step training process. First, the model is trained on a large repository of videos (pretrained model is available). Next, the model is fine tuned on the video to be denoised. Finally, the denoised video is outputted from the model inference.
This module takes a pre-trained model and fine tunes it on the video to be denoised. A previous model trained on an ensemble of SSF datasets can be found on Isilon at /allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5
Run module
python -m ophys_etl.modules.denoising.fine_tuning --input_json <path_to_input_json>
View module input schema
python -m ophys_etl.modules.denoising.fine_tuning --help
Example Input JSON
{
"output_full_args": true,
"test_generator_params": {
"pre_post_omission": 0,
"cache_data": false,
"batch_size": 5,
"name": "MovieJSONGenerator",
"total_samples": -1,
"post_frame": 30,
"end_frame": -1,
"randomize": true,
"pre_frame": 30,
"start_frame": 0,
"gpu_cache_full": false,
"movie_statistics_nbframes": -100,
"data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/val.json",
"normalize_cache": true,
"seed": 1234,
"steps_per_epoch": -1
},
"input_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_input.json",
"run_uid": "1307046775",
"log_level": "INFO",
"finetuning_params": {
"model_string": "",
"cache_data": true,
"multi_gpus": false,
"nb_workers": 1,
"output_dir": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878",
"model_source": {
"local_path": "/allen/programs/mindscope/workgroups/surround/denoising_labeling_2022/ensemble_output/ensemble_ssf_model_even_smaller_validation_mean_squared_error-0120-0.9622.h5"
},
"name": "transfer_trainer",
"caching_validation": false,
"use_multiprocessing": false,
"steps_per_epoch": 20,
"loss": "mean_squared_error",
"measure_baseline_loss": false,
"nb_times_through_data": 1,
"learning_rate": 0.0001,
"verbose": 2,
"period_save": 1,
"apply_learning_decay": false,
"epochs_drop": 5
},
"generator_params": {
"pre_post_omission": 0,
"cache_data": false,
"batch_size": 5,
"name": "MovieJSONGenerator",
"total_samples": -1,
"post_frame": 30,
"end_frame": -1,
"randomize": true,
"pre_frame": 30,
"start_frame": 0,
"gpu_cache_full": false,
"movie_statistics_nbframes": -100,
"data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/train.json",
"normalize_cache": true,
"seed": 1234,
"steps_per_epoch": 20
},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/DENOISING_FINETUNING_output.json"
}
Module outputs
- fine tuned model h5 - best validation loss
- fine tuned models h5 - epochs where validation loss improved
- epoch vs train/val loss plot
Run module
python -m ophys_etl.modules.denoising.inference --input_json <path_to_input_json>
View module input schema
python -m ophys_etl.modules.denoising.inference --help
Example Input JSON
{
"generator_params": {
"batch_size": 5,
"name": "InferenceOphysGenerator",
"start_frame": 0,
"cache_data": true,
"normalize_cache": false,
"gpu_cache_full": false,
"data_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/MOTION_CORRECTION/2023-10-28_08-10-39-046840/1307046775_suite2p_motion_output.h5",
"seed": 1234
},
"inference_params": {
"model_source": {
"local_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_FINETUNING/2023-10-28_09-10-20-599878/1307046775_mean_squared_error_transfer_model.h5"
},
"rescale": true,
"save_raw": false,
"output_padding": true,
"output_file": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"steps_per_epoch": 0
},
"run_uid": "1307046775"
}
Module outputs
- denoised video h5
We forked the main repository to make several improvements to deepinterpolation with regard to logging and optimizations.
Change log
- Performance optimizations
- Cache input video with MovieJSONGenerator
- The data cache for the input video is shared between the train and test generator objects instead of caching twice
- 2D slicing with MovieJSONGenerator when generating a batch input instead of running through an inefficient for loop
- Separate logging of train and validation by Keras. This helps determine performance bottlenecks during training
- Bugfix with marshmallow schema validator
- Unpin argschema in requirements
Segmentation is performed with suite2p's sparsery module
The threshold_scaling
parameter was optimized ad hoc for the SSF datasets by lowering the threshold such that the recall was maximized but not lowered excessively to result in excessive false positives. This value may need to be optimized ad hoc for new datasets (such as mFISH learning).
The defaults for all other parameters were used.
The resultant ROIs are further postprocessed with the following:
- filter by aspect ratio
postprocess_args.aspect_ratio_threshold
- binarize masks from suite2p weights if it crosses an absolute threshold
-
postprocess_args.abs_threshold
(optional), default to use the quantile defined bybinary quantile
-
postprocess_args.binary_quantile
(optional) (default=0.1)
-
- reduce pixelation by performing binary closing followed by binary opening
- format suite2p output to LIMS schema
Run module
python -m ophys_etl.modules.segment_postprocess --input_json <path_to_input_json>
View module input schema
python -m ophys_etl.modules.segment_postprocess --help
Example Input JSON
{
"suite2p_args": {
"h5py": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"movie_frame_rate_hz": 9.48
},
"postprocess_args": {},
"output_json": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/SEGMENTATION/2023-10-28_13-10-13-009716/SEGMENTATION_output.json"
}
Module outputs
- ROI json file with a list of ROIs stored with the following schema
-
x
: left most coordinate of bounding box -
y
: top most coordinate of bounding box -
width
: width of bounding box -
height
: height of bounding box -
mask_matrix
: 2D boolean array of cropped ROI -
valid_roi
: boolean value (this may change in further downstream steps) -
exclusion_labels
: list of reasons the ROI may be invalidated (examples include but are not limited to intersects motion border, empty neuropil mask, is a decrosstalk ghost) mask_image_plane
-
id
: unique identifier for each ROI -
max_correction_up
: maximum upwards shift of motion correction -
max_correction_down
: maximum downwards shift of motion correction -
max_correction_left
: maximum left shift of motion correction -
max_correction_right
: maximum right shift of motion correction
This step is to filter out false positives from the suite2p segmentation step. Since suite2p's segmentation algorithm is optimized to be sensitive for high recall, there is an abundance of false positives.
Example original suite2p predictions on left and classified predictions on right for a selected set of regions in a FOV
The classifier uses an ImageNet CNN to classify ROIs as cell or not cell.
The input to the CNN is a 128x128x3 image representing each ROI cropped in three representations, correlation projection, max projection, and mask. These are represented by the channels dimension of the 2D CNN. These are generated with two modules, generate_correlation_projection_graph
and generate_thumbnails
.
Training data is generated by randomly sampling ophys experiments to get representative FOVs. Each FOV is cropped in select regions. The ROIs from suite2p are labeled by human labelers as cell or not cell.
Run module
python -m ophys_etl.modulessegmentation.calculate_edges --input_json <path_to_input_json>
View module input schema
python -m ophys_etl.modules.segmentation.calculate_edges --help
Example Input JSON
{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}
Module output
- correlation projection graph pkl file
Run module
python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --input_json <path_to_input_json>
View module input schema
python -m ophys_etl.modules.roi_cell_classifier.computer_classifier_artifacts --help
Example Input JSON
{
"video_path": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/DENOISING_INFERENCE/2023-10-28_11-10-16-834331/1307046775_denoised_video.h5",
"graph_output": "/allen/programs/mindscope/production/informatics/ophys_processing/specimen_1286077606/session_1306855381/experiment_1307046775/ROI_CLASSIFICATION_GENERATE_CORRELATION_PROJECTION_GRAPH/2023-10-28_16-10-13-144253/1307046775_correlation_graph.pkl",
"attribute_name": "filtered_hnc_Gaussian",
"neighborhood_radius": 7,
"n_parallel_workers": 32
}
Module output
- png images of 128x128 cropped thumbnails around each ROI with three representations
- correlation projection
- max projection
- suite2p segmentation mask
https://github.com/AllenInstitute/cell_labeling_app
Backend
The backend uses flask. The primary endpoints are all in src/server/cell_labeling_app/endpoints/endpoints.py
. The database schema are in src/server/cell_labeling_app/database/schemas.py
The database currently lives at cell-labeling-app-db-instance-1.cdxknown3r0n.us-west-2.rds.amazonaws.com
(credentials can be obtained through AWS RDS)
The steps for creating a labeling job are as follows:
- generate inputs to the app for labeling. The script to do that is
ophys_etl.modules.roi_cell_classifier.computer_labeler_artifacts
. This script basically takes in several inputs required for labeling and puts them in a single H5 file. The video is downsampled to load quicker. Artifacts ats3://prod.cell-labeling-app.alleninstitute.org/learning_mfish/
, which have been pulled down to the ec2 instance in an EBS. - populate a labeling job in the database. Use
cell_labeling_app.database.populate_labeling_job
. - register users if needed
scripts/register_users.py
- Launch app using
cell_labeling_app.main
.
This will launch a user-configured number of workers for the web server (uses gunicorn)
frontend
The frontend code is in the client
dir and the main app code is in app.js
.
A currently running app can be found at http://ec2-34-211-120-165.us-west-2.compute.amazonaws.com/
Training is performed with DeepCell https://github.com/AllenInstitute/DeepCell
An example notebook to train/evaluate the model here
Labels can be obtained via deepcell.cli.modules.create_dataset.construct_dataset
where the cell labeling app host is ec2-34-211-120-165.us-west-2.compute.amazonaws.com
Run module
The preferred way to train in production is using the cloud api since it efficiently trains folds in parallel and logs everything to sagemaker as well as logs to mlflow.
python -m deepcell.cli.modules.cloud.train --input_json
This will run training using sagemaker. It also makes uses of MLFlow for logging. The URL for that can be provided on request. It does kfold CV and launches a job per fold in parallel. All inputs/outputs are logged to sagemaker and stored on s3.
View module input schema
python -m deepcell.cli.modules.cloud.train --help
Example Input JSON
{
"log_level": "INFO",
"s3_params": {
"bucket_name": "dev.deepcell.alleninstitute.org"
},
"train_params": {
"model_inputs_path": "/home/ec2-user/SageMaker/train_model_inputs_103023_only_learning_mfish.json",
"model_load_path": "/home/ec2-user/SageMaker/ssf_baseline_checkpoints",
"model_params": {
"truncate_to_layer": 22,
"freeze_to_layer": 8
},
"optimization_params": {
"learning_rate": 1e-4
},
"n_folds": 5,
"tracking_params": {
"mlflow_server_uri": "http://mlflo-mlflo-16mqjx084gpy-1597208273.us-west-2.elb.amazonaws.com/",
"mlflow_experiment_name": "learning_mfish_cell_classifier"
}
},
"instance_type": "ml.p3.2xlarge"
}
Module output
Model weights stored in s3 and tracked by sagemaker.
Model performance metrics tracked by MLFlow.
To access, find training job in sagemaker.
Input data logged:
Model weights:
Logs:
Performance metrics tracked in MLFlow
Run module
python -m deepcell.cli.modules.inference --input_json <path_to_input_json>
View module input schema
python -m deepcell.cli.modules.inference --help
Example Input JSON
{
"model_inputs_paths": <path to model inputs>,
"model_params": {
"use_pretrained_model": true,
"model_architecture": "vgg11_bn",
"truncate_to_layer": 22
},
"model_load_path": <path to model checkpoints>,
"save_path": <where to write outputs>
"mode": "production",
"experiment_id": "1234",
"classification_threshold": 0.5
}
Module output
- P(cell) csv with schema:
roi-id
experiment_id
-
y_score
: probability of cell by classifier -
y_pred
: boolean classification (y_pred > classification_threshold)
Code for matching cells across experiments in a container is found in a separate repo: https://github.com/AllenInstitute/ophys_nway_matching
A summary of the algorithm and break down of the input json for the module can be found here: https://github.com/AllenInstitute/ophys_nway_matching/wiki/Schema-overview.