The scripts are divided into two modules using python and R for each of them
The python script analyze_fcs_flowcal.py
and jupyter notebook scripts_archive/flowcal_pipeline_report.py/ipnb
is a wrapper for (semi-)automated processing of flow cytometry data using FlowCal submodule from Tabor lab : https://github.com/taborlab/FlowCal/ in a standard workflow
Castillo-Hair, Sebastian M., et al. "FlowCal: a user-friendly, open source software tool for automatically converting flow cytometry data from arbitrary to calibrated units." ACS synthetic biology 5.7 (2016): 774-780.
Briefly, this is what the python wrapper does :
- Opens all .fcs files within the directory; identify the file for calibration beads
- Prepares the beads data for calibration into mean equivalent fluorophore units (MEFL)
- FlowCal functions : Cleanup data, convert arbitrary fluorescence into MEFLs. Clean up includes
- Gates out saturated events (low and high end)
- Density gating for cells, to remove debris. Retains 50% events from the highest density region. This parameter can be changed by user and would be good to test : 0.3, 0.5, 0.8 fractions before running all the data
- Retains singlet population : top 90% of the FSC-A vs FSC-H plot. This excludes any clumps of cells..
- Saves the plots showing cleanup steps for all/5 random data in
.html
from the jupyter notebook. - Outputs summary statistics of mean, median, mode to a .csv file
- Saves the cleaned up
.fcs
files to theprocessed_data
directory. These can be analyzed by any tool of the user's choice. To interact with each of these steps individually and test different parameters, such as the fraction retained for density gating, use the jupyter notebookscripts_archive/adhoc_flowcal_analysis.py/ipnb
Note: I save the data from flowcal for analysis by R later. Users can use any other tool they wish. The reason for this decision is that I wasn't satisfied with the analysis and plotting capabilities provided by FlowCal and I prefer ggplot to python's plots. + R has a very good general purpose flowcytometry ecocystem with many packages built upon the flowCore
package; These work on .fcs
files without keeping them in the RAM!
The R section is not fully automated yet, but it should work pretty well once you get a hang of the R commands in an hour or two. Do reach out to me by using the issues section on github if you have questions
- The R section of the pipeline uses the processed data saved by flowcal. _If you wish, you can skip the cleanup in python and look at the raw data with the same R scripts as well.
- It attaches the sample names to wells from a 96-well layout in google sheet/.csv file.
- After this R provides commands to use for gating based on a single representative
.fcs
, and broadcasts the gate to all other data. Using theopenCyto
package for this.- Currently I use the function
openCyto::mindensity(..)
which draws a gate threshold at the minimum density region in 1d, so is applicable when the sample has a bimodal distribution with two populations - Look at the documentations in
openCyto
's autogating for other gating schemes in 1D and 2D. AndflowCore
forrectangleGate()
andquadGate()
- Currently I use the function
- Calculates population statistics for all the data using .flowWorkspace package and save data into
.csv
file - Plots distributions of data as highly customizable ggplots both with and without gating. The plots can be made with a one liner code using the powerful
ggcyto
package. Note: replicate wells with same name are merged. Example figure with lots of customizations (no gating here) : ![[FACS_analysis/plots/S043_28-3-22-processed-ridge density-processed-red.png]]
-
Setup git on your computer if you haven't already - git helper
-
Please clone this R-python hybrid code into your computer with the command
git clone https://github.com/ppreshant/flow_cytometry.git
or thessh
versiongit clone [email protected]:ppreshant/flow_cytometry.git
(which is more secure, and takes a couple mins extra setting up, but I would recommend it - here's some help).- The same folder will hold your flow cytometry data and the outputs so it can get large. Choose the folder location accordingly.
-
For the first time, run the steps in R to to load all the required packages
install.packages('tidyverse')
; and do the same for -- reticulate
- BiocManager
Use BiocManager to install the bioconductor packages -
BiocManager::install("flowCore")
; and others -- ggcyto
- openCyto
-
use conda to setup the python requirements : Mostly need the standard
pandas
,matplotlib
,numpy
etc.- Install miniconda : a minimal version of the package and environment manager
conda
. use instructions from the documentation page - Use the command
conda env create -f flowcal_wrappers_environment.yaml
. This will create an environment with the nameflowcal
and install all the python dependancies listed in the file to your conda environment
- Install miniconda : a minimal version of the package and environment manager
- Put your data into the
flowcyt_data
directory. - Update the files for user_inputs for both python and R:
./0.5-user_inputs.R
: for R steps- base_directory <- 'flowcyt_data' or 'processed_data'
- folder_name <- '..' : the folder your individual
.fcs
files are in within the base_directory - file.name_input <- '..' : Use this option if you have a single
.fcs
file holding multiple data (such as from Guava machines). _After unpacking these data you will use the same name for thefolder_name
option - template_source <- 'googlesheet' # use 'googlesheet' or 'excel' options depending on where you are providing the plate layout to name the wells.
scripts_general_fns/g10_user_config.py
: for python steps- fcs_experiment_folder = '..' : the folder your individual
.fcs
files are in within the base_directory - density_gating_fraction = .5 ; might need to adjust
- fcs_experiment_folder = '..' : the folder your individual
- Put sample names into the excel file
flowcyt_data/plate_layoyts.xlsx
or a google sheet. Each well with sample will have the formatplasmid1_positive
. The value after the '_' is thesample_category
: used to colour plots ; and the value before isassay_variable
will be on the x/y-axis of the plots.excel
option is easier but if you would prefer to use thegooglesheet
for naming the samples, then duplicate theFlow cytometry layouts
tab from this sheet into your own googlesheet, and put its url in the0-general_functions_fcs.R/sheeturls
for theplate_layouts_pk
option.
- If you have a single
.fcs
file with multiple data run and you want to run the flowCal workflow. Run the# prelims
and# load data
sections in the codeanalyze_fcs.R
. This will unpack each individual well into a separate.fcs
file in a folder. For subsequent steps, change thefolder_name
option to the name of the new folder and changefile.name_input
to be empty''
. Now you can go ahead with the python module and come back the the R module.
- open a suitable terminal that works for
conda
and activate theflowcal
environment that you created above withconda activate flowcal
- launch your favorite IDE to access python.
jupyter-lab
should be installed in this environment, so type it's name in the same terminal and a browser window will open - Follow instructions in the [[#Data, and config]] above and, add your directory name etc. to the config file
scripts_general_fns/g10_user_config.py
- Open the jupyter notebook
flowcal_pipeline_report.ipnb
and execute the two cells and your data should be ready in about 3 min! .. to be elaborated
- Ensure that the data is in the folder and config file specific to
R
:./0.5-user_inputs.R
is updated - run
source('./analyze_fcs.R')
to load the data into R - run
7-exploratory_data_view.R
for saving overview of all data. - run
11-manual_gating_workflow.R
for gating and saving counts of populations above the gated thresholds
Do contact me if you have any questions about running this by creating an issue here
wrappers for automated processing and plotting of bacterial flow cytometry data
Copyright (C) 2023 Prashant Kalvapalle
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.