notes on flow cytometry Prashant Kalvapalle started - 5/March/2022 tags: #notes
Read papers that does bacterial flow cytometry to see what kind of analysis they do : Read a bunch of Tabor papers : Castillo hair etc.
- Sexton et al, MSB, 2020 "Multiplexing cell‐cell communication." Molecular systems biology 16.7 (2020): e9618.
- 60% SSC threshold ; 20,000–30,000 E. coli-sized events were recorded; 85% events density gated
- Sebastian, 2019 Schmidl, Sebastian R., et al. "Rewiring bacterial two-component systems by modular DNA-binding domain swapping." Nature Chemical Biology 15.7 (2019): 690-698.
- 35% SSC threshold ; 20,000 events ; 30% density retained : Nice image showing gating workflow (S15)
- Combined histograms of 3 replicates after gating
- Tried B. subtilis, S oneidensis; used geometric mean?; combines separate day replicates for histograms; Took avg (arithmetic?) of 3 replicates' geometric mean for
- Klümper, Uli, et al. 2015 "Broad host range plasmids can invade an unexpectedly diverse fraction of a soil bacterial community." The ISME journal 9.4 (2015): 934-945.
- Sorting: 0.1% to 82% enrichment fast sorting; second 100% sorting, 10,000 cells : 2,000 eps speed
Full tutorial (flowcore, ggcyto): https://jchellmuth.com/posts/FACS-with-R/ openCyto automatic gating schemes ggcyto plotting help, adding gates, axis limits
- Loading files
- Attaching names from google sheet :: R done, python can try
- Processing :: FlowCal ahead,
FlopR working but too slow. Need to automate FlowCal, ~~and use beads named for FlopR~~- FlowCal lets you pre-trim data before automatic clustering and MEFLing, FlopR does not currently
- Gating :: Useful for determining % of population that is red etc. R/flopR workflow is more familiar right now
- (verify) Neither have a nice gating heirarchy, but R's flowworkset has good ones for gating and stats
- Plotting distribution : Who has the better violins automated? FlowCal : violins, but without sample name and matplotlib is not compositional like ggplot; flopr -- check
- Retrieving summary stats ::
FlowCal.stats
module Python,flowWorkspace::gs_pop_get_count_fast()
in R
- Run-time
- FlowCal : Run time for single .fcs ~ 9-10 sec with plotting and 1 sec without plotting. using
%timeit -n 1 -r 1 _process_single_fcs(..)
. For full workflow of 4 files + beads calibration = 23.4 sec (without plotting calibration details) - FlopR : ~ 1 min per sample ; maybe turn plotting off? Which step takes the most time?
- FlowCal : Run time for single .fcs ~ 9-10 sec with plotting and 1 sec without plotting. using
- What kind of stats do we want from data?
- (S048 mainly) % and numbers of cells in red high, red low and gr ..
- Other expts : distributions of fcs data plotted
- Can we skip MEFL calibration for certain experiments? - Especially if only population fraction in a gate is desired (S048)? Could stay within R if doing this -
- 10/6/22 : Use adhoc R openCyto, flowWorkspace - for S048 data -- flowcal is discarding useful data. Re-calibrate FlowCal to retain more events (~ >80%) in the future?
- 14/5/22 : Let's focus on trim-mefl processing with FlowCal since it is faster
- 30/4/22 : ~~FlopR, since R is good for the other steps -
Script in rmd
- R script loads google sheet names
- python script for calibration w flow cal
- Pass variable of which well(s) are beads; Save the calibrated data to .fcs again
- Can't proceed to next step .. Figure out a way to beautify the existing plot; Ask John Sexton for custom code
- Re-read processed fcs, attach names and condense replicates
- plot using ggcyto + geom_violin/ggridges
- Check with Harsha: Issues with clogs/air from Vibrio? Running vibrio through the machine seems to clog the SEA Sony in BRC. Even beads have air issues with these sample sets (S0045, S045b) and the data looks like this with SSC around 0 ; but beads run on the same day in S046 are fine..
- How old are the cells? vibrio are known to aggregate- Swetha says ; Especially true if they are in stationary phase right?
![[Pasted image 20220530125021.png]]
Advantage of flowcal
- Nice looking plots to show transformations, ex: gated vs full samples, MEFLing
- Built-in normalization
- Density based gating : ~automatic but needs a user based selection of the % of cells that should be retained
- Check if this can be done in R too?
-
to_mef = FlowCal.mef.get_transform_fxn
ing.15..
: fails when only red channels are present in fluorescence_channels :: uses the mGreenLantern mefl vals for it :( - Code: input() function called from within a module in jupyterlab does not work. See
analyze_fcs_flowcal
/Line 121 for beads with low events - code stops in the pipeline but runs adhoc :
to_mef()
step : on gfp and mcherry2 dataS066x_Ara dose-1
--fluorescence_channels
is empty, not being recognized -- due to PBS - _(fixed now) Looks like singlet gating is using the wrong y axis- should be FSC-H?
-
Investigate the bimodality around 0 for non fluorescent cells. Ex: S055/S063. is this an artifact of the Sony flow cyt software doing background subtraction?
-
Verdict : Don't use MEFLing till I figure out why the beads are giving shitty slope
m
(~ 0.8) even after strong thresholding (low = 5,000 / 10,000) like Sebastian did -
Check on the github issue I made on FlowCal. Sebastian confirms that m needs to be closer to 1. Beads have too much debris (probably data too?). Need to clear more debris with
gate.high_low(beads_sample, channels=['FSC-A'], low=(5000))
- First, make sure that your beads processing is correct.
m=0.68
is highly atypical and suspicious. Ideally you would have it between 0.9 - 1.1, and these should be extremely consistent across your runs (i.e. if you got 0.92 once, you should get +/- 0.1 at most every time in the same instrument) - The other option is to force
m=1
. the beads curve won't fit well in this case.. compare S055 : Raw ![[S055_51 in 8 organisms-raw-ridge density-raw.png|250]] Processed ![[S055_51 in 8 organisms-processed-ridge density-processed.png|250]]
- First, make sure that your beads processing is correct.
-
Take the negative control sample from this dataset and trace the intermediate distributions in the flowCal process - Is this because of subsetting the data or due to MEFL transformation? -- Looks like it is. How is it that S061 sensory only doesn't have the bimodal
-
trying on negative control of S063c/D01 (SS marine) : Raw ![[S063c_Ds_empty-raw.png|250]] Processed : ![[S063c_Ds_empty-processed.png|250]] MEFLing : ![[S063c_Ds_empty-MEFLing.png|500]] ; Histogram at 200 bins : ![[S063c_Ds_empty-processed_200bins.png|250]]
- Follow up to trace steps in the processing to the origination of bimodality ;
- Looks fine after
gate_and_reduce()
: Plotted with thisFlowCal.plot.hist1d(processed_single_fcs.gated_data, channel = fluorescence_channels[0], xlim = (-100, 100), bins = 1000)
- Seeing bimodal around 0 after MEFLing
- Looks fine after
- Follow up to trace steps in the processing to the origination of bimodality ;
-
Effect of the FlowCal processing on a positive sample : Ex: 'S063a_B02_gr_raw..' Raw : ![[S063a_B02_gr_raw.png|250]] Processed : ![[S063a_B02_gr_processed.png|250]]
-
check some sample that is negative with the above method intermediate steps in flowcal. ex: S067b1/B09; which is 143/So - d-1_U which loos clearly bimodal without zoom in ![[S067b1_143_B9_processed_bimodal.png | 200]] Plotting density 2d is more informative than 1d. Here's
FlowCal.plot.density2d(.., mode = 'scatter')
Zoomed in here with smooth and withsmooth = False
![[S067b1_143_B9_raw_density.png | 300]] ![[S067b1_143_B9_processed_density.png| 300]] -
Scatter lot on linearscale (this one should do no approximations, plotting exact data) : Shows that MEFL is producing values away from 0 -- need to inspect the MEFLing vector?
![[S067b1_143_B9_raw_scatter.png|300]] ![[S067b1_143_B9_processed_scatter.png|300]]
-
MEFLing is producing a gap at 0 - check formula if 0 is not an output of MEFL transformation, this could cause the effect. How to get around it? Running
to_mef.fitting()
withfull_output = True
gives these details about the calibration model 'beads_params': array(6.84108707e-01, 4.13784424e+00, 3.43116109e+03),'beads_model_str': 'm*log(fl_rfi) + b = log(fl_mef_auto + fl_mef)'
where m (slope) = 0.684 b (intercept) = 4.137 fl_mef_auto (autofluor.) = 3.43e3 --- this seems too high, is something weird going on? -
Post a reply with the density plot - say that bin edges is not an issue, maybe the model parameters..
Checking the bead calibration of S055_B2 (MG1655 control) which looks clean (not bimodal) in a mGreenLantern plot ; it is still bimodal in mScarlet! fluorphore order : green (good), red (bimodal) ``'beads_params': [array([ 0.89199098, 2.5864598 , 792.15587603]), array([7.93298660e-01, 2.76010590e+00, 1.09402666e+03])],`
the m (slope) close to 1 seems to be less bimodal
The values are not that different but see the scatterplots ![[S055_B2_green_good.png|300]] ![[S055_B2_red_bad.png|300]]
Here's the calibration parameters for green : 0.88250639, 2.46310895, 675.87577154]
the m is closer to 1, but the bimodality is also more subtle here : example data D01 - same as above
![[S063c_Ds_empty-processed_200bins.png|250]]
-
Make a list of samples with the # of events before filtering to check if something is wrong?
-
Practicing the flowcal tutorial
-
Read in a bunch of
.fcs
files from given directory into a vectorizedFlowCal.io.FCSRead
-
(working) Vectorize the flowcal processing script by putting it in a function / or vectorizing each step with list comprehensions
-
(connecting through R better) : Bring in plate layout from the google sheets, Convert plate layout to columns like in R -- using pandas? or a dplyr for python, Attach the names (with some kind of
regex
matching)
- Plot the initial and final event count (x-axis) by well name (short, y-axis) for easy identification of outliers and mis-processed wells. Or do a side by side scatter plot, labelling only the outliers (outside 2 SD..?). requires deep dive into matplotlib :(
- (_~ not urgent_) figure out the best channel to plot during
flowcal
processing - (ignore, can do in R) Add sample names to the plot using
plt.legend(list of names in the same order, loc = 'best')
- Or figure out what variable in the .fcs file is being made the title of the plots?
- Figure out how to compose multiple data into a matplotlib by colour etc. -- Don't know if it will work as good as ggplot; and if FlowCal does it automatically as flowworkspace
- Add sample names to the median violin plots? docs matplotlib
-
Is pData saved with
write.FCS()
? _NO; needs to me remade -
(mutliple blocks analysis) How to load multiple folders, join metadata and analyze together? Useful when samples are split among multiple blocks, ex: S062, S063 or multi-day S050 like experiments. join file loading and pData replacement into a function and run map2 with plate number/day do add a feature. How to cat multiple cytosets? try
rbind2 and flowset_to_cytoset()
-
(Guava data) To expand single .fcs file into multiple .fcs : use
subprocess.run
module to open an R function use case; documentation -
(not doing this) Getting plate layout google sheet : Can use the same approach to call the existing R function to do this for us
-
(doing this in R, automatic) Merge data for replicates : as simple as ndarray.concatenate? do before plotting violins/ distributions
All fluorescence histograms shown are composed of three histograms taken from samples on three separate days and combined into 128 logarithmically spaced bins between 101–106 MEFL units (Supplementary Fig. 15c) unless otherwise stated. Source: Schmidl, Sebastian R., et al. "Rewiring bacterial two-component systems by modular DNA-binding domain swapping." Nature Chemical Biology 15.7 (2019): 690-698.
-
Saving output .fcs: Idea - from
FCSData/numpy ndarray
use fcswrite to convert to .fcs 3.0 file. Can write any numpy array. For metadata etc, can convert to the data format used by fcsparser or flowcytometry tools which are alluded to in fcswrite documentation.- solved : ~~
Error:
ValueError -If odd number of keys + values detected (indicating an unpaired key or value).
due to duplicates in HEADER and TEXT
- solved : ~~
- Could have a user input if beads data gating looks acceptable before proceeding to MEFLing
- Make a variable for density gating percentage,
- (fancy) get the density gating percentage from user input after showing a plot of 50%, interactive analysis.?
- Show the parameters of fitting in the output ; + have user review before proceeding with MEFLing?
- How do we get volume information to get cell density data (Cells/ul) from the .fcs file?
- Is the flow rate recorded in the .FCS file so we can use the time units (assume seconds?) to do:
$\Large \frac{/sec}{flow rate = (ul/ sec)}$
- Is the flow rate recorded in the .FCS file so we can use the time units (assume seconds?) to do:
- Write an output log file as
.txt
in the output folder - include the initial event count, density fraction gated, final event fraction, calibrated or not, what beads file was used - Add initial event count to the summary data from python workflow (Hint: Line 209 in
analyze_fcs.py
) - (Added to output .csv data) Show the number of (final gated?) events in all files loaded
- (will do graph instead) Look for anomalies in event count with any automated metric and add a warning.
- (too much information, ignore for now) Implement a QC data output by counting events in each gating step (for every .fcs) and save as csv file -- and/Or make a plot to help look for anomalies?
- convert to Quarto document with jupytext : Advantages
- Make the saved html file after the folder name
- Run through R studio and integrate with the later plotting code..?
- Also convert to pluto and see if we can slowly inject julia for practice?
- Add a separator between output of individual well plots '----' and print the well name maybe?
- Separate the beads from the data files
- split into functions that can be run independantly too --
- option to Save plots along the gating procedure for the first 3 (or 3 random)
.fcs
files? better than showing for all right? - Would it be worthwhile to save all plots along the gating process for each file in the dataset for manual QC purposes -- a RMD style html output would be good
- How do we dump the plots from each .fcs file from within the loop into an RMD output? using jupyter-notebook
- (can't use since subplots are already made by
FlowCal.plots
) Can squeeze into a pdf by doing subplots - source - Would a pluto notebook (or jupyter notebook) work for calling the python script and collecting all the plots generated?
Literature
- read the flowcal introduction paper to understand the data storage format and theme etc.
- wondering how extendable the formats are compared to the R/bioconductor ones that are building on the original FlowCore so more future proof?
Data display
Nice plot from tabor lab, 4c : Schmidl, Sebastian R., et al. "Rewiring bacterial two-component systems by modular DNA-binding domain swapping." Nature Chemical Biology 15.7 (2019): 690-698.
![[Pasted image 20220526020446.png]]
- flowCore : core data format (flowFrame etc.)
- flowWorkspace : gating and operations on facs :
samples, groups, transformations, compensation matrices, gates, and population statistics in the gating tree, which is represented as a
GatingSet
object inR
. - Also includes the more efficient formats :cytosets
for holding .fcs data
- openCyto - Automated gating functions and workflows for serial gating
- ggcyto - plotting with ggplot semantics
(note) : When installing these packages, specify the location of install for BiocManager with the command : BiocManager::install('ggcyto', lib = .libPaths()[1])
; 1 being the path in the local "My Documents" folder instead of program files which needs write permissions.
Need to understand the data format a bit flowframe/cytoframe = single files ; set = set of files. cyto - stores data in C for efficiency
-
The cytoset for
S032
dataset has13
observables, and26
cells : I assume each observable before and after gain adjustment/filtering and such? -
What is TIME in
fl.set %>% colnames()
? Time is recorded as each event passes through.. Since the first 4 in S032 have fewer, it was because I terminated the PBS samples as they were taking too long.. -
Need to figure out how to get the
wellid
from .fcs file headers. used it to name the individual files when saving them.- Could record these
wellid
s and put them into a column that I could use to merge metadata
- Could record these
-
Learn how to explore a
flowworkspace::cytoset
-
Make exploratory_plots.rmd run a loop and plot each
colname
modality of data as scatter and histogram -- the file will be pretty bulky already :( -
How to incorporate sample names into the cytoset (would be great to show up on the automatic plots)
- Looks like you can make a data.frame of the annotations and use
pData(gating_set) <- annotations.data.frame
(Cytometry on air, Ryan Duggan, youtube, 2014) - Or
sampleNames(cytoset) <- new_values
as a vector cytoset documentation - How to ensure that the order of samples is matched by the replacement file name? I guess the data.frame workflow might be easier, make a new column called well
- Looks like you can make a data.frame of the annotations and use
- Move the multifile .fcs to
Archive
when expanded into a folder already - Plan a workflow for analysing a subset of files from multiple sub-directories for S050 - multiday expt)
- Can use simple regex? Gets only wells, not easy to get replicates + induced, uninduced etc. Loading with regex 'A01' - works, and adds numbers to the
sampleNames
in the cytoset to make them unique . Will be hard to match to proper day ; not generalizable.. - Implement a function to load processed data, change filenames to the metadata and save to another folder ; also copy the logfile if present and delete the original folder. Start from
merge_cytosets_and_save.R
script.-
- include a key for the dataset captured using regex such as - letters :
S0xx[:alpha:]
/ day key :d[:digit:]
from the folder name ; with user input, skip this when details are present in the dataset already ; how does this change for outer folder vs S050 which has inner folders -- this is rare/onetime only
- include a key for the dataset captured using regex such as - letters :
- Show the first filename example and ask user to proceed before doing the batch renaming
-
- Load these renamed samples using regex as needed ; set gates with a designated sample and save them somehow
- How to save gates?
- Can use simple regex? Gets only wells, not easy to get replicates + induced, uninduced etc. Loading with regex 'A01' - works, and adds numbers to the
- How to solve the problem of non unique names of wells within different runs inside S050? _This is solved in
combine_data_workflow.R
- (solved, by replacing
pData
andsampleNames
) Attach the names from the template to cytoset?cf_rename_channel(x, old, new)
. there is non uniqueness here too for replicates -- need to merge before attaching names - (done) Another application : to join data from different experiments to plot on the same graph (48 + 51 + 78, 79 etc.). temporary fix would be to just copy the (processed) files/renamed with expt and sample name into a temp folder ; since it is just a few files not too much space consumed // Can copy the logfiles and delete the processed folders -- these can always be remade with FlowCal given the logfile parameters
Directory checking in 1-reading_multidata_fcs
-
Check for empty directory to future proof when a directory exists but has no files in it
-
Change sampleNames of the fcs set when reading files by passing a vector(?) to
name.keyword: An optional character vector that specifies which FCS keyword to use as the sample names. If this is not set, the GUID of the FCS file is used for sampleNames, and if that is not present (or not unique), then the file names are used.
-
Remove biological replicate counts from cytometry template layouts (metadata) - justification Could attach the numbering in R, will save some effort while making template?, Unless you want to mark biological replicates from technical replicates, which will complicate the analysis by making more columns .. etc. Could also think about this when multiple dilutions are read and need to be analyzed?
-
Equalize processing for Guava vs Sony :: use alias feature to harmonize names to green/red or fluorophores..
#manually supply the alias vs channel options mapping as a data.frame
in read.FCS
-
Scaling: Do we need to do any non default transformation (such as Pn6 scaling - power transform for parameters stroed on a log scale) :
- keyword(single_fcs) and element P1D shows "Logarithmic,6,1"
- does that mean the data needs to be log transformed and is this done by the default linearize option when loading the file?
- Ref documentation for
load_cytoframe_from_fcs
-
(related to scaling) Why does R/flowWorkspace's
load_cytoset_from_fcs()
give different values than python/FlowCal's..
. The summary stats differ between the software. The difference in graphs could be in their logicle transformations, but difference in summary stats needs explanation- Start with the
transformation
parameter inload_cytoframe..
:linearize
(default),linearize-with-PnG-scaling
, orscale
.linearize
transformation applies the appropriate power transform to the data.. Also, when the transformation keyword of the FCS header is set to "custom" or "applied", no transformation will be used. FlowCal
:An FCS file normally stores raw numbers as they are are reported by the instrument sensors. These are referred to as “channel numbers”. The FCS file also contains enough information to transform these numbers back to proper fluorescence units, called Relative Fluorescence Intensities (RFI), or more commonly, arbitrary fluorescence units (a.u.). Depending on the instrument used, this conversion sometimes involves a simple scaling factor, but other times requires a non-straigthforward exponential transformation. The latter is our case.
- Look at
.fcs
header : using FlowCal/R for both raw data and processed data. Look for the transformation keyword to figure out if R is doing the right thing by ignoring the linearize transform - I'm passingtrasformation = FALSE
by default. Don't know why I did that. Maybe need to use the ignoredFlowCal.transform.to_rfi()
command and redo all the FlowCal processing?
- Start with the
-
Why are there negative values in the fluorescence? ex: S063_Marine promoters-2-green.png. Do the FSC, SSC also have negatives? Take one sample and test it ; check other expts
- is this an artifact of the
logicle
scaling? : Re-look at the Github issues on FlowCal about negative values - Negative values could result from : baseline subtraction ; or compensation
- FSC and SSC also have negative values : Ex: 'S063_B02_FSC_raw' : ![[S063a_B02_FSC-flowcal.png|350]]
- is this an artifact of the
- How to merge replicates of data before taking
summary()
stats, ~median -
R/flowWorkspace
andpython/FlowCal
's data encoding seems different. Seen in S040 graphs - ridgeline/R vs points/Python- why does the
R/flowWorkspace
's calculation have positive medians vs negative medians forpython/FlowCal
's median calculation but with exact same magnitude. This happens only for a few samples on the lower end -- but needs to be sorted if using data from both sources to label the plot ![[Pasted image 20221006162045.png |1000]]
- why does the
- (use ggcyto for plotting, no need to get raw data out) How to get the raw-data from the cytoset to just plot mean/median similar to how Lauren Gambill's script does with flow..python
- Break the processing modules into functions that can be called interactively ex: while figuring out the correct density fraction etc.
- (ggcyto is already merging them) Merge data from biological replicates (as mentioned in paper) : ideas CytoTree ; post issue in flowWorkspace or flowCore?
CombineFCS; This function can be used to combine 2 FCS files having a set of shared markers and return one FCS file (matrix) with the total number of cells is equal to the summation of cells in both FCS files, with each cell has an extended number of measured markers. CyTOFmerge
- Tried using the flowAI's
flow_auto_qc
but it does not work due to number of cells incompatibility - Try the interactive method
flowiQC
- Go into the functions of the flowAI package and fish out the one that is making the flow rate plot and use it by itself
- Calculate an avg flow rate (either all stuff or only gated cells) once you understand what the $Time variable is storing.. and its units
-
Is there a way to gate a
mindensity
to be in the middle of two samples (one positive and one negative)? Start looking at openCyto and then flowWorkspace documentations -
How to plot only the gated data points?
fs <- getData(tbdata, "CD3")
- ggcyto -
Is there a
flowWorkspace
way to gate (quad gate..?) on a single sample and use the same gating parameters across all samples?- (use biexp):
mindensity
gate value causes convergence problems with logicle transform
- (use biexp):
-
figure out how to add filterID with mindensity
-
RemoveMargins
: Remove margin events in flow cyt data. PeacoQC documentation; github;-
The PeacoQC package provides quality control functions that will check for monotonic increasing channels and that will remove outliers and unstable events introduced due to e.g. clogs, speed changes etc. during the measurement of your sample. It also provides the functionality of visualising the quality control result of only one sample and the visualisation of the results of multiple samples in one experiment.
-
- (was due to error in
geom_hex()
; solved by updating toggplot 3.2.4
) Explain the ugly plots of FSC, SSC in S063 processed and raw data.Could this be related to the bin edge [issue](flowcal issue) that also causes bimodality of negative samples around 0?- Something changed after Oct 6th commit 620f4e5 made a good plot for fluors for S040; aes_string(x = as.name(fluor_chnls[['red']])) format - which also produces a colourkey/legend.
- Bad plot first seen on 31 Jan for S062a?
- Steps to try
- Try reproducing the old S044 plots with current scripts?
- Read documentations and try
geom_hex()
->stat_bin_hex()
alternatives - Replace ggcyto call with ggplot call directly?
- Ex: S063a_B02: ![[S063a_B02_raw.png|200]] vs flowcal ![[S063a_B02_raw-flowcal.png|300]]
Also check S067b1 : plotting with geom_hex()
and geom_point()
: hex is messing up representing the density bigtime - was there any change in the function behavior from the past?
![[S067b1_143_ww-raw-sctr_hex.png | 300]]
![[S067b1_143_ww-raw-sctr_point.png | 300]]
- extra plot : Show the density of highly fluorescent events in FSC-SSC plot : Would be useful to see if any gating/density filtration by FlowCal is distorting the data
- Not straightforward, has wierd errors ~
- (too hard to automate) De-clutter ridges : Remove labels and medians for overlapping sample names only.
- Or (Currently doing with user input) remove all medians and labels when > 2 categories (colours) occur for > 5 sample names for exploratory plotting
- For high density data (> 1 or 2 colours/
sample_category
), need to remove median highlighting / control it with a switch -- good first application when you make the ridge plotting into a function
- Plotting order : Currently using red as the default I assume, but can have a user key for primary fluorophore of interest 'red/2' or 'green/1'? Need to pass it to
.fluor_colour
inarrange_in_order_of_fluorophore()
- Re-arrange the ridgeline plots in descending order of fluorescence. Currently it orders in ascending order so messes up for multiple coloured plots (check S063c)
- Overlay accurate median value labels on a ridgeline/density plot -- difficulty is in calculating this since the replicates are merged while plotting but not before-hand. Currently the labels and values of "mean_medians" are shifted to lower than the lines plotted; visible in
scale_x_log10()
- (archive) figure out the mystery :
scale_x_log10()
is plotting different values compared toscale_x_flowjo_biexp()
. The biexp shows negative values too, and MG1655 is centered around 0 almost which is what we expect from the machine's current calibration. The mean_median labels match the data better in this case too - Note: The
name
column of thepData
appears as facets ; and samples with same name are merged before plotting (verified for histograms) - (fixed using
aes_string(as.name(ch))
) Pass channel names stored in a variable (by reference) to the ggcyto aes call does not work easily -- something about (quasi)quotation? - Error:
xlim(c(-100, 1e3))
not working on ggcyto + geomhex + geomdensity2d plot - changing title of the plots : do it manually for now
- Replicates merge: ggcyto automatically merges replicates!
For manual scripts with possible changes across experiments, should they be in a separate branch for each dataset used vs copying over into different files? until of course, the script is eventually broken into functions that don't change across expts.
11-manual_gating.R -> S048..
- Branch : new universal changes can be ported by merging main into branch (moving the branch along; but manually rejecting experiment specific changes) ; Static branch should be in a working state ; retains history
- Copy : (universal) Changes in everything other than the current file will be retained but could break the code. breaking has not happened so far for qPCR files where I apply this philosophy using fucntions from general code
- branch + copy single file: doing both could enable mergeability, cause confusion too? -- thinking too much.. go do some actual work now
- Log10 transform, trim zero values (-Inf after log transf)
- Density gating
- Singlet gating ; May not be as useful if density gating already makes it clean enough.; beads seem to matter -- compare with FlowCal for S044 beads
- Autofluorescence (geometric mean) subtraction ; straightforward task
- MEFL calculation
- Is too slow. 5 files, with calibration takes >5 mins even failing in the beads step.
- flowClust of the beads is the slowest step, approx 1 min ; ggplot saving is also long -- too many points?
- Put a profiler to figure out where it is being slowed down, and how to improve it..
user | system | elapsed |
---|---|---|
155.77 | 12.20 | 291.44 |
-
Some error when processing beads --
-
Get_calibration not selecting the correct bead population. Our sample has too much junk, but the
process_fcs
that was getting bacteria and then singlets was doing ok -- so maybe short-circuit the program to do this for the beads too..
-
Fixed : issue on github! : showing error in
process_fcs
from theget_bacteria
function.[1] "2 clusters found" [1] "only debris found" 'x' must be an object of class flowframe
- this was due to a hardcoded 1e4 threshold for calling debris. I guess our FSC/SSC voltages are on the lower side, and increases them might separate out debris more easily (especially for vibrio)
-
Fix the hardcoded beads file :
pattern = utils::glob2rx("*beads*.fcs")
. Can make a variable to fill with the appropriate well; with this pattern as the default -
Feed in the calibration data as list of lists, with proper names
-
Channel names : Generalize with an if loop later, Currently hard-coded to Sony
-
Test a small dir with 5 files + beads and autofluorescence subtractor (if even relevant for Sony). _Running into errors :
x is empty.., negative values retained in log-transform
_ -
For now, I'm moving onto using flowcal; since flopR is too slow and failing with the density plot of beads. Decision was initially reconsidered after issue with cells not found was fixed
FlopR looks to be better than flowcal, does doublet discrimination and background subtraction as well (though it is trivial)
FlowAI does some QC truncation of data based on flow rate anomalies (OUP, 2016)
- Is also useful to get the flowrates and volume estimations for our case on the sorter data
- Need this to do CFU or cells/ml volume normalization
table 1 | table 2 |
---|---|
God knows | What does |
Who is | God really? |
Notes to reproduce running python scripts inside Pluto notebooks as a way to capture output plots in a nice html format - 21/5/22
- Load julia with
julia
in the gitbash terminal - Create new julia environment using
]
thenactivate .
- Add packages with
add ..
: AddedPluto
,Plots
,PyCall
- load package with
using PyCall
. following PyCall documentation - Add python of the desired conda environment to path variable
ENV["PYTHON"] = "C:/Users/new/.conda/envs/flowcal/python.exe"
- Load Pkg with
using Pkg
and build PyCall (need to do everytime the path changes or any major changes are made) withPkg.build("PyCall")
Error
from scripts_general_fns.g3_python_utils_facs import *
causesModuleNotFoundError("No module named 'scripts_general_fns'")
- Same error in jupyter-lab
- Looks like you need empty
__init__.py
files within subdirectories where the modules are located (or all dirs starting from the closest path insys.path
)
- Idea: have the base python code as a standalone runnable script. and call the
..py
from jupyter or pluto when you need the plots to be saved.
Goal : Want to transition the jupyter notebook into a quarto (or Rmd) workflow so I can run it from within Rstudio.
adhoc_flowcal_analysis.qmd
works when run in Rstudio chunk by chunk :) Should inspire the full workflow flowcal_html_output.qmd
now!
- current error :
pandoc .. openFile: does not exist (No such file or directory)
git issue
- (convenience) Need to read in the filename directly from R script
-.5-user_inputs.R
instead of the python fileg10.user_config.py
to prevent duplication
- Quarto has a different way of managing attachments. It is misplacing the attachments in two different places : 1. head directory - images ; 2.
scripts_archive/
- libs : contains formatting etc. - Moving them both tohtml_outputs
enables a cleaner.html
output : CheckS071a_d-1,0.html
- [ ] Read on why this is different from.Rmd
and if there are options to make them same - Modules loaded through R (using reculate) don't update when the
.py
file is updated. Requires to restart R - Show
IPython.core.display.Markdown object
in quarto document like jupyter notebook does it
Need to figure out how to use the correct conda env 'flowcal'. is this set by reticulate or by quarto or Rstudio itself?. Works with R's reticulate
library if Sys.setenv(RETICULATE_PYTHON = "C:/Users/new/.conda/envs/flowcal/python.exe")
is used before library(reticulate)
.
- Using python with Rstudio IDE : documentation
- Alternate way : create a quarto project : docs
- (useless) Rstudio - conda help : https://community.rstudio.com/t/using-rstudio-within-a-conda-environment/128780
- Might be better to rewrite -- directory changes and modules loading might differ need to convert onetime only and not link notebooks
- Use
quarto convert
- from jupytext docs below - Check jupytext docs
how to do this? Need to get the quarto
library says quarto docs
From the R console using the quarto R package:
library(quarto)
quarto_render("document.qmd") # defaults to html
quarto_render("document.qmd", output_format = "pdf")
quarto produces some extraneous folder that holds random files, is this going to be overwritten when running multiple datasets or needs to be saved for actual html to be visible?
Seems to take longer but not sure.
The markdown calls from python are displayed as output <IPython.core.display.Markdown object>
. Maybe quarto will do better?
- fixed by resaving data? Noticed that name of fluophores: gfpmut3, mcherry2 names are not showing up in channel names on d0/d1/d8 data. Did we forget to unmix or something else gone wrong?
- Is the fluor data completely missing? Confirmed in the Sony software: Happens when the spectrum does not show up, implying that compensation removes the fluorescenc channels from the data
- Check sizes for comparable number of samples : data for d-1 has gfpmut3 and mcherry2 channels
data | # samples | size | Notes |
---|---|---|---|
S048 | 35 | 164 MB | |
S050.d8 | 78 | 80 MB | definitely missing data :( |
S052 | 33 | 53 MB | mGL, mSc present; Maybe less off target events due to thresholding on SSC instead of FSC? |
- S052 PBS wells were missing the fl channels and preventing making of cytoset.. Could this property have been carried over from the S050 dataset?
> colnames(fl.set) # vector [1] "FSC-A" "FSC-H" "FSC-W" "SSC-A" "SSC-H" "SSC-W" "TIME"
in comparison, S048 gives
[1] "FSC-A" "FSC-H" "FSC-W" "SSC-A" "SSC-H" "SSC-W"
[7] "mGreenLantern cor-A" "mGreenLantern cor-H" "mGreenLantern cor-W" "mScarlet-I-A" "mScarlet-I-H" "mScarlet-I-W"
[13] "TIME"
- S050 issue when combining data workflow : 8 files are missing
- Fixed by loading the proper files now
missing files presumably because they were not in the plate grid ; d-1/E12, G12, H12 - probably PBS and beads?
Use case : Currently doing this only for showing full distribution figs
- (gating) Check for S070/S050 data if with and without MEFLing changes gated fractions significantly (gate above C-)
Memory paper
- S050 : Need for gating (if it makes a difference?) ; (Fig S only) Can use it to show trend of decay in suppl.
- (?) S071 : memory WW visual glance
- (check) S070 Ara dose response
Marine
- S063a, b, c : Redo processing without MEFLing
Here's some notes on analysis used for memory paper
- Fig 2c : S070 - gated fractions
- (with MEFLing)
- gated on sample : B02 ; quantile 99, sets at 647.82
- Fig 3 : S050 - gated fractions
- No MEFLing
- Gating :
- green : 'A07_d-1' ; 99 quantile gate : sets at 172.39
- red : 'A06_d-1' ; 300 manual gate -- since tail is long, eyeballing 300 based on full distribution
- Fig S11? :
S061_WW-AHL production
- No MEFLing
- Gate : C08 ; 99 quantile gate : sets at 2668.32
Processed data are getting too bulky and can be archived for old and published stuff onto box.com
- Eventually will need a plan if these intermediate data need to be kept and where once I leave Rice
- Discuss with Lauren the plan for data archival incl. big data such as flow cyt..
Here's the folders using the most space. Screenshot from WizTree! ![[flowcyt_disk_usage.png]]
**Recombinase memory
- S071_combined and MEFL. Expt: Memory in WW II
- S050 individual, S050_combined, and MEFLs for both. Expt: Long duration memory in E. coli
SS_marine work
- S062_a and b. Marine promoter library I
- S063_a, b and c_2 & combined. iteration II. Re-conjugation