update to docs, examples, and smooth function

IlyaLab · Feb 9, 2024 · cd3b90f · cd3b90f
1 parent 1a85476
commit cd3b90f
Show file tree

Hide file tree

Showing 11 changed files with 292 additions and 182 deletions.
diff --git a/README.md b/README.md
@@ -9,12 +9,13 @@ This package works with Scanpy AnnData objects stored as h5ad files.
 
   * **Notebook using Decoupler/Omnipath style API ===>>>** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb)
 
+  * **Notebook for smoothing counts, breaks AnnData into groups.
+
   * **See the paper ===>>>** [gssnng](https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false)
 
   * and finally, [Read the Docs!](https://gssnng.readthedocs.io/en/latest/)
 
 
-
 The GSSNNG method is based on using the nearest neighbor graph of cells for data smoothing. This essentially creates 
 mini-pseudobulk expression profiles for each cell, which can be scored by using single sample gene set scoring 
 methods often associated with bulk RNA-seq. 

diff --git a/docs/decoupler_api.rst → docs/decoupler_api_doc.rst b/docs/decoupler_api.rst → docs/decoupler_api_doc.rst
@@ -27,6 +27,8 @@ Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq
 
 `**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_
 
+`**Notebook for creating smoothed count matrices**<https://www.google.com>`_
+
 `**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_
 
 This package works with AnnData objects stored as h5ad files. Expression values are taken from adata.X.
@@ -45,92 +47,93 @@ Installation
 
 Install the package using the following commands::
 
-   pip3 install gssnng
-
-
-
-Installation from GitHub
-========================
-
-   git clone https://github.com/IlyaLab/gssnng
-
-   pip install -e gssnng
-
-
-
-Scoring Functions
-=================
-
-The list of scoring functions::
-
-    geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.
-
-    singscore:      Normalised mean (median centered) ranks (requires ranked data)
-
-    ssGSEA:         Single sample GSEA based on ranked data.
-
-    rank_biased_overlap:  RBO, Weighted average of agreement between sorted ranks and gene set.
-
-    robust_std:     Med(x-med / mad), median of robust standardized values (recommend unranked).
-
-    mean_z:         Mean( (x - mean)/stddv ), average z score. (recommend unranked).
-
-    average_score:  Mean ranks or counts
-
-    median_score:   Median of counts or ranks
-
-    summed_up:      Sum up the ranks or counts.
+    python3 -m pip install gssnng
 
+    # or to from github
+    python3 -m pip install git+https://github.com/IlyaLab/gssnng
 
 
 
 Example script
 ==============
 
-Copy the script out from the cloned repo and run, check the paths if you get an error.::
+Copy the script out from the cloned repo and run, check the paths if you get an error.
+
+::
 
  cp gssnng/gssnng/test/example_decoupler_omnipath_api.py  .
 
  python3.10 example_decoupler_omnipath_api.py
 
-
 Usage
 ======
 
 See gssnng/notebooks for examples on all methods.
 
 1. Read in an AnnData object using scanpy (an h5ad file).
 
-2. Get the model from omnipath via the decoupler API.
+2. Get the model from omnipath via the decoupler API.  You may want to filter out genes negatively associated with the pathway, see the example.
 
-3. Score cells, each gene set will show up as a column in adata.obs.
+3. Score cells, each gene set will show up as a column in adata.obsm['gssnng_estimate'].
 
-.. code-block::
+::
 
    from gssnng import score_cells
 
     q = sc.datasets.pbmc3k_processed()
 
+    # OmniPath Model #
     model = dc.get_progeny().query('weight>0')
 
     score_cells.run_gssnng(
         adata, model,
         source='source',target='target', weight='weight',
-        groupby="louvain", # None
+        groupby="louvain",
         smooth_mode='connectivity',
         recompute_neighbors=32,
         score_method="mean_z",
-        method_params={}, # 'normalization':'standard'
+        method_params={},
         ranked=False,
         cores=6
     )
 
-    #Extracts activities as AnnData object.
+    # Extracts activities as AnnData object.
     acts_gss = dc.get_acts(adata, obsm_key='gssnng_estimate')
 
+    # Now we can plot the gene set scores
     sc.pl.umap(acts_gss, color=sorted(acts_gss.var_names), cmap='coolwarm')
 
 
+
+
+Scoring Functions
+=================
+
+The list of scoring functions::
+
+    geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.
+
+    singscore:      Normalised mean (median centered) ranks (requires ranked data)
+
+    ssGSEA:         Single sample GSEA based on ranked data.
+
+    rank_biased_overlap:  RBO, Weighted average of agreement between sorted ranks and gene set.
+
+    robust_std:     Med(x-med / mad), median of robust standardized values (recommend unranked).
+
+    mean_z:         Mean( (x - mean)/stddv ), average z score. (recommend unranked).
+
+    average_score:  Mean ranks or counts
+
+    median_score:   Median of counts or ranks
+
+    summed_up:      Sum up the ranks or counts.
+
+
+
+
+
+
 Parameters
 ==========
 
@@ -142,6 +145,11 @@ These parameters are used with the "scores_cells.with_gene_sets" function.::
     model: str
     The decoupler gene set model. See Omnipath Wrappers (https://saezlab.github.io/decoupleR/reference/index.html#omnipath-wrappers).
 
+    source: str
+    weight: str
+    target: str
+    Each pathway in OmniPath is a collection of *target* genes from a *source* (i.e. pathway), where each has an interaction *weight*.
+
     groupby: [str, list, dict]
     either a column label in adata.obs, and all categories taken, or a dict specifies one group.
     SEE DESCRIPTION BELOW

diff --git a/docs/gmt_files_doc.rst b/docs/gmt_files_doc.rst
@@ -27,6 +27,8 @@ Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq
 
 `**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_
 
+`**Notebook for creating smoothed count matrices**<https://www.google.com>`_
+
 `**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_
 
 
@@ -46,53 +48,23 @@ Installation
 
 Install the package using the following commands::
 
-   pip3 install gssnng
-
-
-
-Installation from GitHub
-========================
-
-   git clone https://github.com/IlyaLab/gssnng
-
-   pip install -e gssnng
-
-
-
-Scoring Functions
-=================
-
-The list of scoring functions::
-
-    geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.
-
-    singscore:      Normalised mean (median centered) ranks (requires ranked data)
-
-    ssGSEA:         Single sample GSEA based on ranked data.
-
-    rank_biased_overlap:  RBO, Weighted average of agreement between sorted ranks and gene set.
-
-    robust_std:     Med(x-med / mad), median of robust standardized values (recommend unranked).
-
-    mean_z:         Mean( (x - mean)/stddv ), average z score. (recommend unranked).
-
-    average_score:  Mean ranks or counts
-
-    median_score:   Median of counts or ranks
-
-    summed_up:      Sum up the ranks or counts.
+    python3 -m pip install gssnng
 
+    # or to from github
+    python3 -m pip install git+https://github.com/IlyaLab/gssnng
 
 
 
 Example script
 ==============
 
-Copy the script out from the cloned repo and run, check the paths if you get an error.::
+Copy the script out from the cloned repo and run, check the paths if you get an error.
+
+::
 
- cp gssnng/gssnng/test/example_script.py  .
+ cp gssnng/gssnng/test/example_gmt_input.py  .
 
- python3.10 example_script.py
+ python3.10 example_gmt_input.py
 
 
 Usage
@@ -102,11 +74,11 @@ See gssnng/notebooks for examples on all methods.
 
 1. Read in an AnnData object using scanpy (an h5ad file).
 
-2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP,  _DN, and split gene sets _UP+_DN)
+2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP,  _DN, and split gene sets _UP+_DN), see below for more details.
 
 3. Score cells, each gene set will show up as a column in adata.obs.
 
-.. code-block::
+::
 
    from gssnng import score_cells
 
@@ -124,6 +96,29 @@ See gssnng/notebooks for examples on all methods.
 
     sc.pl.umap(q, color=['louvain','T.cells.CD8.up'], wspace=0.35)
 
+Scoring Functions
+=================
+
+The list of scoring functions:
+
+    geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.
+
+    singscore:      Normalised mean (median centered) ranks (requires ranked data)
+
+    ssGSEA:         Single sample GSEA based on ranked data.
+
+    rank_biased_overlap:  RBO, Weighted average of agreement between sorted ranks and gene set.
+
+    robust_std:     Med(x-med / mad), median of robust standardized values (recommend unranked).
+
+    mean_z:         Mean( (x - mean)/stddv ), average z score. (recommend unranked).
+
+    average_score:  Mean ranks or counts
+
+    median_score:   Median of counts or ranks
+
+    summed_up:      Sum up the ranks or counts.
+
 
 Parameters
 ==========

diff --git a/docs/smoothing_adatas.rst b/docs/smoothing_adatas.rst
@@ -0,0 +1,85 @@
+.. GSSNNG documentation master file, created by
+sphinx-quickstart on Wed Apr 27 09:20:15 2022.
+You can adapt this file completely to your liking, but it should at least
+contain the root `toctree` directive.
+
+gssnng to make smoothed count matrices
+======================================
+
+Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq (scRNA-seq).
+
+..
+    .. toctree::
+       :caption: Table of Contents
+       :maxdepth: 2
+
+       Installation
+       Scoring Functions
+       Example script
+       Usage
+       Parameters
+       Groupby
+       Gene sets
+       References
+
+
+`**Notebook using gmt files**  <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/gssnng_quick_start.ipynb>`_
+
+`**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_
+
+`**Notebook for creating smoothed count matrices**<https://www.google.com>`_
+
+`**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_
+
+
+This package works with AnnData objects stored as h5ad files. Expression values are taken from adata.X.
+For creating groups, up to four categorical variables can be used, which are found in the adata.obs table.
+
+
+Installation
+============
+
+Install the package using the following commands::
+
+    python3 -m pip install gssnng
+
+    # or to from github
+    python3 -m pip install git+https://github.com/IlyaLab/gssnng
+
+
+
+Example script
+==============
+
+Copy the script out from the cloned repo and run, check the paths if you get an error.
+
+::
+
+ cp gssnng/gssnng/test/example_smoothing_counts.py  .
+
+ python3.10 example_smoothing_counts.py
+
+
+Usage
+======
+
+See gssnng/notebooks for examples on all methods.
+
+1. Read in an AnnData object using scanpy (an h5ad file).
+
+2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP,  _DN, and split gene sets _UP+_DN), see below for more details.
+
+3. Score cells, each gene set will show up as a column in adata.obs.
+
+::
+
+   from gssnng import nnsmooth
+
+    q = sc.datasets.pbmc3k_processed()
+
+    q_list = nnsmooth.smooth_adata(adata=q,                    # AnnData object
+                                       groupby='louvain',          # Will sample neighbors within this group, can take a list
+                                       smooth_mode='connectivity', # Smooths matrix using distance weights from NN graph.
+                                       recompute_neighbors=32,     # Rebuild nearest neighbor graph with groups, 0 turns off function
+                                       cores=4)                    # Smoothed in parallel.
+
diff --git a/gssnng/smooth_anndatas.py → gssnng/nnsmooth.py b/gssnng/smooth_anndatas.py → gssnng/nnsmooth.py
@@ -3,12 +3,11 @@
 from gssnng.util import error_checking
 from typing import Union
 
-def smooth_anndata(
+def smooth_adata(
         adata: anndata.AnnData,
         groupby: Union[str, list, dict],
         smooth_mode: str,
         recompute_neighbors: int,
-        method_params: dict,
         cores: int
     ) -> anndata.AnnData:
 
@@ -45,7 +44,7 @@ def smooth_anndata(
 
     # score each cell with the list of gene sets
     data_list = _proc_data(adata, None, groupby, smooth_mode, recompute_neighbors,
-                                  None, method_params, samp_neighbors,
+                                  None, None, samp_neighbors,
                                   noise_trials, None, cores, return_data)
 
     print("**done**")