Skip to content

Commit

Permalink
update to docs, examples, and smooth function
Browse files Browse the repository at this point in the history
  • Loading branch information
Gibbsdavidl committed Feb 9, 2024
1 parent 1a85476 commit cd3b90f
Show file tree
Hide file tree
Showing 11 changed files with 292 additions and 182 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,13 @@ This package works with Scanpy AnnData objects stored as h5ad files.

* **Notebook using Decoupler/Omnipath style API ===>>>** [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb)

* **Notebook for smoothing counts, breaks AnnData into groups.

* **See the paper ===>>>** [gssnng](https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false)

* and finally, [Read the Docs!](https://gssnng.readthedocs.io/en/latest/)



The GSSNNG method is based on using the nearest neighbor graph of cells for data smoothing. This essentially creates
mini-pseudobulk expression profiles for each cell, which can be scored by using single sample gene set scoring
methods often associated with bulk RNA-seq.
Expand Down
94 changes: 51 additions & 43 deletions docs/decoupler_api.rst → docs/decoupler_api_doc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq

`**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_

`**Notebook for creating smoothed count matrices**<https://www.google.com>`_

`**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_

This package works with AnnData objects stored as h5ad files. Expression values are taken from adata.X.
Expand All @@ -45,92 +47,93 @@ Installation

Install the package using the following commands::

pip3 install gssnng



Installation from GitHub
========================

git clone https://github.com/IlyaLab/gssnng

pip install -e gssnng



Scoring Functions
=================

The list of scoring functions::

geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.

singscore: Normalised mean (median centered) ranks (requires ranked data)

ssGSEA: Single sample GSEA based on ranked data.

rank_biased_overlap: RBO, Weighted average of agreement between sorted ranks and gene set.

robust_std: Med(x-med / mad), median of robust standardized values (recommend unranked).

mean_z: Mean( (x - mean)/stddv ), average z score. (recommend unranked).

average_score: Mean ranks or counts

median_score: Median of counts or ranks

summed_up: Sum up the ranks or counts.
python3 -m pip install gssnng

# or to from github
python3 -m pip install git+https://github.com/IlyaLab/gssnng



Example script
==============

Copy the script out from the cloned repo and run, check the paths if you get an error.::
Copy the script out from the cloned repo and run, check the paths if you get an error.

::

cp gssnng/gssnng/test/example_decoupler_omnipath_api.py .

python3.10 example_decoupler_omnipath_api.py


Usage
======

See gssnng/notebooks for examples on all methods.

1. Read in an AnnData object using scanpy (an h5ad file).

2. Get the model from omnipath via the decoupler API.
2. Get the model from omnipath via the decoupler API. You may want to filter out genes negatively associated with the pathway, see the example.

3. Score cells, each gene set will show up as a column in adata.obs.
3. Score cells, each gene set will show up as a column in adata.obsm['gssnng_estimate'].

.. code-block::
::

from gssnng import score_cells

q = sc.datasets.pbmc3k_processed()

# OmniPath Model #
model = dc.get_progeny().query('weight>0')

score_cells.run_gssnng(
adata, model,
source='source',target='target', weight='weight',
groupby="louvain", # None
groupby="louvain",
smooth_mode='connectivity',
recompute_neighbors=32,
score_method="mean_z",
method_params={}, # 'normalization':'standard'
method_params={},
ranked=False,
cores=6
)

#Extracts activities as AnnData object.
# Extracts activities as AnnData object.
acts_gss = dc.get_acts(adata, obsm_key='gssnng_estimate')

# Now we can plot the gene set scores
sc.pl.umap(acts_gss, color=sorted(acts_gss.var_names), cmap='coolwarm')




Scoring Functions
=================

The list of scoring functions::

geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.

singscore: Normalised mean (median centered) ranks (requires ranked data)

ssGSEA: Single sample GSEA based on ranked data.

rank_biased_overlap: RBO, Weighted average of agreement between sorted ranks and gene set.

robust_std: Med(x-med / mad), median of robust standardized values (recommend unranked).

mean_z: Mean( (x - mean)/stddv ), average z score. (recommend unranked).

average_score: Mean ranks or counts

median_score: Median of counts or ranks

summed_up: Sum up the ranks or counts.






Parameters
==========

Expand All @@ -142,6 +145,11 @@ These parameters are used with the "scores_cells.with_gene_sets" function.::
model: str
The decoupler gene set model. See Omnipath Wrappers (https://saezlab.github.io/decoupleR/reference/index.html#omnipath-wrappers).

source: str
weight: str
target: str
Each pathway in OmniPath is a collection of *target* genes from a *source* (i.e. pathway), where each has an interaction *weight*.

groupby: [str, list, dict]
either a column label in adata.obs, and all categories taken, or a dict specifies one group.
SEE DESCRIPTION BELOW
Expand Down
75 changes: 35 additions & 40 deletions docs/gmt_files_doc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq

`**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_

`**Notebook for creating smoothed count matrices**<https://www.google.com>`_

`**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_


Expand All @@ -46,53 +48,23 @@ Installation

Install the package using the following commands::

pip3 install gssnng



Installation from GitHub
========================

git clone https://github.com/IlyaLab/gssnng

pip install -e gssnng



Scoring Functions
=================

The list of scoring functions::

geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.

singscore: Normalised mean (median centered) ranks (requires ranked data)

ssGSEA: Single sample GSEA based on ranked data.

rank_biased_overlap: RBO, Weighted average of agreement between sorted ranks and gene set.

robust_std: Med(x-med / mad), median of robust standardized values (recommend unranked).

mean_z: Mean( (x - mean)/stddv ), average z score. (recommend unranked).

average_score: Mean ranks or counts

median_score: Median of counts or ranks

summed_up: Sum up the ranks or counts.
python3 -m pip install gssnng

# or to from github
python3 -m pip install git+https://github.com/IlyaLab/gssnng



Example script
==============

Copy the script out from the cloned repo and run, check the paths if you get an error.::
Copy the script out from the cloned repo and run, check the paths if you get an error.

::

cp gssnng/gssnng/test/example_script.py .
cp gssnng/gssnng/test/example_gmt_input.py .

python3.10 example_script.py
python3.10 example_gmt_input.py


Usage
Expand All @@ -102,11 +74,11 @@ See gssnng/notebooks for examples on all methods.

1. Read in an AnnData object using scanpy (an h5ad file).

2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP, _DN, and split gene sets _UP+_DN)
2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP, _DN, and split gene sets _UP+_DN), see below for more details.

3. Score cells, each gene set will show up as a column in adata.obs.

.. code-block::
::

from gssnng import score_cells

Expand All @@ -124,6 +96,29 @@ See gssnng/notebooks for examples on all methods.

sc.pl.umap(q, color=['louvain','T.cells.CD8.up'], wspace=0.35)

Scoring Functions
=================

The list of scoring functions:

geneset_overlap: For each geneset, number (or fraction) of genes expressed past a given threshold.

singscore: Normalised mean (median centered) ranks (requires ranked data)

ssGSEA: Single sample GSEA based on ranked data.

rank_biased_overlap: RBO, Weighted average of agreement between sorted ranks and gene set.

robust_std: Med(x-med / mad), median of robust standardized values (recommend unranked).

mean_z: Mean( (x - mean)/stddv ), average z score. (recommend unranked).

average_score: Mean ranks or counts

median_score: Median of counts or ranks

summed_up: Sum up the ranks or counts.


Parameters
==========
Expand Down
85 changes: 85 additions & 0 deletions docs/smoothing_adatas.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
.. GSSNNG documentation master file, created by
sphinx-quickstart on Wed Apr 27 09:20:15 2022.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

gssnng to make smoothed count matrices
======================================

Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq (scRNA-seq).

..
.. toctree::
:caption: Table of Contents
:maxdepth: 2
Installation
Scoring Functions
Example script
Usage
Parameters
Groupby
Gene sets
References


`**Notebook using gmt files** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/gssnng_quick_start.ipynb>`_

`**Notebook using Decoupler/Omnipath style API** <https://colab.research.google.com/github/IlyaLab/gssnng/blob/main/notebooks/Scoring_PBMC_data_with_the_GSSNNG_decoupleR_API.ipynb>`_

`**Notebook for creating smoothed count matrices**<https://www.google.com>`_

`**See the paper** <https://academic.oup.com/bioinformaticsadvances/article/3/1/vbad150/7321111?login=false>`_


This package works with AnnData objects stored as h5ad files. Expression values are taken from adata.X.
For creating groups, up to four categorical variables can be used, which are found in the adata.obs table.


Installation
============

Install the package using the following commands::

python3 -m pip install gssnng

# or to from github
python3 -m pip install git+https://github.com/IlyaLab/gssnng



Example script
==============

Copy the script out from the cloned repo and run, check the paths if you get an error.

::

cp gssnng/gssnng/test/example_smoothing_counts.py .

python3.10 example_smoothing_counts.py


Usage
======

See gssnng/notebooks for examples on all methods.

1. Read in an AnnData object using scanpy (an h5ad file).

2. Get gene sets formatted as a .gmt file. (default is UP, also uses _UP, _DN, and split gene sets _UP+_DN), see below for more details.

3. Score cells, each gene set will show up as a column in adata.obs.

::

from gssnng import nnsmooth

q = sc.datasets.pbmc3k_processed()

q_list = nnsmooth.smooth_adata(adata=q, # AnnData object
groupby='louvain', # Will sample neighbors within this group, can take a list
smooth_mode='connectivity', # Smooths matrix using distance weights from NN graph.
recompute_neighbors=32, # Rebuild nearest neighbor graph with groups, 0 turns off function
cores=4) # Smoothed in parallel.

5 changes: 2 additions & 3 deletions gssnng/smooth_anndatas.py → gssnng/nnsmooth.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
from gssnng.util import error_checking
from typing import Union

def smooth_anndata(
def smooth_adata(
adata: anndata.AnnData,
groupby: Union[str, list, dict],
smooth_mode: str,
recompute_neighbors: int,
method_params: dict,
cores: int
) -> anndata.AnnData:

Expand Down Expand Up @@ -45,7 +44,7 @@ def smooth_anndata(

# score each cell with the list of gene sets
data_list = _proc_data(adata, None, groupby, smooth_mode, recompute_neighbors,
None, method_params, samp_neighbors,
None, None, samp_neighbors,
noise_trials, None, cores, return_data)

print("**done**")
Expand Down
Loading

0 comments on commit cd3b90f

Please sign in to comment.