From 8af9cec41848b7ceed4dc221a1af922b141c97e5 Mon Sep 17 00:00:00 2001 From: skjandu <106275737+skjandu@users.noreply.github.com> Date: Thu, 8 Feb 2024 11:57:41 -0800 Subject: [PATCH] Update index.md --- docs/v1/index.md | 101 +++++++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 30 deletions(-) diff --git a/docs/v1/index.md b/docs/v1/index.md index d8a6cd3..81df665 100644 --- a/docs/v1/index.md +++ b/docs/v1/index.md @@ -13,58 +13,99 @@ ## Introduction -tfsites.DifferentialBindingAnalysis compares the e-scores between two PBM datasets. +differentialBindingAnalysis plots the enrichment scores (e-scores) from two PBM datasets against each other. This allows us to assess whether differential binding occurs between the two transcription factors. -## Functionality - -TBD ## Methodology -TBD +The raw PBM datasets for two transcription factors are downloaded from uniPROBE. For each file, the user indicates the columns of the forward k-mer sequences and e-scores. For each k-mer sequence, its e-score from the first PBM file is plotted against its e-score from the second PBM file. Therefore, each data point in the plot is a k-mer with the ordered pair: (PBM 1 e-score, PBM 2 e-score). To indicate whether differential binding occurs, the resulting scatterplot can have either a trendline of the data points or a line with a slope of 1. ## Parameters -* indicates required parameter +### Inputs and Outputs -- **pbm data*** - - This is a [ state what the format and content is supposed to be] list of SNVs to be analyzed. -- **header.present*** - - TRUE/FALSE, genomic coordinates are 0-indexed -- **out filename*** - - Out file name for the annotated PBM data -- **header.sequence.present** - - TRUE/FALSE, Is there a header sequence in the raw PBM file?. -- **column.forward** - - Column of the forward DNA sequence in the pbm file (1-indexed). -- **column.MFI** - - Column of the MFI in the pbm file (1-indexed). -- **sequence** - - Sequence to be scanned. -- **plot.resolution** - - Plot resolution in DPI. -- **zoom** - - Zoom into the plot by the number of base pairs. +* indicates required parameter +- ***Raw PBM Input for First TF (.tsv)** + - Input file containing the raw PBM dataset for the first transcription factor of interest obtained from uniPROBE. +- ***Raw PBM Input for Second TF (.tsv)** + - Input file containing the raw PBM dataset for the second transcription factor of interest obtained from uniPROBE. +- ***Scatterplot of Enrichment Scores (.png)** + - Name of the output file containing a scatterplot of the enrichment scores (e-scores) from the first PBM dataset plotted against the e-scores from the second PBM dataset. + + +### Other Parameters +- ***Header Present in First PBM File (boolean)** + - If `True`, a header exists in the first PBM data file. If `False`, no header exists. +- ***Column Index of K-mers in First PBM File (integer)** + - Number of the column containing the forward DNA sequence in the first PBM file. (1-indexed, 1 is the first column) +- ***Column Index of E-Scores in First PBM File (integer)** + - Number of the column containing the e-score in the first PBM file. (1-indexed, 1 is the first column) +- ***Header Present in Second PBM File (boolean)** + - If True, a header exists in the first PBM data file. If False, no header exists. +- ***Column Index of K-mers in Second PBM File (integer)** + - Number of the column containing the forward DNA sequence in the second PBM file. (1-indexed, 1 is the first column) +- ***Column Index of E-Scores in Second PBM File (integer)** + - Number of the column containing the e-score in the second PBM file. (1-indexed, 1 is the first column) +- **Label K-mers (comma-separated string)** + - `Default = None` + - List of kmers to be labeled on the plot. +- **Scatter Alpha Threshold (float)** + - `Default = 1` + - Alpha threshold that sets the transparency for data points, to show where most data points are concentrated. +- **Trendline (boolean)** + - Default = `False` + - If `True`, plot a line of regression through the data points. If `False`, plot a line through (0,0) with a slope of 1. ## Input Files -1. pbm data. [ define format and contents in detail ] - - +1. Raw PBM Input For First TF (.tsv) +- Columns + - `8-mer:` every possible forward k-mer sequence with length k + - `8-mer:` the reverse complement of the forward k-mer + - `E-score:` the enrichment score of the k-mer + - `Median:` the median fluorescence intensity of the k-mer + - `Z-score:` the z-score of the k-mer + +``` +8-mer 8-mer E-score Median Z-score +AAAAAAAA TTTTTTTT 0.29130 2871.60 3.5965 +AAAAAAAC TTTTTTTG 0.10748 2086.00 0.3958 +AAAAAAAG TTTTTTTC 0.23656 2539.91 2.3673 +AAAAAAAT TTTTTTTA 0.21760 2434.82 1.9442 +AAAAAACA TTTTTTGT 0.19839 2407.46 1.8310 +``` + +2. Raw PBM Input For Second TF (.tsv) +- Columns + - `8-mer:` every possible forward k-mer sequence with length k + - `8-mer:` the reverse complement of the forward k-mer + - `E-score:` the enrichment score of the k-mer + - `Median:` the median fluorescence intensity of the k-mer + - `Z-score:` the z-score of the k-mer + +``` +8-mer 8-mer E-score Median Z-score +AAAAAAAA TTTTTTTT 0.04621 1378.79 0.0023 +AAAAAAAC TTTTTTTG 0.05236 1595.93 1.2232 +AAAAAAAG TTTTTTTC 0.11724 1515.64 0.7923 +AAAAAAAT TTTTTTTA 0.04593 1390.77 0.0745 +AAAAAACA TTTTTTGT 0.11884 1477.50 0.5795 +``` ## Output Files - 1.line plot: .png. [ describe the plot contennts here ] + 1. Scatterplot of Enrichment Scores (.png) + + ## Example Data [Example input data is available on github](https://github.com/genepattern/tfsites.annotateTfSites/data) -## References - ## Version Comments - **1.0.0** (2023-11-28): Initial draft of document scaffold. +- **1.0.1** (2024-02-02): Draft completed.