From 8af9cec41848b7ceed4dc221a1af922b141c97e5 Mon Sep 17 00:00:00 2001
From: skjandu <106275737+skjandu@users.noreply.github.com>
Date: Thu, 8 Feb 2024 11:57:41 -0800
Subject: [PATCH] Update index.md
---
docs/v1/index.md | 101 +++++++++++++++++++++++++++++++++--------------
1 file changed, 71 insertions(+), 30 deletions(-)
diff --git a/docs/v1/index.md b/docs/v1/index.md
index d8a6cd3..81df665 100644
--- a/docs/v1/index.md
+++ b/docs/v1/index.md
@@ -13,58 +13,99 @@
## Introduction
-tfsites.DifferentialBindingAnalysis compares the e-scores between two PBM datasets.
+differentialBindingAnalysis plots the enrichment scores (e-scores) from two PBM datasets against each other. This allows us to assess whether differential binding occurs between the two transcription factors.
-## Functionality
-
-TBD
## Methodology
-TBD
+The raw PBM datasets for two transcription factors are downloaded from uniPROBE. For each file, the user indicates the columns of the forward k-mer sequences and e-scores. For each k-mer sequence, its e-score from the first PBM file is plotted against its e-score from the second PBM file. Therefore, each data point in the plot is a k-mer with the ordered pair: (PBM 1 e-score, PBM 2 e-score). To indicate whether differential binding occurs, the resulting scatterplot can have either a trendline of the data points or a line with a slope of 1.
## Parameters
-* indicates required parameter
+### Inputs and Outputs
-- **pbm data***
- - This is a [ state what the format and content is supposed to be] list of SNVs to be analyzed.
-- **header.present***
- - TRUE/FALSE, genomic coordinates are 0-indexed
-- **out filename***
- - Out file name for the annotated PBM data
-- **header.sequence.present**
- - TRUE/FALSE, Is there a header sequence in the raw PBM file?.
-- **column.forward**
- - Column of the forward DNA sequence in the pbm file (1-indexed).
-- **column.MFI**
- - Column of the MFI in the pbm file (1-indexed).
-- **sequence**
- - Sequence to be scanned.
-- **plot.resolution**
- - Plot resolution in DPI.
-- **zoom**
- - Zoom into the plot by the number of base pairs.
+* indicates required parameter
+- ***Raw PBM Input for First TF (.tsv)**
+ - Input file containing the raw PBM dataset for the first transcription factor of interest obtained from uniPROBE.
+- ***Raw PBM Input for Second TF (.tsv)**
+ - Input file containing the raw PBM dataset for the second transcription factor of interest obtained from uniPROBE.
+- ***Scatterplot of Enrichment Scores (.png)**
+ - Name of the output file containing a scatterplot of the enrichment scores (e-scores) from the first PBM dataset plotted against the e-scores from the second PBM dataset.
+
+
+### Other Parameters
+- ***Header Present in First PBM File (boolean)**
+ - If `True`, a header exists in the first PBM data file. If `False`, no header exists.
+- ***Column Index of K-mers in First PBM File (integer)**
+ - Number of the column containing the forward DNA sequence in the first PBM file. (1-indexed, 1 is the first column)
+- ***Column Index of E-Scores in First PBM File (integer)**
+ - Number of the column containing the e-score in the first PBM file. (1-indexed, 1 is the first column)
+- ***Header Present in Second PBM File (boolean)**
+ - If True, a header exists in the first PBM data file. If False, no header exists.
+- ***Column Index of K-mers in Second PBM File (integer)**
+ - Number of the column containing the forward DNA sequence in the second PBM file. (1-indexed, 1 is the first column)
+- ***Column Index of E-Scores in Second PBM File (integer)**
+ - Number of the column containing the e-score in the second PBM file. (1-indexed, 1 is the first column)
+- **Label K-mers (comma-separated string)**
+ - `Default = None`
+ - List of kmers to be labeled on the plot.
+- **Scatter Alpha Threshold (float)**
+ - `Default = 1`
+ - Alpha threshold that sets the transparency for data points, to show where most data points are concentrated.
+- **Trendline (boolean)**
+ - Default = `False`
+ - If `True`, plot a line of regression through the data points. If `False`, plot a line through (0,0) with a slope of 1.
## Input Files
-1. pbm data. [ define format and contents in detail ]
-
-
+1. Raw PBM Input For First TF (.tsv)
+- Columns
+ - `8-mer:` every possible forward k-mer sequence with length k
+ - `8-mer:` the reverse complement of the forward k-mer
+ - `E-score:` the enrichment score of the k-mer
+ - `Median:` the median fluorescence intensity of the k-mer
+ - `Z-score:` the z-score of the k-mer
+
+```
+8-mer 8-mer E-score Median Z-score
+AAAAAAAA TTTTTTTT 0.29130 2871.60 3.5965
+AAAAAAAC TTTTTTTG 0.10748 2086.00 0.3958
+AAAAAAAG TTTTTTTC 0.23656 2539.91 2.3673
+AAAAAAAT TTTTTTTA 0.21760 2434.82 1.9442
+AAAAAACA TTTTTTGT 0.19839 2407.46 1.8310
+```
+
+2. Raw PBM Input For Second TF (.tsv)
+- Columns
+ - `8-mer:` every possible forward k-mer sequence with length k
+ - `8-mer:` the reverse complement of the forward k-mer
+ - `E-score:` the enrichment score of the k-mer
+ - `Median:` the median fluorescence intensity of the k-mer
+ - `Z-score:` the z-score of the k-mer
+
+```
+8-mer 8-mer E-score Median Z-score
+AAAAAAAA TTTTTTTT 0.04621 1378.79 0.0023
+AAAAAAAC TTTTTTTG 0.05236 1595.93 1.2232
+AAAAAAAG TTTTTTTC 0.11724 1515.64 0.7923
+AAAAAAAT TTTTTTTA 0.04593 1390.77 0.0745
+AAAAAACA TTTTTTGT 0.11884 1477.50 0.5795
+```
## Output Files
- 1.line plot: