-
Notifications
You must be signed in to change notification settings - Fork 34
Marginal footprinting
The marginal footprinting algorithm can be described in the following three steps - (1) Insert a given motif's center in the center of background regions to make synthetic sequences. (2) Find profile probability predictions for the synthetic sequences created for a given motif. (3) Find profile probability predictions for the reverse complement of the given synthetic sequences and then reverse the predictions. (4) Average the prediction in (2) and (3) to get footprints for a given synthetic sequence. (3) Average the footprints across all the synthetic sequences to get marginal footprint for a given motif. Please use the chrombpnet_nobias.h5 model for this to bias-corrected footprints.
chrombpnet footprints [-h] -m MODEL_H5 -r REGIONS -g GENOME -fl CHR_FOLD_PATH -op OUTPUT_PREFIX -pwm_f MOTIFS_TO_PWM
[-bs BATCH_SIZE] [--ylim YLIM]
required arguments:
-m MODEL_H5, --model-h5 MODEL_H5
Path model .h5 file
-r REGIONS, --regions REGIONS
10 column bed file of non-peak regions
-g GENOME, --genome GENOME
Genome fasta
-fl CHR_FOLD_PATH, --chr-fold-path CHR_FOLD_PATH
Fold information - dictionary with test,valid and train keys and values with corresponding chromosomes
-op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
Output prefix for bigwig files
-pwm_f MOTIFS_TO_PWM, --motifs-to-pwm MOTIFS_TO_PWM
Path to a TSV file containing motifs in first column and motif string to use for footprinting in second column
optional arguments:
-bs BATCH_SIZE, --batch-size BATCH_SIZE
batch size to use for prediction
--ylim YLIM lower and upper y-limits for plotting the motif footprint, in the form of a tuple i.e. (0,0.8). If this is set
to None, ylim will be autodetermined.
- The argument
-pwm_f
is a path to a TSV file containing motifs in first column (e.g.Tn5
) and motif string (e.g.GCACAGTACAGAGCTG
) to use for footprinting in second column. A default file is provided in the data folder for reference (https://github.com/kundajelab/chrombpnet/blob/master/chrombpnet/data/motif_to_pwm.TF.tsv)
Note: Note that prefix can include a directory path and prefix for the output file. Make sure that the directory in output_prefix exists. Make sure that regions in the input bed file can be expanded to inputlen (default to 2114) regions without overflowing out of the chromosomes. If this condition is not satisfied the program will return with a error.
The following two files are created using the output_prefix
as prefix for the output.
-
output_prefix
.footprints.h5: A h5 formatted file containing a dictionary withmotifs
as key names and averagemodel_h5
predictions as values. -
output_prefix
.motif_name
.footprints.png: A set of png images each with the naming convention as follows -output_prefix
.motif_name
.footprints.png.motif_name
is a value from the list ofmotifs
input. And the image carries the center 200bp marginal footprint for that motif.