Skip to content

Marginal footprinting

Anusri Pampari edited this page Jul 5, 2023 · 6 revisions

The marginal footprinting algorithm can be described in the following three steps - (1) Insert a given motif's center in the center of background regions to make synthetic sequences. (2) Find profile probability predictions for the synthetic sequences created for a given motif. (3) Find profile probability predictions for the reverse complement of the given synthetic sequences and then reverse the predictions. (4) Average the prediction in (2) and (3) to get footprints for a given synthetic sequence. (3) Average the footprints across all the synthetic sequences to get marginal footprint for a given motif. Please use the chrombpnet_nobias.h5 model for this to bias-corrected footprints.

#Usage

chrombpnet footprints [-h] -m MODEL_H5 -r REGIONS -g GENOME -fl CHR_FOLD_PATH -op OUTPUT_PREFIX -pwm_f MOTIFS_TO_PWM
                             [-bs BATCH_SIZE] [--ylim YLIM]

Input Format

required arguments:
  -m MODEL_H5, --model-h5 MODEL_H5
                        Path model .h5 file
  -r REGIONS, --regions REGIONS
                        10 column bed file of non-peak regions
  -g GENOME, --genome GENOME
                        Genome fasta
  -fl CHR_FOLD_PATH, --chr-fold-path CHR_FOLD_PATH
                        Fold information - dictionary with test,valid and train keys and values with corresponding chromosomes
  -op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
                        Output prefix for bigwig files
  -pwm_f MOTIFS_TO_PWM, --motifs-to-pwm MOTIFS_TO_PWM
                        Path to a TSV file containing motifs in first column and motif string to use for footprinting in second column

optional arguments:
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        batch size to use for prediction
  --ylim YLIM           lower and upper y-limits for plotting the motif footprint, in the form of a tuple i.e. (0,0.8). If this is set
                        to None, ylim will be autodetermined.

Output Format

Note: Note that prefix can include a directory path and prefix for the output file. Make sure that the directory in output_prefix exists. Make sure that regions in the input bed file can be expanded to inputlen (default to 2114) regions without overflowing out of the chromosomes. If this condition is not satisfied the program will return with a error.

The following two files are created using the output_prefix as prefix for the output.

  • output_prefix.footprints.h5: A h5 formatted file containing a dictionary with motifs as key names and average model_h5 predictions as values.
  • output_prefix.motif_name.footprints.png: A set of png images each with the naming convention as follows - output_prefix.motif_name.footprints.png. motif_name is a value from the list of motifs input. And the image carries the center 200bp marginal footprint for that motif.