Skip to content

tfNet is a computational tool that identifies putative regulatory regions and genomic signal interactions in a genome-wide scale.

Notifications You must be signed in to change notification settings

komorowskilab/tfNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tfNet

tfNet is a computational tool implemented in C# for whole genome identification of putative regulatory regions and genomic signal interactions. The input of the tool is a set of ChIP-seq peak signals (or any bed files) and it computes putative regulatory regions under the assumption that the ChIP-seq signals tend to bind in close proximity to each other due to their synergistic nature.
tfNet comes in a 32-bit (x86) and in a 64-bit (x64) build architecture. It runs in all Windows computers under the .NET 4.5 framework and in all Linux/OSX computers under mono. The implementation of the region detection algorithm is parallel in order to achieve better performance. Additionally to region detection, tfNet is able to generate maps of ChIP-seq signal interaction networks as described in (Diamanti et al., 2015. "Maps of context-dependent putative regulatory regions and genomic signal interactions". Nucleic Acids Research). To make this functionality possible you need to install R and the R packages qvalue, igraph and gplots.
tfNet offers a plethora of functions for filtering the resulting set of putative regulatory regions. Additionally, this bioinformatics tool allows an extended parametrization in order to best adapt to the needs of researchers. In total there are 5 major functions. The first 4 perform individual steps of the algorithm such as filtering of putative regulatory regions and ChIP-seq signal interaction extraction. The fifth function runs all the previous 4 in a unified pipeline.
Here you can download the source code, modify it according to your needs etc. If you just need to run tfNet they you might prefer to download the executable tfNet_x64.exe or tfNet_x86.exe together with the R script for the network generation. Here you may find all the data generated from the publication (Diamanti et al., 2016) for various species, cell lines and cell types.

Run tfNet

[mono] tfNet_xBB.exe [verb] [options]

(BB is the build 64 or 86)

Verbs

  • peaks
  • regions
  • filter
  • network
  • tfNet
  • help

peaks

Combines a given set of bed files into a single bed file to be used in the next steps of tfNet.

Input: A set of bed files. Each file should represent one ChIP-seq signal and the file name should contain the ChIP-seq signal name.
Output: A single file in bed format that contains all the input ChIP-seq signal peaks (check the Appendix for more details about the file format).

tfNet_xBB.exe peaks [-i path] [-o path] [--mem integer] [--cols integer] [--tfList path] [--chrInfo path] [-a/--ignoreChrInfo flag] [-v/--noValue flag] [-u/--noSummit flag] [--score integer] [--sValue double] [--pValue double] [--qValue double] [--sort character] [--acc flag] [-t/--fNameTfName flag] [-n/--nPeak flag] [--win integer]

Required parameters

-i The directory where the set of files representing ChIP-seq signals are available. There should be one file for each ChIP-seq signal and the name of the corresponding ChIP-seq signal should be included in the file name. The file format should be bed.

Optional parameters

-o The full path of the output directory where the file(s) created will be placed. The default value is “tfNet_default”. In case the default value is not changed there will be a new directory created under the parent directory of the -i option provided. In case a different output directory is needed it needs to be of the format: /full/path/to/the/output/directory/name.
--mem Memory percentage to be used by tfNet. The default value is 30%. In case you are not secure that the amount of data you input is too large or if the machine you are running the software does not have enough memory then you can use this option to notify you in cases of high memory usage. You can combine this option with the -l or --lowMemory to run tfNet in a single thread mode in order to prevent it from crashing (when available).
--cols The total number of tab-separated columns you prefer to be in your output files in bed format. The default value is 10 (narrowPeak format). For more information please refer to https://genome.ucsc.edu/FAQ/FAQformat.html.
--tfList A file containing all the names of the ChIP-seq signals that the input files represent. The default list contains the 631 ChIP-seq signals listed in the appendix (Default ChIP-seq signal list). The ChIP-seq signal names in the file should be in a semicolon-separated format. This option is mutually exclusive with -t or --fNameTfName in case the ChIP-seq signal name is the full file name.
--chrInfo A file containing information about the chromosome names of the species and the chromosome length. The default species is human and the chromosome coordinates are from http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml. The file format should be of the following format: chr1,249250621;chr2,243199373;. This option is mutually exclusive with -a or --ignoreChrInfo in case there is no information about the chromosome length or if this is of no interest.
-a, --ignoreChrInfo Ignore chromosome name and length correctness for ChIP-seq signals and the detected regions. Mutually exclusive with --chriInfo.
-v, --noValue Discard any input peaks (records) with a pValue or qValue equal to -1. The default value is false.
-u, --noSummit Discard any input peaks (records) with a summit value equal to -1. The default value is false.
--score Discard any input peaks (records) with a score value lower than the given one. The default value is -1 so that no peak is discarded.
--sValue Discard any input peaks (records) with a signal value lower than the given one. The default value is -1 so that no peak is discarded.
--pValue Discard any input peaks (records) with a p value higher than the given one. The default value is -1 so that no peak is discarded. The input may be also of scientific format (e.g. 5e-10).
--qValue Discard any input peaks (records) with a q value higher than the given one. The default value is -1 so that no peak is discarded. The input may be also of scientific format (e.g. 5e-10).
--sort Sort the ChIP-seq signals after merging them all together. The default option is N, meaning that no sort is required. Other options are sort peaks by start only (S), sort peaks by start+summit (M) and sort peaks by start+middle (P).
--acc The accepted file extensions to be considered for the input. The default value is “narrowPeak”. The list provided should be comma separated.
-t, --fNameTfName Use the file name as ChIP-seq signal name (extension excluded). This is a mutually exclusive with --tfList option that is used to provide the file names.
-n, --nPeak The default option is to narrow the peaks in a window of a few base pairs around the summit or the middle point. The windows distance is given by the --win option. Activating this option the peak is not narrowed and the input peak size is used.
--win The window size around the summit that the peak should be narrowed. The default value is 10bp. This is a mutually exclusive option with the -n or --nPeak.

regions

Detects clusters of ChIP-seq signals that constitute putative regulatory regions based on (Diamanti et al., 2016).

Input: A single file in bed format that contains all the input ChIP-seq signal peaks. Mind that the file should be of the same file format as the output of the peaks verb above.
Output: A collection of files regarding the detected regulatory regions:

  • *_regions.narrowPeak the set of putative regulatory regions in bed format (check the Appendix for more details about the file format).
  • *_regions.xml the set of putative regulatory regions in xml format (check the Appendix for more details about the file format).
  • *_regions_peaks.narrowPeak the set of ChIP-seq signal used for the detected putative regulatory regions (the file format is the same as the output from the peaks verb).
  • *_statistics.csv some basic statistics about the detected putative regulatory regions such as the total number of regions, mean length, regions per chromosome etc. (check the Appendix for more details about the file format).

tfNet_xBB.exe regions [-i path] [-o path] [--mem integer] [--cols integer] [--tfList path] [--chrInfo path] [-a/--ignoreChrInfo flag] [-n/--nPeak flag] [--win integer] [-e/--startEnd flag] [--distance integer] [--chr string] [--tfName string] [--pkName string] [--start integer] [--end integer ] [--lScore integer] [--hScore integer] [-s/--strand flag] [-t/--statistics flag] [-x/--xml flag] [-p/--peaks flag] [-l/--lowMemory flag] [--topX integer] [--topXP integer] [--sort character]

Required parameters

-i the file where the set of the ChIP-seq signals is available in bed format. There should be one file containing all the available ChIP-seq signal, preferably the output from the peaks verb. The name field of this file should contain the ChIP-seq signal name and the peak name in the following format: “ChIPseqName_PeakName”.

Optional parameters

-o The full path of the output directory where the file(s) created will be placed. The default value is “tfNet_default”. In case the default value is not changed there will be a new directory created under the parent directory of the -i option provided. In case a different output directory is needed it needs to be of the format: /full/path/to/the/output/directory/name.
--mem Memory percentage to be used by tfNet. The default value is 30%. In case you are not secure that the amount of data you input is too large or if the machine you are running the software does not have enough memory then you can use this option to notify you in cases of high memory usage. You can combine this option with the -l or --lowMemory to run tfNet in a single thread mode in order to prevent it from crashing (when available).
--cols The total number of tab-separated columns you prefer to be in your output files in bed format. The default value is 10 (narrowPeak format). For more information please refer to https://genome.ucsc.edu/FAQ/FAQformat.html.
--tfList A file containing all the names of the ChIP-seq signals that the input files represent. The default list contains the 631 ChIP-seq signals listed in the appendix (Default ChIP-seq signal list). The ChIP-seq signal names in the file should be in a semicolon-separated format. This option is mutually exclusive with -t or --fNameTfName in case the ChIP-seq signal name is the full file name.
--chrInfo A file containing information about the chromosome names of the species and the chromosome length. The default species is human and the chromosome coordinates are from http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml. The file format should be of the following format: chr1,249250621;chr2,243199373;. This option is mutually exclusive with -a or --ignoreChrInfo in case there is no information about the chromosome length or if this is of no interest.
-a, --ignoreChrInfo Ignore chromosome name and length correctness for ChIP-seq signals and the detected regions. Mutually exclusive with --chriInfo.
-n, --nPeak The default option is to narrow the peaks in a window of a few base pairs around the summit or the middle point. The windows distance is given by the --win option. Activating this option the peak is not narrowed and the input peak size is used.
--win The window size around the summit that the peak should be narrowed. The default value is 10bp. This is a mutually exclusive option with the -n or --nPeak.
-e, --startEnd Create ChIP-seq signal-clusters based on middle-point distances. By default the tool considers summit distances.
--distance The distance threshold between peaks in order to cluster them in the same region. The default value is 300bp.
--chr Discard regions that are not in the chromosomes. For multiple chromosomes use a comma-separated string. By default all chromosomes are considered.
--tfName Discard regions that do not contain the given list of transcription factors (comma separated). By default all transcription factors are considered.
--pkName Discard regions that do not contain the given list of peaks (comma separated). You need to provide the input peak names. By default all peaks are considered.
--start Discard regions that have a starting position lower than the provided coordinate. By default the regions are considered.
--end Discard regions that have an ending position larger than the provided coordinate. By default the regions are considered.
--lScore Discard regions with score lower than the provided one. By default the regions are considered.
--hScore Discard regions with score larger than the provided one. By default the regions are considered.
-s, --strand Cluster ChIP-seq signals in a strand specific manner. By default the strand specificity is not forced for the regions detection.
-t, --statistics Do not print the I/O statistics file. By default the file is printed.
-x, --xml Do not print the detailed (information rich) xml file. By default the file is printed.
-p, --peaks Do not print peaks that were clustered together to constitute the regions in a file. By default the file is printed.
-l, --lowMemory Cluster ChIP-seq signals into regions on low a memory consumption. This option omits the parallel detection of regions and it does it in a slower, old-fashioned and memory-cheap manner. We suggest you to apply this option in case the --mem option fires and prints a warning message.
--topX Sort regions by score and keep the top of them. The top regions are selected based on the provided threshold.
--topXP Sort regions by score and keep the top percentage of them. The top regions are selected based on the provided percentage threshold.
--sort Sort the regions by score. The default option is N, meaning that no sort is required. Other options are sort regions by ascending score (A) and sort regions by descending score (D).

filter

Filters a provided set of putative regulatory regions according to the given set of arguments.

Input:A single file in bed or xml format that contains a set of detected putative regulatory regions.
Output:A collection of files regarding the filtered regulatory regions:

  • *_regions.narrowPeak the set of putative regulatory regions in bed format (check the Appendix for more details about the file format).
  • *_regions.xml the set of putative regulatory regions in xml format (check the Appendix for more details about the file format).
  • *_regions_peaks.narrowPeak the set of ChIP-seq signal used for the detected putative regulatory regions (the file format is the same as the output from the peaks verb).
  • *_statistics.csv some basic statistics about the detected putative regulatory regions such as the total number of regions, mean length, regions per chromosome etc. (check the Appendix for more details about the file format).

tfNet_xBB.exe filter [-i path] [-o path] [--mem integer] [--cols integer] [--tfList path] [--chrInfo path] [-a/--ignoreChrInfo flag] [-e/--startEnd flag] [--distance integer] [--chr string] [--tfName string] [--pkName string] [--start integer] [--end integer ] [--lScore integer] [--hScore integer] [-s/--strand flag] [-t/--statistics flag] [-x/--xml flag] [-p/--peaks flag] [-l/--lowMemory flag] [--topX integer] [--topXP integer] [--sort character] [--reg string] [--regFile path]

Required parameters

-i the file where the set of the ChIP-seq signals is available in bed format. There should be one file containing all the available ChIP-seq signal, preferably the output from the peaks verb. The name field of this file should contain the ChIP-seq signal name and the peak name in the following format: “ChIPseqName_PeakName”.

Optional parameters

-o The full path of the output directory where the file(s) created will be placed. The default value is “tfNet_default”. In case the default value is not changed there will be a new directory created under the parent directory of the -i option provided. In case a different output directory is needed it needs to be of the format: /full/path/to/the/output/directory/name.
--mem Memory percentage to be used by tfNet. The default value is 30%. In case you are not secure that the amount of data you input is too large or if the machine you are running the software does not have enough memory then you can use this option to notify you in cases of high memory usage. You can combine this option with the -l or --lowMemory to run tfNet in a single thread mode in order to prevent it from crashing (when available).
--cols The total number of tab-separated columns you prefer to be in your output files in bed format. The default value is 10 (narrowPeak format). For more information please refer to https://genome.ucsc.edu/FAQ/FAQformat.html.
--tfList A file containing all the names of the ChIP-seq signals that the input files represent. The default list contains the 631 ChIP-seq signals listed in the appendix (Default ChIP-seq signal list). The ChIP-seq signal names in the file should be in a semicolon-separated format. This option is mutually exclusive with -t or --fNameTfName in case the ChIP-seq signal name is the full file name.
--chrInfo A file containing information about the chromosome names of the species and the chromosome length. The default species is human and the chromosome coordinates are from http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml. The file format should be of the following format: chr1,249250621;chr2,243199373;. This option is mutually exclusive with -a or --ignoreChrInfo in case there is no information about the chromosome length or if this is of no interest.
-a, --ignoreChrInfo Ignore chromosome name and length correctness for ChIP-seq signals and the detected regions. Mutually exclusive with --chriInfo.
--chr Discard regions that are not in the chromosomes. For multiple chromosomes use a comma-separated string. By default all chromosomes are considered.
--tfName Discard regions that do not contain the given list of transcription factors (comma separated). By default all transcription factors are considered.
--pkName Discard regions that do not contain the given list of peaks (comma separated). You need to provide the input peak names. By default all peaks are considered.
--start Discard regions that have a starting position lower than the provided coordinate. By default the regions are considered.
--end Discard regions that have an ending position larger than the provided coordinate. By default the regions are considered.
--lScore Discard regions with score lower than the provided one. By default the regions are considered.
--hScore Discard regions with score larger than the provided one. By default the regions are considered.
-s, --strand Cluster ChIP-seq signals in a strand specific manner. By default the strand specificity is not forced for the regions detection.
-t, --statistics Do not print the I/O statistics file. By default the file is printed.
-x, --xml Do not print the detailed (information rich) xml file. By default the file is printed.
-p, --peaks Do not print peaks that were clustered together to constitute the regions in a file. By default the file is printed.
-l, --lowMemory Cluster ChIP-seq signals into regions on low a memory consumption. This option omits the parallel detection of regions and it does it in a slower, old-fashioned and memory-cheap manner. We suggest you to apply this option in case the --mem option fires and prints a warning message.
--topX Sort regions by score and keep the top of them. The top regions are selected based on the provided threshold.
--topXP Sort regions by score and keep the top percentage of them. The top regions are selected based on the provided percentage threshold.
--sort Sort the regions by score. The default option is N, meaning that no sort is required. Other options are sort regions by ascending score (A) and sort regions by descending score (D)
--reg Discard regions except from the given ones. For multiple regions use a comma separated string. For very long lists of region names please use the option --regFile.
--regFile Discard regions except from the given ones. For multiple regions use a comma separated string. For more than 20 regions please use this option and not the --reg.

network

Calculates the frequency of each ChIP-seq signal interaction-pair and calls the provided Rscript in order to create the interaction maps.

Input: A single file in bed or xml format that contains a set of detected putative regulatory regions.
Output: A collection of files regarding the interactions of ChIP-seq signals detected in the provided set of regulatory regions:

  • *_cooccuring.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions co-occurring in the regulatory regions.
  • *_cooccuring.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions co-occurring in regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the binomial distribution.
  • *_neighboring.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions neighboring each other in the regulatory regions.
  • *_ neighboring.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions neighboring each other in the regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the hypergeometric distribution.
  • *_overlapping.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions overlapping each other in the regulatory regions.
  • *_ overlapping.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions overlapping each other in the regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the hypergeometric distribution.
Note: the calculation of the ChIP-signal interactions are explained in detail in the publication (Diamanti et al., 2016).

tfNet_xBB.exe network [-i path] [-o path] [--mem integer] [--cols integer] [--tfList path] [--chrInfo path] [-a/--ignoreChrInfo flag] [--fopt character] [--fval double] [--neigh string] [--overlap integer] [--title string] [--Rscript string] [--scr string] [-c/--noR flag]

Required parameters

-i The file where the set of the regions is available. There should be one file containing all the available regions, preferably the output from the regions verb. If the file is in bed format then the name field of this file should contain the regions id, the ChIP-seq signal names and the peak names in the following format: “regID-ChIPseqName1_PeakName,ChIPseqName2_PeakName”. If the file name is in xml format then it should be of the format that the regions verb provides. Note: The xml file is suggested.

Optional parameters

-o The full path of the output directory where the file(s) created will be placed. The default value is “tfNet_default”. In case the default value is not changed there will be a new directory created under the parent directory of the -i option provided. In case a different output directory is needed it needs to be of the format: /full/path/to/the/output/directory/name.
--mem Memory percentage to be used by tfNet. The default value is 30%. In case you are not secure that the amount of data you input is too large or if the machine you are running the software does not have enough memory then you can use this option to notify you in cases of high memory usage. You can combine this option with the -l or --lowMemory to run tfNet in a single thread mode in order to prevent it from crashing (when available).
--cols The total number of tab-separated columns you prefer to be in your output files in bed format. The default value is 10 (narrowPeak format). For more information please refer to https://genome.ucsc.edu/FAQ/FAQformat.html.
--tfList A file containing all the names of the ChIP-seq signals that the input files represent. The default list contains the 631 ChIP-seq signals listed in the appendix (Default ChIP-seq signal list). The ChIP-seq signal names in the file should be in a semicolon-separated format. This option is mutually exclusive with -t or --fNameTfName in case the ChIP-seq signal name is the full file name.
--chrInfo A file containing information about the chromosome names of the species and the chromosome length. The default species is human and the chromosome coordinates are from http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml. The file format should be of the following format: chr1,249250621;chr2,243199373;. This option is mutually exclusive with -a or --ignoreChrInfo in case there is no information about the chromosome length or if this is of no interest.
-a, --ignoreChrInfo Ignore chromosome name and length correctness for ChIP-seq signals and the detected regions. Mutually exclusive with --chriInfo.
--fopt The filter method of the ChIP-seq pair interaction. This is based on p values, q values or Bonferroni corrected p values. The default option is b, hence Bonferroni corrected p values. You may also use p for p values and q for q values.
--fval The cutoff threshold for the statistically significant ChIP-signal interactions. It is combined with the --fopt option. The default value is 0.05.
--neigh The distance thresholds so that two ChIP-seq peaks are considered to be neighboring. The distance is based on the –e/--startEnd option. The string should consist of two integers separated by a comma. The first value represents the lower bound value and the second one the upper bound value. The default values are 20 and 60 bp (20,60).
--overlap The distance threshold so that two ChIP-seq peaks are considered to be overlapping. The distance is based on the –e/--startEnd option. The default value is 0bp. In case the value is >0 then all the distances between 0 and the given one are considered as overlapping.
--title The prefix that should be used for the network title in the heatmap pdf files.
--Rscript If the Rscript is not installed in your machine as a global variable the you should use this option to provide the full path of where the Rscript is located. By default tfNet assumes that Rscript is installed as a global variable.
--scr The full path of the provided script in R that generated the heatmap pdf networks is located. The tfNet assumes that the script is located under the same path as the tool. If not then you should provide the full path of where it is located.
-c, --noR Do not run Rscript. Enable this flag in case you are not interested in generating the heatmap pdf files. By default this flag is disabled.

tfNet

This verb implements the whole pipeline described above. It first runs “peaks”, then “regions” and “filtering” on-the-fly, and finally it runs “networks”.

Input: A set of bed files. Each file should represent one ChIP-seq signal and the file name should contain the ChIP-seq signal name.
Output: A collection of files regarding the detected regulatory regions and the interactions of ChIP-seq signals in these regulatory regions:

  • *_peaks.narrowPeak a single file in bed format that contains all the input ChIP-seq signal peaks (check the Appendix for more details about the file format).
  • *_regions.narrowPeak the set of putative regulatory regions in bed format (check the Appendix for more details about the file format).
  • *_regions.xml the set of putative regulatory regions in xml format (check the Appendix for more details about the file format).
  • *_regions_peaks.narrowPeak the set of ChIP-seq signal used for the detected putative regulatory regions (the file format is the same as the output from the peaks verb).
  • *_statistics.csv some basic statistics about the detected putative regulatory regions such as the total number of regions, mean length, regions per chromosome etc. (check the Appendix for more details about the file format).
  • *_cooccuring.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions co-occurring in the regulatory regions.
  • *_cooccuring.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions co-occurring in regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the binomial distribution.
  • *_neighboring.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions neighboring each other in the regulatory regions.
  • *_ neighboring.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions neighboring each other in the regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the hypergeometric distribution.
  • *_overlapping.csv_filtered.csv contains the number of pairs of ChIP-seq signal interactions overlapping each other in the regulatory regions.
  • *_ overlapping.pdf_filtered.csv the network of pairs of ChIP-seq signal interactions overlapping each other in the regulatory regions. The network is represented by a heatmap. The color intensity of the heatmap tiles is calculated from the Bonferroni corrected p-value of the hypergeometric distribution.

tfNet_x64.exe peaks [-i path] [-o path] [--mem integer] [--cols integer] [--tfList path] [--chrInfo path] [-a/--ignoreChrInfo flag] [-v/--noValue flag] [-u/--noSummit flag] [--score integer] [--sValue double] [--pValue double] [--qValue double] [--sort character] [--acc flag] [-t/--fNameTfName flag] [-n/--nPeak flag] [--win integer] [-e/--startEnd flag] [--distance integer] [--chr string] [--tfName string] [--pkName string] [--start integer] [--end integer ] [--lScore integer] [--hScore integer] [-s/--strand flag] [-t/--statistics flag] [-x/--xml flag] [-p/--peaks flag] [-l/--lowMemory flag] [--topX integer] [--topXP integer] [--sortR character] [--reg string] [--regFile path] [--fopt character] [--fval double] [--neigh string] [--overlap integer] [--title string] [--Rscript string] [--scr string] [-c/--noR flag]

Required parameters

-i The directory where the set of files representing ChIP-seq signals are available. There should be one file for each ChIP-seq signal and the name of the corresponding ChIP-seq signal should be included in the file name. The file format should be bed.

Optional parameters

-o The full path of the output directory where the file(s) created will be placed. The default value is “tfNet_default”. In case the default value is not changed there will be a new directory created under the parent directory of the -i option provided. In case a different output directory is needed it needs to be of the format: /full/path/to/the/output/directory/name.
--mem Memory percentage to be used by tfNet. The default value is 30%. In case you are not secure that the amount of data you input is too large or if the machine you are running the software does not have enough memory then you can use this option to notify you in cases of high memory usage. You can combine this option with the -l or --lowMemory to run tfNet in a single thread mode in order to prevent it from crashing (when available).
--cols The total number of tab-separated columns you prefer to be in your output files in bed format. The default value is 10 (narrowPeak format). For more information please refer to https://genome.ucsc.edu/FAQ/FAQformat.html.
--tfList A file containing all the names of the ChIP-seq signals that the input files represent. The default list contains the 631 ChIP-seq signals listed in the appendix (Default ChIP-seq signal list). The ChIP-seq signal names in the file should be in a semicolon-separated format. This option is mutually exclusive with -t or --fNameTfName in case the ChIP-seq signal name is the full file name.
--chrInfo A file containing information about the chromosome names of the species and the chromosome length. The default species is human and the chromosome coordinates are from http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/index.shtml. The file format should be of the following format: chr1,249250621;chr2,243199373;. This option is mutually exclusive with -a or --ignoreChrInfo in case there is no information about the chromosome length or if this is of no interest.
-a, --ignoreChrInfo Ignore chromosome name and length correctness for ChIP-seq signals and the detected regions. Mutually exclusive with --chriInfo.
-v, --noValue Discard any input peaks (records) with a pValue or qValue equal to -1. The default value is false.
-u, --noSummit Discard any input peaks (records) with a summit value equal to -1. The default value is false.
--score Discard any input peaks (records) with a score value lower than the given one. The default value is -1 so that no peak is discarded.
--sValue Discard any input peaks (records) with a signal value lower than the given one. The default value is -1 so that no peak is discarded.
--pValue Discard any input peaks (records) with a p value higher than the given one. The default value is -1 so that no peak is discarded. The input may be also of scientific format (e.g. 5e-10).
--qValue Discard any input peaks (records) with a q value higher than the given one. The default value is -1 so that no peak is discarded. The input may be also of scientific format (e.g. 5e-10).
--sort Sort the ChIP-seq signals after merging them all together. The default option is N, meaning that no sort is required. Other options are sort peaks by start only (S), sort peaks by start+summit (M) and sort peaks by start+middle (P).
--acc The accepted file extensions to be considered for the input. The default value is “narrowPeak”. The list provided should be comma separated.
-t, --fNameTfName Use the file name as ChIP-seq signal name (extension excluded). This is a mutually exclusive with --tfList option that is used to provide the file names.
-n, --nPeak The default option is to narrow the peaks in a window of a few base pairs around the summit or the middle point. The windows distance is given by the --win option. Activating this option the peak is not narrowed and the input peak size is used.
--win The window size around the summit that the peak should be narrowed. The default value is 10bp. This is a mutually exclusive option with the -n or --nPeak.
-e, --startEnd Create ChIP-seq signal-clusters based on middle-point distances. By default the tool considers summit distances.
--distance The distance threshold between peaks in order to cluster them in the same region. The default value is 300bp.
--chr Discard regions that are not in the chromosomes. For multiple chromosomes use a comma-separated string. By default all chromosomes are considered.
--tfName Discard regions that do not contain the given list of transcription factors (comma separated). By default all transcription factors are considered.
--pkName Discard regions that do not contain the given list of peaks (comma separated). You need to provide the input peak names. By default all peaks are considered.
--start Discard regions that have a starting position lower than the provided coordinate. By default the regions are considered.
--end Discard regions that have an ending position larger than the provided coordinate. By default the regions are considered.
--lScore Discard regions with score lower than the provided one. By default the regions are considered.
--hScore Discard regions with score larger than the provided one. By default the regions are considered.
-s, --strand Cluster ChIP-seq signals in a strand specific manner. By default the strand specificity is not forced for the regions detection.
-t, --statistics Do not print the I/O statistics file. By default the file is printed.
-x, --xml Do not print the detailed (information rich) xml file. By default the file is printed.
-p, --peaks Do not print peaks that were clustered together to constitute the regions in a file. By default the file is printed.
-l, --lowMemory Cluster ChIP-seq signals into regions on low a memory consumption. This option omits the parallel detection of regions and it does it in a slower, old-fashioned and memory-cheap manner. We suggest you to apply this option in case the --mem option fires and prints a warning message.
--topX Sort regions by score and keep the top of them. The top regions are selected based on the provided threshold.
--topXP Sort regions by score and keep the top percentage of them. The top regions are selected based on the provided percentage threshold.
--sortR Sort the regions by score. The default option is N, meaning that no sort is required. Other options are sort regions by ascending score (A) and sort regions by descending score (D)
--reg Discard regions except from the given ones. For multiple regions use a comma separated string. For very long lists of region names please use the option --regFile.
--regFile Discard regions except from the given ones. For multiple regions use a comma separated string. For more than 20 regions please use this option and not the --reg.
--fopt The filter method of the ChIP-seq pair interaction. This is based on p values, q values or Bonferroni corrected p values. The default option is b, hence Bonferroni corrected p values. You may also use p for p values and q for q values.
--fval The cutoff threshold for the statistically significant ChIP-signal interactions. It is combined with the --fopt option. The default value is 0.05.
--neigh The distance thresholds so that two ChIP-seq peaks are considered to be neighboring. The distance is based on the –e/--startEnd option. The string should consist of two integers separated by a comma. The first value represents the lower bound value and the second one the upper bound value. The default values are 20 and 60 bp (20,60).
--overlap The distance threshold so that two ChIP-seq peaks are considered to be overlapping. The distance is based on the –e/--startEnd option. The default value is 0bp. In case the value is >0 then all the distances between 0 and the given one are considered as overlapping.
--title The prefix that should be used for the network title in the heatmap pdf files.
--Rscript If the Rscript is not installed in your machine as a global variable the you should use this option to provide the full path of where the Rscript is located. By default tfNet assumes that Rscript is installed as a global variable.
--scr The full path of the provided script in R that generated the heatmap pdf networks is located. The tfNet assumes that the script is located under the same path as the tool. If not then you should provide the full path of where it is located.
-c, --noR Do not run Rscript. Enable this flag in case you are not interested in generating the heatmap pdf files. By default this flag is disabled.

Citation

Klev Diamanti, Husen M Umer, Marcin Kruczyk, Michał J Dąbrowski, Marco Cavalli, Claes Wadelius, Jan Komorowski (2016). "Maps of context-dependent putative regulatory regions and genomic signal interactions". Nucleic Acids Research 44(19):9110-9120. [link]

About

tfNet is a computational tool that identifies putative regulatory regions and genomic signal interactions in a genome-wide scale.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published