Skip to content

Latest commit

 

History

History
82 lines (66 loc) · 5.3 KB

README.md

File metadata and controls

82 lines (66 loc) · 5.3 KB

bankers2

A set of tools to create plotting files from subset counts in banker's sequenc order. I.e. helper scripts for the visualisation of count data across all possible subsets of a sample set.

Install

  1. clone the repository
  2. move to the directory you cloned to
  3. type make
  4. make the created binaries of the tools mentioned below available in your $PATH

bankers2circos

This tool automatically generates all the files necessary for a circos visualisation of counts of all possible subsets of a sample set, with the input being a file in banker's sequence order. bankers2circos automatically assigns colors using a color profile from the Brewer palettes, optimizes the layout of subset connection ribbons to minimize overlap and generates tick marks and tick mark labels in useful intervals. Darker shades of green signify an increasing number of samples sharing the respective counts' instances, with the lightest green with no ribbons giving instances unique to that sample. Optionally, counts of missing instances can be included as well (e.g. a missing genotype when using input generated by bcftools gtisec, currently only a pull request to bcftools). For an example, see below the usage info.

Usage / Help message

About:   Create files for a circos plot representing all subsets/intersections
         contained in a banker's sequence order file.

Usage:   bankers2circos [options] <banker's-seq-subsets-file>

Options:
    -p, --prefix <prefix>    prefix for circos file names [default: prefix of input without path]
    -m, --missing            if set, include count of missing genotypes per sample in
                             output, including the circos files, if they are requested
    -h, --help               this help message

Input:
    1) Header line specifying samples: '@SMPS SMP1,SMP2,...' (required)
         This header line needs to start with @SMPS, followed by a tab or a whitespace. 
         Then comes a list of sample names in the order they appear, separated by comma, tab or whitespace.
    2) Count lines just contain one number per line. (required)
         They should appear in banker's sequence order with regard to the @SMPS sample order.
         If one missing count per sample is included, these values are in the first #samples lines.
    3) Comment lines starting with '#'. (optional)
         These lines are meant to document details of how the counts were generated and are ignored.

Example

With circos installed, using the following commands and the file test/view.gtisec.out, will create the figures test/view.gtisec.png (displayed below) and test/view.gtisec.png:

# create all the necessary circos files with prefix view.gtisec.
bankers2circos view.gtisec.out
# plot using circos
circos -conf view.gtisec.circos.conf -file view.gtisec

alt tag

bankers2VennDiagram

Using an input file containing all possible subset counts of three to five samples in banker's sequence order, bankers2VennDiagram automatically generates an R file that can be used to create a Venn diagram. It makes use of the R packages VennDiagram and RColorBrewer. Colors are automatically assigned from a qualitative ColorBrewer palette and default values for ellipses alignment and distances for three to five samples are as in the VennDiagram documentation. For an example, see below the usage info.

Usage / Help message

About:   Create file for a plot representing all subsets/intersections contained in
         a banker's sequence order file, using the R package VennDiagram.

Usage:   bankers2VennDiagram [options] <banker's-seq-subsets-file>

Options:
    -p, --prefix <prefix>    prefix for circos file names [default: prefix of input without path]
    -h, --help               this help message

Input:
    1) Header line specifying samples: '@SMPS SMP1,SMP2,...' (required)
         This header line needs to start with @SMPS, followed by a tab or a whitespace. 
         Then comes a list of sample names in the order they appear, separated by comma, tab or whitespace.
    2) Count lines just contain one number per line. (required)
         They should appear in banker's sequence order with regard to the @SMPS sample order.
         If one missing count per sample is included, these values are in the first #samples lines.
    3) Comment lines starting with '#'. (optional)
         These lines are meant to document details of how the counts were generated and are ignored.

Example

With R and the packages VennDiagram and RColorBrewer, using the following commands on test/vd.test.5.gtisec will produce test/vd.test.5.pdf):

# generate the plotting file vd.test.5.R, ready for Rscript
bankers2VennDiagram vd.test.5.gtisec
# generate the plot, using Rscript
Rscript vd.test.5.R