diff --git a/README.md b/README.md index 9f76eb5..028fd4e 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,7 @@ <!-- vscode-markdown-toc --> * [Introduction](#Introduction) -* [System Requirements](#SystemRequirements) - * [Hardware](#Hardware) - * [Operating systems](#Operatingsystems) - * [Third-party libraries](#ThirdPartyLibraries) * [Installation](#Installation) - * [Native build](#NativeBuild) - * [From Docker image](#FromDockerImage) * [Run Paragraph from VCF](#RunParagraphFromVCF) * [Example](#Example) * [Input requirements](#InputRequirements) @@ -38,115 +32,9 @@ Please reference Paragraph using: Genotyping calls in this paper can be found at [paper-data/download-instructions.txt](paper-data/download-instructions.txt) -## <a name='SystemRequirements'></a>System Requirements - -### <a name='Hardware'></a>Hardware - -A standard workstation with at least 8GB of RAM should be sufficient for compilation and testing of the program. - -### <a name='Operatingsystems'></a>Operating systems - -Paragrpah is supported on the following systems: - -- Ubuntu 16.04 and CentOS 5-7, -- macOS 10.11+, - -Python 3.4+ is required. - -We recommend using g++ (6.0+), or a recent version of Clang. - -We use the C++11 standard, any Posix compliant compiler supporting this standard -should be usable. - -### <a name='ThirdPartyLibraries'></a>Third-party libraries - -Please check [requirements.txt](requirements) for required python modules. - -[Boost libraries](http://www.boost.org) version >= 1.5 is required. -- We prefer to statically link Boost libraries to Paragraph executables: - - ```bash - cd ~ - wget http://downloads.sourceforge.net/project/boost/boost/1.65.0/boost_1_65_0.tar.bz2 - tar xf boost_1_65_0.tar.bz2 - cd boost_1_65_0 - ./bootstrap.sh - ./b2 --prefix=$HOME/boost_1_65_0_install link=static install - ``` - -- To point Cmake to your version of Boost use the `BOOST_ROOT` environment variable: - - ```bash - export BOOST_ROOT=$HOME/boost_1_65_0_install - ``` - -We have included copies of other dependent libraries in external/. They are: -- Google Test and Google Mock (v1.8.0) -- Htslib (v1.9) -- Spdlog - ## <a name='Installation'></a>Installation -### <a name='NativeBuild'></a>Native buid -First, checkout the repository like so: - - ```bash - git clone https://github.com/Illumina/paragraph.git - cd paragraph-tools - ``` - - Then create a new directory for the program and compile it there: - - ```bash - # Create a separate build folder. - cd .. - mkdir paragraph-tools-build - cd paragraph-tools-build - - # Configure - # optional: - # export BOOST_ROOT=<path-to-boost-installation> - cmake ../paragraph-tools - # if this doesn't work, run this instead: - # cmake ../paragraph-tools -DCMAKE_CXX_COMPILER=`which g++` -DCMAKE_C_COMPILER=`which gcc` -DBOOST_ROOT=$BOOST_ROOT - - # Make, use -j <n> to use n parallel jobs to build, e.g. make -j4 - make - ``` - -### <a name='FromDockerImage'></a>From Docker Image -We also provide a [Dockerfile](Dockerfile). To build a Docker image, run the following command inside the source - checkout folder: - - ```bash - docker build . - ``` - - Once the image is built you can find out its ID like this: - - ```bash - docker images - ``` - ``` - REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE - <none> <none> 54c7d4015330 16 seconds ago 1.76 GB - ``` - - Check the below section for how to run Paragraph, and execute this before running: - - ```bash - sudo docker run -v `pwd`:/data 54c7d4015330 - ``` - - The current directory can be accessed as `/data` inside the Docker container. - - The default entry point is `multigrmpy.py`. - - To override the default entrypoint and get an interactive shell, run: - - ```bash - sudo docker run --entrypoint /bin/bash -it 54c7d4015330 - ``` +Please check [doc/Installation.md](doc/Installation.md) for system requirements and installation instructions. ## <a name='RunParagraphFromVCF'></a>Run Paragraph from VCF ### <a name='Example'></a>Example @@ -161,7 +49,7 @@ python3 bin/multigrmpy.py -i share/test-data/round-trip-genotyping/candidates.vc This runs a simple genotyping example for two test samples. * **candidates.vcf**: this specifies candidate SV events in a vcf format. -* **samples.txt**: Manifest that specifies some test BAM files. Tab delimited. +* **samples.txt**: Manifest that specifies some test BAM files. Tab or comma delimited. * **dummy.fa** a short dummy reference which only contains `chr1` The output folder `test` then contains gzipped json for final genotypes: diff --git a/RELEASES.md b/RELEASES.md index 242024a..55549cb 100644 --- a/RELEASES.md +++ b/RELEASES.md @@ -1,20 +1,22 @@ # Paragraph Release Notes / Change Log +# Version 2.2b + | Date Y-m-d | Ticket | Description | |------------|---------|----------------------------------------------------------------------| +| 2019-06-14 | GT-804 | Simplify README and add static build | + +# Version 2.2a + | 2019-05-27 | GT-802 | Update license to Apache and fix docker entry | -#Version 2.2 +# Version 2.2 -| Date Y-m-d | Ticket | Description | -|------------|---------|----------------------------------------------------------------------| | 2019-05-11 | GT-743 | Update interface and error handling | | 2018-12-11 | GT-696 | Fix newlines in validation scripts (public repo already fixed) | # Version 2.1 -| Date Y-m-d | Ticket | Description | -|------------|---------|----------------------------------------------------------------------| | 2018-12-06 | GT-675 | Fix filters and alignment stats. Change depth test threshold on lower end | | 2018-11-08 | GT-660 | Optimize GQ for variant genotypes | | 2018-11-02 | GT-656 | Improvement for simple SV genotyping | @@ -24,8 +26,6 @@ # Version 2.0 -| Date Y-m-d | Ticket | Description | -|------------|---------|----------------------------------------------------------------------| | 2018-06-27 | GT-490 | Paragraph 2.0 release; disable Poisson depth test by default | | 2018-06-27 | GT-495 | Improved output of phasing information and paths | | 2018-06-26 | GT-402 | support genotyping on male chrX | @@ -59,8 +59,6 @@ # Version 1.2 -| Date Y-m-d | Ticket | Description | -|------------|---------|----------------------------------------------------------------------| | 2018-04-05 | GT-429 | option to turn off exact and graph aligners in grmpy | | 2018-04-05 | GT-428 | upgrade htslib to version 1.8 | | 2018-04-04 | GT-427 | GT-427 multigrmpy to generate graph ID if vc2toparagraph does not provide it| @@ -81,8 +79,6 @@ # Version 1.1 -| Date Y-m-d | Ticket | Description | -|------------|---------|----------------------------------------------------------------------| | 2018-02-21 | GT-374 | support for read-level validation | | 2018-02-19 | GT-379 | configure tool for installation | | 2018-02-15 | GT-373 | Speedup bam processing by keeping the file open between the graphs | diff --git a/data/download-instructions.txt b/data/download-instructions.txt deleted file mode 100644 index da42ca2..0000000 --- a/data/download-instructions.txt +++ /dev/null @@ -1,16 +0,0 @@ -Please use the following S3 link to download the output VCF from Paragraph manuscript: - -Genotypes of HG002 Long-read ground truth (LRGT) SVs on the Illumina HiSeq X 34.5x bam (VCF format): -https://s3-us-west-1.amazonaws.com/paragraph-paper-data/hg002_sniffles_ccs.paragraph.vcf.gz - - -HG002 Long-read ground truth (LRGT) SVs on 100 individuals from Polaris (JSON format): -Site only: -https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sniffles_ccs_polaris.filtered.autosome.del_ins.json.gz - -Genotypes included: -https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sniffles_ccs_polaris.json.gz - -Sample name map (S3 ID to regular ID): -https://s3-us-west-1.amazonaws.com/paragraph-paper-data/sample_map.txt - diff --git a/doc/Installation.md b/doc/Installation.md new file mode 100644 index 0000000..225d9a7 --- /dev/null +++ b/doc/Installation.md @@ -0,0 +1,127 @@ +# Installation of Paragraph + +* [System Requirements](#SystemRequirements) + * [Hardware](#Hardware) + * [Operating systems](#Operatingsystems) + * [Third-party libraries](#ThirdPartyLibraries) +* [Static Build](#StaticBuild) +* [Installation](#Installation) + * [Native build](#NativeBuild) + * [From Docker image](#FromDockerImage) + +## <a name='SystemRequirements'></a>System Requirements + +### <a name='Hardware'></a>Hardware + +A standard workstation with at least 8GB of RAM should be sufficient for compilation and testing of the program. + +### <a name='Operatingsystems'></a>Operating systems + +Paragrpah is supported on the following systems: + +- Ubuntu 16.04 and CentOS 5-7, +- macOS 10.11+, + +Python 3.6+ is required. + +We recommend using g++ (6.0+), or a recent version of Clang. + +We use the C++11 standard, any Posix compliant compiler supporting this standard +should be usable. + +### <a name='ThirdPartyLibraries'></a>Third-party libraries + +Please check [requirements](../requirements.txt) for required python modules. + +We have included copies of other dependent libraries in external/. They are: +- Google Test and Google Mock (v1.8.0) +- Htslib (v1.9) +- Spdlog + +## <a name='Static Build'></a>Static Build + +We provide a static build that works for GCC 5.2+ under linux environment. No installation is required for the static build. + +Download the static build under "release" tag of the github repo. + +## <a name='Installation'></a>Installation + +### <a name='NativeBuild'></a>Native buid + +[Boost libraries](http://www.boost.org) version >= 1.5 is required. +- We prefer to statically link Boost libraries to Paragraph executables: + + ```bash + cd ~ + wget http://downloads.sourceforge.net/project/boost/boost/1.65.0/boost_1_65_0.tar.bz2 + tar xf boost_1_65_0.tar.bz2 + cd boost_1_65_0 + ./bootstrap.sh + ./b2 --prefix=$HOME/boost_1_65_0_install link=static install + ``` + +- To point Cmake to your version of Boost use the `BOOST_ROOT` environment variable: + + ```bash + export BOOST_ROOT=$HOME/boost_1_65_0_install + ``` + +Once you have boost installed, checkout the repository like so: + + ```bash + git clone https://github.com/Illumina/paragraph.git + cd paragraph-tools + ``` + + Then create a new directory for the program and compile it there: + + ```bash + # Create a separate build folder. + cd .. + mkdir paragraph-tools-build + cd paragraph-tools-build + + # Configure + # optional: + # export BOOST_ROOT=<path-to-boost-installation> + cmake ../paragraph-tools + # if this doesn't work, run this instead: + # cmake ../paragraph-tools -DCMAKE_CXX_COMPILER=`which g++` -DCMAKE_C_COMPILER=`which gcc` -DBOOST_ROOT=$BOOST_ROOT + + # Make, use -j <n> to use n parallel jobs to build, e.g. make -j4 + make + ``` + +### <a name='FromDockerImage'></a>From Docker Image +We also provide a [Dockerfile](Dockerfile). To build a Docker image, run the following command inside the source + checkout folder: + + ```bash + docker build . + ``` + + Once the image is built you can find out its ID like this: + + ```bash + docker images + ``` + ``` + REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE + <none> <none> 54c7d4015330 16 seconds ago 1.76 GB + ``` + + Check the below section for how to run Paragraph, and execute this before running: + + ```bash + sudo docker run -v `pwd`:/data 54c7d4015330 + ``` + + The current directory can be accessed as `/data` inside the Docker container. + + The default entry point is `multigrmpy.py`. + + To override the default entrypoint and get an interactive shell, run: + + ```bash + sudo docker run --entrypoint /bin/bash -it 54c7d4015330 + ``` \ No newline at end of file diff --git a/src/python/bin/multigrmpy.py b/src/python/bin/multigrmpy.py index 96891c6..dd4a8df 100644 --- a/src/python/bin/multigrmpy.py +++ b/src/python/bin/multigrmpy.py @@ -326,11 +326,11 @@ def run(args): line = line.rstrip() if line.startswith('#'): line = line[1:] - f = line.split('\t') + fields = re.split('\t|,', line) if id_index == -1: - id_index = f.index("id") + id_index = fields.index("id") continue - sample_names.append(f[id_index]) + sample_names.append(fields[id_index]) if args.input.endswith("vcf") or args.input.endswith("vcf.gz"): grmpyOutput = vcfupdate.read_grmpy(result_json_path) result_vcf_path = os.path.join(args.output, "genotypes.vcf.gz")