diff --git a/docs/pgs/faq.md b/docs/pgs/faq.md new file mode 100644 index 00000000..ba6310c9 --- /dev/null +++ b/docs/pgs/faq.md @@ -0,0 +1,16 @@ +# Frequently Asked Questions + +## Can I use the Polygenic Score Calculation extension without an email address? +Yes, the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited. + +## Extending expiration date or reset download counter +Your data is available for 7 days. In case you need an extension, please let [us](/contact) know. + +## How can I improve the download speed? +[aria2](https://aria2.github.io/) tries to utilize your maximum download bandwidth. Please keep in mind to raise the k parameter significantly (-k, --min-split-size=SIZE). You will otherwise hit the Michigan Imputation Server download limit for each file (thanks to Anthony Marcketta for point this out). + +## Can I download all results at once? +We provide wget command for all results. Please open the results tab. The last column in each row includes direct links to all files. + +## Can I perform PGS calculation locally? +Imputationserveris using a standalone tool called pgs-calc. It reads the imputed dosages from VCF files and uses them to calculate scores. It supports imputed genotypes from Michigan Imputation Server or TOPMed Imputation Server out of the box and score files from PGS Catalog or PRSWeb instances. In addition, own created score files containing chromosomal positions, both alleles and the effect size can be used easily. pgs-calc uses the chromosomal positions and alleles to find the corresponding dosages in genotype files, but provides also tools to resolve rsIDs in score files using dbSNP. Therefore, it can be applied to genotype files with variants that were not annotated with rsIDs. Moreover, the standalone version provides options to improve the coverage by using the provided proxy mapping file for Europeans or a custom population specific mapping file. pgs-calc is available at https://github.com/lukfor/pgs-calc. \ No newline at end of file diff --git a/docs/pgs/getting-started.md b/docs/pgs/getting-started.md new file mode 100644 index 00000000..27c93cf9 --- /dev/null +++ b/docs/pgs/getting-started.md @@ -0,0 +1,109 @@ +# Polygenic Score Calculation + +We provide an easy to use and user-friendly web interface to apply thousands of published polygenic risk scores to imputed genotypes in an efficient way. +By extending the popular Michigan Imputation Server the module integrates it seamless into the existing imputation workflow and enables users without knowledge in that field to take advantage of this method. +The graphical report includes all meta-data about the scores in a single place and helps users to understand and screen thousands of scores in an easy and intuitive way. + +![pipeline.png](images%2Fpipeline.png) + +An extensive quality control pipeline is executed automatically to detect and fix possible strand-flips and to filter out missing SNPs to prevent systematic errors (e.g. lower scores for individuals with missing or wrong aligned genetic data). + +## Getting started + +To utilize the Polygenic Score Calculation extension on ImputationServer, you must first [register](https://imputationserver.sph.umich.edu/index.html#!pages/register) for an account. +An activation email will be sent to the provided address. Once your email address is verified, you can access the service at no cost. + +**Please note that the extension can also be used with a username without an email. However, without an email, notifications are not sent, and access to genotyped data may be limited.** + +No dataset at hand? No problem, download our example dataset to test the PGS extension: [50-samples.zip](https://imputationserver.sph.umich.edu/downloads/50-samples.zip). + + +When incorporating the Polygenic Score Calculation extension in your research, please cite the following papers: + +> Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze S, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. [Next-generation genotype imputation service and methods](https://www.ncbi.nlm.nih.gov/pubmed/27571263). Nature Genetics 48, 1284–1287 (2016). + +> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021). + +## Setting up your first Polygenic Score Calculation job + +1. [Log in](https://imputationserver.sph.umich.edu/index.html#!pages/login) with your credentials and navigate to the **Run** tab to initiate a new Polygenic Score Calculation job. +2. Please click on **"Polygenic Score calculation"** and the submission dialog appears. +3. The submission dialog allows you to specify job properties. + +![](images/submit-job01.png) + +The following options are available: + + +### Reference Panel + +Our PGS extension offers genotype imputation from different reference panels. The most accurate and largest panel is **HRC (Version r1.1 2016)**. Please select one that fulfills your needs and supports the population of your input data: + +- HRC (Version r1.1 2016) +- 1000 Genomes Phase 3 (Version 5) +- 1000 Genomes Phase 1 (Version 3) +- HapMap 2 + +More details about all available reference panels can be found [here](/pgs/reference-panels/). + +### Upload VCF files from your computer + +When using the file upload, data is uploaded from your local file system to Michigan Imputation Server. By clicking on **Select Files** an open dialog appears where you can select your VCF files: + +![](images/upload-data01.png) + +Multiple files can be selected using the `ctrl`, `cmd` or `shift` keys, depending on your operating system. +After you have confirmed your choice, all selected files are listed in the submission dialog: + +![](images/upload-data02.png) + +Please make sure that all files fulfill the [requirements](/prepare-your-data). + + +!!! important +Since version 1.7.2 URL-based uploads (sftp and http) are no longer supported. Please use direct file uploads instead. + +### Build +Please select the build of your data. Currently the options **hg19** and **hg38** are supported. Michigan Imputation Server automatically updates the genome positions (liftOver) of your data. All reference panels are based on hg19 coordinates. + +### Scores and Trait Category + +Choose the precomputed Polygenic Score repository relevant to your study from the available options. Based on the selected repository, different trait categories appear and can be selected (e.g. Cancer scores): + + ![](images/pgs-repository.png) + +More details about all available PGS repositories can be found [here](/pgs/scores/). + +### Ancestry Estimation + +You can enable ancestry estimation by selecting a reference population used to classify your uploaded samples. Currently, we support a worldwide panel based on HGDP. + +## Start Polygenic Score Calculation + +After agreeing to the *Terms of Service*, initiate the calculation by clicking on **Submit job**. The system will perform Input Validation and Quality Control immediately. If your data passes these steps, the job is added to the queue for processing. + + ![](images/queue01.png) + +## Monitoring and Retrieving Results + +- **Input Validation**: Verify the validity of your uploaded files and review basic statistics. + + ![](images/input-validation01.png) + +- **Quality Control**: Examine the QC report and download statistics after the system filters variants based on various criteria. + + ![](images/quality-control02.png) + +- **Polygenic Score Calculation**: Monitor the progress of the imputation and polygenic scores calculation in real time for each chromosome. + + ![](images/imputation01.png) + +## Downloading Results + +Upon completion, you will be notified by email if you enter your address on registration. A zip archive containing results can be downloaded directly from the server. + + ![](images/job-results.png) + +Click on the filename to download results directly via a web-browser. For command line downloads, use the **share** symbol to obtain private links. + +**Important**: All data is automatically deleted after 7 days. Download needed data within this timeframe. A reminder is sent 48 hours before data deletion. diff --git a/docs/pgs/images/imputation01.png b/docs/pgs/images/imputation01.png new file mode 100644 index 00000000..e7e52ad1 Binary files /dev/null and b/docs/pgs/images/imputation01.png differ diff --git a/docs/pgs/images/input-validation01.png b/docs/pgs/images/input-validation01.png new file mode 100644 index 00000000..379673e8 Binary files /dev/null and b/docs/pgs/images/input-validation01.png differ diff --git a/docs/pgs/images/pgs-repository.png b/docs/pgs/images/pgs-repository.png new file mode 100644 index 00000000..76c12f7e Binary files /dev/null and b/docs/pgs/images/pgs-repository.png differ diff --git a/docs/pgs/images/pipeline.png b/docs/pgs/images/pipeline.png new file mode 100644 index 00000000..0a675747 Binary files /dev/null and b/docs/pgs/images/pipeline.png differ diff --git a/docs/pgs/images/quality-control02.png b/docs/pgs/images/quality-control02.png new file mode 100644 index 00000000..5495fb23 Binary files /dev/null and b/docs/pgs/images/quality-control02.png differ diff --git a/docs/pgs/images/report-01.png b/docs/pgs/images/report-01.png new file mode 100644 index 00000000..4f3ee91d Binary files /dev/null and b/docs/pgs/images/report-01.png differ diff --git a/docs/pgs/images/report-02.png b/docs/pgs/images/report-02.png new file mode 100644 index 00000000..53a41ca3 Binary files /dev/null and b/docs/pgs/images/report-02.png differ diff --git a/docs/pgs/images/submit-job01.png b/docs/pgs/images/submit-job01.png new file mode 100644 index 00000000..9a49433c Binary files /dev/null and b/docs/pgs/images/submit-job01.png differ diff --git a/docs/pgs/images/upload-data01.png b/docs/pgs/images/upload-data01.png new file mode 100644 index 00000000..5c7d7a8f Binary files /dev/null and b/docs/pgs/images/upload-data01.png differ diff --git a/docs/pgs/images/upload-data02.png b/docs/pgs/images/upload-data02.png new file mode 100644 index 00000000..0f9fa025 Binary files /dev/null and b/docs/pgs/images/upload-data02.png differ diff --git a/docs/pgs/output-files.md b/docs/pgs/output-files.md new file mode 100644 index 00000000..2b362c45 --- /dev/null +++ b/docs/pgs/output-files.md @@ -0,0 +1,38 @@ +# Output Files + +The Polygenic Score Calculation Results CSV file provides Polygenic Score (PGS) values for different samples and associated identifiers. +Users can leverage this CSV file to analyze and compare Polygenic Score values across different samples. The data facilitates the investigation of genetic associations and their impact on specific traits or conditions. + +## CSV Format + +The CSV file consists of a header row and data rows: + +### Header Row + +- **sample**: Represents the identifier for each sample. +- **PGS000001, PGS000002, PGS000003, ...**: Columns representing different Polygenic Score values associated with the respective identifiers. + +### Data Rows + +- Each row corresponds to a sample and provides the following information: + - **sample**: Identifier for the sample. + - **PGS000001, PGS000002, PGS000003, ...**: Polygenic Score values associated with the respective identifiers for the given sample. + +### Example + +Here's an example row: + +```csv +sample, PGS000001, PGS000002, PGS000003, ... +sample1, -4.485780284301654, 4.119604924228042, 0.0, -4.485780284301654 +``` + +- **sample1**: Sample identifier. + - **-4.485780284301654**: Polygenic Score value for `PGS000001`. + - **4.119604924228042**: Polygenic Score value for `PGS000002`. + - **0.0**: Polygenic Score value for `PGS000003`. + +**Note:** + +- Polygenic Score values are provided as floating-point numbers. +- The absence of values (e.g., `0.0`) indicates a lack of Polygenic Score information for a particular identifier in a given sample. diff --git a/docs/pgs/pipeline.md b/docs/pgs/pipeline.md new file mode 100644 index 00000000..133e6ea2 --- /dev/null +++ b/docs/pgs/pipeline.md @@ -0,0 +1,11 @@ +# Pipeline + +![pipeline.png](images%2Fpipeline.png) + + + + + + +## Ancestry estimation +We use LASER to perform principal components analysis (PCA) based on the genotypes of each sample and to place them into a reference PCA space which was constructed using a set of reference individuals [14]. We built reference coordinates based on 938 samples from the Human Genome Diversity Project (HGDP) [15] and labeled them by the ancestry categories proposed by the GWASCatalog [16] which are also used in PGS Catalog. \ No newline at end of file diff --git a/docs/pgs/reference-panels.md b/docs/pgs/reference-panels.md new file mode 100644 index 00000000..e2dbd3d6 --- /dev/null +++ b/docs/pgs/reference-panels.md @@ -0,0 +1,45 @@ +# Reference Panels for PGS Calculation + +Our server offers PGS calculation from the following reference panels: + + +## HRC (Version r1.1 2016) + +The HRC panel consists of 64,940 haplotypes of predominantly European ancestry. + +| || +| | | +| Number of Samples | 32,470 | +| Sites (chr1-22) | 39,635,008 | +| Chromosomes | 1-22, X| +| Website | [http://www.haplotype-reference-consortium.org](http://www.haplotype-reference-consortium.org); [HRC r1.1 Release Note](https://imputationserver.sph.umich.edu/start.html#!pages/hrc-r1.1) | + +## 1000 Genomes Phase 3 (Version 5) + +Phase 3 of the 1000 Genomes Project consists of 5,008 haplotypes from 26 populations across the world. + +| || +| | | +| Number of Samples | 2,504 | +| Sites (chr1-22) | 49,143,605 | +| Chromosomes | 1-22, X| +| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) | + + +## 1000 Genomes Phase 1 (Version 3) + +| || +| | | +| Number of Samples | 1,092 | +| Sites (chr1-22) | 28,975,367 | +| Chromosomes | 1-22, X| +| Website | [http://www.internationalgenome.org](http://www.internationalgenome.org) | + +## HapMap 2 + +| || +| | | +| Number of Samples | 60 | +| Sites (chr1-22) | 2,542,916 | +| Chromosomes | 1-22 | +| Website: | [http://www.hapmap.org](http://www.hapmap.org) | diff --git a/docs/pgs/report.md b/docs/pgs/report.md new file mode 100644 index 00000000..c899222c --- /dev/null +++ b/docs/pgs/report.md @@ -0,0 +1,14 @@ +# Interactive Report + +The created report contains a list of all scores, where each score has a different color based on its coverage. The color green indicates that the coverage is very high and nearly all SNPs from the score were also found in the imputed dataset. The color red indicates that very few SNPs were found and the coverage is therefore low. + +![report.png](images/report-01.png) + +In addition, the report includes detailed metadata for each score such as the number of variants, the number of well-imputed genotypes and the population used to construct the score. A direct link to PGS Catalog, Cancer PRSWeb or ExPRSWeb is also available for further investigation (e.g. for getting information about the method that was used to construct the score). Further, the report displays the distribution of the scores of all uploaded samples and can be interactively explored. This allows users to detect samples with either a high or low risk immediately. + +Moreover, the report gives an overview of all estimated ancestries from the uploaded genotypes and compares them with the populations of the GWAS that was used to create the score. + +![report.png](images/report-02.png) + + +If an uploaded sample with an unsupported population is detected, a warning message is provided and the sample is excluded from the summary statistics. diff --git a/docs/pgs/scores.md b/docs/pgs/scores.md new file mode 100644 index 00000000..0493c05b --- /dev/null +++ b/docs/pgs/scores.md @@ -0,0 +1,21 @@ +# Scores + +We support currently the following PGS repositories out of the box: + +## PGS-Catalog + +We use PGS Catalog as the source of scores for PGS Server (version 19. Jan 2023). the PGS Catalog is an online database that collects and annotates published scores and currently provides access to over 3,900 scores encompassing more than 580 traits. + +> Samuel A. Lambert, Laurent Gil, Simon Jupp, Scott C. Ritchie, Yu Xu, Annalisa Buniello, Aoife McMahon, Gad Abraham, Michael Chapman, Helen Parkinson, John Danesh, Jacqueline A. L. MacArthur and Michael Inouye. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. doi: 10.1038/s41588-021-00783-5 (2021). + +## Cancer-PRSweb + +Collection of scores for major cancer traits. + +> Fritsche LG, Patil S, Beesley LJ, VandeHaar P, Salvatore M, Ma Y, Peng RB, Taliun D, Zhou X, Mukherjee B: Cancer PRSweb: An Online Repository with Polygenic Risk Scores for Major Cancer Traits and Their Evaluation in Two Independent Biobanks. Am J Hum Genet 2020, 107(5):815-836. + +## ExPRSweb + +Collection of scores for common health-related exposures like body mass index or alcohol consumption. + +> Ma Y, Patil S, Zhou X, Mukherjee B, Fritsche LG: ExPRSweb: An online repository with polygenic risk scores for common health-related exposures. Am J Hum Genet 2022, 109(10):1742-1760. diff --git a/files/imputationserver-pgs.yaml b/files/imputationserver-pgs.yaml index 11dd8649..b3a865f8 100644 --- a/files/imputationserver-pgs.yaml +++ b/files/imputationserver-pgs.yaml @@ -1,8 +1,10 @@ id: imputationserver-pgs -name: Genotype Imputation (PGS Calc Integration) -description: This is the new Michigan Imputation Server Pipeline using Minimac4. Documentation can be found here.

If your input data is GRCh37/hg19 please ensure chromosomes are encoded without prefix (e.g. 20).
If your input data is GRCh38hg38 please ensure chromosomes are encoded with prefix 'chr' (e.g. chr20). +name: Polygenic Score Calculation +description: "You can upload genotyped data and the application imputes your genotypes, performs ancestry estimation and finally calculates Polygenic Risk Scores.

No dataset at hand? No problem, download our example dataset: 50-samples.zip

" + + version: 1.8.0 -website: https://imputationserver.readthedocs.io +website: https://imputationserver.readthedocs.io/en/latest/pgs/getting-started category: installation: @@ -53,11 +55,13 @@ workflow: generates: $local $outputimputation $logfile $hadooplogs binaries: ${app_hdfs_folder}/bin +#if( $reference != "disabled") - name: Ancestry Estimation jar: imputationserver.jar classname: genepi.imputationserver.steps.ancestry.TraceStep binaries: ${app_hdfs_folder}/bin references: ${app_hdfs_folder}/references +#end - name: Data Compression and Encryption jar: imputationserver.jar @@ -95,6 +99,7 @@ workflow: 0.1: 0.1 0.2: 0.2 0.3: 0.3 + visible: false - id: phasing description: Phasing @@ -103,14 +108,13 @@ workflow: values: eagle: Eagle v2.4 (phased output) no_phasing: No phasing + visible: false - id: population description: Population - type: list - values: - bind: refpanel - property: populations - category: RefPanel + value: mixed + type: text + visible: false - id: mode description: Mode @@ -120,6 +124,7 @@ workflow: qconly: Quality Control Only imputation: Quality Control & Imputation phasing: Quality Control & Phasing Only + visible: false - id: aesEncryption description: AES 256 encryption @@ -129,7 +134,7 @@ workflow: values: true: yes false: no - visible: true + visible: false - id: meta description: Generate Meta-imputation file @@ -138,7 +143,7 @@ workflow: values: true: yes false: no - visible: true + visible: false - id: myseparator0 type: separator @@ -154,14 +159,24 @@ workflow: required: true category: PGSPanel + - id: pgsCategory + description: Trait Category + type: list + values: + bind: pgsPanel + property: categories + category: PGSPanel + - id: reference - description: Reference Populations + description: Ancestry Estimation type: list required: true - value: HGDP_938_genotyped + value: disabled values: + disabled: "Disabled" HGDP_938_genotyped: Worldwide (HGDP) - HGDP_938_imputed: Worldwide (imputed HGDP) + #HGDP_938_imputed: Worldwide (imputed HGDP) + visible: true - id: dim description: Number of principal components to compute diff --git a/mkdocs.yml b/mkdocs.yml index 9f4f4b8e..40350353 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -3,12 +3,21 @@ theme: readthedocs nav: - Home: index.md -- Getting Started: getting-started.md -- Data Preparation: prepare-your-data.md -- Reference Panels: reference-panels.md -- Pipeline Overview: pipeline.md -- Security: data-sensitivity.md -- FAQ: faq.md +- Genotype Imputation: + - Getting Started: getting-started.md + - Data Preparation: prepare-your-data.md + - Reference Panels: reference-panels.md + - Pipeline Overview: pipeline.md + - Security: data-sensitivity.md + - FAQ: faq.md +- Polygenic Score Calculation: + - Getting Started: pgs/getting-started.md + - Interactive Report: pgs/report.md + - Output Files: pgs/output-files.md + - Reference Panels: pgs/reference-panels.md + - Available Scores: pgs/scores.md + - Pipeline Overview: pgs/pipeline.md + - FAQ: pgs/faq.md - Developer Documentation: - API: api.md - Docker: docker.md diff --git a/pages/home.stache b/pages/home.stache index ef17f0e2..07f7875a 100755 --- a/pages/home.stache +++ b/pages/home.stache @@ -2,15 +2,15 @@

Michigan Imputation Server

- Free Next-Generation Genotype Imputation Service + Free Next-Generation Genotype Imputation Platform

- <% if(!loggedIn) {%> + {{#is(loggedIn, false)}}


- Sign up now  - Login + Sign up now  + Login

- <% } %> + {{/is}}
@@ -22,35 +22,123 @@

- <%= counter.attr('complete.chromosomes') ? (Math.round(counter.attr('complete.chromosomes') / 22.0 /1000.0/1000.0 * 10) / 10).toLocaleString() : '0'%> M
Imputed Genomes + {{div(counter.complete.chromosomes, 22000000)}}M
Imputed Genomes

- <%= Math.round(counter.attr('users')).toLocaleString() %>
Registerd Users + {{counter.users}}
Registered Users

- > 100
Published GWAS + {{#counter.running.runs}}{{.}}{{else}}0{{/counter.running.runs}}
Running Jobs

+ + +
+
+ +
+
+
+

Genotype Imputation

+
+

+ You can upload genotyping data and the application imputes your genotypes against different reference panels. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+ +
+
+
+
+

HLA Imputation

+
+

+ Enables accurate prediction of human leukocyte antigen (HLA) genotypes from your uploaded genotyping data using multi-ancestry reference panels. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+
+
+
+
+

Polygenic Score Calculation

+
+

+ You can upload genotyping data and the application imputes your genotypes, performs ancestry estimation and finally calculates Polygenic Risk Scores. +

+

+ {{#is(loggedIn, false)}} + Run + {{ else }} + Run + {{/is}} +   Learn more +

+
+
+
+ +
+
+
+ +

Latest News

+
-
Latest News
-
+

+ 21 May 2021
+ We have increased the max sample size to 110k. +

+ 15 April 2021
+ Update to new framework completed! Currently, max sample size will be limited to 25k, but we expect to lift this limitation in the next few weeks. +

+

+ 18 March 2020
+ Due to coronavirus-related impacts support may be slower than usual. If you haven't heard back from us after a week or so, feel free to e-mail again to check on the status of things. Take care! +

+ 07 November 2019
+ Updated MIS to v1.2.4! Major improvements: Minimac4 for imputation, improved chrX support, QC check right after upload, better documentation. Checkout out our GitHub repository for further information. +

+

+ 17 October 2019
+ Michigan Imputation Server at ASHG19. All information is available here. +

+ +

+ 27 November 2018
+ Redesigned user interface to improve user experience. +

+

27 June 2017
Updated pipeline to v1.0.2. Release notes can be found here.

@@ -58,17 +146,13 @@ 29 Aug 2016
Imputation server paper is out now: Das et al., Nature Genetics 2016

-

- 14 June 2016
- Supporting 23andMe input format. -

19 April 2016
Updated HRC Panel (r1.1) available.

12 January 2016
- New Reference Panel (CAAPA) available. + New Reference Panel (CAAPA) available.

24 April 2015
@@ -80,7 +164,9 @@

- <% !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); %> +
@@ -103,19 +189,19 @@
-

+

Upload your genotypes to our server located in Michigan.
All interactions with the server are secured.
-

+

Choose a reference panel. We will take care of pre-phasing and imputation.
-

+

Download the results.
All results are encrypted with a one-time password. After 7 days, all results are deleted from our server.
@@ -127,7 +213,7 @@
-

Up to date and accurate reference panels

+

Wide-range of reference panels supported

@@ -183,17 +269,17 @@

- +

- Imputation Server is open source and easy to install on your own Hadoop cluster. + Imputation Server is open source and easy to install on your own Hadoop cluster or use Docker.

- +

Host your own confidential reference panels in a secure and private environment. @@ -203,7 +289,7 @@

- +

You have full control about the service. Write us to get more information. diff --git a/pages/images/github.png b/pages/images/github.png new file mode 100644 index 00000000..32d55610 Binary files /dev/null and b/pages/images/github.png differ diff --git a/pom.xml b/pom.xml index 032c9684..fa62633c 100644 --- a/pom.xml +++ b/pom.xml @@ -296,7 +296,7 @@ lukfor pgs-calc - 1.5.5 + 1.6.1 diff --git a/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java b/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java index a5020b9d..b4c6ddce 100644 --- a/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java +++ b/src/main/java/genepi/imputationserver/steps/CompressionEncryption.java @@ -106,159 +106,160 @@ public boolean run(WorkflowContext context) { context.beginTask("Export data..."); - // get sorted directories - List folders = HdfsUtil.getDirectories(output); + if (pgsPanel == null) { - ImputationResults imputationResults = new ImputationResults(folders, phasingOnly); - Map imputedChromosomes = imputationResults.getChromosomes(); + // get sorted directories + List folders = HdfsUtil.getDirectories(output); - Set chromosomes = imputedChromosomes.keySet(); - boolean lastChromosome = false; - int index = 0; + ImputationResults imputationResults = new ImputationResults(folders, phasingOnly); + Map imputedChromosomes = imputationResults.getChromosomes(); - String checksumFilename = FileUtil.path(localOutput, "results.md5"); - LineWriter writer = new LineWriter(checksumFilename); + Set chromosomes = imputedChromosomes.keySet(); + boolean lastChromosome = false; + int index = 0; - for (String name : chromosomes) { + String checksumFilename = FileUtil.path(localOutput, "results.md5"); + LineWriter writer = new LineWriter(checksumFilename); - index++; + for (String name : chromosomes) { - if (index == chromosomes.size()) { - lastChromosome = true; - } + index++; - ImputedChromosome imputedChromosome = imputedChromosomes.get(name); + if (index == chromosomes.size()) { + lastChromosome = true; + } - context.println("Export and merge chromosome " + name); + ImputedChromosome imputedChromosome = imputedChromosomes.get(name); - // create temp dir - String temp = FileUtil.path(localOutput, "temp"); - FileUtil.createDirectory(temp); + context.println("Export and merge chromosome " + name); - // output files + // create temp dir + String temp = FileUtil.path(localOutput, "temp"); + FileUtil.createDirectory(temp); - ArrayList files = new ArrayList(); + // output files - // merge info files - if (!phasingOnly) { - String infoOutput = FileUtil.path(temp, "chr" + name + ".info.gz"); - FileMerger.mergeAndGzInfo(imputedChromosome.getInfoFiles(), infoOutput); - files.add(new File(infoOutput)); - } + ArrayList files = new ArrayList(); - // merge all dosage files + // merge info files + if (!phasingOnly) { + String infoOutput = FileUtil.path(temp, "chr" + name + ".info.gz"); + FileMerger.mergeAndGzInfo(imputedChromosome.getInfoFiles(), infoOutput); + files.add(new File(infoOutput)); + } - String dosageOutput; - if (phasingOnly) { - dosageOutput = FileUtil.path(temp, "chr" + name + ".phased.vcf.gz"); - } else { - dosageOutput = FileUtil.path(temp, "chr" + name + ".dose.vcf.gz"); - } - files.add(new File(dosageOutput)); + // merge all dosage files - MergedVcfFile vcfFile = new MergedVcfFile(dosageOutput); - vcfFile.addHeader(context, imputedChromosome.getHeaderFiles()); + String dosageOutput; + if (phasingOnly) { + dosageOutput = FileUtil.path(temp, "chr" + name + ".phased.vcf.gz"); + } else { + dosageOutput = FileUtil.path(temp, "chr" + name + ".dose.vcf.gz"); + } + files.add(new File(dosageOutput)); - for (String file : imputedChromosome.getDataFiles()) { - context.println("Read file " + file); - vcfFile.addFile(HdfsUtil.open(file)); - HdfsUtil.delete(file); - } + MergedVcfFile vcfFile = new MergedVcfFile(dosageOutput); + vcfFile.addHeader(context, imputedChromosome.getHeaderFiles()); - vcfFile.close(); + for (String file : imputedChromosome.getDataFiles()) { + context.println("Read file " + file); + vcfFile.addFile(HdfsUtil.open(file)); + HdfsUtil.delete(file); + } - // merge all meta files - if (mergeMetaFiles) { + vcfFile.close(); - context.println("Merging meta files..."); + // merge all meta files + if (mergeMetaFiles) { - String dosageMetaOutput = FileUtil.path(temp, "chr" + name + ".empiricalDose.vcf.gz"); - MergedVcfFile vcfFileMeta = new MergedVcfFile(dosageMetaOutput); + context.println("Merging meta files..."); - String headerMetaFile = imputedChromosome.getHeaderMetaFiles().get(0); - context.println("Use header from file " + headerMetaFile); + String dosageMetaOutput = FileUtil.path(temp, "chr" + name + ".empiricalDose.vcf.gz"); + MergedVcfFile vcfFileMeta = new MergedVcfFile(dosageMetaOutput); - vcfFileMeta.addFile(HdfsUtil.open(headerMetaFile)); + String headerMetaFile = imputedChromosome.getHeaderMetaFiles().get(0); + context.println("Use header from file " + headerMetaFile); - for (String file : imputedChromosome.getDataMetaFiles()) { - context.println("Read file " + file); - vcfFileMeta.addFile(HdfsUtil.open(file)); - HdfsUtil.delete(file); - } - vcfFileMeta.close(); + vcfFileMeta.addFile(HdfsUtil.open(headerMetaFile)); - context.println("Meta files merged."); + for (String file : imputedChromosome.getDataMetaFiles()) { + context.println("Read file " + file); + vcfFileMeta.addFile(HdfsUtil.open(file)); + HdfsUtil.delete(file); + } + vcfFileMeta.close(); - files.add(new File(dosageMetaOutput)); - } + context.println("Meta files merged."); - if (sanityCheck.equals("yes") && lastChromosome) { - context.println("Run tabix on chromosome " + name + "..."); - Command tabix = new Command(FileUtil.path(workingDirectory, "bin", "tabix")); - tabix.setSilent(false); - tabix.setParams("-f", dosageOutput); - if (tabix.execute() != 0) { - context.endTask("Error during index creation: " + tabix.getStdOut(), WorkflowContext.ERROR); - return false; + files.add(new File(dosageMetaOutput)); } - context.println("Tabix done."); - } - // create zip file - String fileName = "chr_" + name + ".zip"; - String filePath = FileUtil.path(localOutput, fileName); - File file = new File(filePath); - createEncryptedZipFile(file, files, password, aesEncryption); + if (sanityCheck.equals("yes") && lastChromosome) { + context.println("Run tabix on chromosome " + name + "..."); + Command tabix = new Command(FileUtil.path(workingDirectory, "bin", "tabix")); + tabix.setSilent(false); + tabix.setParams("-f", dosageOutput); + if (tabix.execute() != 0) { + context.endTask("Error during index creation: " + tabix.getStdOut(), WorkflowContext.ERROR); + return false; + } + context.println("Tabix done."); + } - // add checksum to hash file - context.println("Creating file checksum for " + filePath); - long checksumStart = System.currentTimeMillis(); - String checksum = FileChecksum.HashFile(new File(filePath), FileChecksum.Algorithm.MD5); - writer.write(checksum + " " + fileName); - long checksumEnd = (System.currentTimeMillis() - checksumStart) / 1000; - context.println("File checksum for " + filePath + " created in " + checksumEnd + " seconds."); + // create zip file + String fileName = "chr_" + name + ".zip"; + String filePath = FileUtil.path(localOutput, fileName); + File file = new File(filePath); + createEncryptedZipFile(file, files, password, aesEncryption); - // delete temp dir - FileUtil.deleteDirectory(temp); + // add checksum to hash file + context.println("Creating file checksum for " + filePath); + long checksumStart = System.currentTimeMillis(); + String checksum = FileChecksum.HashFile(new File(filePath), FileChecksum.Algorithm.MD5); + writer.write(checksum + " " + fileName); + long checksumEnd = (System.currentTimeMillis() - checksumStart) / 1000; + context.println("File checksum for " + filePath + " created in " + checksumEnd + " seconds."); - IExternalWorkspace externalWorkspace = context.getExternalWorkspace(); + // delete temp dir + FileUtil.deleteDirectory(temp); - if (externalWorkspace != null) { + IExternalWorkspace externalWorkspace = context.getExternalWorkspace(); - long start = System.currentTimeMillis(); + if (externalWorkspace != null) { - context.println("External Workspace '" + externalWorkspace.getName() + "' found"); + long start = System.currentTimeMillis(); - context.println("Start file upload: " + filePath); + context.println("External Workspace '" + externalWorkspace.getName() + "' found"); - String url = externalWorkspace.upload("local", file); + context.println("Start file upload: " + filePath); - long end = (System.currentTimeMillis() - start) / 1000; + String url = externalWorkspace.upload("local", file); - context.println("Upload finished in " + end + " sec. File Location: " + url); + long end = (System.currentTimeMillis() - start) / 1000; - context.println("Add " + localOutput + " to custom download"); + context.println("Upload finished in " + end + " sec. File Location: " + url); - String size = FileUtils.byteCountToDisplaySize(file.length()); + context.println("Add " + localOutput + " to custom download"); - context.addDownload("local", fileName, size, url); + String size = FileUtils.byteCountToDisplaySize(file.length()); - FileUtil.deleteFile(filePath); + context.addDownload("local", fileName, size, url); - context.println("File deleted: " + filePath); + FileUtil.deleteFile(filePath); - } else { - context.println("No external Workspace set."); - } - } + context.println("File deleted: " + filePath); - writer.close(); + } else { + context.println("No external Workspace set."); + } + } - // delete temporary files - HdfsUtil.delete(output); + writer.close(); - // Export calculated risk scores - if (pgsPanel != null) { + // delete temporary files + HdfsUtil.delete(output); + } else { + // Export calculated risk scores context.println("Exporting PGS scores..."); @@ -310,7 +311,7 @@ public boolean run(WorkflowContext context) { String fileName = "scores.zip"; String filePath = FileUtil.path(pgsOutput, fileName); File file = new File(filePath); - createEncryptedZipFile(file, new File(outputFileScores), password, aesEncryption); + createZipFile(file, new File(outputFileScores)); context.println("Exported PGS scores to " + fileName + "."); @@ -354,7 +355,7 @@ public boolean run(WorkflowContext context) { String fileNameReport = "scores.report.zip"; File fileReport = new File(FileUtil.path(pgsOutput, fileNameReport)); - createEncryptedZipFileFromFolder(fileReport, new File(extendedHtmlFolder), password, aesEncryption); + createZipFile(fileReport, new File(extendedHtmlFolder)); context.println("Created reports " + outputFileHtml + " and " + fileReport.getPath() + "."); @@ -385,26 +386,41 @@ public boolean run(WorkflowContext context) { Object mail = context.getData("cloudgene.user.mail"); Object name = context.getData("cloudgene.user.name"); - if (mail != null) { + if (mail != null && !mail.toString().isEmpty()) { String subject = "Job " + context.getJobId() + " is complete."; - String message = "Dear " + name + ",\nthe password for the imputation results is: " + password - + "\n\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" - + context.getJobId() + "/results"; + String message = ""; + if (pgsPanel == null) { + message = "Dear " + name + ",\nthe password for the imputation results is: " + password + + "\n\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" + + context.getJobId() + "/results"; + } else { + message = "Dear " + name + ",\nThe results can be downloaded from " + serverUrl + "/start.html#!jobs/" + + context.getJobId() + "/results"; + } try { context.sendMail(subject, message); - context.ok("We have sent an email to " + mail + " with the password."); + if (pgsPanel == null) { + context.ok("We have sent an email to " + mail + " with the password."); + } else { + context.ok("We have sent a notification email to " + mail + "."); + } return true; } catch (Exception e) { - context.println("Data compression failed: " + ExceptionUtils.getStackTrace(e)); - context.error("Data compression failed: " + e.getMessage()); + context.println("Sending notification email failed: " + ExceptionUtils.getStackTrace(e)); + context.error("Sending notification email failed: " + e.getMessage()); return false; } } else { - context.error("No email address found. Please enter your email address (Account -> Profile)."); - return false; + if (pgsPanel == null) { + context.error("No email address found. Please enter your email address (Account -> Profile)."); + return false; + } else { + context.ok("PGS report created successfully."); + return true; + } } } else { @@ -477,4 +493,15 @@ public void createEncryptedZipFileFromFolder(File file, File folder, String pass zipFile.close(); } + public void createZipFile(File file, File folder) throws IOException { + ZipFile zipFile = new ZipFile(file); + if (folder.isFile()){ + zipFile.addFile(folder); + } else { + zipFile.addFolder(folder); + } + zipFile.close(); + } + + } diff --git a/src/main/java/genepi/imputationserver/steps/Imputation.java b/src/main/java/genepi/imputationserver/steps/Imputation.java index 173cbaa6..f5050c48 100644 --- a/src/main/java/genepi/imputationserver/steps/Imputation.java +++ b/src/main/java/genepi/imputationserver/steps/Imputation.java @@ -19,6 +19,7 @@ import genepi.imputationserver.util.RefPanelList; import genepi.io.FileUtil; import genepi.io.text.LineReader; +import genepi.riskscore.commands.FilterMetaCommand; public class Imputation extends ParallelHadoopJobStep { @@ -63,6 +64,7 @@ public boolean run(WorkflowContext context) { String mode = context.get("mode"); String phasing = context.get("phasing"); PgsPanel pgsPanel = PgsPanel.loadFromProperties(context.getData("pgsPanel")); + String pgsCategory = context.get("pgsCategory"); String r2Filter = context.get("r2Filter"); if (r2Filter == null) { @@ -123,10 +125,29 @@ public boolean run(WorkflowContext context) { context.println(" " + entry.getKey() + "/" + entry.getValue()); } } + + String includeScoreFilenameHdfs = null; if (pgsPanel != null) { - context.println(" PGS: " + pgsPanel.getScores().size() + " scores"); + context.println(" PGS: " + FileUtil.getFilename(pgsPanel.getScores())); + + if (pgsCategory != null && !pgsCategory.isEmpty() && !pgsCategory.equals("all")) { + String includeScoreFilename = FileUtil.path(context.getLocalTemp(), "include-scores.txt"); + FilterMetaCommand filter = new FilterMetaCommand(); + filter.setCategory(pgsCategory); + filter.setMeta(pgsPanel.getMeta()); + filter.setOut(includeScoreFilename); + int result = 0; + try { + result = filter.call(); + } catch (Exception e) { + throw new RuntimeException(e); + } + includeScoreFilenameHdfs = HdfsUtil.path(context.getHdfsTemp(), "include-scores.txt"); + HdfsUtil.put(includeScoreFilename, includeScoreFilenameHdfs); + } + } else { - context.println(" PGS: no scores selected"); + context.println(" PGS: no score file selected"); } // execute one job per chromosome @@ -229,6 +250,9 @@ protected void readConfigFile() { } if (pgsPanel != null) { + if (includeScoreFilenameHdfs != null) { + job.setIncludeScoreFilenameHDFS(includeScoreFilenameHdfs); + } job.setScores(pgsPanel.getScores()); } job.setRefPanel(reference); diff --git a/src/main/java/genepi/imputationserver/steps/InputValidation.java b/src/main/java/genepi/imputationserver/steps/InputValidation.java index 4c825c9a..2f54fe6d 100644 --- a/src/main/java/genepi/imputationserver/steps/InputValidation.java +++ b/src/main/java/genepi/imputationserver/steps/InputValidation.java @@ -232,7 +232,7 @@ private boolean checkVcfFiles(WorkflowContext context) { + (phased ? "phased" : "unphased") + "\n" + "Build: " + (build == null ? "hg19" : build) + "\n" + "Reference Panel: " + reference + " (" + panel.getBuild() + ")" + "\n" + "Population: " + population + "\n" + "Phasing: " + phasing + "\n" + "Mode: " + mode - + (pgsPanel != null ? "\n" + "PGS-Calculation: " + pgsPanel.getScores().size() + " scores" + + (pgsPanel != null ? "\n" + "PGS-Calculation: " + context.get("pgsPanel") + " (" + context.get("pgsCategory") + ")" : ""); if (r2Filter != null && !r2Filter.isEmpty() && !r2Filter.equals("0")) { diff --git a/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java b/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java index 0935dca3..c2158afc 100644 --- a/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java +++ b/src/main/java/genepi/imputationserver/steps/ancestry/TraceStep.java @@ -128,7 +128,7 @@ public boolean prepareTraceJobs(WorkflowContext context) { HdfsUtil.put(mergedFile, HdfsUtil.path(vcfHdfsDir, "study.merged.vcf.gz")); // read number of samples from first vcf file - VcfFile vcfFile = VcfFileUtil.load(mergedFile, 200000, false); + VcfFile vcfFile = VcfFileUtil.load(files[0], 200000, false); int nIndividuals = vcfFile.getNoSamples(); int batch = 0; @@ -168,8 +168,9 @@ public boolean prepareTraceJobs(WorkflowContext context) { return true; } catch (IOException e) { - context.error("An internal server error occured."); - e.printStackTrace(); + + context.error("An internal server error occurred.\n" + exceptionToString(e)); + } context.error("Execution failed. Please, contact administrator."); @@ -209,8 +210,7 @@ public boolean checkDataAndMerge(WorkflowContext context, String[] files, String return true; } catch (IOException e) { - context.error("Input Validation failed: " + e); - e.printStackTrace(); + context.error("Input Validation failed:\n" + exceptionToString(e)); return false; } } @@ -288,21 +288,11 @@ public boolean estimateAncestries(WorkflowContext context) { return true; - } catch (IOException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (InterruptedException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (ClassNotFoundException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); - } catch (URISyntaxException e) { - context.error("An internal server error occured while launching Hadoop job."); - e.printStackTrace(); + } catch (IOException | InterruptedException | ClassNotFoundException | URISyntaxException e) { + context.error("An internal server error occurred while launching Hadoop job.\n" + exceptionToString(e)); } - context.error("Execution failed. Please, contact administrator."); + context.error("Execution failed. Please, contact administrator."); return false; } @@ -326,37 +316,43 @@ public void progress(String message) { } return results; } catch (InterruptedException e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(e.getMessage(), WorkflowContext.ERROR); return result; } catch (Exception e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(task.getName() + " failed.", WorkflowContext.ERROR); return result; } catch (Error e) { - e.printStackTrace(); TaskResults result = new TaskResults(); result.setSuccess(false); result.setMessage(e.getMessage()); - StringWriter s = new StringWriter(); - e.printStackTrace(new PrintWriter(s)); - context.println("Task '" + task.getName() + "' failed.\nException:" + s.toString()); + context.println("Task '" + task.getName() + "' failed.\nException:" + exceptionToString(e)); context.endTask(task.getName() + " failed.", WorkflowContext.ERROR); return result; } } + private static String exceptionToString(Exception e) { + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + e.printStackTrace(pw); + return sw.toString(); + } + + private static String exceptionToString(Error e) { + StringWriter sw = new StringWriter(); + PrintWriter pw = new PrintWriter(sw); + e.printStackTrace(pw); + return sw.toString(); + } + } + diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java index 1c3ca88c..abbefb70 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationJob.java @@ -2,7 +2,6 @@ import java.io.IOException; import java.util.List; -import java.util.stream.Collectors; import org.apache.commons.logging.Log; import org.apache.hadoop.io.Text; @@ -44,7 +43,9 @@ public class ImputationJob extends HadoopJob { public static final String PHASING_ENGINE = "PHASING_ENGINE"; - public static final String SCORES = "SCORES"; + public static final String SCORE_FILE = "SCORES"; + + public static final String INCLUDE_SCORE_FILE = "INCLUDE_SCORE_FILE"; private String refPanelHdfs; @@ -62,7 +63,9 @@ public class ImputationJob extends HadoopJob { private String binariesHDFS; - private List scores; + private String scores; + + private String includeScoreFilenameHDFS; public ImputationJob(String name, Log log) { super(name, log); @@ -168,20 +171,33 @@ protected void setupDistributedCache(CacheStore cache) throws IOException { } } - // add scores to cache3 + // add scores to cache if (scores != null) { - log.info("Add " + scores.size() + " scores to distributed cache..."); - for (String score : scores) { - if (HdfsUtil.exists(score)) { - cache.addFile(score); - if (HdfsUtil.exists(score + ".format")) { - cache.addFile(score + ".format"); - } + log.info("Add " + scores + " scores to distributed cache..."); + if (HdfsUtil.exists(scores)) { + cache.addFile(scores); + if (HdfsUtil.exists(scores + ".info")) { + log.info("Add " + scores + ".info to distributed cache..."); + cache.addFile(scores + ".info"); + } + if (HdfsUtil.exists(scores + ".tbi")) { + log.info("Add " + scores + ".tbi to distributed cache..."); + cache.addFile(scores + ".tbi"); + } + } else { + log.info("PGS score file '" + scores + "' not found."); + throw new IOException("PGS score file '" + scores + "' not found."); + } + + if (includeScoreFilenameHDFS != null){ + if (HdfsUtil.exists(includeScoreFilenameHDFS)) { + cache.addFile(includeScoreFilenameHDFS); } else { - log.info("PGS score file '" + score + "' not found."); - throw new IOException("PGS score file '" + score + "' not found."); + log.info("Include score file '" + scores + "' not found."); + throw new IOException("Include score file '" + scores + "' not found."); } } + log.info("All scores added to distributed cache."); } @@ -283,10 +299,8 @@ public void setPhasingEngine(String phasing) { set(PHASING_ENGINE, phasing); } - public void setScores(List scores) { - - String scoresNames = scores.stream().collect(Collectors.joining(",")); - set(SCORES, scoresNames); + public void setScores(String scores) { + set(SCORE_FILE, scores); this.scores = scores; } @@ -294,4 +308,8 @@ public void setBinariesHDFS(String binariesHDFS) { this.binariesHDFS = binariesHDFS; } + public void setIncludeScoreFilenameHDFS(String includeScoreFilenameHDFS) { + set(INCLUDE_SCORE_FILE, includeScoreFilenameHDFS); + this.includeScoreFilenameHDFS = includeScoreFilenameHDFS; + } } \ No newline at end of file diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java index 806de4cb..77a896be 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationMapper.java @@ -33,7 +33,9 @@ public class ImputationMapper extends Mapper { private String outputScores; - private String[] scores; + private String scores; + + private String includeScoresFilename = null; private String refFilename = ""; @@ -168,28 +170,35 @@ protected void setup(Context context) throws IOException, InterruptedException { } // scores - String scoresFilenames = parameters.get(ImputationJob.SCORES); - if (scoresFilenames != null) { - String[] filenames = scoresFilenames.split(","); - scores = new String[filenames.length]; - for (int i = 0; i < scores.length; i++) { - String filename = filenames[i]; - String name = FileUtil.getFilename(filename); - String localFilename = cache.getFile(name); - scores[i] = localFilename; - // check if score file has format file - String formatFile = cache.getFile(name + ".format"); - if (formatFile != null) { - // create symbolic link to format file. they have to be in the same folder - Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name)), Paths.get(localFilename)); - Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".format")), Paths.get(formatFile)); - scores[i] = FileUtil.path(folder, name); - } + String scoresFilename = parameters.get(ImputationJob.SCORE_FILE); + if (scoresFilename != null) { + String name = FileUtil.getFilename(scoresFilename); + String localFilename = cache.getFile(name); + scores = localFilename; + // check if score file has info and tbi file + String infoFile = cache.getFile(name + ".info"); + String tbiFile = cache.getFile(name + ".tbi"); + if (infoFile != null && tbiFile != null) { + // create symbolic link to format file. they have to be in the same folder + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name)), Paths.get(localFilename)); + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".info")), Paths.get(infoFile)); + Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".tbi")), Paths.get(tbiFile)); + scores = FileUtil.path(folder, name); + } else { + throw new IOException("*info or *tbi file not available"); + } + System.out.println("Loaded " + FileUtil.getFilename(scoresFilename) + " from distributed cache"); + + String hdfsIncludeScoresFilename = parameters.get(ImputationJob.INCLUDE_SCORE_FILE); + if (hdfsIncludeScoresFilename != null){ + String includeScoresName = FileUtil.getFilename(hdfsIncludeScoresFilename); + includeScoresFilename = cache.getFile(includeScoresName); } - System.out.println("Loaded " + scores.length + " score files from distributed cache"); + + } else { - System.out.println("No scores files et."); + System.out.println("No scores file set."); } // create symbolic link --> index file is in the same folder as data @@ -264,6 +273,7 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio pipeline.setPhasingEngine(phasingEngine); pipeline.setPhasingOnly(phasingOnly); pipeline.setScores(scores); + pipeline.setIncludeScoreFilename(includeScoresFilename); boolean succesful = pipeline.execute(chunk, outputChunk); ImputationStatistic statistics = pipeline.getStatistic(); @@ -290,7 +300,10 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio statistics.setImportTime((end - start) / 1000); - } else { + } + + // push results only if not in PGS mode + else if (scores == null) { HdfsUtil.put(outputChunk.getInfoFilename(), HdfsUtil.path(output, chunk + ".info")); @@ -322,9 +335,7 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio System.out.println("Time filter and put: " + (end - start) + " ms"); - } - - if (scores != null) { + } else { HdfsUtil.put(outputChunk.getScoreFilename(), HdfsUtil.path(outputScores, chunk + ".scores.txt")); HdfsUtil.put(outputChunk.getScoreFilename() + ".json", @@ -353,41 +364,4 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio } } - public void filterInfoFileByR2(String input, String output, double minR2) throws IOException { - - LineReader readerInfo = new LineReader(input); - LineWriter writerInfo = new LineWriter(output); - - readerInfo.next(); - String header = readerInfo.get(); - - // find index for Rsq - String[] headerTiles = header.split("\t"); - int index = -1; - for (int i = 0; i < headerTiles.length; i++) { - if (headerTiles[i].equals("Rsq")) { - index = i; - } - } - - writerInfo.write(header); - - while (readerInfo.next()) { - String line = readerInfo.get(); - String[] tiles = line.split("\t"); - String value = tiles[index]; - try { - double r2 = Double.parseDouble(value); - if (r2 > minR2) { - writerInfo.write(line); - } - } catch (NumberFormatException e) { - writerInfo.write(line); - } - } - - readerInfo.close(); - writerInfo.close(); - - } } diff --git a/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java b/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java index 716f478e..b887eabc 100644 --- a/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java +++ b/src/main/java/genepi/imputationserver/steps/imputation/ImputationPipeline.java @@ -24,7 +24,6 @@ public class ImputationPipeline { - public static final String PIPELINE_VERSION = "michigan-imputationserver-1.8.0"; public static final String IMPUTATION_VERSION = "minimac-v4.1.6"; @@ -67,13 +66,15 @@ public class ImputationPipeline { private String mapBeagleFilename = ""; + private String includeScoreFilename = null; + private String build = "hg19"; private boolean phasingOnly; private String phasingEngine = ""; - private String[] scores; + private String scores; private ImputationStatistic statistic = new ImputationStatistic(); @@ -177,7 +178,7 @@ public boolean execute(VcfChunk chunk, VcfChunkOutput output) throws Interrupted return false; } - if (scores != null && scores.length >= 0) { + if (scores != null) { System.out.println(" Starting PGS calculation '" + scores + "'..."); @@ -347,7 +348,7 @@ private boolean runPgsCalc(VcfChunkOutput output) { String cacheDir = new File(output.getScoreFilename()).getParent(); PGSCatalog.CACHE_DIR = cacheDir; - if (scores == null || scores.length == 0) { + if (scores == null) { System.out.println("PGS calcuation failed. No score files set. "); return false; } @@ -361,25 +362,20 @@ private boolean runPgsCalc(VcfChunkOutput output) { ApplyScoreTask task = new ApplyScoreTask(); task.setVcfFilename(output.getImputedVcfFilename()); task.setChunk(scoreChunk); - task.setRiskScoreFilenames(scores); + task.setRiskScoreFilenames(new String[] { scores }); + if (includeScoreFilename != null && !includeScoreFilename.isEmpty()){ + task.setIncludeScoreFilename(includeScoreFilename); + } // TODO: enable fix-strand-flips // task.setFixStrandFlips(true); // task.setRemoveAmbiguous(true); - for (String file : scores) { - String autoFormat = file + ".format"; - if (new File(autoFormat).exists()) { - task.setRiskScoreFormat(file, RiskScoreFormat.MAPPING_FILE); - } - } - task.setOutputReportFilename(output.getScoreFilename() + ".json"); task.setOutput(output.getScoreFilename()); TaskService.setAnsiSupport(false); List runningTasks = TaskService.run(task); - for (Task runningTask : runningTasks) { if (!runningTask.getStatus().isSuccess()) { System.out.println("PGS-Calc failed: " + runningTask.getStatus().getThrowable()); @@ -424,6 +420,10 @@ public void setRefBeagleFilename(String refBeagleFilename) { this.refBeagleFilename = refBeagleFilename; } + public void setIncludeScoreFilename(String includeScoreFilename) { + this.includeScoreFilename = includeScoreFilename; + } + public void setMinimacCommand(String minimacCommand, String minimacParams) { this.minimacCommand = minimacCommand; this.minimacParams = minimacParams; @@ -459,7 +459,7 @@ public void setPhasingOnly(boolean phasingOnly) { this.phasingOnly = phasingOnly; } - public void setScores(String[] scores) { + public void setScores(String scores) { this.scores = scores; } diff --git a/src/main/java/genepi/imputationserver/util/PgsPanel.java b/src/main/java/genepi/imputationserver/util/PgsPanel.java index 73757ae8..65175c25 100644 --- a/src/main/java/genepi/imputationserver/util/PgsPanel.java +++ b/src/main/java/genepi/imputationserver/util/PgsPanel.java @@ -1,8 +1,6 @@ package genepi.imputationserver.util; -import java.util.List; import java.util.Map; -import java.util.Vector; import genepi.hadoop.HdfsUtil; @@ -14,7 +12,7 @@ public class PgsPanel { private String meta = null; - private List scores = new Vector<>(); + private String scores = null; private PgsPanel() { @@ -35,8 +33,7 @@ public static PgsPanel loadFromProperties(Object properties) { panel.meta = map.get("meta").toString(); } if (map.containsKey("scores")) { - List list = (List) map.get("scores"); - panel.scores = list; + panel.scores = map.get("scores").toString(); return panel; } else { return null; @@ -47,11 +44,8 @@ public static PgsPanel loadFromProperties(Object properties) { } - public List getScores() { - List scoresPath = new Vector(); - for (String score : scores) { - scoresPath.add(HdfsUtil.path(location, score)); - } + public String getScores() { + String scoresPath = HdfsUtil.path(scores); return scoresPath; } diff --git a/src/test/java/genepi/imputationserver/steps/ImputationTest.java b/src/test/java/genepi/imputationserver/steps/ImputationTest.java index 5bb7f60e..00b1d440 100644 --- a/src/test/java/genepi/imputationserver/steps/ImputationTest.java +++ b/src/test/java/genepi/imputationserver/steps/ImputationTest.java @@ -99,7 +99,7 @@ public void testPipelineWithPhased() throws IOException, ZipException { assertEquals(true, file.isPhased()); assertEquals(TOTAL_REFPANEL_CHR20_B37 + ONLY_IN_INPUT, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -150,7 +150,7 @@ public void testPipelineWithPhasedAndMetaOption() throws IOException, ZipExcepti assertEquals(true, file.isPhased()); assertEquals(TOTAL_REFPANEL_CHR20_B37 + ONLY_IN_INPUT, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -501,7 +501,7 @@ public void testValidatePanelPhasedInput() throws IOException, ZipException { assertEquals("n/a", header.getOtherHeaderLine("mis_phasing").getValue()); assertEquals(ImputationPipeline.PIPELINE_VERSION, header.getOtherHeaderLine("mis_pipeline").getValue()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } @@ -551,25 +551,22 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { String inputFolder = "test-data/data/chr20-unphased"; // import scores into hdfs - String score1 = PGSCatalog.getFilenameById("PGS000018"); - String score2 = PGSCatalog.getFilenameById("PGS000027"); + String targetScores = HdfsUtil.path("scores-hdfs", "scores.txt.gz"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz", targetScores); - String targetScore1 = HdfsUtil.path("scores-hdfs", "PGS000018.txt.gz"); - HdfsUtil.put(score1, targetScore1); + String targetIndex = HdfsUtil.path("scores-hdfs", "scores.txt.gz.tbi"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.tbi", targetIndex); - String targetScore2 = HdfsUtil.path("scores-hdfs", "PGS000027.txt.gz"); - HdfsUtil.put(score2, targetScore2); + String targetInfo = HdfsUtil.path("scores-hdfs", "scores.txt.gz.info"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.info", targetInfo); // create workflow context and set scores WorkflowTestContext context = buildContext(inputFolder, "hapmap2"); context.setOutput("outputScores", "cloudgene2-hdfs"); Map pgsPanel = new HashMap(); - List scores = new Vector(); - scores.add("PGS000018.txt.gz"); - scores.add("PGS000027.txt.gz"); - pgsPanel.put("location", "scores-hdfs"); - pgsPanel.put("scores", scores); + pgsPanel.put("scores", targetScores); + pgsPanel.put("meta", "test-data/data/pgs/test-scores.chr20.json"); pgsPanel.put("build", "hg19"); context.setData("pgsPanel", pgsPanel); @@ -601,27 +598,9 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { result = run(context, export); assertTrue(result); - ZipFile zipFile = new ZipFile("test-data/tmp/local/chr_20.zip", PASSWORD.toCharArray()); + ZipFile zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip"); zipFile.extractAll("test-data/tmp"); - - VcfFile file = VcfFileUtil.load("test-data/tmp/chr20.dose.vcf.gz", 100000000, false); - - assertEquals("20", file.getChromosome()); - assertEquals(51, file.getNoSamples()); - assertEquals(true, file.isPhased()); - assertEquals(TOTAL_REFPANEL_CHR20_B37, file.getNoSnps()); - - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz"); - assertEquals(snpInInfo, file.getNoSnps()); - - String[] args = { "test-data/tmp/chr20.dose.vcf.gz", "--ref", "PGS000018,PGS000027", "--out", - "test-data/tmp/expected.txt" }; - int resultScore = new CommandLine(new ApplyScoreCommand()).execute(args); - assertEquals(0, resultScore); - - zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip", PASSWORD.toCharArray()); - zipFile.extractAll("test-data/tmp"); - CsvTableReader readerExpected = new CsvTableReader("test-data/tmp/expected.txt", ','); + CsvTableReader readerExpected = new CsvTableReader("test-data/data/pgs/expected.txt", ','); CsvTableReader readerActual = new CsvTableReader("test-data/tmp/scores.txt", ','); while (readerExpected.next() && readerActual.next()) { @@ -635,37 +614,36 @@ public void testPipelineWithEagleAndScores() throws IOException, ZipException { new File("test-data/tmp/local/scores.html").exists(); FileUtil.deleteDirectory("test-data/tmp"); + zipFile.close(); } @Test - public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipException { + public void testPipelineWithEagleAndScoresAndCategory() throws IOException, ZipException { String configFolder = "test-data/configs/hapmap-chr20"; String inputFolder = "test-data/data/chr20-unphased"; // import scores into hdfs - String score1 = "test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"; - String format1 = "test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format"; + String targetScores = HdfsUtil.path("scores-hdfs", "scores.txt.gz"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz", targetScores); - String targetScore1 = HdfsUtil.path("scores-hdfs", "PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"); - HdfsUtil.put(score1, targetScore1); + String targetIndex = HdfsUtil.path("scores-hdfs", "scores.txt.gz.tbi"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.tbi", targetIndex); - String targetFormat1 = HdfsUtil.path("scores-hdfs", - "PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format"); - HdfsUtil.put(format1, targetFormat1); + String targetInfo = HdfsUtil.path("scores-hdfs", "scores.txt.gz.info"); + HdfsUtil.put("test-data/data/pgs/test-scores.chr20.txt.gz.info", targetInfo); // create workflow context and set scores WorkflowTestContext context = buildContext(inputFolder, "hapmap2"); context.setOutput("outputScores", "cloudgene2-hdfs"); Map pgsPanel = new HashMap(); - List scores = new Vector(); - scores.add("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt"); - pgsPanel.put("location", "scores-hdfs"); - pgsPanel.put("scores", scores); + pgsPanel.put("scores", targetScores); + pgsPanel.put("meta", "test-data/data/pgs/test-scores.chr20.json"); pgsPanel.put("build", "hg19"); context.setData("pgsPanel", pgsPanel); + context.setInput("pgsCategory","Body measurement"); //only PGS000027 // run qc to create chunkfile @@ -678,6 +656,7 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc result = run(context, qcStats); assertTrue(result); + assertTrue(context.hasInMemory("Remaining sites in total: 7,735")); // add panel to hdfs importRefPanel(FileUtil.path(configFolder, "ref-panels")); @@ -694,31 +673,14 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc result = run(context, export); assertTrue(result); - ZipFile zipFile = new ZipFile("test-data/tmp/local/chr_20.zip", PASSWORD.toCharArray()); - zipFile.extractAll("test-data/tmp"); - - VcfFile file = VcfFileUtil.load("test-data/tmp/chr20.dose.vcf.gz", 100000000, false); - - assertEquals("20", file.getChromosome()); - assertEquals(51, file.getNoSamples()); - assertEquals(true, file.isPhased()); - assertEquals(TOTAL_REFPANEL_CHR20_B37, file.getNoSnps()); - - int snpInInfo = getLineCount("test-data/tmp/chr20.info.gz"); - assertEquals(snpInInfo, file.getNoSnps()); - - String[] args = { "test-data/tmp/chr20.dose.vcf.gz", "--ref", score1, "--out", "test-data/tmp/expected.txt" }; - int resultScore = new CommandLine(new ApplyScoreCommand()).execute(args); - assertEquals(0, resultScore); - - zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip", PASSWORD.toCharArray()); + ZipFile zipFile = new ZipFile("test-data/tmp/pgs_output/scores.zip"); zipFile.extractAll("test-data/tmp"); - CsvTableReader readerExpected = new CsvTableReader("test-data/tmp/expected.txt", ','); + CsvTableReader readerExpected = new CsvTableReader("test-data/data/pgs/expected.txt", ','); CsvTableReader readerActual = new CsvTableReader("test-data/tmp/scores.txt", ','); + assertEquals(2, readerActual.getColumns().length); //only sample and PGS000027 while (readerExpected.next() && readerActual.next()) { - assertEquals(readerExpected.getDouble("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS"), - readerActual.getDouble("PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS"), 0.00001); + assertEquals(readerExpected.getDouble("PGS000027"), readerActual.getDouble("PGS000027"), 0.00001); } readerExpected.close(); readerActual.close(); @@ -727,6 +689,7 @@ public void testPipelineWithEagleAndScoresAndFormat() throws IOException, ZipExc new File("test-data/tmp/local/scores.html").exists(); FileUtil.deleteDirectory("test-data/tmp"); + zipFile.close(); } @@ -773,7 +736,7 @@ public void testPipelineWithEaglePhasingOnlyWithPhasedData() throws IOException, assertEquals(true, file.isPhased()); assertEquals(TOTAL_SNPS_INPUT - SNPS_MONOMORPHIC, file.getNoSnps()); - // FileUtil.deleteDirectory("test-data/tmp"); + FileUtil.deleteDirectory("test-data/tmp"); } diff --git a/test-data/data/pgs/expected.txt b/test-data/data/pgs/expected.txt new file mode 100644 index 00000000..588c2451 --- /dev/null +++ b/test-data/data/pgs/expected.txt @@ -0,0 +1,52 @@ +"sample","PGS000018","PGS000027" +"FB00000","0.05313937446647911","-0.023816253458824904" +"FB00001","0.045917222207399355","-0.019797123689728618" +"FB00002","0.021708847564541695","-0.04673578016981638" +"FB00005","0.09648429952959627","-0.005867172391953264" +"FB00006","0.0026945615426237635","-0.028585821197501655" +"FB00012","0.11515414724643888","-0.016262550736242526" +"FB00015","-0.02863926274282929","-0.0058560594715711765" +"FB00016","0.09708058770183478","-0.006615503674890271" +"FB00017","0.0333183545495552","-0.09133852506614625" +"FB00021","0.02125234476039485","-0.03385883398967005" +"FB00022","0.08063914063225523","-0.023863179573550077" +"FB00023","0.08486816825786414","-0.019237921064168134" +"FB00025","0.007633970504953569","-0.00821490769121821" +"FB00027","0.06589368184603889","6.881747037498858E-4" +"FB00031","0.08159354733483709","-0.05865219358524136" +"FB00032","0.059559124423313986","-0.011932065906053195" +"FB00034","0.07329317227154561","-0.0314670162155752" +"FB00035","0.0331774804672276","-0.03781256556917445" +"FB00037","0.0016465439310346247","-0.019176108365452083" +"FB00038","0.0017311541871692232","-0.015071171783794392" +"FB00039","0.09066393193180469","0.015333069126141343" +"FB00040","0.015603500129263245","-0.018604292801794806" +"FB00041","0.0309823224616549","-0.016796346366294696" +"FB00044","0.05134259495424047","0.004675911477700524" +"FB00049","0.06968220638930554","-0.02991454296882224" +"FB00051","0.052175511361486654","-0.05707273500672397" +"FB00052","0.029502600308498023","-0.019905815607306344" +"FB00053","0.06161116407293038","-0.031026560813588313" +"FB00055","0.054493671491316106","0.00431830324515882" +"FB00056","0.02320194194946318","0.00614156436749441" +"FB00058","0.07582055523278575","-0.006918139126677136" +"FB00059","0.08255746970293422","-0.009353011329235092" +"FB00062","0.057664896380924285","-0.014677392669547516" +"FB00064","0.09473688564666033","-0.008888761239558821" +"FB00066","0.037741025828820024","-0.045262319349685054" +"FB00068","0.02146662718024195","-0.02550422945740293" +"FB00069","0.05478741439964831","-0.0026899084361405347" +"FB00071","0.062334100154450026","-0.013246084749458435" +"FB00072","0.06290342067593024","-0.016947750575916215" +"FB00074","0.05314067711554828","-0.007028060519271768" +"FB00075","-0.017286646355253094","0.009813629909409457" +"FB00077","0.004136677400486163","0.0038938051485948692" +"FB00078","0.08205304671458324","-0.012855452407862854" +"FB00082","0.026206583635414803","-0.0016588620378834643" +"FB00086","-0.026879636463039175","-0.002094128807407887" +"FB00089","0.04857095566430543","-0.03379098615229034" +"FB00090","-0.026723652883937743","-0.02661307087901709" +"FB00091","0.06103986214408777","-0.028456093825954442" +"FB00093","-0.0017740509916735758","-0.010945215908210024" +"FB00094","0.032863183164308585","-0.026344531584280163" +"FB00095","0.05236004026455726","-0.014738808207713519" diff --git a/test-data/data/pgs/test-scores.chr20.json b/test-data/data/pgs/test-scores.chr20.json new file mode 100644 index 00000000..dd843f15 --- /dev/null +++ b/test-data/data/pgs/test-scores.chr20.json @@ -0,0 +1,129 @@ +{ + "PGS000018": { + "id": "PGS000018", + "trait": "Coronary artery disease", + "efo": [ + { + "id": "EFO_0001645", + "label": "coronary artery disease", + "description": "Narrowing of the coronary arteries due to fatty deposits inside the arterial walls. The diagnostic criteria may include documented history of any of the following: documented coronary artery stenosis greater than or equal to 50% (by cardiac catheterization or other modality of direct imaging of the coronary arteries); previous coronary artery bypass surgery (CABG); previous percutaneous coronary intervention (PCI); previous myocardial infarction. (ACC) [NCIT: C26732]", + "url": "http://www.ebi.ac.uk/efo/EFO_0001645" + } + ], + "populations": { + "items": { + "MAE": { + "name": "MAE", + "count": -1, + "percentage": 0.509, + "color": "eeeeee", + "label": "Multi-Ancestry (including Europeans)" + }, + "EUR": { + "name": "EUR", + "count": -1, + "percentage": 0.37, + "color": "#0099E6", + "label": "European" + }, + "SAS": { + "name": "SAS", + "count": -1, + "percentage": 0.067, + "color": "#F90026", + "label": "South Asian" + }, + "AMR": { + "name": "AMR", + "count": -1, + "percentage": 0.011, + "color": "#800080", + "label": "Hispanic or Latin American" + }, + "EAS": { + "name": "EAS", + "count": -1, + "percentage": 0.03, + "color": "#FF99E6", + "label": "East Asian" + }, + "AFR": { + "name": "AFR", + "count": -1, + "percentage": 0.008, + "color": "#FF6600", + "label": "African" + }, + "GME": { + "name": "GME", + "count": -1, + "percentage": 0.006, + "color": "#DBEE06", + "label": "Greater Middle Eastern" + } + }, + "total": 382026 + }, + "publication": { + "date": "2018-10-01", + "journal": "J Am Coll Cardiol", + "firstauthor": "Inouye M", + "doi": "10.1016/j.jacc.2018.07.079" + }, + "categories": ["Cardiovascular disease"], + "variants": 1745179, + "repository": "PGS-Catalog", + "link": "https://www.pgscatalog.org/score/PGS000018", + "samples": 382026 + }, + + "PGS000027": { + "id": "PGS000027", + "trait": "Body Mass Index", + "efo": [ + { + "id": "EFO_0004340", + "label": "body mass index", + "description": "An indicator of body density as determined by the relationship of BODY WEIGHT to BODY HEIGHT. BMI\u003dweight (kg)/height squared (m2). BMI correlates with body fat (ADIPOSE TISSUE). Their relationship varies with age and gender. For adults, BMI falls into these categories: below 18.5 (underweight); 18.5-24.9 (normal); 25.0-29.9 (overweight); 30.0 and above (obese). (National Center for Health Statistics, Centers for Disease Control and Prevention)", + "url": "http://www.ebi.ac.uk/efo/EFO_0004340" + } + ], + "populations": { + "items": { + "EUR": { + "name": "EUR", + "count": -1, + "percentage": 0.991, + "color": "#0099E6", + "label": "European" + }, + "AMR": { + "name": "AMR", + "count": -1, + "percentage": 0.005, + "color": "#800080", + "label": "Hispanic or Latin American" + }, + "AFR": { + "name": "AFR", + "count": -1, + "percentage": 0.004, + "color": "#FF6600", + "label": "African" + } + }, + "total": 238944 + }, + "publication": { + "date": "2019-04-01", + "journal": "Cell", + "firstauthor": "Khera AV", + "doi": "10.1016/j.cell.2019.03.028" + }, + "categories": ["Body measurement"], + "variants": 2100302, + "repository": "PGS-Catalog", + "link": "https://www.pgscatalog.org/score/PGS000027", + "samples": 238944 + } +} diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz b/test-data/data/pgs/test-scores.chr20.txt.gz new file mode 100644 index 00000000..b59f6546 Binary files /dev/null and b/test-data/data/pgs/test-scores.chr20.txt.gz differ diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz.info b/test-data/data/pgs/test-scores.chr20.txt.gz.info new file mode 100644 index 00000000..fef2aaf4 --- /dev/null +++ b/test-data/data/pgs/test-scores.chr20.txt.gz.info @@ -0,0 +1,7 @@ +# PGS-Collection v1 +# Date=Fri Dec 08 08:38:43 CET 2023 +# Scores=2 +# Updated by pgs-calc 1.6.0 +score variants ignored +PGS000018 1745179 0 +PGS000027 2100302 0 diff --git a/test-data/data/pgs/test-scores.chr20.txt.gz.tbi b/test-data/data/pgs/test-scores.chr20.txt.gz.tbi new file mode 100644 index 00000000..a18baec6 Binary files /dev/null and b/test-data/data/pgs/test-scores.chr20.txt.gz.tbi differ diff --git a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt b/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt deleted file mode 100644 index 1a6ec0de..00000000 --- a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt +++ /dev/null @@ -1,100 +0,0 @@ -## PRSweb reference PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608 -## PRSweb LD reference MGI -## PRSweb date 20200608 -## GWAS source 30510241 -## GWAS reference PUBMED -## GWAS phenotype Colorectal cancer -## GWAS id CRC_Huyghe -## GWAS URL https://www.nature.com/articles/s41588-018-0286-6#Sec35 -## PRS method LD Clumping (MAF >= 1%, r^2 <= 0.1) & P-value thresholding (see tuning parameter) -## PRS tuning parameter 7.8e-06 -## PRS evaluation in UKB -## Genome build GRCh37/hg19 -CHROM POS REF ALT EA OA PVALUE WEIGHT -1 38455891 G C G C 3.8e-09 0.0523 -1 55246035 T C C T 3.3e-11 0.0665 -1 183002639 A G A G 2.4e-16 0.073 -1 222112634 A G G A 6.1e-16 0.0877 -2 159964552 T C C T 4.4e-08 0.0511 -2 199612407 T C C T 5e-09 0.0535 -2 199781586 T C T C 3.7e-11 0.0627 -2 219191256 T C T C 1.5e-11 0.0613 -3 40915239 A G G A 1.2e-16 0.0994 -3 66365163 G A A G 7.1e-08 0.0597 -3 112999560 G A G A 1.4e-08 0.1761 -3 133701119 G A A G 3.8e-09 0.0597 -3 169517436 C T C T 7.8e-06 0.0453 -4 94938618 C A A C 1.2e-08 0.052 -4 106128760 G A A G 1.6e-08 0.0522 -4 145659064 T C C T 2.9e-08 0.0842 -5 1240204 C T T C 5.1e-09 0.1119 -5 1296486 A G G A 1.4e-22 0.0865 -5 40102443 G A A G 4.2e-09 0.0545 -5 40280076 G A A G 9.3e-25 0.1013 -5 134467220 C T C T 4.8e-15 0.0693 -6 31449620 C T C T 1.8e-10 0.1118 -6 32593080 A G G A 4.9e-14 0.0889 -6 35569562 A G A G 3.6e-08 0.0778 -6 36623379 G A A G 8.6e-08 0.054 -6 55712124 C T C T 1.1e-11 0.0724 -7 45136423 T C T C 4.7e-08 0.065 -8 117630683 A C C A 7.3e-28 0.2099 -8 128413305 G T G T 1.1e-15 0.1052 -8 128571855 G T G T 1.8e-09 0.0608 -9 22103183 G T G T 1.4e-08 0.0504 -9 101679752 T G T G 3.1e-08 0.0818 -9 113671403 T C C T 2.8e-09 0.0637 -10 8739580 T A T A 1.3e-25 0.1064 -10 52648454 C T C T 5e-10 0.073 -10 80819132 A G G A 1.8e-17 0.0765 -10 101351704 A G G A 1e-17 0.0889 -10 114288619 T C C T 1.3e-11 0.0975 -10 114722621 G A A G 7e-07 0.0527 -11 61549025 G A G A 1.2e-11 0.0636 -11 74280012 T G G T 8.9e-19 0.078 -11 74427921 C T C T 3.7e-16 0.1934 -11 101656397 T A T A 1.1e-09 0.0537 -11 111156836 T C T C 1.9e-31 0.1122 -12 4368607 T C C T 3.6e-14 0.089 -12 4388271 C T T C 1.6e-15 0.1181 -12 4400808 C T T C 2.4e-09 0.055 -12 6421174 A T T A 4.1e-09 0.0597 -12 43134191 A G G A 1.3e-09 0.053 -12 51171090 A G G A 1.9e-23 0.0896 -12 57533690 C A A C 9.4e-09 0.053 -12 111973358 A G G A 2.6e-16 0.0737 -12 115890922 T C C T 8.1e-14 0.066 -13 34092164 C T C T 3.4e-07 0.0468 -13 37462010 A G G A 6.3e-13 0.0758 -13 73791554 T C C T 2.6e-08 0.0982 -13 111075881 C T T C 1.8e-09 0.0549 -14 54419106 A C C A 2.1e-23 0.0912 -14 54445157 G A G A 3.1e-07 0.0465 -14 59189361 G A G A 9.9e-07 0.0691 -15 32992836 G A G A 1.1e-06 0.0464 -15 33010736 G A A G 2.3e-29 0.1248 -15 33156386 G A A G 1.5e-10 0.0705 -15 67402824 T C C T 2.4e-13 0.0689 -16 68743939 A C A C 3.1e-08 0.055 -16 80043258 C A C A 2.1e-08 0.0498 -16 86339315 T C T C 2.8e-08 0.0487 -16 86703949 C T T C 6.6e-06 0.0481 -17 809643 G A G A 6.8e-08 0.0514 -17 10707241 G A A G 6.6e-12 0.0748 -17 70413253 G A A G 5.6e-09 0.0595 -18 46453156 A T A T 3.8e-74 0.1606 -19 16417198 C T T C 4.2e-10 0.0868 -19 33519927 T G T G 3.7e-23 0.1939 -19 41871573 G A A G 9.5e-07 0.0441 -19 59079096 C T T C 4.2e-08 0.0632 -20 6376457 G C G C 1.1e-16 0.0795 -20 6603622 C T C T 6.9e-12 0.0627 -20 6699595 T G G T 2.3e-18 0.0819 -20 6762221 C T T C 3.3e-14 0.0714 -20 7740976 A G G A 3.4e-13 0.0874 -20 33213196 A C C A 3e-07 0.045 -20 42666475 C T T C 6.8e-09 0.0597 -20 47340117 A G A G 5.9e-15 0.0719 -20 49055318 C T C T 3.3e-09 0.0547 -20 60932414 T C C T 1.1e-26 0.1146 -20 62308612 T G T G 5.3e-08 0.0593 diff --git a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format b/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format deleted file mode 100644 index bef65f66..00000000 --- a/test-data/data/prsweb/PRSWEB_PHECODE153_CRC-Huyghe_PT_UKB_20200608_WEIGHTS.txt.format +++ /dev/null @@ -1,7 +0,0 @@ -{ - "chromosome": "CHROM", - "position": "POS", - "effect_weight": "WEIGHT", - "otherAllele": "OA", - "effectAllele": "EA" -}