Skip to content

Commit

Permalink
Support reference panels in msav format and decay parameter (#128)
Browse files Browse the repository at this point in the history
* Add new minimac4 binary and adapt execution command

* Add msav reference panels

* Adapt test data to include GT tag

* Fix input data

* Fix input data for hg38 test cases

* Ignore obsolete --myChromosome option

* Update run-tests.yml

* Update run-tests.yml

* Update to new parameter --sites for writing info files

* Add new minR2 parameter to minimac4 and remove R2 pipeline filter

* Add prefix mis_ to all MIS tags in VCF file

* Fix VCF header test case with new prefix

* Update to latest minimac4 binary

* Add min-r2 parameter to command only if minR2 > 0

* Update to pgs-calc version v0.9.10

* Prepare release v1.7.0-rc1

* Update to latest minimac4 (commit 9c653eb)

* Prepare release 1.7.0-rc2

* Update minimac4 to 8498d

* Add HTS tag to m4 format option

* Update minimac4 and rebuild msav files

* Update DefaultPreferenceStore.java

* Fix line number in new test case (from master) due to new M4

* Update to minimac v4.1.0

* Update to minimac v4.1.4

* Add minimac4 decay option

* Prepare release v1.8.0-beta1

* Update imputation engine log to v4.1.4

* Prepare release v1.8.0-beta2

* Update to minimac v4.1.5

* Prepare release v1.8.0-beta3

* Prepare release v1.8.0-beta4 (#126)

Provenance: from installer script
https://github.com/statgen/Minimac4/releases/tag/v4.1.6

---------

Co-authored-by: Andy Boughton <[email protected]>
  • Loading branch information
seppinho and abought authored Dec 6, 2023
1 parent 09870a0 commit f9a0270
Show file tree
Hide file tree
Showing 41 changed files with 117 additions and 115 deletions.
Binary file removed files/bin/Minimac4
Binary file not shown.
Binary file added files/bin/minimac4
Binary file not shown.
2 changes: 1 addition & 1 deletion files/imputationserver-beagle.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
id: imputationserver-beagle
name: Genotype Imputation supporting Beagle (Minimac4)
description: This is the new Michigan Imputation Server Pipeline using <a href="https://github.com/statgen/Minimac4">Minimac4</a>. Documentation can be found <a href="http://imputationserver.readthedocs.io/en/latest/">here</a>.<br><br>If your input data is <b>GRCh37/hg19</b> please ensure chromosomes are encoded without prefix (e.g. <b>20</b>).<br>If your input data is <b>GRCh38hg38</b> please ensure chromosomes are encoded with prefix 'chr' (e.g. <b>chr20</b>).
version: 1.7.5
version: 1.8.0-beta4
website: https://imputationserver.readthedocs.io
category:

Expand Down
2 changes: 1 addition & 1 deletion files/imputationserver-hla.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
id: imputationserver-hla
name: Genotype Imputation HLA (Minimac4)
description: This is the new Michigan Imputation Server Pipeline using <a href="https://github.com/statgen/Minimac4">Minimac4</a>. Documentation can be found <a href="http://imputationserver.readthedocs.io/en/latest/">here</a>.<br><br>If your input data is <b>GRCh37/hg19</b> please ensure chromosomes are encoded without prefix (e.g. <b>20</b>).<br>If your input data is <b>GRCh38hg38</b> please ensure chromosomes are encoded with prefix 'chr' (e.g. <b>chr20</b>).
version: 1.7.5
version: 1.8.0-beta4
website: https://imputationserver.readthedocs.io
category:

Expand Down
2 changes: 1 addition & 1 deletion files/imputationserver-pgs.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
id: imputationserver-pgs
name: Genotype Imputation (PGS Calc Integration)
description: This is the new Michigan Imputation Server Pipeline using <a href="https://github.com/statgen/Minimac4">Minimac4</a>. Documentation can be found <a href="http://imputationserver.readthedocs.io/en/latest/">here</a>.<br><br>If your input data is <b>GRCh37/hg19</b> please ensure chromosomes are encoded without prefix (e.g. <b>20</b>).<br>If your input data is <b>GRCh38hg38</b> please ensure chromosomes are encoded with prefix 'chr' (e.g. <b>chr20</b>).
version: 1.7.5
version: 1.8.0-beta4
website: https://imputationserver.readthedocs.io
category:

Expand Down
4 changes: 2 additions & 2 deletions files/minimac4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
id: imputationserver
name: Genotype Imputation (Minimac4)
description: This is the new Michigan Imputation Server Pipeline using <a href="https://github.com/statgen/Minimac4">Minimac4</a>. Documentation can be found <a href="http://imputationserver.readthedocs.io/en/latest/">here</a>.<br><br>If your input data is <b>GRCh37/hg19</b> please ensure chromosomes are encoded without prefix (e.g. <b>20</b>).<br>If your input data is <b>GRCh38hg38</b> please ensure chromosomes are encoded with prefix 'chr' (e.g. <b>chr20</b>).
version: 1.7.5
description: This is the new Michigan Imputation Server Pipeline using <a href="https://github.com/statgen/Minimac4">Minimac4</a>. Documentation can be found <a href="http://imputationserver.readthedocs.io/en/latest/">here</a>.<br><br>If your input data is <b>GRCh37/hg19</b> please ensure chromosomes are encoded without prefix (e.g. <b>20</b>).<br>If your input data is <b>GRCh38hg38</b> please ensure chromosomes are encoded with prefix 'chr' (e.g. <b>chr20</b>).
version: 1.8.0-beta4
website: https://imputationserver.readthedocs.io
category:

Expand Down
5 changes: 1 addition & 4 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@

<groupId>genepi</groupId>
<artifactId>imputationserver</artifactId>

<version>1.7.5</version>

<version>1.8.0-beta4</version>
<packaging>jar</packaging>

<name>University of Michigan Imputation Server</name>
<url>http://maven.apache.org</url>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,6 @@ protected void setup(Context context) throws IOException, InterruptedException {
String referenceName = parameters.get(ImputationJob.REF_PANEL);
imputationParameters.setPhasing(phasingEngine);
imputationParameters.setReferencePanelName(referenceName);
imputationParameters.setMinR2(minR2);
imputationParameters.setPhasingRequired(phasingRequired);

// get cached files
Expand Down Expand Up @@ -153,11 +152,11 @@ protected void setup(Context context) throws IOException, InterruptedException {
mapBeagleFilename = cache.getFile(mapBeagle);
}

String minimacCommand = cache.getFile("Minimac4");
String minimacCommand = cache.getFile("minimac4");
String eagleCommand = cache.getFile("eagle");
String beagleCommand = cache.getFile("beagle.jar");
String tabixCommand = cache.getFile("tabix");

// create temp directory
DefaultPreferenceStore store = new DefaultPreferenceStore(context.getConfiguration());
folder = store.getString("minimac.tmp");
Expand All @@ -182,9 +181,9 @@ protected void setup(Context context) throws IOException, InterruptedException {
String formatFile = cache.getFile(name + ".format");
if (formatFile != null) {
// create symbolic link to format file. they have to be in the same folder
Files.createSymbolicLink(Paths.get(FileUtil.path(folder,name)), Paths.get(localFilename));
Files.createSymbolicLink(Paths.get(FileUtil.path(folder,name+".format")), Paths.get(formatFile));
scores[i] = FileUtil.path(folder,name);
Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name)), Paths.get(localFilename));
Files.createSymbolicLink(Paths.get(FileUtil.path(folder, name + ".format")), Paths.get(formatFile));
scores[i] = FileUtil.path(folder, name);
}
}
System.out.println("Loaded " + scores.length + " score files from distributed cache");
Expand Down Expand Up @@ -212,6 +211,7 @@ protected void setup(Context context) throws IOException, InterruptedException {
int phasingWindow = Integer.parseInt(store.getString("phasing.window"));

int window = Integer.parseInt(store.getString("minimac.window"));
int decay = Integer.parseInt(store.getString("minimac.decay"));

String minimacParams = store.getString("minimac.command");
String eagleParams = store.getString("eagle.command");
Expand All @@ -226,6 +226,8 @@ protected void setup(Context context) throws IOException, InterruptedException {
pipeline.setPhasingWindow(phasingWindow);
pipeline.setBuild(build);
pipeline.setMinimacWindow(window);
pipeline.setMinR2(minR2);
pipeline.setDecay(decay);

}

Expand Down Expand Up @@ -289,16 +291,8 @@ public void map(LongWritable key, Text value, Context context) throws IOExceptio
statistics.setImportTime((end - start) / 1000);

} else {
if (imputationParameters.getMinR2() > 0) {
// filter by r2
String filteredInfoFilename = outputChunk.getInfoFilename() + "_filtered";
filterInfoFileByR2(outputChunk.getInfoFilename(), filteredInfoFilename,
imputationParameters.getMinR2());
HdfsUtil.put(filteredInfoFilename, HdfsUtil.path(output, chunk + ".info"));

} else {
HdfsUtil.put(outputChunk.getInfoFilename(), HdfsUtil.path(output, chunk + ".info"));
}

HdfsUtil.put(outputChunk.getInfoFilename(), HdfsUtil.path(output, chunk + ".info"));

long start = System.currentTimeMillis();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,10 @@

public class ImputationPipeline {

public static final String PIPELINE_VERSION = "michigan-imputationserver-1.7.5";

public static final String IMPUTATION_VERSION = "minimac4-1.0.2";
public static final String PIPELINE_VERSION = "michigan-imputationserver-1.8.0-beta4";

public static final String IMPUTATION_VERSION = "minimac-v4.1.6";

public static final String BEAGLE_VERSION = "beagle.18May20.d20.jar";

Expand All @@ -48,8 +49,12 @@ public class ImputationPipeline {

private int minimacWindow;

private int minimacDecay;

private int phasingWindow;

private double minR2;

private String refFilename;

private String mapMinimac;
Expand Down Expand Up @@ -288,6 +293,16 @@ public boolean phaseWithBeagle(VcfChunk input, VcfChunkOutput output, String ref
public boolean imputeVCF(VcfChunkOutput output)
throws InterruptedException, IOException, CompilationFailedException {

// create tabix index
Command tabix = new Command(tabixCommand);
tabix.setSilent(false);
tabix.setParams(output.getPhasedVcfFilename());
System.out.println("Command: " + tabix.getExecutedCommand());
if (tabix.execute() != 0) {
System.out.println("Error during index creation: " + tabix.getStdOut());
return false;
}

String chr = "";
if (build.equals("hg38")) {
chr = "chr" + output.getChromosome();
Expand All @@ -306,6 +321,8 @@ public boolean imputeVCF(VcfChunkOutput output)
binding.put("chr", chr);
binding.put("unphased", false);
binding.put("mapMinimac", mapMinimac);
binding.put("minR2", minR2);
binding.put("decay", minimacDecay);

String[] params = createParams(minimacParams, binding);

Expand Down Expand Up @@ -345,11 +362,11 @@ private boolean runPgsCalc(VcfChunkOutput output) {
task.setVcfFilename(output.getImputedVcfFilename());
task.setChunk(scoreChunk);
task.setRiskScoreFilenames(scores);
//TODO: enable fix-strand-flips
//task.setFixStrandFlips(true);
//task.setRemoveAmbiguous(true);

// TODO: enable fix-strand-flips
// task.setFixStrandFlips(true);
// task.setRemoveAmbiguous(true);

for (String file : scores) {
String autoFormat = file + ".format";
if (new File(autoFormat).exists()) {
Expand Down Expand Up @@ -474,4 +491,13 @@ public void setMapBeagleFilename(String mapBeagleFilename) {
this.mapBeagleFilename = mapBeagleFilename;
}

public void setMinR2(double minR2) {
this.minR2 = minR2;
}

public void setDecay(int decay) {
this.minimacDecay = decay;

}

}
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,12 @@ public static Properties defaults() {
defaults.setProperty("chunksize", "20000000");
defaults.setProperty("phasing.window", "5000000");
defaults.setProperty("minimac.window", "500000");
defaults.setProperty("minimac.decay", "0");
defaults.setProperty("minimac.sendmail", "no");
defaults.setProperty("server.url", "https://imputationserver.sph.umich.edu");
defaults.setProperty("minimac.tmp", "/tmp");
defaults.setProperty("minimac.command",
"--refHaps ${ref} --haps ${vcf} --start ${start} --end ${end} --window ${window} --prefix ${prefix} --chr ${chr} --cpus 1 --noPhoneHome --format GT,DS,GP --allTypedSites --meta --minRatio 0.00001 ${chr =='MT' ? '--myChromosome ' + chr : ''} ${unphased ? '--unphasedOutput' : ''} ${mapMinimac != null ? '--referenceEstimates --map ' + mapMinimac : ''}");
"--region ${chr}:${start}-${end} --overlap ${window} --output ${prefix}.dose.vcf.gz --output-format vcf.gz --format GT,DS,GP,HDS --min-ratio 0.00001 --decay ${decay} --all-typed-sites --sites ${prefix}.info --empirical-output ${prefix}.empiricalDose.vcf.gz ${minR2 != 0 ? '--min-r2 ' + minR2 : ''} ${mapMinimac != null ? '--map ' + mapMinimac : ''} ${ref} ${vcf}");
defaults.setProperty("eagle.command",
"--vcfRef ${ref} --vcfTarget ${vcf} --geneticMapFile ${map} --outPrefix ${prefix} --bpStart ${start} --bpEnd ${end} --allowRefAltSwap --vcfOutFormat z --keepMissingPloidyX");
defaults.setProperty("beagle.command",
Expand Down
51 changes: 20 additions & 31 deletions src/main/java/genepi/imputationserver/util/FileMerger.java
Original file line number Diff line number Diff line change
Expand Up @@ -24,35 +24,18 @@ public static void splitIntoHeaderAndData(String input, OutputStream outHeader,

while (reader.next()) {
String line = reader.get();

if (!line.startsWith("#")) {
if (parameters.getMinR2() > 0) {
// rsq set. parse line and check rsq
String info = parseInfo(line);
if (info != null) {
boolean keep = keepVcfLineByInfo(info, R2_FLAG, parameters.getMinR2());
if (keep) {
outData.write(line.getBytes());
outData.write("\n".getBytes());
}
} else {
// no valid vcf line. keep line
outData.write(line.getBytes());
outData.write("\n".getBytes());
}
} else {
// no rsq set. keep all lines without parsing
outData.write(line.getBytes());
outData.write("\n".getBytes());
}
outData.write(line.getBytes());
outData.write("\n".getBytes());
} else {

// write filter command before ID List starting with #CHROM
if (line.startsWith("#CHROM")) {
outHeader.write(("##pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes());
outHeader.write(("##imputation=" + ImputationPipeline.IMPUTATION_VERSION + "\n").getBytes());
outHeader.write(("##phasing=" + parameters.getPhasingMethod() + "\n").getBytes());
outHeader.write(("##panel=" + parameters.getReferencePanelName() + "\n").getBytes());
outHeader.write(("##r2Filter=" + parameters.getMinR2() + "\n").getBytes());
outHeader.write(("##mis_pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes());
outHeader.write(("##mis_imputation=" + ImputationPipeline.IMPUTATION_VERSION + "\n").getBytes());
outHeader.write(("##mis_phasing=" + parameters.getPhasingMethod() + "\n").getBytes());
outHeader.write(("##mis_panel=" + parameters.getReferencePanelName() + "\n").getBytes());
}

// write all headers except minimac4 command
Expand Down Expand Up @@ -85,9 +68,9 @@ public static void splitPhasedIntoHeaderAndData(String input, OutputStream outHe

// write filter command before ID List starting with #CHROM
if (line.startsWith("#CHROM")) {
outHeader.write(("##pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes());
outHeader.write(("##phasing=" + parameters.getPhasingMethod() + "\n").getBytes());
outHeader.write(("##panel=" + parameters.getReferencePanelName() + "\n").getBytes());
outHeader.write(("##mis_pipeline=" + ImputationPipeline.PIPELINE_VERSION + "\n").getBytes());
outHeader.write(("##mis_phasing=" + parameters.getPhasingMethod() + "\n").getBytes());
outHeader.write(("##mis_panel=" + parameters.getReferencePanelName() + "\n").getBytes());
}

// write all headers except eagle command
Expand Down Expand Up @@ -129,24 +112,30 @@ public static void mergeAndGzInfo(List<String> hdfs, String local) throws IOExce

LineReader reader = new LineReader(in);

boolean header = true;
boolean lineBreak = false;

while (reader.next()) {

String line = reader.get();

if (header) {
if (line.startsWith("#")) {

if (firstFile) {

if (lineBreak) {
out.write('\n');
}
out.write(line.toString().getBytes());
firstFile = false;
lineBreak = true;
}
header = false;
} else {
out.write('\n');
out.write(line.toString().getBytes());
}
}

firstFile = false;

in.close();

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@ public class ImputationParameters {

private String referencePanelName;

private double minR2;

private String phasing;

private boolean phasingRequired;
Expand All @@ -20,14 +18,6 @@ public void setReferencePanelName(String referencePanelName) {
this.referencePanelName = referencePanelName;
}

public double getMinR2() {
return minR2;
}

public void setMinR2(double minR2) {
this.minR2 = minR2;
}

public String getPhasing() {
return phasing;
}
Expand Down
Loading

0 comments on commit f9a0270

Please sign in to comment.