Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Add matched analyses #2

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 3 additions & 8 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,8 @@ jobs:
- name: Pull Docker image and cache
run: |
docker pull papaemmelab/purple:v0.1.1
- name: Run unit tests of each process for Amber, Cobalt, Purple
docker pull quay.io/biocontainers/hmftools-sage:3.4.4--hdfd78af_0
- name: Run unit tests of each process for Amber, Cobalt, binCobalt, Sage, Purple
run: |
nf-test test tests/main.runamber.nf.test
nf-test test tests/main.runcobalt.nf.test
nf-test test tests/main.bincobalt.nf.test
nf-test test tests/main.runpurple.nf.test
- name: Run pipeline end-to-end test
run: |
nf-test test tests/main.nf.test
nf-test test tests/main.*.nf.test

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Nextflow run files
nextflow
nf-test
work
capsule
framework
Expand All @@ -10,5 +11,7 @@ tmp

# Tests
tests/outdir/*
tests/data/ref/ensembl_data_original
outdir
plugins
slurm*.out
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,45 @@
[![nf-purple CI](https://github.com/papaemmelab/nf-purple/actions/workflows/ci.yml/badge.svg)](https://github.com/papaemmelab/nf-purple/actions/workflows/ci.yml)
[![nf-test](https://img.shields.io/badge/tested_with-nf--test-337ab7.svg)](https://github.com/askimed/nf-test)

Nextflow Pipeline to run [Purple](https://github.com/hartwigmedical/hmftools/blob/master/purple/README.md#tumor-only-mode) in *Tumor-Only* mode, uses [Amber](https://github.com/hartwigmedical/hmftools/tree/master/amber#tumor-only-mode) and [Cobalt](https://github.com/hartwigmedical/hmftools/tree/master/cobalt#tumor-only-mode) from HMFTools suite, of the Hartwig Foundation.
Nextflow Pipeline to run [Purple](https://github.com/hartwigmedical/hmftools/blob/master/purple/README.md) in *Tumor-Only* mode, uses [Amber](https://github.com/hartwigmedical/hmftools/tree/master/amber) and [Cobalt](https://github.com/hartwigmedical/hmftools/tree/master/cobalt) from HMFTools suite, of the Hartwig Foundation.

## 🚀 Run Pipeline

You need Nextflow installed.

### Tumor-Normal matched:

```bash
module load java/jdk-11.0.11

# To run matched pipeline
nextflow papaemmelab/nf-purple \
--tumor $tumor \
--tumor_bam $TUMOR_BAM \
--normal $normal \
--normal_bam $NORMAL_BAM \
--outdir $OUTDIR \
...refargs
```

- See more info: [Purple](https://github.com/hartwigmedical/hmftools/blob/master/purple/README.md#arguments), [Amber](https://github.com/hartwigmedical/hmftools/tree/master/amber#paired-normaltumor-mode), [Cobalt](https://github.com/hartwigmedical/hmftools/tree/master/cobalt#mandatory-arguments)

### Tumor only mode:

```bash
module load java/jdk-11.0.11

# To run unmatched tumor-only
nextflow papaemmelab/nf-purple \
--tumor $tumor \
--tumor_bam $TUMOR_BAM \
--outdir $OUTDIR \
...refargs
```

- See more info: [Purple](https://github.com/hartwigmedical/hmftools/blob/master/purple/README.md#tumor-only-mode), [Amber](https://github.com/hartwigmedical/hmftools/tree/master/amber#tumor-only-mode), [Cobalt](https://github.com/hartwigmedical/hmftools/tree/master/cobalt#tumor-only-mode)


## 🧬 Get Reference Data

Downloaded from [Purple Ref Data](https://console.cloud.google.com/storage/browser/hmf-public/HMFtools-Resources/dna_pipeline) for genome version 37.
Expand Down
96 changes: 96 additions & 0 deletions bin/bin_cobalt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
#!/usr/bin/env python3

import argparse
import shutil
import pandas as pd
import numpy as np

parser = argparse.ArgumentParser(
description=(
"Bin cobalt probes with similar LogR values "
"together to decrease oversegmentation."
)
)
parser.add_argument(
"--in_pcf",
type=str,
required=True,
help="Path to the input cobalt ratio .pcf file.",
)
parser.add_argument(
"--bin_probes",
type=int,
required=True,
help="Max probe bin size."
)
parser.add_argument(
"--bin_log_r",
type=float,
required=True,
help="Max probe logR difference to bin."
)
args = parser.parse_args()

cobalt_ratio_pcf = pd.read_csv(args.in_pcf, sep="\t")
cobalt_ratio_pcf_probes = pd.DataFrame(columns=cobalt_ratio_pcf.columns)

# First bin by probes
chrom_arm = None
last_idx = None
for idx, seg in cobalt_ratio_pcf.iterrows():
if chrom_arm != "_".join(seg[["chrom", "arm"]].astype(str)):
chrom_arm = "_".join(seg[["chrom", "arm"]].astype(str))
cobalt_ratio_pcf_probes = pd.concat(
[cobalt_ratio_pcf_probes, seg.to_frame().T], ignore_index=True
)
last_idx = cobalt_ratio_pcf_probes.index[-1]
continue
if (
cobalt_ratio_pcf_probes.loc[last_idx, "n.probes"] <= args.bin_probes
or seg["n.probes"] <= args.bin_probes
):
means = [
cobalt_ratio_pcf_probes.loc[last_idx, "mean"]
] * cobalt_ratio_pcf_probes.loc[last_idx, "n.probes"]
means.extend([seg["mean"]] * seg["n.probes"])
cobalt_ratio_pcf_probes.loc[last_idx, "mean"] = np.mean(means)
cobalt_ratio_pcf_probes.loc[last_idx, "n.probes"] += seg["n.probes"]
cobalt_ratio_pcf_probes.loc[last_idx, "end.pos"] = seg["end.pos"]
else:
cobalt_ratio_pcf_probes = pd.concat(
[cobalt_ratio_pcf_probes, seg.to_frame().T], ignore_index=True
)
last_idx = cobalt_ratio_pcf_probes.index[-1]

# Then bin by logR mean
cobalt_ratio_pcf_probes = cobalt_ratio_pcf_probes.reset_index().drop(columns="index")
cobalt_ratio_pcf_probes_logR = pd.DataFrame(columns=cobalt_ratio_pcf_probes.columns)
chrom_arm = None
for idx, seg in cobalt_ratio_pcf_probes.iterrows():
if chrom_arm != "_".join(seg[["chrom", "arm"]].astype(str)):
chrom_arm = "_".join(seg[["chrom", "arm"]].astype(str))
cobalt_ratio_pcf_probes_logR = pd.concat(
[cobalt_ratio_pcf_probes_logR, seg.to_frame().T], ignore_index=True
)
last_idx = cobalt_ratio_pcf_probes_logR.index[-1]
continue
if (
abs(cobalt_ratio_pcf_probes.loc[last_idx, "mean"] - seg["mean"])
<= args.bin_log_r
):
means = [
cobalt_ratio_pcf_probes_logR.loc[last_idx, "mean"]
] * cobalt_ratio_pcf_probes_logR.loc[last_idx, "n.probes"]
means.extend([seg["mean"]] * seg["n.probes"])
cobalt_ratio_pcf_probes_logR.loc[last_idx, "mean"] = np.mean(means)
cobalt_ratio_pcf_probes_logR.loc[last_idx, "n.probes"] += seg["n.probes"]
cobalt_ratio_pcf_probes_logR.loc[last_idx, "end.pos"] = seg["end.pos"]
else:
cobalt_ratio_pcf_probes_logR = pd.concat(
[cobalt_ratio_pcf_probes_logR, seg.to_frame().T], ignore_index=True
)
last_idx = cobalt_ratio_pcf_probes_logR.index[-1]

# store input with another name to replace original
shutil.move(args.in_pcf, args.in_pcf.replace(".pcf", ".original.pcf"))
cobalt_ratio_pcf_probes_logR.to_csv(args.in_pcf, sep="\t", index=False)
Loading
Loading