Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threading groups of inputs #29

Open
drtconway opened this issue Jun 4, 2020 · 0 comments
Open

Threading groups of inputs #29

drtconway opened this issue Jun 4, 2020 · 0 comments

Comments

@drtconway
Copy link

Hi Janis,

There's a reasonably common bit of boilerplate that comes up when composing tools - declaring inputs for all the reference-like things, that are the same as those in one or more of the tools invoked, then threading them in.

For example:

...
        self.input(
            "snps_dbsnp",
            VcfTabix,
            doc=InputDocumentation(
                "From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``",
                quality=InputQualityType.static,
                example="HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/\n\n"
                "(WARNING: The file available from the genomics-public-data resource on Google Cloud Storage is NOT compressed and indexed. This will need to be completed prior to starting the pipeline.\n\n"
                "File: gs://genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.gz",
            ),
        )
        self.input(
            "snps_1000gp",
            VcfTabix,
            doc=InputDocumentation(
                "From the GATK resource bundle, passed to BaseRecalibrator as ``known_sites``",
                quality=InputQualityType.static,
                example="HG38: https://console.cloud.google.com/storage/browser/genomics-public-data/references/hg38/v0/\n\n"
                "File: gs://genomics-public-data/references/hg38/v0/1000G_phase1.snps.high_confidence.hg38.vcf.gz",
            ),
        )

...

        self.step(                                                                                                                                       
            "vc_gatk",                                                                                                                                   
            GatkSomaticVariantCaller_4_1_3(
                normal_bam=self.normal_bam,
                tumor_bam=self.tumor_bam,
                normal_name=self.normal_name,
                tumor_name=self.tumor_name,
                intervals=self.gatk_intervals,
                reference=self.reference,
                snps_dbsnp=self.snps_dbsnp,
                snps_1000gp=self.snps_1000gp,
                known_indels=self.known_indels,
                mills_indels=self.mills_indels,
            ),
            scatter="intervals", 
        )                                                                                                                                                

This leads to duplication, and room for error.

One possibility would be to use a static method to add groups of inputs. So for example you might have:

class  GatkSomaticVariantCaller_4_1_3(....):
    ...
    @staticmethod
    def reference_inputs(thing):
        thing.input("snps_dbsnp", ...)
        thing.input("snps_1000gp", ...)
        ...etc...

    def constructor(....):
        reference_inputs(self)

This doesn't help much with the input passing. Half an idea about how to reduce that is to use Python's keyword argument magic. It seems somehow like you should be able to do something like:

    self.step("vc_gatk", GatkSomaticVariantCaller_4_1_3(..., **refs)

I don't quite have it figured out, but perhaps a static method on GatkSomaticVariantCaller could return the dictionary.

To quote Terry Pratchet, speaking of Ly Tin Wheedle "at that point the bar closed."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant