Skip to content

vanallenlab/SigProfilerAssignment-wrapper

Repository files navigation

SigProfilerAssignment wrapper

Please read their manuscript, review their GitHub repository, official documentation, and COSMIC's mutational signatures page before using code in this repository.

This repository contains code that is a wrapper around SigProfilerAssignment, an algorithm that enabling the assignment of previously known mutational signatures to individual samples and individual somatic mutations, by the Alexandrov Lab at UCSD. As of this writing, SigProfilerAssignment has not yet been published in a peer review journal and the GitHub repository continues to be developed. So, while the requirements file specifies a release, the code in this repository may eventually break as the authors make updates.

SigProfilerAssignment does not currently work in containers :(

Installation and set up

This repository uses Python 3.11. To use code in this repository, download this software from GitHub, install Python dependencies within a virtual environment, and install a reference genome.

Download this software from GitHub

This repository can be downloaded through GitHub on the website or by using terminal. To download on the website, navigate to the top of this page, click the green Clone or download button, and select Download ZIP. This will download this repository in a compressed format. To install using GitHub on terminal, type

git clone https://github.com/vanallenlab/SigProfilerAssignment.git
cd SigProfilerAssignment

Install Python dependencies

This repository uses Python 3.11. We recommend using a virtual environment and running Python with either Anaconda or Miniconda.

To create a virtual environment and install dependencies with Anaconda or Miniconda, run the following from this repository's directory:

conda create -y -n SigProfilerAssignment python=3.11
conda activate SigProfilerAssignment
pip install -r requirements.txt

If you are using base Python, you can create a virtual environment and install dependencies by running:

virtualenv venv_SigProfilerAssignment
source activate venv_SigProfilerAssignment/bin/activate
pip install -r requirements.txt

Install reference genome

SigProfilerAssignment uses SigProfilerMatrixGenerator for matrix generation, which requires a reference genome to be installed in the virtual environment. The script install_reference_genome.py will install a reference genome for use.

This script uses SigProfilerMatrixGenerator and it will produce a warning that this step takes 40+ minutes, but it has never taken more than 5-10 minutes using either my home or Dana-Farber internet.

Required arguments:

    --reference     <string>    reference genome to install; default=GRCh37; choices=GRCh38, GRCh37, mm9, mm10, rn6

Example:

python install_reference_genome.py --reference GRCh37

Running SigProfilerAssignment

SigProfiler tools require passing a folder containing input files, rather than an individual file itself. Additionally, their expected input for Mutation Annotation Format (MAF) files does not follow either NCI or TCGA MAF specifications. Thus, to use this repository, we recommend,

  1. Placing input files within an input directory
  2. Trim the MAF files using trim_maf.py
  3. Run SigProfilerAssignment using sig_profiler_assignment.py

Trimming MAF files

trim_maf.py will trim either a single MAF file or folder containing MAF files to the specification set by SigProfilerMatrixGenerator.

Required arguments:

    --mode, -m      <string>    specify if input is a file or folder, choices: file, folder
    --input, -i     <string>    input file or folder

Optional arguments:

    --output-folder, -o     <string>    path to output folder, will be created if it does not exist
    --output-suffix, -s     <string>    string of new output suffix, if you want to strip file names

Example for file input:

python trim_maf.py \
  --mode file \
  --input example.oncotated.validated.annotated.final.maf \
  --output-folder trimmed-mafs \
  --output-suffix "maf"

Running SigProfilerAssignment

The script sig_profiler_assignment.py is a wrapper around SigProfilerAssignment's cosmic_fit function. Additionally, the script will compute the contribution (or weight) per SBS signature per sample. Input MAFs should be trimmed and formatted beforehand.

This wrapper contains three required arguments and then largely mimics SigProfilerAssignment's additional parameters, with the exception of --do-not-export-probabilities and --disable-plotting as wrappers around export_probabilities and make_plots, respectively. The default values for both of these arguments are True, so the behavior for this wrapper is to disable them if you do not want the default functionality.

Required arguments:

    --input-folder, -i          <string>    folder containing input MAF files, after processing with `trim_maf.py`
    --output-folder, -o         <string>    folder to write outputs to
    --write-results-per-sample  <boolean>   if separate output files should be created for each sample

Optional arguments, see their official documentation:

    --input-type
    --context-type
    --version
    --exome
    --genome-build
    --signature-database
    --exclude-signature-subgroups
    --do-not-export-probabilities
    --export-probabilities-per-mutation
    --disable-plotting
    --sample-reconstruction-plots-format
    --verbose

Example:

python sig_profiler_assignment.py -i trimmed-mafs -o outputs --write-results-per-sample --verbose

The flow of this script is a bit odd, it performs the following sequence,

  1. Copies input MAF files to the output directory and sets the output directory as the input directory
  2. Runs SigProfilerAssignment
  3. Remove copies of input files from output directory
  4. Calculates contributions per sample for all samples and writes to {output-folder}/SBS_contributions.txt
  5. Writes a table per sample to {output-folder}/SBS_sample_contributions/, if --write-results-per-sample is passed

The copying and removing of inputs from the output directory is because the current version of SigProfilerMatrixGenerator writes outputs to the input directory. Thus, this is performed to keep all outputs from SigProfilerAssignment in the outputs folder specified, leaving the inputs folder untouched. This definitely was not the case in prior versions of the tool, but I cannot find the changes in their release notes. Maybe I will open an Issue on their GitHub repository to try to find out if it was an intentional change or not.

Outputs

There are outputs generated from both SigProfilerMatrixGenerator and SigProfilerAssignment. Detailed descriptions of outputs can be found within the official documentation for SigProfilerMatrixGenerator and SigProfilerAssignment.

Outputs found in the {output-folder}/ are as follows,

  • Assignment_Solution/, outputs from SigProfilerAssignment
  • input/ a copy of the inputs used
  • Matrix_Generator_output/, outputs from SigProfilerMatrixGenerator
  • SBS_sample_contributions/, SBS contributions by sample, generated if --write-results-per-sample is passed
  • JOB_METADATA_SPA.TXT, log file from SigProfilerAssignment
  • SBS_contributions.txt, calculated contributions per sample for each SBS signature

About

Implementation repository for SigProfilerAssignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages