Please read their manuscript, review their GitHub repository, official documentation, and COSMIC's mutational signatures page before using code in this repository.
This repository contains code that is a wrapper around SigProfilerAssignment, an algorithm that enabling the assignment of previously known mutational signatures to individual samples and individual somatic mutations, by the Alexandrov Lab at UCSD. As of this writing, SigProfilerAssignment has not yet been published in a peer review journal and the GitHub repository continues to be developed. So, while the requirements file specifies a release, the code in this repository may eventually break as the authors make updates.
SigProfilerAssignment does not currently work in containers :(
This repository uses Python 3.11. To use code in this repository, download this software from GitHub, install Python dependencies within a virtual environment, and install a reference genome.
This repository can be downloaded through GitHub on the website or by using terminal. To download on the website, navigate to the top of this page, click the green Clone or download
button, and select Download ZIP
. This will download this repository in a compressed format. To install using GitHub on terminal, type
git clone https://github.com/vanallenlab/SigProfilerAssignment.git
cd SigProfilerAssignment
This repository uses Python 3.11. We recommend using a virtual environment and running Python with either Anaconda or Miniconda.
To create a virtual environment and install dependencies with Anaconda or Miniconda, run the following from this repository's directory:
conda create -y -n SigProfilerAssignment python=3.11
conda activate SigProfilerAssignment
pip install -r requirements.txt
If you are using base Python, you can create a virtual environment and install dependencies by running:
virtualenv venv_SigProfilerAssignment
source activate venv_SigProfilerAssignment/bin/activate
pip install -r requirements.txt
SigProfilerAssignment uses SigProfilerMatrixGenerator for matrix generation, which requires a reference genome to be installed in the virtual environment. The script install_reference_genome.py
will install a reference genome for use.
This script uses SigProfilerMatrixGenerator and it will produce a warning that this step takes 40+ minutes, but it has never taken more than 5-10 minutes using either my home or Dana-Farber internet.
Required arguments:
--reference <string> reference genome to install; default=GRCh37; choices=GRCh38, GRCh37, mm9, mm10, rn6
Example:
python install_reference_genome.py --reference GRCh37
SigProfiler tools require passing a folder containing input files, rather than an individual file itself. Additionally, their expected input for Mutation Annotation Format (MAF) files does not follow either NCI or TCGA MAF specifications. Thus, to use this repository, we recommend,
- Placing input files within an input directory
- Trim the MAF files using
trim_maf.py
- Run SigProfilerAssignment using
sig_profiler_assignment.py
trim_maf.py
will trim either a single MAF file or folder containing MAF files to the specification set by SigProfilerMatrixGenerator.
Required arguments:
--mode, -m <string> specify if input is a file or folder, choices: file, folder
--input, -i <string> input file or folder
Optional arguments:
--output-folder, -o <string> path to output folder, will be created if it does not exist
--output-suffix, -s <string> string of new output suffix, if you want to strip file names
Example for file input:
python trim_maf.py \
--mode file \
--input example.oncotated.validated.annotated.final.maf \
--output-folder trimmed-mafs \
--output-suffix "maf"
The script sig_profiler_assignment.py
is a wrapper around SigProfilerAssignment's cosmic_fit
function. Additionally, the script will compute the contribution (or weight) per SBS signature per sample. Input MAFs should be trimmed and formatted beforehand.
This wrapper contains three required arguments and then largely mimics SigProfilerAssignment's additional parameters, with the exception of --do-not-export-probabilities
and --disable-plotting
as wrappers around export_probabilities
and make_plots
, respectively. The default values for both of these arguments are True
, so the behavior for this wrapper is to disable them if you do not want the default functionality.
Required arguments:
--input-folder, -i <string> folder containing input MAF files, after processing with `trim_maf.py`
--output-folder, -o <string> folder to write outputs to
--write-results-per-sample <boolean> if separate output files should be created for each sample
Optional arguments, see their official documentation:
--input-type
--context-type
--version
--exome
--genome-build
--signature-database
--exclude-signature-subgroups
--do-not-export-probabilities
--export-probabilities-per-mutation
--disable-plotting
--sample-reconstruction-plots-format
--verbose
Example:
python sig_profiler_assignment.py -i trimmed-mafs -o outputs --write-results-per-sample --verbose
The flow of this script is a bit odd, it performs the following sequence,
- Copies input MAF files to the output directory and sets the output directory as the input directory
- Runs SigProfilerAssignment
- Remove copies of input files from output directory
- Calculates contributions per sample for all samples and writes to
{output-folder}/SBS_contributions.txt
- Writes a table per sample to
{output-folder}/SBS_sample_contributions/
, if--write-results-per-sample
is passed
The copying and removing of inputs from the output directory is because the current version of SigProfilerMatrixGenerator writes outputs to the input directory. Thus, this is performed to keep all outputs from SigProfilerAssignment in the outputs folder specified, leaving the inputs folder untouched. This definitely was not the case in prior versions of the tool, but I cannot find the changes in their release notes. Maybe I will open an Issue on their GitHub repository to try to find out if it was an intentional change or not.
There are outputs generated from both SigProfilerMatrixGenerator and SigProfilerAssignment. Detailed descriptions of outputs can be found within the official documentation for SigProfilerMatrixGenerator and SigProfilerAssignment.
Outputs found in the {output-folder}/
are as follows,
Assignment_Solution/
, outputs from SigProfilerAssignmentinput/
a copy of the inputs usedMatrix_Generator_output/
, outputs from SigProfilerMatrixGeneratorSBS_sample_contributions/
, SBS contributions by sample, generated if--write-results-per-sample
is passedJOB_METADATA_SPA.TXT
, log file from SigProfilerAssignmentSBS_contributions.txt
, calculated contributions per sample for each SBS signature