Read more about methodology and results here: ShiftSmooth: Locally Smoothed Attribution for Genomic CNNs.
Clone repo.
git clone https://github.com/nielseni6/ShiftSmoothedAttributions.git
Make sure to have the following required libraries if you do not already.
Python 3.6
PyTorch 1.2.0
torchvision 0.4.0
matplotlib 3.2.0
numpy 1.16.3
Pillow 6.0.0
pyseqlogo
If you are getting errors with pyseqlogo
please download the pyseqlogo files from their github (https://github.com/saketkc/pyseqlogo) and place a copy of the pyseqlogo folder into the ShiftSmoothedAttributions\Codon_Detection
and ShiftSmoothedAttributions\Human_Goldfish_Classification
folders.
This documentation is split into two parts, Quickstart and Training. If you simply wish to recreate the results using the precalculated attribution maps then you will want to begin with Quickstart. If you would like to recreate all experiments from scratch, including formatting the dataset, training the model and generating attribution maps then you will want to begin from Training.
Move to project repository.
cd ShiftSmoothedAttributions\Codon_Detection
To recreate Experiment 1 (Shifting Invariance) run the shift experiment file.
python shift_experiment.py
To recreate Experiment 2 (Are the Areas of Interest Being Highlighted?) Run the display logo file to display the attribution maps given in this repo.
python disp_attr_motif_logo.py
Move to project repository.
cd ShiftSmoothedAttributions\Human_Goldfish_Classification
To recreate Experiment 1 (Shifting Invariance) run the shift experiment file.
python shift_experiment.py
To recreate Experiment 2 (Are the Areas of Interest Being Highlighted?) Run the display logo file to display the attribution maps given in this repo. Note: this experiment will not be able to validate the method the same as the codon detection task since the important features are not known for human/goldfish classification.
python disp_attr_logo.py
If you would like to train the model yourself follow these steps
- Go to https://www.ncbi.nlm.nih.gov/nuccore/CM000663.2 and click on FASTA.
- From here click Send To -> File -> Create File, then Save File.
-
Once the file is finished downloading rename it to
human_genome_c1.txt
and place it in theCodon_Detection\raw_data
folder. -
Now that the data is downloaded move to project repository.
cd ShiftSmoothedAttributions\Codon_Detection
- Run dataset formatter until you are satisfied with the size of the dataset.
python generate_dataset.py
- Generate attribution maps using trained model.
python getattributions_motif.py
- Follow the steps for the Quickstart for the Codon Detection Task to run experiments using newly generated attribution maps.
- Go to https://www.ncbi.nlm.nih.gov/nuccore/CM000663.2 and click on FASTA.
- From here click Send To -> File -> Create File, then Save File.
- Once the file is finished downloading rename it to
human_genome_c1.txt
and place it in theHuman_Goldfish_Classification\raw_data
folder.
Steps 4 through 6 are a repeat of steps 1 through 3 except that we are downloading the goldfish genome this time rather than human. 4. Go to https://www.ncbi.nlm.nih.gov/nuccore/CM010432.1 and click on FASTA.
- From here click Send To -> File -> Create File, then Save File.
-
Once the file is finished downloading rename it to
goldfish_genome_c1.txt
and place it in theHuman_Goldfish_Classification\raw_data
folder. -
Now that the data is downloaded move to project repository.
cd ShiftSmoothedAttributions\Human_Goldfish_Classification
- Run dataset formatter until you are satisfied with the size of the dataset.
python generate_dataset.py
- Generate attribution maps using trained model.
python getattributions.py
- Follow the steps for the Quickstart for the Human/Goldfish Classification Task to run experiments using newly generated attribution maps.