Skip to content

Ian Nielsen Systems Devices and Algorithms in Bioinformatics Final Project Spring 2021

Notifications You must be signed in to change notification settings

nielseni6/ShiftSmoothedAttributions

Repository files navigation

ShiftSmoothedAttributions

Ian Nielsen Systems Devices and Algorithms in Bioinformatics Final Project Spring 2021

Read more about methodology and results here: ShiftSmooth: Locally Smoothed Attribution for Genomic CNNs.

Install

Clone repo.

git clone https://github.com/nielseni6/ShiftSmoothedAttributions.git

Make sure to have the following required libraries if you do not already.

Python 3.6

PyTorch 1.2.0

torchvision 0.4.0

matplotlib 3.2.0

numpy 1.16.3

Pillow 6.0.0

pyseqlogo

If you are getting errors with pyseqlogo please download the pyseqlogo files from their github (https://github.com/saketkc/pyseqlogo) and place a copy of the pyseqlogo folder into the ShiftSmoothedAttributions\Codon_Detection and ShiftSmoothedAttributions\Human_Goldfish_Classification folders.

Getting Started

This documentation is split into two parts, Quickstart and Training. If you simply wish to recreate the results using the precalculated attribution maps then you will want to begin with Quickstart. If you would like to recreate all experiments from scratch, including formatting the dataset, training the model and generating attribution maps then you will want to begin from Training.

Quickstart:

Codon Detection Task

Move to project repository.

cd ShiftSmoothedAttributions\Codon_Detection

To recreate Experiment 1 (Shifting Invariance) run the shift experiment file.

python shift_experiment.py

To recreate Experiment 2 (Are the Areas of Interest Being Highlighted?) Run the display logo file to display the attribution maps given in this repo.

python disp_attr_motif_logo.py

Human/Goldfish Classification Task

Move to project repository.

cd ShiftSmoothedAttributions\Human_Goldfish_Classification

To recreate Experiment 1 (Shifting Invariance) run the shift experiment file.

python shift_experiment.py

To recreate Experiment 2 (Are the Areas of Interest Being Highlighted?) Run the display logo file to display the attribution maps given in this repo. Note: this experiment will not be able to validate the method the same as the codon detection task since the important features are not known for human/goldfish classification.

python disp_attr_logo.py

Train Model:

If you would like to train the model yourself follow these steps

Codon Detection Task

  1. Go to https://www.ncbi.nlm.nih.gov/nuccore/CM000663.2 and click on FASTA.

image

  1. From here click Send To -> File -> Create File, then Save File.

image

  1. Once the file is finished downloading rename it to human_genome_c1.txt and place it in the Codon_Detection\raw_data folder.

  2. Now that the data is downloaded move to project repository.

cd ShiftSmoothedAttributions\Codon_Detection

  1. Run dataset formatter until you are satisfied with the size of the dataset.

python generate_dataset.py

  1. Generate attribution maps using trained model.

python getattributions_motif.py

  1. Follow the steps for the Quickstart for the Codon Detection Task to run experiments using newly generated attribution maps.

Human/Goldfish Classification Task

  1. Go to https://www.ncbi.nlm.nih.gov/nuccore/CM000663.2 and click on FASTA.

image

  1. From here click Send To -> File -> Create File, then Save File.

image

  1. Once the file is finished downloading rename it to human_genome_c1.txt and place it in the Human_Goldfish_Classification\raw_data folder.

Steps 4 through 6 are a repeat of steps 1 through 3 except that we are downloading the goldfish genome this time rather than human. 4. Go to https://www.ncbi.nlm.nih.gov/nuccore/CM010432.1 and click on FASTA.

image

  1. From here click Send To -> File -> Create File, then Save File.

image

  1. Once the file is finished downloading rename it to goldfish_genome_c1.txt and place it in the Human_Goldfish_Classification\raw_data folder.

  2. Now that the data is downloaded move to project repository.

cd ShiftSmoothedAttributions\Human_Goldfish_Classification

  1. Run dataset formatter until you are satisfied with the size of the dataset.

python generate_dataset.py

  1. Generate attribution maps using trained model.

python getattributions.py

  1. Follow the steps for the Quickstart for the Human/Goldfish Classification Task to run experiments using newly generated attribution maps.

About

Ian Nielsen Systems Devices and Algorithms in Bioinformatics Final Project Spring 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages