SkipGuide Analysis

The code used for the analysis and production of the results described in the paper Machine learning based CRISPR gRNA design for therapeutic exon skipping.

Dependencies

The analysis was performed using Python 3.7.5 and Jupyter. The dependencies are listed in environment.yml. We recommend using the conda package manager from Anaconda Python to create an environment for running the analysis:

conda env create -f environment.yml

Activate the environment by:

conda activate skipguide_data_processing

Data Files

The provided Jupyter notebooks (see Usage section) can produce all the results starting from the raw sequencing data. However, computations can take a very long time, on the order of hours or days depending on computational resources. The notebooks are configured to skip certain long computations if pre-computed files are available. We recommend you instead download the pre-computed files before running the notebooks.

Raw Data Files

If you opt to not use the pre-computed files, the raw sequencing data needs to be available. Download them from here (raw/archive.tar.bz2), extract, and place the *.fastq files in the data/reads directory before running the provided notebooks. Alternatively, the same *.fastq files are available on NCBI SRA, BioProject accession PRJNA647416. Running all the notebooks may take on the order of hours or days depending on computational resources.

Pre-Computed Files

If you opt to use the pre-computed files, the raw sequencing data is not necessary. Download the pre-computed files from here (precomputed/cache.tar.xz), extract, and replace the cache directory with the extracted cache directory. Running all the notebooks should then take less than half an hour.

Usage

You can open the provided Jupyter notebooks under src and view the outputs. This section details how you can run the notebooks from scratch.

See Data Files section to include the necessary data files.

If pre-computed files are not used, modify the NUM_PROCESSES variable in config.py to specify the number of cores for multiprocessing.

Start a Jupyter notebook server, e.g.:

jupyter notebook --port=8888

Run the provided notebooks under src in the following order:

Sequence_Extraction.ipynb
Barcode_Sequence_Lookup_Tables.ipynb
datA_Characterize_Sequences_Indels.ipynb
inDelphi_Evaluation.ipynb
inDelphi_Check_Data_Leakage.ipynb
datB_Characterize_Skipping.ipynb
SpliceAI_Predict_Skipping.ipynb
MMSplice_Predict_Skipping.ipynb
MetaSplice_SkipGuide_Evaluation.ipynb

Inspect the comments and markdown in the notebooks for more context.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SkipGuide Analysis

Dependencies

Data Files

Raw Data Files

Pre-Computed Files

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

SkipGuide Analysis

Dependencies

Data Files

Raw Data Files

Pre-Computed Files

Usage