HDV Ribozyme Auto-Cleavage and Ligation Prediction Using Machine Learning

HDV / LIG - Lib14

The Lib14 dataset encompasses 3 datasets of 16384 folded sequences (size of ~ 107 GB) generated using the SPOT-RNA algorithm [1]. The goal of this data is to allow the development of machine learning algorithm to better determine the efficiency in self-cleavage and ligation of a given RNA sequence for the HDV-Lib14 and LIG-Lib14, respectively.

Both Lib14 sequences have each 14 specific nucleotides that were experimentally modified. Both whole sequence follows the IUPAC nucleotide code to illustrate non-singleton nucleotide modifications [2].

HDV-Lib14 (whole sequence)

GGACCATTCGAMTCCCATTAGRCTGGKCCGCCTCCTSGCGGCGGGAGTTGSGCKAGGGAGGAASAGYCTTYYCTAGRCTAASGMSCATCGATCCGGTTCGCCGGATCCAAATCGGGCTTCGGTCCGGTTC

LIG-Lib14 (whole sequence)

GGAMTCCCATTAGRCTGGKCCGCCTCCTSGCGGCGGGAGTTGSGCKAGGGAGGAASAGYCTTYYCTAGRCTAASGMSCATCGATCCGGTTCGCCGGATCCAAATCGGGCTTCGGTCCGGTTC

14 modified IUPAC nucleotides with respective position for the HDV and LIG whole sequences [2].

nt modifications	M	R	K	S	S	K	S	Y	Y	Y	R	S	M	S
HDV nt positions	11	21	26	36	50	53	63	66	70	71	76	81	83	84
LIG nt positions	3	13	18	28	42	45	55	58	62	63	68	73	75	76

Preprocessing Results (PCA)

HDV-Lib14 Machine Learning Outputs:

LIG-Lib14 Machine Learning Outputs:

ML graphs are generated using the ML model respective testing set, in other words, the data that was not used for training the Machine Learning (ML). On the graph, an exact prediction would be situated on a diagonal line represented by the equation Y = X. Where the X-axis represents the estimated output generated by the ML model, while the Y-axis represents the true output taken from experimental data. All predictions on the diagonal are correct predictions. The machine learning model are saved as a "pickle" file, in the "pkl" folder, under the respective model name, "lig_nt_MachineLearning.pkl" and "hdv_nt_MachineLearning.pkl".

Example of folded HDV RNA

Figure 3: Datasets/HDV/radiate/SEQUENCE_10343_radiate.png

REFERENCES:

[1] https://github.com/jaswindersingh2/SPOT-RNA
[2] https://www.bioinformatics.org/sms/iupac.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

HDV Ribozyme Auto-Cleavage and Ligation Prediction Using Machine Learning

HDV / LIG - Lib14

HDV-Lib14 (whole sequence)

LIG-Lib14 (whole sequence)

14 modified IUPAC nucleotides with respective position for the HDV and LIG whole sequences [2].

Preprocessing Results (PCA)

HDV-Lib14 Machine Learning Outputs:

LIG-Lib14 Machine Learning Outputs:

Example of folded HDV RNA

REFERENCES:

Files

README.md

Latest commit

History

README.md

File metadata and controls

HDV Ribozyme Auto-Cleavage and Ligation Prediction Using Machine Learning

HDV / LIG - Lib14

HDV-Lib14 (whole sequence)

LIG-Lib14 (whole sequence)

14 modified IUPAC nucleotides with respective position for the HDV and LIG whole sequences [2].

Preprocessing Results (PCA)

HDV-Lib14 Machine Learning Outputs:

LIG-Lib14 Machine Learning Outputs:

Example of folded HDV RNA

REFERENCES: