Developer Notes : This Project Is a work in progress, some functionality is WIP at the moment.
This Chemical Smiles Toolkit has a variety of features including clustering chemical compounds based on their SMILES (Simplified Molecular Input Line Entry System) representation and provides a user-friendly interface to input a SMILES string and obtain a cluster of similar chemicals along with their respective SMILES. In addition, the user can input a SMILE and recieve a 2D Structure in return.
There is no requirement to cluster the default data on first use, it has already been clustered using 100 (arbitrary, working on alternative at the moment) clusters.
The project consists of the following main components:
-
Webscraping: The project includes a webscraping module that fetches drug SMILES and names from reliable sources. This data will serve as the basis for chemical compound clustering.
-
Clustering: The clustering module utilizes agglomerative clustering with levenshtein distance to cluster the chemical compounds based on their SMILES. It computes the similarity between compounds and assigns them to appropriate clusters.
-
Chemical Identification: This module takes a SMILE and outputs the predicted chemical.
-
SMILE To Structure: This module takes a SMILE and outputs the predicted chemical.
-
Clone the repository:
git clone https://github.com/DanielFlockhart/Chemical-SMILES-toolkit.git
-
Navigate to the project directory:
cd Chemical-SMILES-toolkit
-
Install the required dependencies:
pip install -r requirements.txt
-
In the case you want to use your own dataset please upload your txt of chemical names in this form. If you only wish to test the software, please skip to step
drugs.txt
["name1","name2","name3"...]
-
Launch the program:
python main.py
-
Follow Instructions
---------- Welcome chemical SMILES toolkit ---------- The Github repository comes with a pre-clustered dataset of 1411 Substances with 100 clusters as an example. Feel free to use this dataset or cluster your own dataset. Please choose from the follow options to continue: 1. Get similar SMILE to a given SMILE with current clusters 2. Re-cluster data with a different number of clusters 3. Re-cluster data with a different dataset 4. Convert a SMILE to a 2D structure and display it 5. Get the name of a chemical from a SMILE
-
Getting Similar Chemicals
Enter a smile: CCC(CC1=CNC2=CC=CC=C21)N Alpha-methyltryptamine CC(CC1=CNC2=CC=CC=C21)N Alpha-ethyltryptamine CCC(CC1=CNC2=CC=CC=C21)N Alpha,N-DMT CC(CC1=CNC2=CC=CC=C21)NC 5-MeO-AMT CC(CC1=CNC2=C1C=C(C=C2)OC)N Alpha,N,O-TMS CC(CC1=CNC2=C1C=C(C=C2)OC)NC 5-Fluoro-AMT CC(CC1=CNC2=C1C=C(C=C2)F)N 6-fluoro-AMT CC(CC1=CNC2=C1C=CC(=C2)F)N MethylbenzodioxolylbutanamineCCCC(C)(C1C2=CC=CC=C2OO1)N Benzodioxolylbutanamine CCCC(C1C2=CC=CC=C2OO1)N Naphthylaminopropane CC(CC1=CC2=CC=CC=C2C=C1)N
-
Converting a SMILE to a 2D structure and display it.
Enter a smile: CCC(CC1=CNC2=CC=CC=C21)
Displayed Image (The file name of the image is the name of the Chemical)
-
Getting the name of a chemical from a SMILE
Enter a smile: CCC(CC1=CNC2=CC=CC=C21) The SMILE corresponds to the chemical -> 3-butyl-1H-indole
Contributions to this project are welcome! If you have any suggestions, improvements, or new features to propose, please submit a pull request. You can also report any issues or bugs by opening an issue on the project's GitHub repository.
When contributing, please follow the existing code style, write clear and concise commit messages, and provide appropriate documentation.
This project is licensed under the MIT License. Feel free to use, modify, and distribute it as per the terms of the license.
The project acknowledges the following resources for their contributions:
Thank you for using the Chemical SMILES toolkit project! We hope it proves to be useful for your chemical analysis and research.
Working on:
- Setting Up UI
- Creating Pages System
- Create my own UI Framework built ontop of tkinter
- Creating 3D Conformer Generator