Chemical Smiles Toolkit

Developer Notes : This Project Is a work in progress, some functionality is WIP at the moment.

This Chemical Smiles Toolkit has a variety of features including clustering chemical compounds based on their SMILES (Simplified Molecular Input Line Entry System) representation and provides a user-friendly interface to input a SMILES string and obtain a cluster of similar chemicals along with their respective SMILES. In addition, the user can input a SMILE and recieve a 2D Structure in return.

There is no requirement to cluster the default data on first use, it has already been clustered using 100 (arbitrary, working on alternative at the moment) clusters.

Features

The project consists of the following main components:

Webscraping: The project includes a webscraping module that fetches drug SMILES and names from reliable sources. This data will serve as the basis for chemical compound clustering.
Clustering: The clustering module utilizes agglomerative clustering with levenshtein distance to cluster the chemical compounds based on their SMILES. It computes the similarity between compounds and assigns them to appropriate clusters.
Chemical Identification: This module takes a SMILE and outputs the predicted chemical.
SMILE To Structure: This module takes a SMILE and outputs the predicted chemical.

Installation

Clone the repository:

git clone https://github.com/DanielFlockhart/Chemical-SMILES-toolkit.git

Navigate to the project directory:
```
cd Chemical-SMILES-toolkit
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

In the case you want to use your own dataset please upload your txt of chemical names in this form. If you only wish to test the software, please skip to step

drugs.txt
```
["name1","name2","name3"...]
```
Launch the program:
```
python main.py
```

Follow Instructions

---------- Welcome chemical SMILES toolkit ----------

 The Github repository comes with a pre-clustered dataset of 1411 Substances with 100 clusters as an example.
 Feel free to use this dataset or cluster your own dataset.
 Please choose from the follow options to continue:

 1. Get similar SMILE to a given SMILE with current clusters
 2. Re-cluster data with a different number of clusters
 3. Re-cluster data with a different dataset
 4. Convert a SMILE to a 2D structure and display it
 5. Get the name of a chemical from a SMILE

Getting Similar Chemicals

Enter a smile: CCC(CC1=CNC2=CC=CC=C21)N

Alpha-methyltryptamine       CC(CC1=CNC2=CC=CC=C21)N
Alpha-ethyltryptamine        CCC(CC1=CNC2=CC=CC=C21)N
Alpha,N-DMT                  CC(CC1=CNC2=CC=CC=C21)NC
5-MeO-AMT                    CC(CC1=CNC2=C1C=C(C=C2)OC)N
Alpha,N,O-TMS                CC(CC1=CNC2=C1C=C(C=C2)OC)NC
5-Fluoro-AMT                 CC(CC1=CNC2=C1C=C(C=C2)F)N
6-fluoro-AMT                 CC(CC1=CNC2=C1C=CC(=C2)F)N
MethylbenzodioxolylbutanamineCCCC(C)(C1C2=CC=CC=C2OO1)N
Benzodioxolylbutanamine      CCCC(C1C2=CC=CC=C2OO1)N
Naphthylaminopropane         CC(CC1=CC2=CC=CC=C2C=C1)N

Converting a SMILE to a 2D structure and display it.
```
Enter a smile: CCC(CC1=CNC2=CC=CC=C21)  
```
Displayed Image (The file name of the image is the name of the Chemical)

Getting the name of a chemical from a SMILE

Enter a smile: CCC(CC1=CNC2=CC=CC=C21)  
The SMILE corresponds to the chemical -> 3-butyl-1H-indole

Contribution

Contributions to this project are welcome! If you have any suggestions, improvements, or new features to propose, please submit a pull request. You can also report any issues or bugs by opening an issue on the project's GitHub repository.

When contributing, please follow the existing code style, write clear and concise commit messages, and provide appropriate documentation.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute it as per the terms of the license.

Credits

The project acknowledges the following resources for their contributions:

PubChem - Data source for drug SMILES and names
RDKit - Converting SMILEs to 2D Structures

Thank you for using the Chemical SMILES toolkit project! We hope it proves to be useful for your chemical analysis and research.

Working on:

Setting Up UI
- Creating Pages System
Create my own UI Framework built ontop of tkinter
Creating 3D Conformer Generator

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backups		backups
data		data
docs		docs
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemical Smiles Toolkit

Features

Installation

Usage

Contribution

License

Credits

About

Releases

Packages

Languages

License

DanielFlockhart/Chemical-SMILES-toolkit

Folders and files

Latest commit

History

Repository files navigation

Chemical Smiles Toolkit

Features

Installation

Usage

Contribution

License

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages