Skip to content

A toolkit for processing, understanding and using canonical smiles.

License

Notifications You must be signed in to change notification settings

DanielFlockhart/Chemical-SMILES-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chemical Smiles Toolkit

Developer Notes : This Project Is a work in progress, some functionality is WIP at the moment.

This Chemical Smiles Toolkit has a variety of features including clustering chemical compounds based on their SMILES (Simplified Molecular Input Line Entry System) representation and provides a user-friendly interface to input a SMILES string and obtain a cluster of similar chemicals along with their respective SMILES. In addition, the user can input a SMILE and recieve a 2D Structure in return.

There is no requirement to cluster the default data on first use, it has already been clustered using 100 (arbitrary, working on alternative at the moment) clusters.

Features

The project consists of the following main components:

  1. Webscraping: The project includes a webscraping module that fetches drug SMILES and names from reliable sources. This data will serve as the basis for chemical compound clustering.

  2. Clustering: The clustering module utilizes agglomerative clustering with levenshtein distance to cluster the chemical compounds based on their SMILES. It computes the similarity between compounds and assigns them to appropriate clusters.

  3. Chemical Identification: This module takes a SMILE and outputs the predicted chemical.

  4. SMILE To Structure: This module takes a SMILE and outputs the predicted chemical.

Installation

  1. Clone the repository:

    git clone https://github.com/DanielFlockhart/Chemical-SMILES-toolkit.git
  2. Navigate to the project directory:

    cd Chemical-SMILES-toolkit
  3. Install the required dependencies:

    pip install -r requirements.txt

Usage

  1. In the case you want to use your own dataset please upload your txt of chemical names in this form. If you only wish to test the software, please skip to step

    drugs.txt

    ["name1","name2","name3"...]
    
  2. Launch the program:

    python main.py
    
  3. Follow Instructions

    ---------- Welcome chemical SMILES toolkit ----------
    
     The Github repository comes with a pre-clustered dataset of 1411 Substances with 100 clusters as an example.
     Feel free to use this dataset or cluster your own dataset.
     Please choose from the follow options to continue:
    
     1. Get similar SMILE to a given SMILE with current clusters
     2. Re-cluster data with a different number of clusters
     3. Re-cluster data with a different dataset
     4. Convert a SMILE to a 2D structure and display it
     5. Get the name of a chemical from a SMILE  
  4. Getting Similar Chemicals

    Enter a smile: CCC(CC1=CNC2=CC=CC=C21)N
    
    Alpha-methyltryptamine       CC(CC1=CNC2=CC=CC=C21)N
    Alpha-ethyltryptamine        CCC(CC1=CNC2=CC=CC=C21)N
    Alpha,N-DMT                  CC(CC1=CNC2=CC=CC=C21)NC
    5-MeO-AMT                    CC(CC1=CNC2=C1C=C(C=C2)OC)N
    Alpha,N,O-TMS                CC(CC1=CNC2=C1C=C(C=C2)OC)NC
    5-Fluoro-AMT                 CC(CC1=CNC2=C1C=C(C=C2)F)N
    6-fluoro-AMT                 CC(CC1=CNC2=C1C=CC(=C2)F)N
    MethylbenzodioxolylbutanamineCCCC(C)(C1C2=CC=CC=C2OO1)N
    Benzodioxolylbutanamine      CCCC(C1C2=CC=CC=C2OO1)N
    Naphthylaminopropane         CC(CC1=CC2=CC=CC=C2C=C1)N
    
  5. Converting a SMILE to a 2D structure and display it.

    Enter a smile: CCC(CC1=CNC2=CC=CC=C21)  

    Displayed Image (The file name of the image is the name of the Chemical)

    Alt Text

  6. Getting the name of a chemical from a SMILE

    Enter a smile: CCC(CC1=CNC2=CC=CC=C21)  
    The SMILE corresponds to the chemical -> 3-butyl-1H-indole
    

Contribution

Contributions to this project are welcome! If you have any suggestions, improvements, or new features to propose, please submit a pull request. You can also report any issues or bugs by opening an issue on the project's GitHub repository.

When contributing, please follow the existing code style, write clear and concise commit messages, and provide appropriate documentation.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute it as per the terms of the license.

Credits

The project acknowledges the following resources for their contributions:

  • PubChem - Data source for drug SMILES and names
  • RDKit - Converting SMILEs to 2D Structures

Thank you for using the Chemical SMILES toolkit project! We hope it proves to be useful for your chemical analysis and research.

Working on:

  • Setting Up UI
    • Creating Pages System
  • Create my own UI Framework built ontop of tkinter
  • Creating 3D Conformer Generator

About

A toolkit for processing, understanding and using canonical smiles.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages