Skip to content

Zemzemfiras1/WebScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraping

Web scraping is the automated process of extracting large amounts of data from websites. In bioinformatics, it is especially valuable for gathering publicly available data from biological databases, such as gene sequences, protein structures, or clinical datasets. By streamlining data collection, web scraping enables researchers to efficiently access and compile vast amounts of information, which is critical for large-scale analyses like genome-wide association studies (GWAS) or drug discovery.

image

The importance of web scraping in bioinformatics lies in its ability to bypass manual data collection, saving time and reducing errors. It helps in monitoring and updating datasets in real-time, ensuring researchers have the latest information for their studies. Furthermore, it aids in building comprehensive data repositories that are essential for machine learning models, personalized medicine, and variant analysis. Overall, it boosts data accessibility, accelerating discoveries in life sciences.


Main objective

This repository hosts a Python script designed to automate the retrieval of chemical compound details. With a list of compound names or IDs as input, the script fetches vital information such as Compound Name, ID, Molecular Formula, Molecular Weight, and SMILES

Requirements

pip install requests
pip install beautifulsoup4
pip install lxml
pip install pandas

Features

Automatic Data Gathering: Instead of going through pages and copying things manually, web scraping tools do it for you quickly and efficiently.

Understanding Web Pages: These tools know how to read the structure of a website (like sections, headers, or tables) to find and collect the right information.

Getting Exactly What You Need: You can set up rules for the scraper to focus only on the data that matters to you, ignoring the rest.

Working with Dynamic Websites: Some scrapers are smart enough to interact with websites that change content based on user actions (like sites using JavaScript or needing you to click buttons).

Bypassing Blocks: Advanced scrapers can deal with security features like CAPTCHAs or use different networks (proxies) so they don't get blocked for gathering too much data too quickly.

Saving Data in Useful Formats: After collecting the data, scrapers can save it in formats like spreadsheets, text files, or databases, making it easy for you to analyze or use later.

Resilience: If something goes wrong—like the site going down—the scraper can try again without crashing, making the process more reliable.


Please contact me [email protected] for any questions or comments.

About

Retrieving data from pubchem database

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages