This repository provides educational material for learning web scraping using Python. It starts with the basics using requests
and BeautifulSoup
and then progresses to more advanced techniques with Scrapy
.
- Introduction to Web Scraping
- Basic Web Scraping with requests and BeautifulSoup
- Setting Up the Environment
- Introduction to Scrapy
- Advanced Scrapy Techniques
- Handling JavaScript-Heavy Sites with Selenium
- Responsible Web Scraping: Best Practices and Ethics
- Challenges and Solutions in Web Scraping
- Practical Projects for Skill Application
- Handling Data Post-Scraping
- Sample Project: Scraping Patek.com
- Sample Project: Scraping Blancpain.com
- Basic knowledge of Python
- Python environment set up on your local machine
Practical application enhances learning. That's why we've provided sample projects that offer a hands-on approach to utilizing the concepts and techniques discussed in this tutorial. These projects demonstrate the use of popular Python libraries for web scraping: BeautifulSoup
, Scrapy
, and Selenium
.
This project showcases the power and flexibility of Scrapy
, a comprehensive web scraping framework in Python. We've targeted Patek.com for this task, extracting detailed information about various watch models. You'll find everything you need to understand, run, and learn from this real-world example in the project's directory, including the script, sample output data, and a detailed walkthrough of the code.
Our second project takes a different approach, using BeautifulSoup
and Selenium
to scrape data from Blancpain's website. While BeautifulSoup
is perfect for simpler scraping tasks, combining it with Selenium
allows handling JavaScript-heavy websites. This project will walk you through a practical example of how to use BeautifulSoup
and Selenium
for web scraping. The project's folder contains the script, sample data, and an extensive guide to understanding each step of the process.