This Python script scrapes product information from Amazon's search results page for "formal shoes for men". It extracts product titles, prices, and URLs, and saves the data to a CSV file.
The script performs the following tasks:
Fetches the search results page from Amazon for "formal shoes for men".
Uses BeautifulSoup to parse the HTML content of the page.
Finds and constructs full URLs for product pages from the search results.
For each product page, extracts the title and price.
Compiles the data into a DataFrame and saves it to a CSV file named amazon_data.csv
.
get_title(soup)
: Extracts the product title from the HTML content.get_price(soup)
: Extracts the product price from the HTML content.
- Set Headers: Configures the HTTP headers to mimic a real browser.
- Fetch Search Results: Requests the search results page from Amazon.
- Extract Product Links: Finds and builds links to individual product pages.
- Scrape Product Pages: Requests each product page and extracts the title and price.
- Handle Exceptions: Handles HTTP errors and continues processing.
- Save Data: Writes the data to a CSV file.
- Python 3.x
- Requests library
- BeautifulSoup4 library
- pandas library
- numpy library
You can install the required libraries using pip:
pip install requests beautifulsoup4 pandas numpy