This project is a web scraping tool designed to extract data from the Interaction24 website. The data is then written to a CSV file, a JSON file, and uploaded to a Google Spreadsheet.
- BeautifulSoup: A Python library used for web scraping purposes to pull the data out of HTML.
- gspread: A Python client library for Google's spreadsheet API.
- oauth2client: A library for OAuth 2.0 in Python to make authenticated requests to the Google Sheets API.
- requests: A simple HTTP library for Python, built for human beings.
- The script first sends a GET request to the Interaction24 website.
- It then parses the HTML content of the site using BeautifulSoup.
- The data of each person is extracted and stored in a
Person
object. - These
Person
objects are then written to a CSV file and a JSON file. - Finally, the data is uploaded to a Google Spreadsheet using the
gspread
library and Google Sheets API.
It is recommended to use poetry
to manage the dependencies of this project, but it is not necessary. The requirements.txt
file contains all the dependencies needed to run the project.
-
Clone the Repository:
git clone https://github.com/AnyoneClown/InteractionScrapy.git
-
Install Dependencies:
poetry install --no-root # or use pip, if you can't use poetry pip install -r requirements.txt
-
Open folder
scraping
and add your service account key file, rename it tocredentials.json
. -
Replace
Interaction24 Team
in the upload_to_spreadsheet function with the name of your Google Spreadsheet. -
Run the Script:
python scraping/scrapy.py
This will start the scraping process and the data will be written to team.csv, team.json, and the specified Google Spreadsheet.
Please ensure that the Google Spreadsheet is shared with the client_email
found in your service account JSON key file, and that the service account has the necessary permissions to access and modify the spreadsheet.