This project shows how to deploy a distributed web scraper for financial data to enhance efficiency, use a relational database for storage, and implement comprehensive monitoring.
- Distributed Systems: Develop systems using RabbitMQ and Celery for scalable web scraping.
- Docker Deployment: Use Docker for streamlined setup and deployment, monitored with Protainer.
- Database: Efficiently store and manage data using MySQL.
- Monitoring: Implement Grafana, Prometheus for big data monitoring.
- Dashboard: Build Grafana dashboards for data status monitoring and anomaly detection.
Follow these steps to set up and run the distributed web scraper:
Clone the repo:
git clone https://github.com/whchien/financial-data-engine.git
Install the necessary dependencies:
make install-package
Initiate docker swarm
make init-swarm
Create the Docker network for service communication:
make create-network
Deploy RabbitMQ to handle message queuing:
make deploy-rabbitmq
Deploy the MySQL service for data storage:
make deploy-mysql
Set up the MySQL volume for data persistence:
make create-mysql-volume
Deploy the Celery worker for TWSE tasks for example:
make run-worker-twse
Send a task to fetch Taiwan futures daily data:
make send-taiwan-futures-daily-task
By following these steps, you will set up a distributed scraping system capable of efficiently collecting financial data, utilizing RabbitMQ for task queuing, MySQL for data storage, and Celery for task execution.
This project is inspired by this repo.