UpCrawlerApplication.mp4
Crawler Bot is a C# .NET project designed to scrape product information from an e-commerce website and store it in a MySQL database. It utilizes a C# background worker to navigate the target website and extract details such as product names, prices, discount availability, and image URLs. The project follows the Clean Architecture and CQRS design patterns, ensuring a well-structured and maintainable codebase.
Web scraping of product details: The Crawler Bot navigates to the e-commerce website and gathers product information, including regular and discounted prices, image URLs, and product names.
Clean Architecture and CQRS: The project is structured according to the Clean Architecture principles, ensuring separation of concerns and maintainability. The CQRS pattern is implemented to segregate read and write operations effectively.
User Management: Microsoft Identity is employed for user authentication, allowing users to register and log in through traditional methods or using Google login. JWT tokens are used for secure login/logout procedures.
Email Notifications: The application sends email notifications to users upon registration and when specific product details are scraped.
Global Exception Handling: A GlobalException filter is implemented to handle and manage exceptions gracefully throughout the application.
Front-end with React and TypeScript: The front-end of the application is built using React with TypeScript, providing a responsive and user-friendly interface.
Tailwind CSS for Styling: Tailwind CSS is used for designing the UI, ensuring a clean and modern appearance.
C# .NET
Clean Architecture
CQRS (Command Query Responsibility Segregation)
Microsoft Identity for User Management
JWT (JSON Web Tokens) for Authentication
MySQL for Database Management
Background Worker for Web Scraping
SignalR for Real-Time Communication
Selenium Framework for bots
React with TypeScript
Tailwind CSS for Styling
User Authentication: Users can register and log in using traditional methods or through Google login. JWT tokens are issued for secure authentication.
Homepage: Upon successful login, users are directed to the home page, where they can directly navigate to give an order.
Creating Orders: Users can create new orders, specifying details such as the number of products to scrape and the type of products (all, discounted, non-discounted).
SignalR Communication: When an order is created, the details are sent to the back-end worker using SignalR hub for web scraping.
Web Scraping: The background worker, with the help of a web driver, navigates to the e-commerce website and performs web scraping based on the user's order details.
Data Storage: The scraped product information is stored in the "Product" table, and order-related information is stored in the "Order" table. Bot status and order completion details are saved in the "Order Event" table.
Live Tracking: Users can track the bot's progress and the scraped products in real-time using the "Live Track" page. SignalR facilitates the transfer of logs from the back-end to the front-end for live updates.
Protected Routes: The application uses protected routes to ensure user session security. If a token expires, the user is automatically logged out for enhanced security.
Users have the option to export their order details to an Excel table directly from the product page and send these crawled products via email. This feature enables users to conveniently analyze, store, and share their scraped data with ease.
Clone the repository from GitHub.
Set up the necessary environment for C# .NET and React with TypeScript.
Install the required C# and JavaScript dependencies.
Configure the MySQL database connection settings.
Build and run the C# .NET back-end.
Start the React front-end to access the Crawler Bot application.