This is a Python Scrapy project to scrap some info from https://arxiv.org. It is submitted as part of 1st Sem coursework for M.E. in Big Data & Data Analytics @MSIS, MAHE, Manipal.
To run the project:
- Mount the entire project folder to Google Drive.
- Open
PDV | Scrapy | Assignment.ipynb
in Google Colab - Execute the commands in Colab.
- Run the project locally
- Ensure scrapy installed either using
pip
orconda
- From a terminal, switch to
scrapy_arxiv_org
folder & runscrapy crawl basic