Skip to content

opencodeiiita/News_Scraping

Repository files navigation

News_Scraping

Please read the instructions.txt carefully before attempting the tasks

A good data scientist not only has extensive knowledge of machine learning, and deep learning, but also has the ability to extract and gather data from various sources and store it in a useable format. This task will introduce you to the first step of all data science tasks, data collection. One method of data collection is web scraping, which you will be working on in this task.

Problem Statement This project involves collecting data from various online sources. You are asked to collect relevant news data on different stocks, collect financial news headlines. The second part of the project is data cleaning and pre processing. You are asked to present a clean and usable dataset.

Instructions

  • Refer to beautiful soup's online documentation or refer to youtube videos if you run into a problem instead of using ChatGPT
  • Do not alter any prewritten code or comments
  • Be sure to add comments to make your code legible and to let the mentors understand what approach you have taken
  • Only use google colab to run the code

Procedure

  1. Fork and clone this repository onto your local device
  2. Open the .ipynb file on google colab
  3. Once you are done with the task, download as .ipynb and store it in a folder along with required files
  4. Name your file as your Enrollment number
  5. Push this file to forked repo and then send PR
  6. Your code will be reviewed by the mentors. Points will be granted once the PR is accepted and merged

Help

For any query feel free to contact [email protected]. You can also interact with the mentors and the geekhaven community on discord