This dataset comprises news articles collected over the past few months using the NewsAPI. The primary motivation behind curating this dataset was to develop and experiment with various natural language processing (NLP) models. The dataset aims to support the creation of text summarization models, sentiment analysis models, and other NLP applications.
The data is sourced from the NewsAPI, a comprehensive and up-to-date news aggregation service. The API provides access to a wide range of news articles from various reputable sources, making it a valuable resource for constructing a diverse and informative dataset.
The data for this dataset was collected using a custom Python script. You can find the script used for data retrieval dailyWorker.py. This script leverages the NewsAPI to gather information on news articles over a specified period.
Feel free to explore and modify the script to suit your data collection needs. If you have any questions or suggestions for improvement, please don't hesitate to reach out.
The inspiration behind collecting this dataset stems from the growing interest in NLP applications and the need for high-quality, real-world data to train and evaluate these models effectively. By leveraging the NewsAPI, we aim to contribute to the development of robust text summarization and sentiment analysis models that can better understand and process news content.
- Text of news articles
- Publication date and time
- Source information
- Any additional metadata available through the NewsAPI
- Text Summarization: Develop models to generate concise and informative summaries of news articles.
- Sentiment Analysis: Analyze the sentiment expressed in news articles to understand public opinion.
- Topic Modeling: Explore trends and topics within the news data.
Note:
Please refer to the NewsAPI documentation for terms of use and ensure compliance with their policies when using this dataset.