This project aims to predict stock prices by creating a hybrid model that combines numerical analysis of historical stock data and sentiment analysis of news headlines. The project involves several steps, including data collection, preprocessing, feature engineering, and model building.
- Stock Data: Download historical stock data from Yahoo Finance using the
yfinance
library. - News Headlines: Collect news headlines related to the stock from a suitable source. Note: The dataset for news headlines can be downloaded from this link: Headlines Dataset.
- Stock Data: Clean the stock data, handle missing values, and format the data into a DataFrame.
- News Headlines:
- Tokenization: Tokenize the headlines to break them into individual words.
- Sentiment Analysis: Perform sentiment analysis on the headlines to quantify sentiment.
- Preprocessing: Apply necessary preprocessing steps, such as removing stop words and special characters.
- Stock Data: Calculate technical indicators such as Simple Moving Average (SMA), Exponential Moving Average (EMA), and Relative Strength Index (RSI).
- Combined Data: Merge the preprocessed news headlines with the stock data based on the common date.
- Exploratory Data Analysis (EDA): Explore the merged data to understand the relationships between different features.
- Correlation Analysis: Examine the correlation between numerical features and the target variable ('Close').
- Hybrid Model: Use a machine learning algorithm, such as Random Forest, to build a hybrid model that incorporates both numerical features from stock data and sentiment features from news headlines.
- Metrics: Evaluate the model using metrics like Mean Squared Error (MSE), R-squared (R2), and other relevant performance metrics.
- Visualization: Create visualizations, such as plots of stock prices, technical indicators, and model predictions.
- Parameter Tuning: Fine-tune the model parameters to improve performance.
- Save DataFrame: Save the merged DataFrame, which contains both stock data and preprocessed headlines, as a new CSV file for future use.
This project demonstrates the application of a hybrid model for stock price prediction, combining numerical analysis and sentiment analysis. It provides insights into the relationship between news sentiment and stock performance.
Feel free to customize and expand upon each section based on the specifics of your project.