-
-
Notifications
You must be signed in to change notification settings - Fork 213
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #588 from mariam7084/main
Google Playstore Analysis and Rating Prediction
- Loading branch information
Showing
16 changed files
with
14,089 additions
and
0 deletions.
There are no files selected for viewing
10,841 changes: 10,841 additions & 0 deletions
10,841
Google Playstore Analysis And Rating Predictor/Dataset/googleplaystore.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+61.5 KB
...Playstore Analysis And Rating Predictor/Images/App distribution by category.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+114 KB
...sis And Rating Predictor/Images/App distribution sunburst chart by category.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+114 KB
...tore Analysis And Rating Predictor/Images/App size distribution by category.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+40.2 KB
...le Playstore Analysis And Rating Predictor/Images/Average Price By category.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+42.2 KB
...e Playstore Analysis And Rating Predictor/Images/Category by Content Rating.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+23.3 KB
...le Playstore Analysis And Rating Predictor/Images/Distribution of App Sizes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+12 KB
...alysis And Rating Predictor/Images/Number of applications by content rating.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+7.64 KB
Google Playstore Analysis And Rating Predictor/Images/Number of apps by type.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+25.6 KB
...re Analysis And Rating Predictor/Images/Scatter plot of reviews vs installs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+41.1 KB
...e Playstore Analysis And Rating Predictor/Images/Total installs by category.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+266 KB
... Playstore Analysis And Rating Predictor/Images/Word Cloud for Genre Column.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+220 KB
...laystore Analysis And Rating Predictor/Images/wordcloud for category column.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,151 changes: 3,151 additions & 0 deletions
3,151
...Analysis And Rating Predictor/Model/Google_Playstore_Analysis_and_Rating_Prediction.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
90 changes: 90 additions & 0 deletions
90
Google Playstore Analysis And Rating Predictor/Model/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
<h1>Google PlayStore Analysis and Rating Predictor</h1> | ||
|
||
**GOAL** | ||
|
||
To analyze the 'Google Playstore Dataset' Dataset using Exploratory Data analysis and make a regression model to predict the rating of the apps. | ||
|
||
**DATASET** | ||
|
||
https://www.kaggle.com/datasets/madhav000/playstore-analysis | ||
|
||
**DESCRIPTION** | ||
|
||
The problem is to identify the apps that are going to be good for Google to promote. App ratings, which are provided by the customers, is always a great indicator of the goodness of the app. The problem reduces to: predict which apps will have high ratings. | ||
|
||
The dataset contains the following columns: | ||
- App : Applicaton Name | ||
- Category: Category to which the app belongs | ||
- Rating: Overall user rating of the app | ||
- Reviews: Number of user reviews for the app | ||
- Size: Size of the app | ||
- Installs: Number of user downloads/installs for the app | ||
- Type: Paid or Free | ||
- Price: Price of the app | ||
- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult | ||
- Genres: An app can belong to multiple genres (apart from its main category). For example, a musical family game will belong to Music, Game, Family genres. | ||
- Last Updated: Date when the app was last updated on Play Store | ||
- Current Ver: Current version of the app available on Play Store | ||
- Android Ver: Minimum required Android version | ||
|
||
**WHAT I HAD DONE** | ||
|
||
* Checked for missing values and cleaned the data accordingly | ||
* Analyzed the data, found insights and visualized them accordingly. | ||
* Found detailed insights of different columns with one another using plotting libraries. | ||
* Deployed four regression models viz., Linear Regression, Support Vector regression, Decision Tree Regression, Random FOrest Regression to predict the rating. | ||
* Used RMSE(Root Mean Square Error) to evaluate the performance of the models. | ||
|
||
|
||
**LIBRARIES NEEDED** | ||
|
||
1. Pandas | ||
2. Matplotlib | ||
3. Seaborn | ||
4. Plotly | ||
5. Numpy | ||
6. WordCloud | ||
7. Sklearn | ||
|
||
**VISUALIZATION** | ||
![App distribution sunburst chart by category](<../Images/App distribution sunburst chart by category.png>) | ||
![App size distribution by category](<../Images/App size distribution by category.png>) | ||
![App distribution by category](<../Images/App distribution by category.png>) | ||
![Average Price By category](<../Images/Average Price By category.png>) | ||
![Category by Content Rating](<../Images/Category by Content Rating.png>) | ||
![wordcloud for category column](<../Images/wordcloud for category column.png>) | ||
![Total installs by category](<../Images/Total installs by category.png>) | ||
![Number of apps by type](<../Images/Number of apps by type.png>) | ||
![Word Cloud for Genre Column](<../Images/Word Cloud for Genre Column.png>) | ||
![Number of applications by content rating](<../Images/Number of applications by content rating.png>) | ||
|
||
For more visualization refer the .ipynb file :) | ||
|
||
**Model Performances** | ||
|
||
|Model | RMSE | | ||
| ------------------------- | -------------------| | ||
|Linear Regression | 0.5474395094809866 | | ||
|Support Vector Regression | 0.545564956206932 | | ||
|Decision Tree Regression | 0.7552190124670595 | | ||
|Random Forest Regression | 0.7552190124670595 | | ||
|
||
- reviews, type, installs and size columns were used to make the regression model with the rating column being the target vector | ||
- high RMSE shows that the data has a wide variation in it. | ||
|
||
**CONCLUSION** | ||
- Various Categories of apps have varied ratings | ||
- Most installed apps belonged to the category of 'Game' | ||
- Most of the apps on the playstore are rated for all age groups of audience | ||
- More than 3/4th of the apps are free to install. | ||
- Among the paid apps, finance apps were most expensive having an average price of $8, followed by Lifestyle apps at $6 and medical apps at $3 average. | ||
- All the apps were less than 100 mb. | ||
- Most of the available apps had 'Family' category. | ||
- The Support Vector regression(with rbf kernel) had the minimum RMSE among the algorithms used making it the best model. | ||
|
||
**AUTHOR** | ||
|
||
- Code contributed by *Mariam* @ #JWoC_2024 | ||
|
||
[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/mariam-m7084) | ||
[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/mariam7084/) |
7 changes: 7 additions & 0 deletions
7
Google Playstore Analysis And Rating Predictor/Requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Numpy == 1.25.2 | ||
Matplotlib == 3.7.1 | ||
Pandas == 1.5.3 | ||
Seaborn == 0.13.1 | ||
Plotly == 5.15.0 | ||
Wordcloud == 1.9.3 | ||
Sklearn == 1.2.2 |