This is a brief intrudction to the contents of the repository.
The data folder contains a url to the Google Drive containing the orginal review/business data chinesereview.csv and the simplified version chineseAllReview.csv, along with the three transformed data. The code for transformation could be found at: code >> goal_1 >> text_processing.
The code folder contains all the codes we used for analysis/prediction.
For people who're interested in our analysis, he/she could find our final codes for Goal 1 in the subfolder goal_1, and our final codes for the best prediction in kaggle could be founded in the subfolder goal_2.
The image folder contains all the plots in our slides, and summary. The name/git-comments of each img shows its source.
The summary folder contains an executive Jupyter Notebook summary Module2_final_report.ipynb and a pdf version Module2_final_report.pdf. The images in this report also live there, but you can still find them in the image folder.
The slides folder contains the slides pdf for our presentation.