Skip to content

Latest commit

 

History

History
145 lines (106 loc) · 7.05 KB

README.md

File metadata and controls

145 lines (106 loc) · 7.05 KB

GitHub

Pullman Sea Temple Port Douglas Online Comments analysis

Table of Contents

General description

Reputation analysis using Natural Language Processing tools (Text analyctics), semisupervised classification and timeseries analysis.

Analyse text content from tripavisors reviews over Pullman resort in Port Douglas, using a varity of methods.

Check the full jupyter notebook here (Report)

TLDR:

I explore the current trends of customer experience through online comments on TripAdvisor for Pullman Sea temple (PPD) in Port Douglas, Queensland, Australia.

By analysing the scores, I discovered:

users_score

However, when applying a Time series analysis I realised that:

  1. Monthly average number of comments has increased through the years, although
  2. Score evolution shows a declining trend in recent years, and
  3. when counting the proportion of comments, I discovered that despite most comments are still positive, negatives are more predominant
  4. When checking the absolute values we see negative comments remaining the same but there are fewer new positive comments

avg_month_score Prop_avg_score Abs_avg_score

To understand the customer experience and why the score is declining, I performed several Text analysis of the actual comments, to discover:

  1. By using multiple strategies I extracted most common phases to see which factors are the most important for customers, like the swimming pool, the distance to town or the staff
  2. Applying vector similarity, I build a semi-supervised sentence classifier to group the text by its content in 5 categories: Housekeeping, Infrastructure, Restaurant, Front Desk and others. I later checked if their prevalence changed over time. Which wasn't the case: All 5 topics are relevant all the time.

semi-supervised topics

  1. Also, I used full unsupervised Topic modelling technique to explore more relevant topics I could miss in the first analysis. This analysis showed again that distance to town the swimming pool and the staff, specially from front desk, were the most important, but also:
    • The restaurant and room service
    • Most rooms are fully equipped apartments with clean and spacious rooms with kitchen and laundry
    • The latest is important for families with kids, it is likely the main type of customer
    • Also the hotel configuration and the different types of buildings
    • Atmosphere: luxury and tropical
    • Other surrounding attractions like the Daintree and the Coral Reef

unsupervised_topics

  1. Then, I applied Sentiment Analysis, to score how positive or negative a comment was by its content, and realised that Housekeeping has the least positive sentiment. While the Front desk was mostly positive.

Sentiment_score

  1. Finally, I used Signal Decomposition over the sentiment score through time:
    • Seasonality creates pressure over both Food and beverage and Housekeeping areas.
    • Environmental and infrastructure factors may need renewal as its novelty use decay over time, as shown by its declining trend.
    • Because rooms are functional apartments with independent access, some rooms are privately owned and rented through other media such AirBnB. And those may not include services from the hotel management and may have separate housekeeping and other services. Those can impact the comments score as more and more rooms are being sold to private owners.

Sentiment_score Sentiment_score

Author

Business Problem

Pullman Sea temple Resort constantly check online reviews to improve service. However, it is difficult to have a systematical view of the text content of such reviews, specially to compare evolution and trends.

Tech Stack

Anaconda Python Jupyter Notebook Matplotlib Pandas NumPy scikit-learn

  • Python 3.8
  • Jupyter notebook
  • Spacy
  • Gensim 3.8.3
  • pyVisLDA
  • Pandas
  • Numpy
  • NLTK
  • Scklearn
  • Scrapy

Run locally

Repository structure

Install requirements

pip install requirements.txt

Run Scrapy script

to run scrapy script go to /tripullman and run

scrapy crawl pull -o test.csv

Run Report (Jupyter Notebook)

Open the file using VSCODE and jupyter notebook plugin OR

alternative open the terminal in the main folder and run

jupyter notebook