Skip to content

Starbucks project shows how I utilized the ETL process using python pandas and matplotlib.

Notifications You must be signed in to change notification settings

hej6853/Starbucks_People_Analytics

Repository files navigation

image

With nearly 350,000 employees across 30,000 retail locations, Starbucks is one of the largest multinational chains of coffeehouses in the world. One of the key resources is their employees at Starbucks. Starbucks calls their employees partners because employees are all partners in shared success. Part of the Starbucks experience is walking into a store and being greeted by great employees that know your name and your favorite drink. In other words, the longer a partner works at Starbucks, the more relationships they build and experiences they contribute to which translates into better customer experiences, increased competitive advantage, and greater customer lifetime value.

image

Motivation

image

Our question to resolve throught this project is "how can Starbucks predict when high-value employees are at risk of leaving, so that steps can be taken to minimize turnover?"

Starbucks has a relatively high turnover rate of 65 percent for full-time partners. It costs as much as 33% of a worker's annual salary to replace. If we assume this statistic holds true for Starbucks, employee turnover could be costing them approcimately $2 billion per year and reduce this by just 0.1%, it could mean saving of $ 2 million per year.

Initial Dataset

image

Data Preprocessing

The dataset I received was a time-series format. Time series analysis suffers from a number of weaknesses, including problems with generalization from a single study, difficulty in obtaining appropriate measures, and problems with accurately identifying the correct model to represent the data. Therefore, I transformed this dataset to independent of the observations format data frame. Therefore, I created an independent observations data frame with Python--Pandas that is transformable from time-series data including over 100M + rows through ETL process and extracted the 6,100 talented employees’ data.

image image

Data Exploration

image

image

Build with

  • Python Pandas
  • Matplotlib (data exploration and visualization)

Key Skills Learned

  • Machine Learning - Logistic Regression
  • Data Extract, Transform, Load
  • Matplotlib data exploration and visualization

Supervised Learning _ Rogistic Regression

image

Conclusion

image image

  • Identified talentied partners who work more than 1.09 years with Starbucks and stay with the one position for more than 0.83 years.
  • Developed a supervised machine learning model with 98% of accuracy in predicting when the employees are about to leave or stay and derived a cost analysis outcome that can save $1,220 for each employee.

License

** the original dataset is not included due to NDA. © hej6853

About

Starbucks project shows how I utilized the ETL process using python pandas and matplotlib.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published