Skip to content

Analyzing real statistics to answer business question points

Notifications You must be signed in to change notification settings

shkhryr/data-science-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-Tasks

I had sales data of each month of year 2019.

I started by cleaning data. Tasks during this section include:

Drop NaN values from DataFrame
Removing rows based on a condition
Change the type of columns (to_numeric, to_datetime, astype)
Add necessary columns
Once I have cleaned up our data a bit, I move the data exploration section. 

In this section I explore 5 high level business questions related to data:

1. What was the best month for sales? How much was earned that month?
2. What city sold the most product?
3. What city sold the most product?
4. What time should we display advertisemens to maximize the likelihood of customer’s buying product?
5. What products are most often sold together?
6. What product sold the most? Why do you think it sold the most?

To answer these questions I walk through many different pandas & matplotlib methods. They include:

Concatenating multiple csvs together to create a new DataFrame (pd.concat)
Adding columns
Parsing cells as strings to make new columns (.str)
Using the .apply() method
Using groupby to perform aggregate analysis
Plotting bar charts and lines graphs to visualize our results
Labeling our graphs

Sample from the code with graph: alt Sample from the code with grapth

About

Analyzing real statistics to answer business question points

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published