Overview

'80% of time in data science and analysis is spent data cleaning'

This includes:
• Loading multiple sources of data
• Consolidating data for analysis
• Reshaping and joining datasets
• Dealing with missing values, duplicates and outliers
• Cleaning strings

Why is data cleaning important?

• Estimated $3 trillion US GDP lost in 2016 - IBM
• 1 in 3 business leaders did not trust the data sources used in decision-making

'Garbage in leads to garbage out'

You get to know your data:

• Understanding data through inital cleaning and exploration
• Reduces the risk of incorrect assumptions
• Raises relevant questions
• Discovery of issues such as biases in data collection
• Opportunities to problem solve for unique datasets
• Setup to extract additional insight
• Setup to emphasis particular questions

Project overview

Tasks

This project centres around cleaning six dirty datasets [Folder]

• Task 1 - Decathlon Events [Analysis]
• Task 2 - Cake Ingredients [Analysis]
• Task 3 - Seabirds Spottings [Analysis]
• Task 4 - Sweeties Survey [Analysis]
• Task 5 - Right Wing Authoritarianism [Analysis]
• Task 6 - Dogs [Analysis]

Format

Each solution includes:
• Cleaning script
• Commentary, assumptions and process
• Answers to questions

Required libraries

• here
• janitor
• readr
• tidyverse

Folder structure

raw_data
data_cleaning_scripts
clean_data
documentation_and_analysis

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
task1		task1
task2		task2
task3		task3
task4		task4
task5		task5
task6		task6
.gitattributes		.gitattributes
.gitignore		.gitignore
README.html		README.html
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Why is data cleaning important?

You get to know your data:

Project overview

Tasks

Format

Required libraries

Folder structure

About

Releases

Packages

Languages

ThisIsJohnnyLau/dirty_data_project

Folders and files

Latest commit

History

Repository files navigation

Overview

Why is data cleaning important?

You get to know your data:

Project overview

Tasks

Format

Required libraries

Folder structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages