In this project, we will create a data analysis using the 2013 and 2014 American Community Survey (ACS). Data files can be downloaded from this dropbox folder.
There is a Kaggle kernel challenge for ACS 2013 data hosted on the data science competition website Kaggle, which motivated Project 1 of ADS last semester. Both the Kaggle challenge and our project 1 intend to encourage reproducible data analysis.
For this project, I invite you to browse scripts on Kaggle and the analysis done by the teams from Spring 2016 and find inspiration to prepare an R Notebook on the data from ACS 2013 and 2014.
For presentation, the team should present the story and results from their R Notebook uploaded to GitHub.
- Download the data and read the data documentations (e.g., such as "what is allocation flag?").
- Understand how to use survey weights.
- Team members browse scripts and old projects independently.
- Members share their favorite examples and explain the reasons why they like them such as interesting topics or cool codes.
- Based on a subset of scripts that members regard as top scripts, members brainstorm about what they would like to do.
- "Which script makes you wonder about something not shown in the data?"
- "Do you have thoughts on expand certain analysis?"
- It is ok to have 2-3 leads to explore but it is better to converge on a single topic.
- Setup a GitHub project folder with everyone listed as contributor. Everyone clones the project locally and create a local branch.
- The team can work with subgroups of 2-3 work together more frequently than the entire team. However, everyone should check in regularly on group discussion online and changes in the GitHub folder.
- R dplyr package
- R readr package
- R DT package
- R data.table
- R testthat package
- A brief guide to git.
- Putting your project on GitHub.
- Rcharts, quick interactive plots
- htmlwidgets, javascript library adaptation in R.