Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 2.89 KB

Project1_desc.md

File metadata and controls

35 lines (28 loc) · 2.89 KB

STAT GU4243/GR5243 Fall 2016 Applied Data Science

Project 1 Analysis of American Community Surveys

In this project, we will create a data analysis using the 2013 and 2014 American Community Survey (ACS). Data files can be downloaded from this dropbox folder.

There is a Kaggle kernel challenge for ACS 2013 data hosted on the data science competition website Kaggle, which motivated Project 1 of ADS last semester. Both the Kaggle challenge and our project 1 intend to encourage reproducible data analysis.

Challenge

For this project, I invite you to browse scripts on Kaggle and the analysis done by the teams from Spring 2016 and find inspiration to prepare an R Notebook on the data from ACS 2013 and 2014.

For presentation, the team should present the story and results from their R Notebook uploaded to GitHub.

Suggested team workflow
  1. Download the data and read the data documentations (e.g., such as "what is allocation flag?").
  2. Understand how to use survey weights.
  3. Team members browse scripts and old projects independently.
  4. Members share their favorite examples and explain the reasons why they like them such as interesting topics or cool codes.
  5. Based on a subset of scripts that members regard as top scripts, members brainstorm about what they would like to do.
    • "Which script makes you wonder about something not shown in the data?"
    • "Do you have thoughts on expand certain analysis?"
  6. It is ok to have 2-3 leads to explore but it is better to converge on a single topic.
  7. Setup a GitHub project folder with everyone listed as contributor. Everyone clones the project locally and create a local branch.
  8. The team can work with subgroups of 2-3 work together more frequently than the entire team. However, everyone should check in regularly on group discussion online and changes in the GitHub folder.
Useful resources