-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes.Rmd
73 lines (57 loc) · 2.77 KB
/
notes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: "Notes"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE
)
```
## Data distribution plans
### Regular/novice users
Novice versions of data can go on data.world in wide table formats. These are similar to the formatted tables in Excel files that we gave to the designers. data.world takes datasets which can be made up of multiple files. The best way to do this might be to have datasets by source or type, so one dataset of CWS data, then maybe one dataset for each chapter or set of topics. DW can take Excel-type spreadsheets, so each table's data could actually be a sheet in a workbook, just like the formatted tables.
#### Example structure
```
DataHaven on data.world
.
|-- CWS 2018 (by group and town within region)
| |-- Greater New Haven CWS
| |-- Fairfield County CWS
| |-- Greater Hartford CWS
|-- Demographics 2017 (by town and/or demographics within region)
| |-- Greater New Haven demographics (workbook)
| | |-- Population & growth (table 2A) (sheet)
| | |-- Race & immigration (table 2B) (sheet)
|-- |-- Greater Hartford demographics (workbook)
| | |-- Population & growth (table 2A) (sheet)
| | |-- Race & immigration (table 2B) (sheet)
```
For survey data, novice version for Greater New Haven would look like:
```{r}
gnh_wide <- readr::read_csv("output_data/cws/wide/gnh_pilot_profile.csv")
knitr::kable(gnh_wide[1:10, 1:6])
```
Data dictionary gets written in a separate text file and uploaded as part of dataset metadata, so that will help make sure all CWS data has the same descriptions for each indicator.
### Advanced users
Advanced use data will be in long-shaped CSV files that are more appropriate for doing data analysis. These are similar to the types of files in the index output_data folders. For now, these will just be on github, although maybe some other storage, like a cloud database or S3 bucket would make sense as well. For ACS, etc. it's probably easiest for these to have a similar folder structure to the 2019index repo, e.g.:
```
2019indexpub repo
.
|-- demographics
|-- |-- population_by_age_2017.csv
|-- |-- population_by_race_2017.csv
|-- |-- poverty_and_low_income_by_age_2017.csv
|-- |-- household_structure_2017.csv
|-- housing
|-- |-- median_housing_value_2017.csv
|-- |-- homeownership_total_and_by_race_2017.csv
|-- |-- housing_cost_burden_by_tenure_2017.csv
```
For CWS data, these will just be dumps of all survey data for a location, plus its weights in a separate file, then an overall lookup table.
Advanced version of CWS data for Greater New Haven looks like:
```{r}
gnh_long <- readr::read_csv("output_data/cws/long/greater_new_haven_2018_cws_data.csv")
knitr::kable(dplyr::filter(gnh_long[,1:5], code == "LIFECC", category == "Gender"))
```