Adding New Datasets

Overview

Data USA is built around four major types of data: geographies, occupations, industries and educational courses. In order to integrate new data with existing data in the platform, it is important that the data are linkable. Below are details on how each one of the four major data types are structured and how they should be structured for any new data source.

Geographies

Geographic identifiers are special strings that denote different levels of the US geographic hierarchy. Here are the currently supported identifiers for geographies in Data USA:

Description	Code Format	Notes
Nation (United States)	01000US
State	04000USXX	Where XX is a 2-digit FIPS code
County	05000USXXAAA	Where XX is a 2-digit FIPS state code and AAA is a 3-digit county code
Place	1600USXXBBBBB	Where XX is a 2-digit FIPS state code and BBBBB is a 5-digit place code
Metropolitan Statistical Area	3100USCCCCC	Where CCCCC is a 5-digit MSA code
Tract	14000USXXAAADDDDDD	Where XX is a 2-digit FIPS state code and AAA is a 3-digit county code and DDDDD is a 5-digit tract code

By convention, all geography columns should be text fields named geo.

Industries (NAICS)

Data USA is primarily built around the PUMS NAICS. For a full listing of all PUMS NAICS codes visit the attribute list at https://api.datausa.io/attrs/naics/. As a secondary option, data sets may also use BLS NAICS codes provided as an attribute list at https://api.datausa.io/attrs/bls_naics/.

For a dataset to work appropriately, it should either be completely contained by the list of PUMS NAICS or completely contained by the list of BLS NAICS codes (mixing the two lists is considered invalid). Every row of data in a new source should correspond to a valid NAICS code (either PUMS or BLS) to be considered valid data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding New Datasets

Overview

Geographies

Industries (NAICS)

Clone this wiki locally