Overview

Demographic data by ZIP code can be useful in getting a better understanding of your members and/or user base. Are your members from urban or rural areas? How much money do they likely make? How educated are they? Where do they fall as far as demographics go?

The Census Bureau American FactFinder is a free source of comprehensive and detailed data. Unfortunately, it’s not particularly easy to use in its raw format.

We’ve pulled, cleaned, consolidated, and reformatted the Census Bureau data. The enclosed tables (.txt format) should be easily uploadable into any database. These tables are:

social.txt (Census Bureau Tables DP02 and DP02PR)
econ.txt (Census Bureau Table DP03)
housing.txt (Census Bureau Table DP04)
demo.txt (Census Bureau Table DP05)
rural_urban.txt (Census Bureau Table P2)

What is a ZCTA5?

None of these tables actually have a field called “ZIP code.” Rather, they have a field called “ZCTA5”. A ZCTA5 (or ZCTA) can be thought of as interchangeable with a zip code given following caveats:

There are no ZCTAs for PO Box ZIP codes - this means that for 42,000 US ZIP Codes there are 32,000 ZCTAs.
ZCTAs, which stand for Zip Code Tabulation Areas, are based on zip codes but don’t necessarily follow exact zip code boundaries. If you would like to read more about ZCTAs, please refer to this link. The Census Bureau also provides an animation that shows how ZCTAs are formed.

How can I load this data into tables on AWS Redshift?

To load this data into tables on AWS Redshift, we first stored it in an Amazon S3 bucket, and then used a COPY command to load the data.

COPY 'AWS Redshift destination'
FROM 'S3 Bucket source'
ACCESS_KEY_ID ''
SECRET_ACCESS_KEY ''
DELIMITER '\t'
IGNOREHEADER 1;

How can I load this data into Pandas?

import pandas as pd
pd.read_csv('filename', sep = '\t')

How current is this data?

We used the 5-year estimates as opposed to the 1-year estimates because although they are less current, they are the most reliable and precise, containing data for all ZCTA's. If you would like to get a better understanding of the differences between the 1-year, 3-year, and 5-year estimates, please refer to this link. For more general information on the ACS estimate files, please refer to this handbook.

The rural/urban summary calculation is only done once every decade, so that data is only as recent as 2010.

General column naming rules

The Census Bureau column names aren't particularly easy to use for data analysis and don't necessarily maintain all the information for that field (such as what denominator is used for percent fields), so we reformatted them while attempting to keep the column names as detailed as possible.

Format:
For percent fields: category-percent-numerator-of-denominator
For estimates fields: category-unit-numerator_denominator

Examples:
Estimate; SEX AND AGE - 20 to 24 years -> sex_and_age-population-20_to_24_years_total_population
Percent; ROOMS - Total housing units - 1 room -> rooms-percent-1_room-of-total_housing_units

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.txt		demo.txt
econ.txt		econ.txt
housing.txt		housing.txt
rural_urban.txt		rural_urban.txt
social.txt		social.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

What is a ZCTA5?

How can I load this data into tables on AWS Redshift?

How can I load this data into Pandas?

How current is this data?

General column naming rules

About

Releases

Packages

License

Ro-Data/Ro-Census-Summaries-By-Zipcode

Folders and files

Latest commit

History

Repository files navigation

Overview

What is a ZCTA5?

How can I load this data into tables on AWS Redshift?

How can I load this data into Pandas?

How current is this data?

General column naming rules

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages