Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

Join API

Dave Landry edited this page May 10, 2017 · 17 revisions

Contents

Let's breakdown an example join API data call:

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo

In its most basic form, join API calls required two parameters: a show variable and a required variable. In this example we are showing "geo" (aka geographic locations) across all sumlevels (in the join API by default if you do not specify sumlevels it will default to "all"). Note that in the results not all variables are available at all sumlevels across tables.

To see only county-level data we could do:

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&sumlevel=county
parameter accepted values description
limit integer Limits the number of rows returned by the query. (Note that the default is 100 rows)
order any available column name Column name to use for ordering the resulting data array.
show (required) any available attribute A comma-separated list of attributes to show in the query.
sort desc or asc Changes the sort order of the returned data array.
sumlevel any available sumlevel for the given attribute This restricts the data fetched to only display the specified sumlevel(s). If more than one "show" attribute is specified, sumlevel must be a comma-separated list with a value for each attribute. (default is "all" levels)
required (required) any available column name A comma-separated list of column names to be returned in the query.
where see documentation Advanced filtering of columns, similar to the WHERE clause on SQL.
year latest, oldest, all, 4-digit year Filters the returned data to the given year.
display_names 0 or 1 Show attribute names in response (default 0, disabled)
auto_crosswalk 0 or 1 Perform an advanced auto-crosswalk join (default 0, disabled)

Previously we showed an example Join API call.

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo

You can also filter the results to only show data from the latest year from each table:

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&year=latest

This will pull the latest data available for each variable.

If you wanted to filter based on associated information (say, you didn't only want the latest year) you can also use the .: syntax as a filter in the where clause. For example:

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&where=income.year:2013,adult_obesity.year:2016

This query selects the data looking at where income in 2013 and adult obesity in 2016. For a full list of table variables see the variables endpoint.

By default, joins between tables (with the exception of years columns) look for exact matches. So if one dataset has data only at the county level and another has data only the place level, you would not be able to join counties and places. There are instances where you may want to do something like this. For these instances you may use the "auto_crosswalk=1" flag. Here is are examples of the auto-crosswalking API.

https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&auto_crosswalk=1&where=income.sumlevel:place,adult_obesity.sumlevel:county

To break this down: auto_crosswalk=1 enables the auto crosswalking feature. By default when crosswalking geographies, the crosswalk will produce a cross product join for every related geographic level. To filter the sumlevels of particular variables, we use the special <variable>.sumlevel:<level> syntax. In this case we restrict the income sumlevel to places and adult_obesity to counties.