-
Notifications
You must be signed in to change notification settings - Fork 42
Join API
Let's breakdown an example join API data call:
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo
In its most basic form, join API calls required two parameters: a show variable and a required variable. In this example we are showing "geo" (aka geographic locations) across all sumlevels (in the join API by default if you do not specify sumlevels it will default to "all"). Note that in the results not all variables are available at all sumlevels across tables.
To see only county-level data we could do:
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&sumlevel=county
parameter | accepted values | description |
---|---|---|
limit | integer | Limits the number of rows returned by the query. (Note that the default is 100 rows) |
order | any available column name | Column name to use for ordering the resulting data array. |
show (required) | any available [attribute](Attribute API) | A comma-separated list of attributes to show in the query. |
sort |
desc or asc
|
Changes the sort order of the returned data array. |
sumlevel | any available sumlevel for the given attribute | This restricts the data fetched to only display the specified sumlevel(s). If more than one "show" attribute is specified, sumlevel must be a comma-separated list with a value for each attribute. (default is "all" levels) |
required (required) | any available column name | A comma-separated list of column names to be returned in the query. |
where | see documentation | Advanced filtering of columns, similar to the WHERE clause on SQL. |
year |
latest , oldest , all , 4-digit year |
Filters the returned data to the given year. |
display_names | 0 or 1 | Show attribute names in response |
auto_crosswalk | 0 or 1 | Perform an advanced auto-crosswalk join |
Previously we showed an example Join API call.
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo
You can also filter the results to only show data from the latest year from each table:
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&year=latest
This will pull the latest data available for each variable.
If you wanted to filter based on associated information (say, you didn't only want the latest year) you can also use the .: syntax as a filter in the where clause. For example:
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&where=income.year:2013,adult_obesity.year:2016
This query selects the data looking at where income in 2013 and adult obesity in 2016. For a full list of table variables see the variables endpoint.
By default, joins between tables (with the exception of years columns) look for exact matches. So if one dataset has data only at the county level and another has data only the place level, you would not be able to join counties and places. There are instances where you may want to do something like this. For these instances you may use the "auto_crosswalk=1" flag. Here is are examples of the auto-crosswalking API.
https://api.datausa.io/api/join?required=income,adult_obesity&show=geo&auto_crosswalk=1&where=income.sumlevel:place,adult_obesity.sumlevel:county
To break this down: auto_crosswalk=1 enables the auto crosswalking feature. By default when crosswalking geographies, the crosswalk will produce a cross product join for every related geographic level. To filter the sumlevels of particular variables, we use the special <variable>.sumlevel:<level> syntax. In this case we restrict the income sumlevel to places and adult_obesity to counties.