-
Notifications
You must be signed in to change notification settings - Fork 1
In Situ Data API Details (Outdated See note at top)
NOTE: The links on this page are now outdated as new docs have been released. See here: https://docs.openaq.org/about/about
According to the challenge details, the in situ air quality measurement data is available from the ECMWF, or via OpenAQ API for additional locations.
OpenAQ is an aggregator of many sources of air quality data. Docs for using their APIs can be found here, however lacks a lot of detail regarding what parameters represent or required formats.
Within the v2 APIs there are various GET requests that provide different data, the most useful for our purposes may be get measurements as it allows a specified date range and we have a requirement to track the data over a 5 day period. Using the other v2 APIs it seems to be possible to get more information about the possible parameters.
Many of the parameters listed here state they can take more than one option, not all of these permutations have been tested to see what they do. Having experimented with date_from and date_to it could be that having it more than once allows you to return distinct groups of data but this needs more exploration. There is a lot of information including default values detailed in the swagger.
Parameters with a clear relevant use case for our work:
country_id (Limit results by a certain country using two digit country ID) - unclear how to get list of these
- date_from
- takes URL encoded format of YYYY-MM-DDThh%3Amm%3Ass
- eg for a date of 2024-04-18T00:00:00 use 2024-04-19T00%3A00%3A00
- if you add the parameter more than once you can specify other date ranges to return as well
-
date_to - see date_from
-
limit - “Change the number of results returned. e.g. limit=1000 will return up to 1000 results”
-
page - “Paginate through results. e.g. page=1 will return first page of results”
-
sort - takes values: desc / asc
-
parameter_id - parameter_id can be retrieved using get parameters request
-
parameter - “(optional) A parameter name or ID by which to filter measurement results. e.g. parameter=pm25 or parameter=pm25¶meter=pm10”
-
coordinates - “Coordinate pair in form lat,lng. Up to 8 decimal points of precision e.g. 38.907,-77.037”
-
radius - “Search radius from coordinates as center in meters. Maximum of 25,000 (25km) defaults to 1000 (1km) e.g. radius=10000”
-
country - Limit results by a certain country using two letter country code. e.g. ?country=US or ?country=US&country=MX
list of countries with their codes available via get countries request
- city - Limit results by a certain city or cities. (e.g. ?city=Chicago or ?city=Chicago&city=Boston)
can be retrieved using get locations request, appears to be the ‘city’ field
-
location_id - location_id can be retrieved using get locations request, appears to be the ‘id’ field
-
location - Limit results by the location key name (can be retrieved using get locations request, appears to be the ‘name’ field
-
order_by - takes values: city / country / location / datetime
Parameters with an unclear use case for our work at this time:
-
format, offset, unit, project, value_from, value_to - Unclear which options these take
-
has_geo - takes options blank, true or false (unclear what these represent)
-
is_mobile - Location is mobile e.g. ?isMobile=true (takes values: true / false)
-
is_analysis - Data is the product of a previous analysis/aggregation and not raw measurements e.g. ?isAnalysis=false (takes values: true / false)
-
entity - takes values: government / community / research
-
sensor_type - takes values: reference grade (reference%20grade)/ low-cost sensor (low-cost%20sensor)
-
include_fields - Additional fields to include in response e.g. ?include_fields=sourceName
The Swagger shows a successful response as a 200 response returning a JSON with data, or if there is a validation error it should throw a 422 with a JSON response.
Sample Python code to make the GET request with parameters to get measurements endpoint, and print the response to the terminal.
import json
import requests
endpoint = "https://api.openaq.org/v2/measurements"
date_from = "2024-04-20T00%3A00%3A00Z"
date_to = "2024-04-21T00%3A00%3A00Z"
limit = "3000"
page = "1"
offset = "0"
sort = "desc"
coordinates = "51.522%2C-0.0421"
order_by = "datetime"
radius = "5000"
def test_get_measurements_within_5km_of_coordinates_and_print_response():
url = ("{}?date_from={}&date_to={}&limit={}&page={}&offset={}&sort={}&coordinates={}&order_by={}&radius={}"
.format(endpoint, date_from, date_to, limit, page, offset, sort, coordinates, order_by, radius))
payload = {}
headers = {
"accept": "application/json"
}
try:
response = requests.request("GET", url, headers=headers, data=payload)
json_str = json.dumps(response.json(), indent=4)
print(json_str)
except Exception as e:
print("An error occurred:", e)
This is mixed in depth of detail, as listed above there are several parameters that it is not clear what they do, we should request more information about these
Localization
Some “location” / “city” values are in non-Latin-based languages (eg "location": "金門縣 - 金門")
- It will be necessary to filter the results of the Get measurements API by location, this can be done several ways, including by using the parameter city.
- However, you need to know the precise value to return the correct locations
- using London returns at time of writing 19 locations
- The town of “Southend-on-Sea” is within this grouping, yet is 42 miles from the Westminster coordinates using City of London returns 6 more locations
- there could be other values for city that return more - how do we have confidence that we know what locations are in London?
- This will be more difficult with locations we are not familiar with
- “PARIS 2E ARRONDISSEMENT” is listed as a city (the coordinates correlate with Paris), nut you won’t find this by filtering by Paris
It may be more effective for our purposes to use the coordinates and radius functionality to get data from an area
-
In the ECMWF AQI documentation it says The overall hourly European Air Quality index is simply defined as the highest value of the five individual pollutants indexes computed for the same hour...The overall daily European Air Quality index is the highest value of the overall hourly European Air Quality index in the corresponding day”
-
Using the Get Locations and Get Parameters APIs it is clear that there is variability in the number of pollutants that different locations report on
- London Westminster (locationId 159) reports on pm25, o3 and no2
- City of London - Guildhall (locationId 304892) reports on only o3
- We may need to consider how we select data to compute in situ AQI in order to avoid under-representing the AQI
- e.g. If a location only reports o3 levels, and an AQI of 2 is calculated from the data, while the actual AQI is actually a much higher index but the location lacks the means to detect it
- In order to track the 5 day accuracy of the forecasting for all pollutants, we will need to ensure that if we use a subset of locations we are able to capture all pollutants required.
Getting Started and Overview
- Product Description
- Roles and Responsibilities
- User Roles and Goals
- Architectural Design
- Iterations
- Decision Records
- Summary Page Explanation
- Deployment Guide
- Working Practices
- Q&A
Investigations and Notebooks
- CAMs Schema
- Exploratory Notebooks
- Forecast ETL Process
- In Situ air pollution data sources
- Notebook: OpenAQ data overview
- Notebook: Unit conversion
- Data Archive Considerations
Manual Test Charters
- Charter 1 (Comparing ECMWF forecast to database values)
- Charter 2 (Backend performance)
- Charter 3 (Forecast range implementation)
- Charter 4 (In situ bad data)
- Charter 5 (Filtering ppm units)
- Charter 7 (Forecast API input validation)
- Charter 8 (Forecast API database sizes)
- Charter 9 (Measurements summary API input validation)
- Charter 10 (Seeding bad data)
- Charter 11 ()Measurements API input validation
- Charter 12 (Validating echart plot accuracy)
- Charter 13 (Explore UI after data outage)
- Charter 14 (City page address)
- Charter 15 (BugFix diff 0 calculation)
- Charter 16 (City page chart data mocking)
- Charter 17 (Summary table logic)
- Charter 18 (AQI chart colour banding)
- Charter 19 (City page screen sizes)
- Charter 20 (Date picker)
- Charter 21 (Graph consistency)
- Charter 22 (High measurement values)
- Charter 23 (ppm -> µg m³)
- Charter 24 (Textures API input validation)
- Charter 25 (Graph line colours)
- Charter 26 (Fill in gaps in forecast)
- Charter 27 (Graph behaviour with mock data)
- Charter 28 (Summary table accuracy)
- Re‐execute: Charter 28
- Charter 29 (Fill in gaps in situ)
- Charter 30 (Forecast window)
- Charter 31 (UI screen sizes)