Skip to content

In Situ Data API Details (Outdated See note at top)

Benjamin Ell-Jones edited this page Oct 8, 2024 · 1 revision

NOTE: The links on this page are now outdated as new docs have been released. See here: https://docs.openaq.org/about/about

According to the challenge details, the in situ air quality measurement data is available from the ECMWF, or via OpenAQ API for additional locations.

OpenAQ API

OpenAQ is an aggregator of many sources of air quality data. Docs for using their APIs can be found here, however lacks a lot of detail regarding what parameters represent or required formats.

Within the v2 APIs there are various GET requests that provide different data, the most useful for our purposes may be get measurements as it allows a specified date range and we have a requirement to track the data over a 5 day period. Using the other v2 APIs it seems to be possible to get more information about the possible parameters.

Get measurements

Request Parameters

Many of the parameters listed here state they can take more than one option, not all of these permutations have been tested to see what they do. Having experimented with date_from and date_to it could be that having it more than once allows you to return distinct groups of data but this needs more exploration. There is a lot of information including default values detailed in the swagger.

Parameters with a clear relevant use case for our work:

country_id (Limit results by a certain country using two digit country ID) - unclear how to get list of these

  • date_from
  • takes URL encoded format of YYYY-MM-DDThh%3Amm%3Ass
  • eg for a date of 2024-04-18T00:00:00 use 2024-04-19T00%3A00%3A00
  • if you add the parameter more than once you can specify other date ranges to return as well
  • date_to - see date_from

  • limit - “Change the number of results returned. e.g. limit=1000 will return up to 1000 results”

  • page - “Paginate through results. e.g. page=1 will return first page of results”

  • sort - takes values: desc / asc

  • parameter_id - parameter_id can be retrieved using get parameters request

  • parameter - “(optional) A parameter name or ID by which to filter measurement results. e.g. parameter=pm25 or parameter=pm25¶meter=pm10”

  • coordinates - “Coordinate pair in form lat,lng. Up to 8 decimal points of precision e.g. 38.907,-77.037”

  • radius - “Search radius from coordinates as center in meters. Maximum of 25,000 (25km) defaults to 1000 (1km) e.g. radius=10000”

  • country - Limit results by a certain country using two letter country code. e.g. ?country=US or ?country=US&country=MX

list of countries with their codes available via get countries request

  • city - Limit results by a certain city or cities. (e.g. ?city=Chicago or ?city=Chicago&city=Boston)

can be retrieved using get locations request, appears to be the ‘city’ field

  • location_id - location_id can be retrieved using get locations request, appears to be the ‘id’ field

  • location - Limit results by the location key name (can be retrieved using get locations request, appears to be the ‘name’ field

  • order_by - takes values: city / country / location / datetime

Parameters with an unclear use case for our work at this time:

  • format, offset, unit, project, value_from, value_to - Unclear which options these take

  • has_geo - takes options blank, true or false (unclear what these represent)

  • is_mobile - Location is mobile e.g. ?isMobile=true (takes values: true / false)

  • is_analysis - Data is the product of a previous analysis/aggregation and not raw measurements e.g. ?isAnalysis=false (takes values: true / false)

  • entity - takes values: government / community / research

  • sensor_type - takes values: reference grade (reference%20grade)/ low-cost sensor (low-cost%20sensor)

  • include_fields - Additional fields to include in response e.g. ?include_fields=sourceName

Response

The Swagger shows a successful response as a 200 response returning a JSON with data, or if there is a validation error it should throw a 422 with a JSON response.

Sample Python code to make the GET request with parameters to get measurements endpoint, and print the response to the terminal.

import json
import requests

endpoint = "https://api.openaq.org/v2/measurements"
date_from = "2024-04-20T00%3A00%3A00Z"
date_to = "2024-04-21T00%3A00%3A00Z"
limit = "3000"
page = "1"
offset = "0"
sort = "desc"
coordinates = "51.522%2C-0.0421"
order_by = "datetime"
radius = "5000"

def test_get_measurements_within_5km_of_coordinates_and_print_response():
    url = ("{}?date_from={}&date_to={}&limit={}&page={}&offset={}&sort={}&coordinates={}&order_by={}&radius={}"
           .format(endpoint, date_from, date_to, limit, page, offset, sort, coordinates, order_by, radius))

    payload = {}
    headers = {
    "accept": "application/json"
    }

try:
    response = requests.request("GET", url, headers=headers, data=payload)
    json_str = json.dumps(response.json(), indent=4)
    print(json_str)
except Exception as e:
    print("An error occurred:", e)

Of Note

API documentation

This is mixed in depth of detail, as listed above there are several parameters that it is not clear what they do, we should request more information about these

Localization

Some “location” / “city” values are in non-Latin-based languages (eg "location": "金門縣 - 金門")

Filtering by area

  • It will be necessary to filter the results of the Get measurements API by location, this can be done several ways, including by using the parameter city.
  • However, you need to know the precise value to return the correct locations
  • using London returns at time of writing 19 locations
  • The town of “Southend-on-Sea” is within this grouping, yet is 42 miles from the Westminster coordinates using City of London returns 6 more locations
  • there could be other values for city that return more - how do we have confidence that we know what locations are in London?
  • This will be more difficult with locations we are not familiar with
  • “PARIS 2E ARRONDISSEMENT” is listed as a city (the coordinates correlate with Paris), nut you won’t find this by filtering by Paris

It may be more effective for our purposes to use the coordinates and radius functionality to get data from an area

Measurements

  • In the ECMWF AQI documentation it says The overall hourly European Air Quality index is simply defined as the highest value of the five individual pollutants indexes computed for the same hour...The overall daily European Air Quality index is the highest value of the overall hourly European Air Quality index in the corresponding day”

  • Using the Get Locations and Get Parameters APIs it is clear that there is variability in the number of pollutants that different locations report on

  • London Westminster (locationId 159) reports on pm25, o3 and no2
  • City of London - Guildhall (locationId 304892) reports on only o3
  • We may need to consider how we select data to compute in situ AQI in order to avoid under-representing the AQI
  • e.g. If a location only reports o3 levels, and an AQI of 2 is calculated from the data, while the actual AQI is actually a much higher index but the location lacks the means to detect it
  • In order to track the 5 day accuracy of the forecasting for all pollutants, we will need to ensure that if we use a subset of locations we are able to capture all pollutants required.

vAirify Wiki

Home

Getting Started and Overview

Investigations and Notebooks

Testing

Manual Test Charters

Clone this wiki locally