This document should give you a general idea how to import data like places, equipment/facility infos and disruptions into the accessibility.cloud format.
If you know how to import data already and need a reference of the whole exchange format's schema, go here.
Nobody should have to change their data format to import it as accessibility.cloud source, so to make the import flow as flexible and efficient as possible, we separate the process into small, customizable processing units that you can link together in a data flow.
Here is an exemplary import flow definition to download and process GeoJSON data from a web API. Currently you have to write import processes in JSON, but we plan to have a UI for this.
You can start by copy & pasting this definition into the text area in the 'Format' tab of your source and click the 'Start an import' button to download and process the data. Each object in the array corresponds to one stream processing unit (see below for an explanation what that means):
[
{
"type": "HTTPDownload",
"comment": "Downloads public toilet data from Vienna's open data website as JSON.",
"parameters": {
"sourceUrl": "http://data.wien.gv.at/daten/geo?service=WFS&request=GetFeature&version=1.1.0&typeName=ogdwien:WCANLAGEOGD&srsName=EPSG:4326&outputFormat=json"
}
},
{
"type": "ParseJSONStream",
"comment": "Extract all GeoJSON features from the JSON.",
"parameters": {
"path": "features.*",
"lengthPath": "totalFeatures"
}
},
{
"type": "DebugLog",
"comment": "Output the extracted data for inspecting."
},
{
"type": "TransformData",
"comment": "Transform the data into accessibility.cloud format.",
"parameters": {
"mappings": {
"properties-originalId": "d.id",
"properties-category": "'toilets'",
"properties-name": "'Public Toilet'",
"geometry": "d.geometry",
"properties-address": "d.properties['STRASSE'] + ', Bezirk ' + d.properties['BEZIRK'] + ', Vienna, Austria'",
"properties-accessibility-accessibleWith-wheelchair": "d.properties['KATEGORIE'].includes('Behindertenkabine')"
}
}
},
{
"type": "DebugLog",
"skip": true
},
{
"type": "UpsertPlace",
"comment": "Save each transformed data record in the database."
}
]
Stream processing units have inputs and outputs. They transform chunks of binary data or JavaScript objects into new chunks of binary data or JavaScript objects.
Each unit's JSON object definition has the following properties:
type
: Stream unit type, asString
, for instance"DebugLog"
(all unit types are explained below).comment
(optional): Shown in the import results. This is useful for explaining others how your stream chain works. Shown in the results on import.parameters
(optional): Allows to supply parameters to the unit.skip
(optional): Connect output of the previous unit to the following unit of this stream unit, skipping this unit. Helpful for debugging or drafting stream chains.
We support the following stream processing units:
Gets arrays from the input, takes the array elements and outputs them as single objects.
path
(optional): Set this to a key path to allow processing JavaScript objects that wrap arrays, instead of processing arrays directly. If you use this parameter, you must supply a key path (in dot-notation) where to find the array in each processed object. If you omit this parameter, input chunks must be arrays, otherwise objects.
Converts downloaded data into UTF8 (the format of the database). Input formats can be strings like ASCII
, utf16
, ISO-8859-1
.
fromCharset
: name of the character set used on the input.
Displays the first and the last chunk read from input in the UI.
Filters incoming JavaScript objects by truthiness of a property, or by equality of a property to a given value. Outputs matching objects, discards all other objects.
path
: key path of the property in incoming objects to match when filtering, in dot-notation.negate
(optional): Inverts the filtering: discards matching objects, and outputs only objects that do not match the filter.expectedValue
(optional): If given, the stream compares this value with the property value at the givenpath
in each input object. The filter counts the object as a match if both values are equal.
Starts a single HTTP GET request to downloads data from a given URL.
sourceUrl
: URL from which the data should be downloaded. Should start withhttp
orhttps
.gzip
: Boolean to switch gzip/deflate support on or off.true
by default.header
: headers to send with the request, as plain JS object.
Reads chunks and outputs them without changing them. Stops output after given number of chunks.
limit
: Maximal number of output chunks (Default:3
)
Pipe a stream of multiple strings into this unit to generate parallel HTTP requests. Starts parallel HTTP GET requests for each input chunk. The stream unit generates the URL for each request from each input data chunk.
sourceUrl
: URL to download the data from. Should start withhttp
orhttps
. If you insert the string{{inputData}}
into the URL, the unit replaces it with what it reads from the input.maximalErrorRatio
: If you make a lot of requests to an API, it can be a normal case that some of them fail without making the whole result invalid, for example if the source's underlying database contains invalid records. Use this parameter to specify the ratio between erroneous and valid obtained responses before you regard the whole stream's result as erroneous. When reaching the threshold, the stream stops processing. Note that if the ratio is less than 1.0 and the first response contains an error, the stream fails. Default is0.25
(= 25%).gzip
: Boolean to switch gzip/deflate support on or off.true
by default.header
: Additional headers for the request, as object.allowedStatusCodes
: Array of HTTP status code numbers that you regard as successful. Default is[200]
.maximalConcurrency
: Specifies maximal number of parallel requests at the same time. When reaching this concurrency level, the stream pauses reading from the previous stream unit. Default is3
.
Parses a CSV stream and outputs JSON strings for each line, optionally reading the header and using it for the property names. We use the FastCSV module for this. Note that you have to use ParseJSONChunks after this module to convert the JSON strings into actual JavaScript objects before further processing.
An example block looks like this....
{
"type": "ParseCSVStream",
"parameters": {
"headers": true,
"objectMode": true
}
},
objectMode
: Ensure that data events have an object emitted rather than the stringified version set to false to have a stringified buffer (Default:true
)headers
: Set to true if you expect the first line of your CSV to contain headers, alternatly you can specify an array of headers to use. You can also specify a sparse array to omit some of the columns. (Default:false
)ignoreEmpty
: If you wish to ignore empty rows (Default:false
)delimiter
: If your data uses an alternate delimiter such as";"
or"\t"
. (Default:","
)
Reads each incoming chunk as JSON string and converts it to a JavaScript objects. This can be useful for the common case that the input data consists of multiple JSON strings delimited by newlines (or other characters).
Reads one single JSON string as a stream, scraping all JSON objects or values matching a given path
parameter and returning them as JavaScript objects.
path
: Path of the resulting objects given in JSONPath format, for instancefeatures.*.properties
. To parse the JSON, we use the JSONStream library.lengthPath
: Path to aNumber
value that determines the expected number of stream objects for showing progress in accessibility.cloud’s IDE.
Skips a given number of input chunks, then outputs the rest without changes.
skip
: Number of chunks to skip (Default:0
)
Splits the incoming string into chunks that can be processed as objects, using a given delimiter.
Transforms given JSON objects into the accessibility.cloud format using mappings and JavaScript.
ignoreSkippedPlaces
: Iftrue
, places without original id are not regarded as errors. Iffalse
(default), mark the import as erroneous when encountering a place without original id.mappings
: A JSON object that contains mappings. Keys are target property names according to the accessibility.cloud specification, values are JavaScript expression strings. Inside the expression, you can access the input data record using the predefined JavaScript variabled
. We support ES2015 here.
You can use JavaScript functions to convert from your original data into the final format. Note that each POI you import should at least have these properties:
geometry
— so accessibility.cloud returns our POIs in location-based search queries. This property follows the GeoJSON standard specification. It's best to supply aPoint
here, but we support other geometry types as well. If your API supplies a WGS84 longitude and latitude as single fields, you have to convert the coordinates to a GeoJSON point structure (see below for an example)properties-name
— The name of the imported place.properties-originalId
— to overwrite existing POIs on re-import. This should be an ID string based on your own API's unique POI identifier attribute.properties-accessibility-accessibleWith-wheelchair
— a Boolean value to mark the POI as accessible with a wheelchair in queries.properties-accessibility-accessibleWith-guideDog
— a Boolean value to mark the POI as accessible with a guide dog in queries.properties-accessibility-accessibleWith-limitedSight
— a Boolean value to mark the POI as accessible with a limited sight in queries.
Note that for properties, you can use key paths to generate nested objects. Path portions are split with -
dashes. If a part of path doesn't exist, it's created. A path portion can be numeric, in this case we regard the child as an array. The transform step creates arrays for missing index properties while creating objects for all other missing properties.
Here is an example:
{
"mappings": {
"geometry": "{ type: 'Point', coordinates: [Number(d['lon']), Number(d['lat'])] }",
"properties-originalId": "''+d.id",
"properties-name": "helpers.OSM.fetchNameFromTags(d.tags)",
"properties-accessibility-accessibleWith-wheelchair": "d.tags['wheelchair'] == 'yes'"
}
}
Transforms each input chunk or object into something else, using a JavaScript expression.
javascript
: JavaScript expression likeArray.from({length: d}, (v, k) => k + 1)
. Supports ES2015.d
is predefined as the input chunk/object. Evaluates the expression and writes its result as new chunk/object to the output.
For each given input object, stream a file from a remote URL to our Amazon S3 storage. An admin must enable this feature for the data source before you can use this stream unit.
remotePathProperty
: Key path (in dot-notation) where to find the remote path on our S3 in any given input object.mimeTypeProperty
: Key path (in dot-notation) where to find the mime type to set on S3 in any given input object.isUploadedFlagProperty
: Key path (in dot-notation) where to find a flag that the object is already uploaded to S3. If the value of this property istrue
, the server will skip the upload for the input object.sourceUrlTemplateDataProperty
: Key path (in dot-notation) where to find the property to use as input for string template-based URLs.sourceUrlProperty
: Key path (in dot-notation) where to find the URL of the source file to stream.
Inserts an Image
document into the database. Upserting will link the image with another target document, e.g. a Place
.
refetchAfterUpsert
(optional): Set this totrue
to output a refetched document after each upsert. This document will contain all properties in the DB, even the ones that you have not included in the upsert modifier.
Inserts a PlaceInfo
document into the database. Upserting will link the place with the source you have created in the accessibility.cloud web UI, and will make it available to API users when…
- you have accepted the terms and conditions for the data source's organization
- your data source is not in draft state anymore (see the source's 'Settings' tab in the web UI)
- the app token used to query the API belongs to an app by an organization that is allowed to read your source's data (you can either allow everyone or specific organizations to use your data source in the 'Settings' tab).
UpsertPlace
is usually the last unit in the stream chain, but it outputs status data for each processed chunk that you can use for debugging.
refetchAfterUpsert
(optional): Set this totrue
to output a refetched document after each upsert. This document will contain all properties in the DB, even the ones that you have not included in the upsert modifier.
Inserts a record into the database that describes an equipment or facility of a place, for example an elevator or escalator.
Note that equipment and facilities have a differing set of accessibility attributes.
This works like UpsertPlace
, but using it marks the source as a data source for equipment/facilities, which enables special features:
- Equipment / Facility PoIs can belong to
PlaceInfo
PoIs from another data source. This way, places can have a list of (for example) elevators and escalators. - If a transformed imported PoI has
properties.originalPlaceInfoId
andproperties.placeSourceId
properties, importing will associate the equipment/facility with the place. For this association, accessibility.cloud uses the data provider's original place ID. - If the equipment data source uses a different ID space than the associated places, you can use the
properties.originalPlaceInfoIdField
property - For this to work,
properties.placeSourceId
must refer to the ID of a place data source on accessibility.cloud that belongs to the same organization. - accessibility.cloud will show the imported equipment/facilities on the overview page of associated place data source.
- The accessibility.cloud
/placeInfos
API will include equipment/facility data asproperties.equipmentInfos
property.
refetchAfterUpsert
(optional): Set this totrue
to output a refetched document after each upsert. This document will contain all properties in the DB, even the ones that you have not included in the upsert modifier.
Inserts a record into the database that describes a (possibly timed) disruption of an equipment object.
This allows to record if a disruption is a planned event, scheduled maintainances, or to let users of the equipment find out when a disruption will end.
accessibility.cloud supports two models for storing disruptions:
- A
PlaceInfo
has 0-*EquipmentInfos
.EquipmentInfo
s have manyDisruption
s. - A
PlaceInfo
has 0-*Disruptions
, without a model for equipment/facility info.
Some data sources have one record per existing disruption and do not supply disruptions that have happened in the past or will happen in the future. For this, you can set the removeMissingRecords
parameter of the UpsertDisruption
stream unit to true
. Setting the flag will delete all disruption records that have not been part of an import after the import finishes.
Note that disruptions have a specific set of attributes.
This works like UpsertPlace
, but using it marks the source as a data source for disruptions, which enables special features:
- Add a
equipmentSourceId
parameter to this stream to refer to an equipment source that should be watched for status changes. - A disruption can belong to a
PlaceInfo
orEquipmentInfo
from another data source. The same way, a place or equipment/facility can have a list of disruptions in the past, present and future. Depending on the imported data model, its your choice if you want to associate a disruption with a place or equipment/facility. - If a transformed imported PoI has
properties.originalPlaceInfoId
andproperties.placeSourceId
properties, importing will associate the disruption with the place. - If a transformed imported PoI has
properties.originalEquipmentInfoId
andproperties.equipmentSourceId
properties, importing will associate the disruption with the equipment/facility. - For this to work, the equipment/facility/place data sources must belong to the same organization. Use the
properties.equipmentSourceId
andproperties.placeSourceId
properties to refer to the respective data sources. - If you associate a disruption with equipment, you can use the
isEquipmentWorking
property of the disruption and set the stream unit'stakeOverEquipmentWorkingFlag
parameter totrue
. This will update the equipment'sproperties.isWorking
flag on import. If you want to take the date properties of theDisruption
into account, set theproperties.equipmentIsWorking
flag accordingly in the disruption data source transform stream. - If you set
setUnreferencedEquipmentToWorking
totrue
, accessibility.cloud will interpret missing disruption information as working equipment and set all equipment infos not referenced in the last import toisWorking: true
after import. The server can reset theisWorking
flag on a specific subset of equipment if you add aequipmentSelectorForImport
selector (as escaped JSON string) that selects the places that should be reset. An example where this is useful is when an equipment source has elevators from different operators, and you want the import to a affect a specific operator’s elevators only. - accessibility.cloud will show the imported disruptions on the overview page of associated places/equipment/facility data sources.
/placeInfos
API responses will include disruption data if you supply aincludeRelated=equipmentInfos.disruptions
query parameter./equipmentInfos
API responses will include disruption data if you supply aincludeRelated=disruptions
query parameter.
refetchAfterUpsert
(optional): Set this totrue
to output a refetched document after each upsert. This document will contain all properties in the DB, even the ones that you have not included in the upsert modifier.
You can use the following popular libraries for your transform code:
- lodash, an extensive convenience function libary for JavaScript, via the global variable
_
- geodesy, a libary of geodesy functions (for example conversions between coordinate systems), via the global variable
geodesy
… and the following helpers:
helpers.OSM.fetchCategoryFromTags
: Find a valid accessibility.cloud category identifier from a given array of OpenStreetMap tags. Returns aString
.helpers.OSM.fetchNameFromTags
: Generate a generic name for a place from given array of OpenStreetMap tags. Returns aString
.helpers.extractNumber
to find and extract a number from a given string. Returns aNumber
.helpers.convertOSGB36ToWGS84Point(x, y)
to convert OSGB36 X/Y geocoordinates to a GeoJSON Point geometry structure. Returns an object, for example like{ type: 'Point', coordinates: [ -1.609014719024711, 54.97022522800059 ] } }
.