Analyze the data from the APC pilot and gather insight from it.
Download onto and analyze the data on one computing node.
This repository has been created as part of the Waltti APC project.
- Install
poetry
. poetry install
- Create an
.env
file with the appropriate configuration. See below for the environment variables. poetry run python src/waltti_apc_pilot_analysis/main.py
Environment variable | Required? | Default value | Description |
---|---|---|---|
COUNTING_SYSTEM_MAP |
✅ Yes | A map from the counting system IDs to vehicles and vendor names. The map is given in the form of a stringified JSON array of strings in the shape [[systemId1, [uniqueVehicleId1, vendorA]], [systemId2, [uniqueVehicleId2, vendorB]], ...] like the output of Map.prototype.entries() in JavaScript. An example could be [[\"c5e96843-e820-4837-8eef-6176be4b4c4e\",[\"fi:jyvaskyla:6714_503\",\"Acme\"]],[\"6dd41f2e-841f-44a0-b5f8-a108847dc4a2\",[\"fi:jyvaskyla:6714_529\",\"Corpcorp\"]]] . |
|
DATA_ROOT_PATH |
✅ Yes | The path of the directory where to download the Pulsar messages. | |
PULSAR_GTFSRTVP_TOPIC_JSON_ARRAY |
✅ Yes | A JSON array of the Pulsar topics for the GTFS Realtime VehiclePosition messages to download. Each of the topics will be downloaded starting from the earliest message that is still locally missing. Topic regex pattern is not used here so that each topic can be downloaded separately. An example value could be ["persistent://tenant/source/gtfs-realtime-vp-fi-kuopio","persistent://tenant/source/gtfs-realtime-vp-fi-jyvaskyla"] . |
|
PULSAR_OAUTH2_AUDIENCE |
✅ Yes | The OAuth 2.0 audience. | |
PULSAR_OAUTH2_ISSUER_URL |
✅ Yes | The OAuth 2.0 issuer URL. | |
PULSAR_OAUTH2_KEY_PATH |
✅ Yes | The path to the OAuth 2.0 private key JSON file. | |
PULSAR_ONBOARD_APC_TOPIC |
✅ Yes | The topic for the onboard APC messages to download. The download will start from the earliest message that is still locally missing. An example value could be "persistent://tenant/source/mqtt-apc-from-vehicle" . |
|
PULSAR_SERVICE_URL |
✅ Yes | The service URL. | |
PULSAR_TLS_VALIDATE_HOSTNAME |
✅ Yes | Whether to validate the hostname on its TLS certificate. This option exists because some Apache Pulsar hosting providers cannot handle Apache Pulsar clients setting this to true . |
|
RUN_PHASES_JSON_ARRAY |
✅ Yes | A JSON array of the phases of the program to run. The array for running all of the phases is ["download","analysis"] . After the downloads are recent enough, it's easier to iterate by using just the value ["analysis"] . |