Version: 1.1
Date: 12.04.2022
Authors: Mario Scrocca (@marioscrock), Milan Straka (@bioticek)
The trias-extractor
offer parser is a module of the Ride2Rail Offer Categorizer responsible for parsing offers from Trias and for converting them to the offer cache schema enabling the categorization.
The parsed input format is mainly based on the Trias specification for a TripResponse
message but takes also into account custom extensions (available in the folder extensions
) developed by Shift2Rail IP4 projects, i.e., the Coactive extensions and the extensions defined for Ride2Rail.
The procedure implemented by the trias-extractor
is composed of two main phases.
Parsing of data required from the Trias file provided to an intermediate representation using in-memory objects. The procedures to parse the data are implemented in the extractor.py module. The intermediate object model used to represent the parsed data is defined in the model.py module.
The defined model reflects the offer cache schema:
- Request: id, start_time, end_time, start_point, end_point, cycling_dist_to_stop, walking_dist_to_stop, walking_speed, cycling_speed, driving_speed, max_transfers, expected_duration, via_locations, offers (dictionary of associated Offer objects)
- Offer: id, trip, bookable_total, complete_total, offer_items (dictionary of associated OfferItem objects)
- Trip: id, duration, start_time, end_time, num_interchanges, length, legs (dictionary of associated TripLeg objects)
- OfferItem: id, name, fares_authority_ref, fares_authority_text, price, leg_ids (list of ids of TripLeg objects covered by the OfferItem object)
- TripLeg: id, start_time, end_time, duration, leg_track, length, leg_stops, transportation_mode, travel_expert, attributes (dictionary of key-value pairs)
- TimedLeg(TripLeg): line, journey
- ContinuousLeg(TripLeg)
- RideSharingLeg(ContinuousLeg): driver, vehicle
Location and its subclasses (StopPoint, Address) are used to support the processing but are not serialized in the offer cache.
The parsing procedure is implemented through the following steps:
- Parse the
TripRequest
data associated with the offers described in the TriasTripResponse
obtaining amodel.Request
object - Parse the
TripResponseContext
associated with the offers described in the TriasTripResponse
obtaining a list ofmodel.Location
objects - Parse all the Trias
Trip
s and the associatedTripLeg
s obtaining a set ofmodel.Trip
objects referencing an ordered list ofmodel.TripLeg
s - Parse the Trias Meta-
Ticket
associated with the different TriasTrip
s obtaining a list ofmodel.Offer
objects referencing the associatedmodel.Trip
and bound to themodel.Request
object - Parse the Trias
Ticket
associated with each Meta-Ticket
obtaining a list ofmodel.OfferItem
associated with amodel.Offer
and with themodel.TripLeg
s covered by the offer item. - Parse the
OfferItemContext
for each TriasTicket
obtaining a dictionary of key-value pairs bound to specificmodel.TripLeg
s associated to themodel.OfferItem
Notes:
- Step 1: if not provided in a parameter, a UUID is automatically assigned to each request received by the
trias-extractor
and used as id for themodel.Request
object - Step 5: a
model.Offer
can be associated with nomodel.OfferItem
if a purchase is not needed to perform the trip - Step 6: If the
OfferItemContext
contains a composite key, the assumption is that it is composed asoic_key:leg_id
and the parsed value should be associated only with themodel.TripLeg
having the providedleg_id
. In all the other cases the value parsed is associated to all themodel.TripLeg
s associated with themodel.OfferItem
. The information extracted from theOfferItemContext
is merged with theAttribute
s parsed for eachmodel.TripLeg
.
Storing of the data parsed by the trias-extractor
to the offer cache. A dedicated procedure is defined for in the writer.py module. The complete serialization is composed of queued commands in a pipeline that is executed as a single write to the offer cache.
The trias-extractor
component is implemented as a Python application using the Flask framework to expose the described procedure as a service. Each Trias file processed by the trias-extractor
component is mapped to a Request object and then serialized in the offer cache.
Example request running the trias-extractor
locally.
$ curl --header 'Content-Type: application/xml' \
--request POST \
--data-binary '@trias/$FILE_NAME' \
http://localhost:5000/extract/?request_id=example_1_1
The parameter request_id in the URL, serves for testing purposes to set the request_id to an exact value.
If omitted, a random request_id is generated.
Adding Trias requests to a trias
folder in the repository root, the load.sh
script can be used to automatically launch the trias-extractor service, the offer cache and process the files. The offer cache data are persisted in the ./data
folder.
The request_id (key to access the data parsed from the offer cache) is returned in the response as a field in a JSON body together with the number of offers parsed. Example output:
{
"request_id": "581ec560-251e-4dbe-9e52-8f824bda5eb0",
"num_offers": "15"
}
Error code 400
is returned if there is an error in the parsing procedure, code 500
if the request fails for any other reason.
The following values of parameters can be defined in the configuration file trias_extractor_service.conf.
Section cache:
- host - host address of the cache service that should be accessed
- port - port number of the cache service that should be accessed
The trias_extractor/config/codes.csv can be modified to configure the parsing procedure of the Attribute
s associated with the different TripLeg nodes and the offer item context associated with the different Ticket nodes (offer items). The file defines the admissible keys (key
column), the expected range of the values (value_min
and value_max
columns for numeric datatypes) and the datatype (type
column, admissible values are string
, int
, float
, date
) to execute a preliminary validation of the value parsed.
Different alternatives are provided to deploy the trias-extractor
service.
Running it locally (assumption Redis is running at localhost:6379
)
$ python3 trias_extractor_service.py
* Serving Flask app "trias_extractor_service" (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: 441-842-797
Running on Docker (executes both the trias-extractor
service and a Redis container)
$ docker-compose build
$ docker-compose up
Change the build
section in the docker-compose file to use the Dockerfile.production
configuration that runs the Flask app on gunicorn
, remove the environment
section.
$ docker-compose build
$ docker-compose up
Edit the Dockerfile.production
file to set a different gunicorn
configuration.