This is a guide outlining the process of conducting a toy time-series prediction using Flink and ML.
To begin the time-series prediction process in Flink, the first step involves sending simulated sensor data to RabbitMQ in JSON format. The JSON structure resembles: { "machine_id": 1, "time": 1, "sensor_id": 1, "sensor_val": 1, "state": "good", "sensor_id": 2, ... }.
Upon receiving the sensor data stream from RabbitMQ, the next step is to read and process the data. This is achieved by converting the JSON data into POJO format. Additionally, some simple feature transformations such as addition, mean calculation, variance computation, and quartile calculations (upper and lower quartiles) are performed on the data.
involved operators: addSource/flatMap/watermark/filter/window/keyBy/process
The data is further transformed into the required format for LightGBM (LGBM). The transformed data is then processed using a map operator, and LGBM's PMML (Predictive Model Markup Language) file is employed to make predictions.
involved operators: keyBy/process/map
In the final stage, the predicted results are compared with the original data stream, and the two streams are merged. The merged data is written to both Redis and InfluxDB databases for storage and analysis.
involved operators: coGroup/where/equalTo/window/apply/addSink