Gathers data science and machine learning problem solving using PyFlink.
- Simple Word Count to HDFS, 1.tableapi-word-count-hdfs.ipynb.
- Simple Table API to do Word Count and sink into Parquet format in HDFS.
- Simple Word Count to PostgreSQL, 2.tableapi-word-count-postgres.ipynb.
- Simple Table API to do Word Count and sink into PostgreSQL using JDBC.
- Simple Word Count to Kafka, 3.tableapi-word-count-kafka.ipynb.
- Simple Table API to do Word Count and sink into Kafka.
- Simple text classification to HDFS, 4.tableapi-malay-sentiment-classifier-hdfs.ipynb.
- Load trained text classification model using UDF to classify sentiment and sink into Parquet format in HDFS.
- Simple text classification to PostgreSQL, tableapi-malay-sentiment-classifier-postgres.ipynb.
- Load trained text classification model using UDF to classify sentiment and sink into PostgreSQL.
- Simple text classification to Kafka, tableapi-malay-sentiment-classifier-kafka.ipynb.
- Load trained text classification model using UDF to classify sentiment and sink into Kafka.
- Simple real time text classification upsert to PostgreSQL, tableapi-malay-sentiment-classifier-kafka-upsert-postgres.ipynb.
- Simple real time text classification from Debezium CDC and upsert into PostgreSQL.
- Simple real time text classification upsert to Kafka, tableapi-malay-sentiment-classifier-kafka-upsert-kafka.ipynb.
- Simple real time text classification from Debezium CDC and upsert into Kafka Upsert.
- Simple Word Count to Apache Hudi, tableapi-word-count-hudi.ipynb.
- Simple Table API to do Word Count and sink into Apache Hudi in HDFS.
- Simple text classification to Apache Hudi, tableapi-malay-sentiment-classifer-hudi.ipynb.
- Load trained text classification model using UDF to classify sentiment and sink into Apache Hudi in HDFS.
- Simple real time text classification upsert to Apache Hudi, tableapi-malay-sentiment-classifer-kafka-upsert-hudi.ipynb.
- Simple real time text classification from Debezium CDC and upsert into Apache Hudi in HDFS.
- Run HDFS and PostgreSQL for Hive Metastore,
docker container rm -f hdfs postgres
docker-compose -f misc.yaml build
docker-compose -f misc.yaml up
- Create Hive metastore in PostgreSQL,
PGPASSWORD=postgres docker exec -it postgres psql -U postgres -d postgres -c "$(cat hive-schema-3.1.0.postgres.sql)"
- Build image,
docker build -t flink flink
- Run docker compose,
docker-compose up
Feel free to scale up the workers,
docker-compose scale taskmanager=2
To access Flink SQL CLI,
docker exec -it flink /opt/flink/bin/sql-client.sh
- Run Kafka and Debezium for PostgreSQL CDC,
docker container rm -f debezium broker
docker-compose -f kafka.yaml up
docker exec postgresql bash -c \
'PGPASSWORD=postgres psql -d postgres -U postgres -c "$(cat /bitnami/postgresql/conf/table.sql)"'
- Add PostgreSQL CDC,
curl --location --request POST http://localhost:8083/connectors/ \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "employee-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"tasks.max": "1",
"plugin.name": "pgoutput",
"database.hostname": "postgresql",
"database.port": "5432",
"database.user": "postgres",
"database.password": "postgres",
"database.dbname" : "postgres",
"database.server.name": "employee",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "true",
"table.whitelist": "public.employee,public.salary",
"tombstones.on.delete": "false",
"decimal.handling.mode": "double"
}
}'