Introduce Arctic on Flink Lookup Join #1412
YesOrNo828
started this conversation in
Ideas
Replies: 1 comment
-
google docs link: https://docs.google.com/document/d/1euYt7rbNGjrr2JFw2vK_epuHIjZ9AJV99FXC75yXSfw/edit?usp=sharing |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Arctic on Flink Lookup Join
Introduction:
This proposal aims to outline the implementation of an Arctic on Flink Lookup Table Join that will enable fast and efficient retrieval of data from the Arctic Lakehouse Mixed format table. The current system uses Flink event time temporal join technology, leading to a slow and inefficient data initialization process. With the implementation of the Arctic on Flink Lookup Table Join, we will be able to improve the speed and efficiency of data retrieval, improving the system’s performance.
Background & Motivation:
The user encountered some problems while using the current lookup join scheme:
There are several advantages to using Lookup Join Syntax:
GOALS
NON-GOALS
Proposal:
We propose the implementation of the Arctic on Flink Lookup Table as a replacement for the existing indexing method. The new system will be implemented using Apache Flink, which is a distributed computing framework that can process large volumes of data in real-time. The Arctic on Flink Lookup Table will store data in the form of key-value pairs through JVM or RocksDB.
Reduce Initialization Time
The proposed methods for reducing initialization time are as follows:
Support the non-primary keys as join keys
We propose to support non-primary keys as join keys in the Arctic on Flink Lookup Table Join. This will increase the flexibility of querying data, allowing for a partial primary key or non-primary key comparisons. With this improvement, users will no longer be restricted to using only primary keys as join keys, expanding the use cases for the system.
KVTable
UniqueIndexTable
SecondaryIndexTable
RocksDBCacheState:
RocksDBRecordState:
RocksDBSetMemoryState:
BinaryRowDataSerializerWrapper:
MixedIncrementalLoader:
ArcticLookupFunction:
Test Plan
Each new feature would be covered by unit tests.
Reference:
[1]: Support MOR with Flink Engine in runtime batch mode
[2]: FLIP-204: Introduce Hash Lookup Join
[3]: optimizing-bulk-load-in-rocksdb
[4]: FLINK-29138
Beta Was this translation helpful? Give feedback.
All reactions