An Open Lakehouse Format for Big Data Analytics, ML & AI
The TrinityLake format defines the objects in a Lakehouse and provides a consistent and efficient way for accessing and manipulating these objects. It offers the following key features:
- Multi-object multi-statement transactions with standard SQL
BEGIN
andCOMMIT
semantics - Consistent time travel and snapshot export across all objects in the Lakehouse
- Storage only as a Lakehouse solution that works exactly the same way locally, on premise and in the cloud
- Compatibility with open table formats like Apache Iceberg, supporting both standard SQL
MANAGED
andEXTERNAL
as well as federation-based access patterns. - Compatibility with open catalog standards like Apache Iceberg REST Catalog specification, serving as a highly scalable yet extremely lightweight backend implementation
For more details about the format specification, and how to get started and use it with various open engines such as Apache Spark, please visit trinitylake.io.
This project is still at early development stage. If you are interested in developing this project with us together, we mainly use Slack (click for invite link) for communication. We also use GitHub Issues and GitHub Discussions for discussion purpose.
The project website is built using the mkdocs-material framework with a few other plugins.
python3 -m venv env
source env/bin/activate
pip install mkdocs-material
pip install mkdocs-awesome-pages-plugin
source env/bin/activate
mkdocs serve