This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5 from cmu-db/zilongz
goal & architecture
- Loading branch information
Showing
3 changed files
with
19 additions
and
3 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,10 +5,26 @@ | |
* Chien-Yu Liu ([email protected]) | ||
|
||
## Overview | ||
>What is the goal of this project? What will this component achieve? | ||
### Goal | ||
The goal of this project is to design and implement a **Catalog Service** for an OLAP database system. The Catalog aims for managing metadata and providing a centralized repository for storing information about the structure and organization of data within the OLAP database. This project aims to produce a functional catalog that adheres to [the Iceberg catalog specification](https://iceberg.apache.org/spec/) exposed through [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml). | ||
## Architectural Design | ||
>Explain the input and output of the component, describe interactions and breakdown the smaller components if any. Include diagrams if appropriate. | ||
We follow the logic model described below. The input of our service come from execution engine and I/O service. And we will provide matadata to planner and scheduler. | ||
![system architecture](./assets/system-architecture.png) | ||
### Data Model | ||
We adhere to the Iceberg data model, arranging tables based on namespaces, with each table uniquely identified by its name. Our goal is to enable multi-versioning, facilitating point-in-time queries and allowing for queries at a specific historical version of the table. | ||
|
||
For every table in the catalog, there is an associated metadata file. This file contains a collection of manifests, each of which references the table's information at different points in time. The manifest file is an in-memory, non-persistent component that gets recreated based on on-disk files during service restarts. (If it is not frequently updated, we could dump it to disk every time we update it) | ||
|
||
To enhance startup and recovery times, we periodically save the in-memory index to disk. This ensures a quicker restoration process by utilizing the dumped index data. | ||
![Catalog Data Model](./assets/iceberg-metadata.png) | ||
|
||
### Use Cases | ||
#### Namespace | ||
create/update/delete namespace. | ||
#### Table | ||
create/update/delete table | ||
#### Query Table’s Metadata | ||
get metadeta by {namespace}/{table} | ||
|
||
## Design Rationale | ||
* Correctness: | ||
|