Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

goal & architecture #5

Merged
merged 1 commit into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/assets/iceberg-metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/assets/system-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 19 additions & 3 deletions doc/design_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,26 @@
* Chien-Yu Liu ([email protected])

## Overview
>What is the goal of this project? What will this component achieve?

### Goal
The goal of this project is to design and implement a **Catalog Service** for an OLAP database system. The Catalog aims for managing metadata and providing a centralized repository for storing information about the structure and organization of data within the OLAP database. This project aims to produce a functional catalog that adheres to [the Iceberg catalog specification](https://iceberg.apache.org/spec/) exposed through [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml).
## Architectural Design
>Explain the input and output of the component, describe interactions and breakdown the smaller components if any. Include diagrams if appropriate.
We follow the logic model described below. The input of our service come from execution engine and I/O service. And we will provide matadata to planner and scheduler.
![system architecture](./assets/system-architecture.png)
### Data Model
We adhere to the Iceberg data model, arranging tables based on namespaces, with each table uniquely identified by its name. Our goal is to enable multi-versioning, facilitating point-in-time queries and allowing for queries at a specific historical version of the table.

For every table in the catalog, there is an associated metadata file. This file contains a collection of manifests, each of which references the table's information at different points in time. The manifest file is an in-memory, non-persistent component that gets recreated based on on-disk files during service restarts. (If it is not frequently updated, we could dump it to disk every time we update it)

To enhance startup and recovery times, we periodically save the in-memory index to disk. This ensures a quicker restoration process by utilizing the dumped index data.
![Catalog Data Model](./assets/iceberg-metadata.png)

### Use Cases
#### Namespace
create/update/delete namespace.
#### Table
create/update/delete table
#### Query Table’s Metadata
get metadeta by {namespace}/{table}

## Design Rationale
>Explain the goals of this design and how the design achieves these goals. Present alternatives considered and document why they are not chosen.
Expand Down
Loading