Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Commit

Permalink
Merge pull request #5 from cmu-db/zilongz
Browse files Browse the repository at this point in the history
goal & architecture
  • Loading branch information
Angela-CMU authored Jan 31, 2024
2 parents 6d1dcba + 0ef0d1a commit b6778f8
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 3 deletions.
Binary file added doc/assets/iceberg-metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/assets/system-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 19 additions & 3 deletions doc/design_doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,26 @@
* Chien-Yu Liu ([email protected])

## Overview
>What is the goal of this project? What will this component achieve?
### Goal
The goal of this project is to design and implement a **Catalog Service** for an OLAP database system. The Catalog aims for managing metadata and providing a centralized repository for storing information about the structure and organization of data within the OLAP database. This project aims to produce a functional catalog that adheres to [the Iceberg catalog specification](https://iceberg.apache.org/spec/) exposed through [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml).
## Architectural Design
>Explain the input and output of the component, describe interactions and breakdown the smaller components if any. Include diagrams if appropriate.
We follow the logic model described below. The input of our service come from execution engine and I/O service. And we will provide matadata to planner and scheduler.
![system architecture](./assets/system-architecture.png)
### Data Model
We adhere to the Iceberg data model, arranging tables based on namespaces, with each table uniquely identified by its name. Our goal is to enable multi-versioning, facilitating point-in-time queries and allowing for queries at a specific historical version of the table.

For every table in the catalog, there is an associated metadata file. This file contains a collection of manifests, each of which references the table's information at different points in time. The manifest file is an in-memory, non-persistent component that gets recreated based on on-disk files during service restarts. (If it is not frequently updated, we could dump it to disk every time we update it)

To enhance startup and recovery times, we periodically save the in-memory index to disk. This ensures a quicker restoration process by utilizing the dumped index data.
![Catalog Data Model](./assets/iceberg-metadata.png)

### Use Cases
#### Namespace
create/update/delete namespace.
#### Table
create/update/delete table
#### Query Table’s Metadata
get metadeta by {namespace}/{table}

## Design Rationale
* Correctness:
Expand Down

0 comments on commit b6778f8

Please sign in to comment.