Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Dtenwolde authored Oct 6, 2023
1 parent c15fcf0 commit cd18f81
Showing 1 changed file with 70 additions and 87 deletions.
157 changes: 70 additions & 87 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,19 @@
# WIP Disclaimer
This template is currently work-in-progress. Feel free to play around with it and give us feedback. Note also that this template depends on a development version of DuckDB. Follow https://duckdb.org/news for more information on official launch.
This repository is currently a research project and a work in progress. Feel free to play around with it and give us feedback. NOTE: Currently any query containing SQL/PGQ syntax requires a `-` at the start of the query (else you will get a segmentation fault).

# DuckDB Extension Template
The main goal of this template is to allow users to easily develop, test and distribute their own DuckDB extension.
# DuckPGQ
A DuckDB extension for graph workloads that supports the SQL/PGQ standard.

## Getting started
First step to getting started is to create your own repo from this template by clicking `Use this template`. Then clone your new repository using
To get started, first clone this repository:
```sh
git clone --recurse-submodules https://github.com/<you>/<your-new-extension-repo>.git
git clone --recurse-submodules https://github.com/cwida/duckpgq-extension.git
```
Note that `--recurse-submodules` will ensure the correct version of duckdb is pulled allowing you to get started right away.
Note that `--recurse-submodules` will ensure the correct version of DuckDB is pulled allowing you to get started right away.

This repository uses a modified version of DuckDB (currently a modified v0.8.1) and is not yet easily installed from a standard DuckDB (e.g. `pip install duckdb`) installation.
If you want to use the SQL/PGQ syntax, you will have to build this repository from the source.
In the future, we aim to have an easily installed and loaded DuckDB extension.

## Building
To build the extension:
Expand All @@ -24,110 +28,88 @@ The main binaries that will be built are:
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `<extension_name>.duckdb_extension` is the loadable binary as it would be distributed.
- `duckpgq.duckdb_extension` is the loadable binary as it would be distributed.

## Running the extension
To run the extension code, simply start the shell with `./build/release/duckdb`.

Now we can use the features from the extension directly in DuckDB. The template contains a single scalar function `duckpgq()` that takes a string arguments and returns a string:
Now we can use the features from the extension directly in DuckDB. To verify that the extension works correctly you can run the following query:
```
D select duckpgq('Jane') as result;
┌───────────────┐
result
│ varchar │
├───────────────┤
Quack Jane 🐥 │
└───────────────┘
D select duckpgq('Daniel');
┌───────────────────
duckpgq('Jane')
varchar
├───────────────────
Duckpgq Jane 🐥
└───────────────────
```

## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```
## SQL/PGQ
SQL/PGQ is a graph query language built on top of SQL, bringing graph pattern matching capabilities to existing SQL users as well as to new users who are interested in graph technology but who do not have an SQL background.

## Getting started with your own extension
After creating a repository from this template, the first step is to name your extension. To rename the extension, run:
```
python3 ./scripts/set_extension_name.py <extension_name_you_want>
```
Feel free to delete the script after this step.
SQL/PGQ is standardized by the International Organization for Standardization (ISO) and is supported by the Oracle Database. It provides a declarative language for querying property graphs, which are a type of graph data model that stores nodes, edges, and properties on both nodes and edges.

Now you're good to go! After a (re)build, you should now be able to use your duckdb extension:
```
./build/release/duckdb
D select <extension_name_you_chose>('Jane') as result;
┌─────────────────────────────────────┐
│ result │
│ varchar │
├─────────────────────────────────────┤
│ <extension_name_you_chose> Jane 🐥 │
└─────────────────────────────────────┘
```

For inspiration/examples on how to extend DuckDB in a more meaningful way, check out the [test extensions](https://github.com/duckdb/duckdb/blob/master/test/extension),
the [in-tree extensions](https://github.com/duckdb/duckdb/tree/master/extension), and the [out-of-tree extensions](https://github.com/duckdblabs).
SQL/PGQ queries use a visual graph syntax that is similar to that of Cypher, another popular graph query language. However, SQL/PGQ also supports traditional SQL syntax, which makes it easy for SQL users to get started with graph querying.

## Distributing your extension
Easy distribution of extensions built with this template is facilitated using a similar process used by DuckDB itself.
Binaries are generated for various versions/platforms allowing duckdb to automatically install the correct binary.
SQL/PGQ can be used to query property graphs for a variety of purposes, including:

This step requires that you pass the following 4 parameters to your GitHub repo as action secrets:
- Finding paths between nodes
- Finding nodes that match certain criteria
- Finding the shortest path between two nodes
- Finding the most connected nodes in a graph
- Finding the most influential nodes in a graph
SQL/PGQ is a powerful tool for querying graph data, and it is becoming increasingly popular as graph databases become more widely adopted.

| secret name | description |
| ------------- | ----------------------------------- |
| S3_REGION | s3 region holding your bucket |
| S3_BUCKET | the name of the bucket to deploy to |
| S3_DEPLOY_ID | the S3 key id |
| S3_DEPLOY_KEY | the S3 key secret |

After setting these variables, all pushes to master will trigger a new (dev) release. Note that your AWS token should
have full permissions to the bucket, and you will need to have ACLs enabled.

### Installing the deployed binaries
To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the
`allow_unsigned_extensions` option set to true. How to set this will depend on the client you're using. Some examples:

CLI:
```shell
duckdb -unsigned
## Example
First load some data into DuckDB. The following loads the LDBC Social Network Benchmark dataset
```sql
import database 'duckdb-pgq/data/SNB0.003';
```

Python:
```python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})
The first step of SQL/PGQ is to register a property graph:
```sql
-CREATE PROPERTY GRAPH snb
VERTEX TABLES (
Person LABEL Person
)
EDGE TABLES (
person_knows_person SOURCE KEY ( person1id ) REFERENCES Person ( id )
DESTINATION KEY ( person2id ) REFERENCES Person ( id )
LABEL Knows
);
```

Here is a simple example of a SQL/PGQ query:
```sql
-SELECT *
FROM GRAPH_TABLE (snb
MATCH (p:Person)-[k:knows]->(f:Person)
WHERE p.firstname = 'Jan'
COLUMNS(f.firstname)
) x;
```

NodeJS:
```js
db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});
```
This query will find all of Jan's friends and return their first names.

Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension
you want to install. To do this run the following SQL query in DuckDB:
```sql
SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com/<your_extension_name>/latest';
```
Note that the `/latest` path will allow you to install the latest extension version available for your current version of
DuckDB. To specify a specific version, you can pass the version instead.
SQL/PGQ is a complex language, but there are resources available to help users get started.
- [DuckPGQ: Efficient Property Graph Queries in an analytical RDBMS](https://www.cidrdb.org/cidr2023/papers/p66-wolde.pdf)

After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:
```sql
INSTALL <your_extension_name>
LOAD <your_extension_name>
```
## Contributing
Want to contribute to the project? Great! Please refer to DuckDB's own [development](https://github.com/duckdb/duckdb#development) and [contribution](https://github.com/duckdb/duckdb/blob/main/CONTRIBUTING.md) guides which we tend to follow to see how you can help us out. If you are unsure, do not hesitate to reach out.

### Versioning of your extension
Extension binaries will only work for the specific DuckDB version they were built for. Since you may want to support multiple
versions of DuckDB for a release of your extension, you can specify which versions to build for in the CI of this template.
By default, the CI will build your extension against the version of the DuckDB submodule, which should generally be the most
recent version of DuckDB. To build for multiple versions of DuckDB, simply add the version to the matrix variable, e.g.:
For development, you generally want to build using debug mode:
```sh
make debug
```
strategy:
matrix:
duckdb_version: [ '<submodule_version>', 'v0.7.0']

## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```


## Setting up CLion

### Opening project
Expand All @@ -139,3 +121,4 @@ Now to fix your project path go to `tools->CMake->Change Project Root`([docs](ht
To set up debugging in CLion, there are two simple steps required. Firstly, in `CLion -> Settings / Preferences -> Build, Execution, Deploy -> CMake` you will need to add the desired builds (e.g. Debug, Release, RelDebug, etc). There's different ways to configure this, but the easiest is to leave all empty, except the `build path`, which needs to be set to `../build/{build type}`. Now on a clean repository you will first need to run `make {build type}` to initialize the CMake build directory. After running make, you will be able to (re)build from CLion by using the build target we just created.

The second step is to configure the unittest runner as a run/debug configuration. To do this, go to `Run -> Edit Configurations` and click `+ -> Cmake Application`. The target and executable should be `unittest`. This will run all the DuckDB tests. To specify only running the extension specific tests, add `--test-dir ../../.. [sql]` to the `Program Arguments`. Note that it is recommended to use the `unittest` executable for testing/development within CLion. The actual DuckDB CLI currently does not reliably work as a run target in CLion.

0 comments on commit cd18f81

Please sign in to comment.