Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/v51 #6007

Merged
merged 23 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/docs/build/connect/connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,18 @@ To add a remote source using the UI, click "+" by Sources in the left hand navig

After import, you can reimport your data whenever you want by clicking the "refresh source" button in the Rill UI.

:::note Have a firewall setup?
You need to whitelist the following IP addresses to connect to/from Rill Cloud and your service behind the firewall.
```
35.196.245.100
34.74.117.37
35.196.153.31
34.75.22.143
34.148.167.51
35.237.60.193
```
:::

### Using code
When you add a source using the UI or CLI, a code definition will automatically be created as a `.yaml` file in your Rill project in the `sources` directory.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/build/credentials/credentials.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ At a high level, configuring credentials and credentials management in Rill can
## Setting credentials for Rill Developer

When reading from a source (or using a different OLAP engine), Rill will attempt to use existing credentials that have been configured on your machine.
1. Credentials that have been configured in your local environment via the CLI (for [AWS](../../reference/connectors/s3.md#local-credentials) / [Azure](../../reference/connectors/azure.md#local-credentials) / [Google Cloud](../../reference/connectors/gcs.md#local-credentials))
1. Credentials that have been configured in your local environment via the CLI (for [AWS](../../reference/connectors/s3.md#local-credentials) / [Azure](../../reference/connectors/azure.md#local-credentials) / [Google Cloud](../../reference/connectors/gcs#rill-developer-local-credentials))
2. Credentials that have been passed in directly through the connection string or DSN (typically for databases - see [Source YAML](../../reference/project-files/sources.md) and [Connector YAML](../../reference/project-files/connectors.md) for more details)
3. Credentials that have been passed in as a [variable](../../deploy/templating.md) when starting Rill Developer via `rill start --env key=value`
4. Credentials that have been specified in your *`<RILL_PROJECT_HOME>/.env`* file
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/build/dashboards/_category_.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
position: 40
label: Create Dashboards
label: Create Explore Dashboards
collapsible: true
collapsed: true
50 changes: 39 additions & 11 deletions docs/docs/build/dashboards/dashboards.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
---
title: Create Dashboards
title: Create Explore Dashboards
description: Create dashboards using source data and models with time, dimensions, and measures
sidebar_label: Create Dashboards
sidebar_label: Create Explore Dashboards
sidebar_position: 00
---
:::tip
Starting in version 0.50, metrics views has been separated from explore dashboards. This allows for a cleaner, more accessible metrics layer and the ability to build various dashboards and components on top of a single metrics view. For more information on what a metrics view is please see: [What is a Metrics View?](/concepts/metrics-layer)

For migration steps, see [Migrations](/latest-changes/v50-dashboard-changes#how-to-migrate-your-current-dashboards).
:::

In Rill, dashboards are one of many components that access the metrics layer. Currently, it is only possible to create an explore dashboard but more features are on the way!
In Rill, explore dashboards are used to visually understand your data with real-time filtering based on your defined dimensions and measures in your metrics view. In the explore dashboard YAML, you can define which measures and dimensions are visible as well as define the default view when a user sees your dashboard.

![img](/img/build/dashboard/explore-dashboard.png)

Expand All @@ -17,18 +22,28 @@ When including dimensions and measures only the named resources will be included
Rill also supports the ability to exclude a set of named dimensions and measures.

```yaml
metrics_view: my_metrics_view

dimensions: [country, region, product_category] # Only these three dimensions will be included
measures:
exclude: [profit] # All measures except profit will be included
type: explore

title: Title of your Explore Dashboard
description: a description for your explore dashboard
metrics_view: my_metricsview

dimensions: '*' #can use regex
measures: '*' #can use regex

time_ranges: #was available_time_ranges, list the time of available time ranges that can be selected in your dashboard
time_zones: #was available_time_zones, list the time zones that are selectable in the dashboard

defaults: #define all the defaults within here, was default_* in previous dashboard YAML
dimensions:
measures:
...
security:
access: #only dashboard access can be defined here, other security policies must be set on the metrics view
```

:::tip
Starting in version 0.50, metrics view has been separated from dashboard. This allows for a cleaner, more accessible metrics layer and the ability to build various dashboards and components on top of a single metrics layer. For more information on why we decided to do this, please refer to the following: [Why separate the dashboard and metrics layer](/concepts/metrics-layer)

For migration steps, see [Migrations](/latest-changes/v50-dashboard-changes#how-to-migrate-your-current-dashboards).
:::


:::note Dashboard Properties
Expand All @@ -41,6 +56,19 @@ Once a dashboard is ready to preview, before [deploying to Rill Cloud](/deploy/d
![preview](/img/build/dashboard/preview-dashboard.png)


### Clickable Dimension Links
Adding an additional parameter to your dimension in the [metrics view](/build/metrics-view/) can allow for clickable links directly from the dashboard.

```yaml
dimensions:
- label: Company Url
column: Company URL
uri: true #if already set to the URL, also accepts SQL expressions
```

![url-click](/img/build/dashboard/clickable-dimension.png)


### Multi-editor and external IDE support

Rill Developer is meant to be developer friendly and has been built around the idea of keystroke-by-keystroke feedback when modeling your data, allowing live interactivity and a real-time feedback loop to iterate quickly (or make adjustments as necessary) with your models and dashboards. Additionally, Rill Developer has support for the concept of "hot reloading", which means that you can keep two windows of Rill open at the same time and/or use a preferred editor of choice, such as VSCode, side-by-side with the dashboard that you're actively developing!
Expand Down
4 changes: 4 additions & 0 deletions docs/docs/build/incremental-models/_category_.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
position: 33
label: Create Advanced Models
collapsible: true
collapsed: true
227 changes: 227 additions & 0 deletions docs/docs/build/incremental-models/incremental-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
---
title: Create Advanced Models
description: C
sidebar_label: Create Advanced Models
sidebar_position: 00
---

Unlike SQL models, YAML models provide the ability to fine tune a model to perform additional capabilities such as partitions and incremental modeling. This is important as it adds the ability to refresh or load new data in increments thus resulting in decreased down time, and decreased cost of ingestion.

:::note Take a look at the Reference!
If you are unsure what are the required parameters, please review the [reference page for Advanced Models](/reference/project-files/advanced-models).
:::

## Types of Advanced Models
The two topics of advanced models are Incremental models, Partitioned models and Staging Models.

1. [Incremental Models](#what-is-an-incremental-model)

2. [Partitioned Models](#what-are-partitions)

3. [Staging Models](staging.md)


## What is an Incremental Model?

Unlike [regular models](../models/models.md) that are created via SQL file, incremental models are defined in a YAML file and are useful to:
- decrease cost of ingestion,
- decrease loading time of new data,
- *with partitions* allow the ability to refresh specific portions of data,
- and more!

Whether your data exists in cloud storage or in a data warehouse, Rill will be able to increment and ingest depending on the settings you define in your model file.

:::tip
Incremental Modeling is in ongoing development, while we do have support for the following, please reach out to us if you have any specific requirements.

Snowflake --> ClickHouse via [Staging Model](staging.md)

S3 --> ClickHouse

Snowflake/Athena/Redshift/Bigquery --> DuckDB

S3/GCS/Azure --> DuckDB

:::

### Creating an Incremental Model

In order to enable incremental model, you will need to set the following: `incremental: true`.
```yaml
type: model

sql: #some sql query from source_table
incremental: true
```
:::tip
Incremental models with neither `state` nor `partition` defined will append data per incremental refresh from the source table. This will result in duplicate data and is not recommended.
:::
### Incremental Models with State defined

If your data is not [partitioned](#what-are-partitions), you can define the incremental model with a predefined `state` parameter.

```yaml
type: model
incremental: true

state:
sql: SELECT MAX(date) as date FROM TABLE

sql: |
SELECT * FROM TABLE
{{ if incremental }} WHERE COL_DATE = TO_DATE( '{{ .state.date }}', 'YYYY-MM-DD') + INTERVAL '1 day' {{ end }}
```

Once state is defined in an incremental model, its value can be used as a variable in your SQL statement. In the above example, the state returns the most recent `date` value from `TABLE` and adds an additional day. Then, the SQL statement will run based on the WHERE clause.

:::tip
You can verify the current value of your state in the left hand panel under Incremental Processing.
:::


In the above example, we are using patitions defined in DuckDB to define a range of days to use in the Snowflake query. The data will be written to a temp-data folder in S3 and written to ClickHouse after. Once completed, the data in temp-data will be cleared.

### Refreshing an Incremental Model

When you are testing with incremental models in Rill Developer, you will notice a change in the refresh functionality. Instead of a full refresh, you are given the option for `incremental refresh`.

![img](/img/tutorials/302/now-incremental.png)

:::tip What's the difference?
Once increments are enabled on a model, this grants you the ability to refresh the model in increments, instead of loading the full data each time. This is handy when you're data is massive and reingesting the data may take time. For a project on production, this allows for less downtime when needing to update your dashboards when the source data is updated.

There are times where a full refresh may be required. In these cases, running the full refresh is equiavalent to running a normal refresh with incremental disabled.
:::

When selecting to refresh incrementally what is being run in the CLI is:

```bash
rill project refresh --local --model <model_name>
```

Kind in mind that if you select `Full refresh` this will start the ingestion of **all of your data** from scratch. Only use this when absolutely required. When running a full refresh, the CLI command is:

```bash
rill project refresh --local --model <model_name> --full
```

## What are Partitions?

In Rill, partitions are a special type of state in which you can explicitly partition the model into parts. Depending on if your data is in cloud storage or a data warehouse, you can use the `glob` or `sql` parameters.

You can manage partitions via the CLI using the `rill project partitions` command.
```bash
rill project partitions
List partitions for a model

Usage:
rill project partitions [<project>] <model> [flags]

Flags:
--project string Project Name
--path string Project directory (default ".")
--model string Model Name
--pending Only fetch pending partitions
--errored Only fetch errored partitions
--local Target locally running Rill
--page-size uint32 Number of partitions to return
```


### Defining a Partition in a Model
Under the `partitions:` parameter, you will define the pattern in which your data is stored.

### SQL
When defining your SQL, it is important to understand the data that you are querying and creating a partition that makes sense. For example, possibly selecting a distinct customer_name per partition, or possibly partition the SQL by a chronological partition, such as month.

```yaml
partitions:
sql: SELECT range AS num FROM range(0,10) #num is the partition variable and can be referenced as {{partition.num}}
#sql: SELECT DISTINCT customer_name as cust_name from table #results in {{partition.cust_name}}
```

:::tip Using the SQL parition in the YAML
Depending on the column name of the partition, you can reference the partition using ` {{ .partition.<column_name> }}` in the model's SQL query.
```YAML
partitions:
sql: SELECT range AS num FROM range(0,10)
sql: SELECT {{ .partition.num }} AS num, now() AS inserted_on
```
:::

### glob

When defining the glob pattern, you will need to consider whether you'd partition the data by folder or file.
In the first example, we are paritioning by each file with the suffix data.csv.
```yaml
partitions:
glob: gs://rendo-test/**/*data.csv
```

If you'd prefer to partition it by folder your can add the partition parameter and define it as `directory`.
```yaml
glob:
path: gs://rendo-test/**/*data.csv
partition: directory #hive
```
:::tip Using the glob partition in the YAML
The glob partition has a predefined `{{ .partition.uri }}` reference to use in the model's SQL query.
```YAML
partitions:
glob:
connector: gcs
path: gs://path/to/file/**/*.parquet
sql: SELECT * FROM read_parquet('{{ .partition.uri }}')
```
:::

### Viewing Partitions in Rill Developer

Once `partitions:` is defined in your model, a new button will appear in the right hand panel, `View Partitions`. When selecting this, a new UI will appear with all of your partitions and more information on each. Note that these can be sorted on all, pending, and errors.

![img](/img/tutorials/302/partitions-refresh-ui.png)

You can sort the view on `all partitions`, `pending partitions` and `error partitions`. For any of these paritions, you can select 'Refresh Partition' to refresh. (This is only available for incremental partitioned models.)
- all partitions will show all the available paritions in the model.
- pending partitions will show the partitions that are waiting to be processed.
- error partitions will display any partitions that errored during ingestion.


### Viewing Partitions in the CLI
Likewise to the UI, you can view the partitions of a model within the CLI.

```
rill project partitions
List partitions for a model

Usage:
rill project partitions [<project>] <model> [flags]

Flags:
--project string Project Name
--path string Project directory (default ".")
--model string Model Name
--pending Only fetch pending partitions
--errored Only fetch errored partitions
--local Target locally running Rill
--page-size uint32 Number of partitions to return per page (default 50)
--page-token string Pagination token
```

If running locally, you will need to add the `--local` flag to the command.
```bash
rill project partitions model_name [--local]
KEY (10) DATA EXECUTED ON ELAPSED ERROR
---------------------------------- ----------- ---------------------- --------- -------
ff7416f774dfb086006d0b4696c214e1 {"num":0} 2024-11-12T22:48:49Z 95ms
...
```

:::note Incremental not enabled
If you try to refresh a partition using the following command on a partitioned but not incremental model, you will experience the following error:
```
rill project refresh --model <model_name> [--local] --partition ff7416f774dfb086006d0b4696c214e1
Error: can't refresh partitions on model "model_name" because it is not incremental
```
:::

Loading