Skip to content

Commit

Permalink
docs: BigQuery data plane description (#182)
Browse files Browse the repository at this point in the history
* docs: BigQuery data plan description

* chore: DEPENDENCIES

* doc: added explanation about BigQuery named parameters and about use of single Part in the sink

* chore: DEPENDENCIES
  • Loading branch information
man8pr authored Jun 27, 2024
1 parent 7781326 commit 5fc8297
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 13 deletions.
29 changes: 16 additions & 13 deletions DEPENDENCIES
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ maven/mavencentral/com.apicatalog/copper-multibase/0.5.0, Apache-2.0, approved,
maven/mavencentral/com.apicatalog/copper-multicodec/0.1.1, Apache-2.0, approved, #14500
maven/mavencentral/com.apicatalog/iron-verifiable-credentials/0.14.0, Apache-2.0, approved, clearlydefined
maven/mavencentral/com.apicatalog/titanium-json-ld/1.0.0, Apache-2.0, approved, clearlydefined
maven/mavencentral/com.apicatalog/titanium-json-ld/1.4.0, Apache-2.0, approved, #13683
maven/mavencentral/com.apicatalog/titanium-json-ld/1.4.0, Apache-2.0, approved, #15200
maven/mavencentral/com.fasterxml.jackson.core/jackson-annotations/2.10.3, Apache-2.0, approved, CQ21280
maven/mavencentral/com.fasterxml.jackson.core/jackson-annotations/2.17.0, Apache-2.0, approved, #13672
maven/mavencentral/com.fasterxml.jackson.core/jackson-annotations/2.17.1, Apache-2.0, approved, #13672
Expand All @@ -17,7 +17,7 @@ maven/mavencentral/com.fasterxml.jackson.module/jackson-module-jakarta-xmlbind-a
maven/mavencentral/com.fasterxml.jackson.module/jackson-module-jakarta-xmlbind-annotations/2.17.1, Apache-2.0, approved, #13668
maven/mavencentral/com.fasterxml.jackson/jackson-bom/2.17.1, Apache-2.0, approved, #14162
maven/mavencentral/com.github.docker-java/docker-java-api/3.3.6, Apache-2.0, approved, #10346
maven/mavencentral/com.github.docker-java/docker-java-transport-zerodep/3.3.6, Apache-2.0 AND (Apache-2.0 AND BSD-3-Clause), approved, #7946
maven/mavencentral/com.github.docker-java/docker-java-transport-zerodep/3.3.6, Apache-2.0 AND (Apache-2.0 AND BSD-3-Clause), approved, #15251
maven/mavencentral/com.github.docker-java/docker-java-transport/3.3.6, Apache-2.0, approved, #7942
maven/mavencentral/com.google.android/annotations/4.1.1.4, Apache-2.0, approved, clearlydefined
maven/mavencentral/com.google.api-client/google-api-client/2.2.0, Apache-2.0, approved, clearlydefined
Expand Down Expand Up @@ -62,7 +62,7 @@ maven/mavencentral/com.google.cloud/google-cloud-secretmanager/2.42.0, Apache-2.
maven/mavencentral/com.google.cloud/google-cloud-storage/2.37.0, Apache-2.0, approved, clearlydefined
maven/mavencentral/com.google.cloud/google-iam-admin/3.37.0, Apache-2.0, approved, clearlydefined
maven/mavencentral/com.google.cloud/libraries-bom/26.38.0, , restricted, clearlydefined
maven/mavencentral/com.google.code.findbugs/jsr305/3.0.2, Apache-2.0, approved, #20
maven/mavencentral/com.google.code.findbugs/jsr305/3.0.2, CC-BY-2.5, approved, #15220
maven/mavencentral/com.google.code.gson/gson/2.10.1, Apache-2.0, approved, #6159
maven/mavencentral/com.google.code.gson/gson/2.8.9, Apache-2.0, approved, CQ23496
maven/mavencentral/com.google.crypto.tink/tink/1.13.0, Apache-2.0, approved, #14502
Expand Down Expand Up @@ -95,20 +95,20 @@ maven/mavencentral/com.google.protobuf/protobuf-java-util/3.25.3, BSD-3-Clause,
maven/mavencentral/com.google.protobuf/protobuf-java/3.25.1, BSD-3-Clause, approved, clearlydefined
maven/mavencentral/com.google.protobuf/protobuf-java/3.25.3, BSD-3-Clause, approved, clearlydefined
maven/mavencentral/com.google.re2j/re2j/1.7, BSD-3-Clause, approved, clearlydefined
maven/mavencentral/com.nimbusds/nimbus-jose-jwt/9.39.3, Apache-2.0, approved, #14830
maven/mavencentral/com.nimbusds/nimbus-jose-jwt/9.40, Apache-2.0, approved, #15156
maven/mavencentral/com.puppycrawl.tools/checkstyle/10.17.0, LGPL-2.1-or-later AND (Apache-2.0 AND LGPL-2.1-or-later) AND Apache-2.0, approved, #15077
maven/mavencentral/com.squareup.okhttp3/okhttp-dnsoverhttps/4.12.0, Apache-2.0, approved, #11159
maven/mavencentral/com.squareup.okhttp3/okhttp/4.12.0, Apache-2.0, approved, #11156
maven/mavencentral/com.squareup.okhttp3/okhttp/4.12.0, Apache-2.0, approved, #15227
maven/mavencentral/com.squareup.okhttp3/okhttp/4.9.3, Apache-2.0 AND MPL-2.0, approved, #3225
maven/mavencentral/com.squareup.okio/okio-jvm/3.6.0, Apache-2.0, approved, #11158
maven/mavencentral/com.squareup.okio/okio/3.6.0, Apache-2.0, approved, #11155
maven/mavencentral/commons-beanutils/commons-beanutils/1.9.4, Apache-2.0, approved, CQ12654
maven/mavencentral/commons-codec/commons-codec/1.11, Apache-2.0 AND BSD-3-Clause, approved, CQ15971
maven/mavencentral/commons-codec/commons-codec/1.15, Apache-2.0 AND BSD-3-Clause AND LicenseRef-Public-Domain, approved, CQ22641
maven/mavencentral/commons-codec/commons-codec/1.16.1, Apache-2.0 AND (Apache-2.0 AND BSD-3-Clause), approved, #9157
maven/mavencentral/commons-collections/commons-collections/3.2.2, Apache-2.0, approved, CQ10385
maven/mavencentral/commons-collections/commons-collections/3.2.2, Apache-2.0, approved, #15185
maven/mavencentral/commons-logging/commons-logging/1.2, Apache-2.0, approved, CQ10162
maven/mavencentral/dev.failsafe/failsafe-okhttp/3.3.2, Apache-2.0, approved, #9178
maven/mavencentral/dev.failsafe/failsafe-okhttp/3.3.2, Apache-2.0, approved, #15208
maven/mavencentral/dev.failsafe/failsafe/3.3.1, Apache-2.0, approved, #9268
maven/mavencentral/dev.failsafe/failsafe/3.3.2, Apache-2.0, approved, #9268
maven/mavencentral/info.picocli/picocli/4.7.6, Apache-2.0, approved, #4365
Expand Down Expand Up @@ -144,6 +144,7 @@ maven/mavencentral/io.setl/rdf-urdna/1.1, Apache-2.0, approved, clearlydefined
maven/mavencentral/jakarta.activation/jakarta.activation-api/2.1.3, EPL-2.0 OR BSD-3-Clause OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.jaf
maven/mavencentral/jakarta.annotation/jakarta.annotation-api/2.1.1, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.ca
maven/mavencentral/jakarta.inject/jakarta.inject-api/2.0.1, Apache-2.0, approved, ee4j.cdi
maven/mavencentral/jakarta.json/jakarta.json-api/2.1.3, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.jsonp
maven/mavencentral/jakarta.transaction/jakarta.transaction-api/2.0.0, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.jta
maven/mavencentral/jakarta.validation/jakarta.validation-api/3.0.2, Apache-2.0, approved, ee4j.validation
maven/mavencentral/jakarta.ws.rs/jakarta.ws.rs-api/3.1.0, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.rest
Expand All @@ -155,7 +156,7 @@ maven/mavencentral/net.bytebuddy/byte-buddy-agent/1.14.15, Apache-2.0, approved,
maven/mavencentral/net.bytebuddy/byte-buddy/1.14.15, Apache-2.0 AND BSD-3-Clause, approved, #7163
maven/mavencentral/net.bytebuddy/byte-buddy/1.14.16, Apache-2.0 AND BSD-3-Clause, approved, #7163
maven/mavencentral/net.bytebuddy/byte-buddy/1.14.9, Apache-2.0 AND BSD-3-Clause, approved, #7163
maven/mavencentral/net.java.dev.jna/jna/5.13.0, Apache-2.0 AND LGPL-2.1-or-later, approved, #6709
maven/mavencentral/net.java.dev.jna/jna/5.13.0, Apache-2.0 AND LGPL-2.1-or-later, approved, #15196
maven/mavencentral/net.sf.saxon/Saxon-HE/12.4, MPL-2.0 AND (MPL-2.0 AND Apache-2.0) AND (MPL-2.0 AND LicenseRef-X11-style) AND MPL-1.0 AND W3C, approved, #12716
maven/mavencentral/org.antlr/antlr4-runtime/4.13.1, BSD-3-Clause, approved, #10767
maven/mavencentral/org.apache.arrow/arrow-format/15.0.2, Apache-2.0, approved, #12854
Expand All @@ -169,8 +170,8 @@ maven/mavencentral/org.apache.commons/commons-text/1.3, Apache-2.0, approved, cl
maven/mavencentral/org.apache.httpcomponents.client5/httpclient5/5.1.3, Apache-2.0, approved, #6276
maven/mavencentral/org.apache.httpcomponents.core5/httpcore5-h2/5.1.3, Apache-2.0, approved, clearlydefined
maven/mavencentral/org.apache.httpcomponents.core5/httpcore5/5.1.3, Apache-2.0, approved, clearlydefined
maven/mavencentral/org.apache.httpcomponents/httpclient/4.5.13, Apache-2.0 AND LicenseRef-Public-Domain, approved, CQ23527
maven/mavencentral/org.apache.httpcomponents/httpclient/4.5.14, Apache-2.0 AND LicenseRef-Public-Domain, approved, CQ23527
maven/mavencentral/org.apache.httpcomponents/httpclient/4.5.13, Apache-2.0, approved, #15248
maven/mavencentral/org.apache.httpcomponents/httpclient/4.5.14, Apache-2.0, approved, #15248
maven/mavencentral/org.apache.httpcomponents/httpcore/4.4.13, Apache-2.0, approved, CQ23528
maven/mavencentral/org.apache.httpcomponents/httpcore/4.4.14, Apache-2.0, approved, CQ23528
maven/mavencentral/org.apache.httpcomponents/httpcore/4.4.16, Apache-2.0, approved, CQ23528
Expand All @@ -197,6 +198,7 @@ maven/mavencentral/org.conscrypt/conscrypt-openjdk-uber/2.5.2, Apache-2.0, appro
maven/mavencentral/org.eclipse.collections/eclipse-collections-api/11.1.0, EPL-1.0 OR BSD-3-Clause, approved, technology.collections
maven/mavencentral/org.eclipse.collections/eclipse-collections/11.1.0, EPL-1.0 OR BSD-3-Clause, approved, technology.collections
maven/mavencentral/org.eclipse.edc/asset-spi/0.7.2-SNAPSHOT, Apache-2.0, approved, technology.edc
maven/mavencentral/org.eclipse.edc/auth-spi/0.7.2-SNAPSHOT, Apache-2.0, approved, technology.edc
maven/mavencentral/org.eclipse.edc/autodoc-processor/0.7.2-SNAPSHOT, Apache-2.0, approved, technology.edc
maven/mavencentral/org.eclipse.edc/boot-lib/0.7.2-SNAPSHOT, Apache-2.0, approved, technology.edc
maven/mavencentral/org.eclipse.edc/boot-spi/0.7.2-SNAPSHOT, Apache-2.0, approved, technology.edc
Expand Down Expand Up @@ -256,6 +258,7 @@ maven/mavencentral/org.eclipse.jetty/jetty-servlet/11.0.21, EPL-2.0 OR Apache-2.
maven/mavencentral/org.eclipse.jetty/jetty-util/11.0.21, EPL-2.0 OR Apache-2.0, approved, rt.jetty
maven/mavencentral/org.eclipse.jetty/jetty-webapp/11.0.21, EPL-2.0 OR Apache-2.0, approved, rt.jetty
maven/mavencentral/org.eclipse.jetty/jetty-xml/11.0.21, EPL-2.0 OR Apache-2.0, approved, rt.jetty
maven/mavencentral/org.eclipse.parsson/parsson/1.1.6, EPL-2.0, approved, ee4j.parsson
maven/mavencentral/org.glassfish.hk2.external/aopalliance-repackaged/3.0.6, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.glassfish
maven/mavencentral/org.glassfish.hk2/hk2-api/3.0.6, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.glassfish
maven/mavencentral/org.glassfish.hk2/hk2-locator/3.0.6, EPL-2.0 OR GPL-2.0-only with Classpath-exception-2.0, approved, ee4j.glassfish
Expand Down Expand Up @@ -292,13 +295,13 @@ maven/mavencentral/org.junit.jupiter/junit-jupiter-api/5.10.1, EPL-2.0, approved
maven/mavencentral/org.junit.jupiter/junit-jupiter-api/5.10.2, EPL-2.0, approved, #9714
maven/mavencentral/org.junit.jupiter/junit-jupiter-engine/5.10.1, EPL-2.0, approved, #9711
maven/mavencentral/org.junit.jupiter/junit-jupiter-engine/5.10.2, EPL-2.0, approved, #9711
maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.1, EPL-2.0, approved, #9708
maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.2, EPL-2.0, approved, #9708
maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.1, EPL-2.0, approved, #15304
maven/mavencentral/org.junit.jupiter/junit-jupiter-params/5.10.2, EPL-2.0, approved, #15250
maven/mavencentral/org.junit.platform/junit-platform-commons/1.10.1, EPL-2.0, approved, #9715
maven/mavencentral/org.junit.platform/junit-platform-commons/1.10.2, EPL-2.0, approved, #9715
maven/mavencentral/org.junit.platform/junit-platform-engine/1.10.1, EPL-2.0, approved, #9709
maven/mavencentral/org.junit.platform/junit-platform-engine/1.10.2, EPL-2.0, approved, #9709
maven/mavencentral/org.junit.platform/junit-platform-launcher/1.10.2, EPL-2.0, approved, #9704
maven/mavencentral/org.junit.platform/junit-platform-launcher/1.10.2, EPL-2.0, approved, #15216
maven/mavencentral/org.junit/junit-bom/5.10.1, EPL-2.0, approved, #9844
maven/mavencentral/org.junit/junit-bom/5.10.2, EPL-2.0, approved, #9844
maven/mavencentral/org.junit/junit-bom/5.9.2, EPL-2.0, approved, #4711
Expand Down
47 changes: 47 additions & 0 deletions docs/developer/bigquery/bq-data-plane.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# BigQuery Data Plane

This document describes the first implementation of the BigQuery data plane.

The BigQuery data plane is implemented as a pipeline by `BigQueryDataSource` and ` BigQueryDataSink` classes.
The data source, identified by a `BigQueryDataAddress`, expects an asset defined by a query statement that, executed on a BigQuery table, returns the data to be transferred.
The data sink, also identified by a `BigQueryDataAddress`, receives the data from the source and transfers it without changes on a BigQuery destination table.

## Data source

The data source supports queries with [named parameters](https://cloud.google.com/bigquery/docs/parameterized-queries), that are similar to SQL query parameters. To set the values of the named parameters, the `BigQueryRequestParams` generated by `BigQueryRequestParamsProvider.provideSourceParams` (invoked from `BigQueryDataSourceFactory.createSource`) includes also the sink address passed when the transfer process is started by the consumer. The sink address must provide the values of the parameters in the query in the form:
```
"dataDestination": {
"type": "BigQueryData",
"project": "consumerProject",
"dataset": "destinationDataset",
"table": "destinationTable",
"@TYPE_parameterName": "parameterValue"
}
```
The TYPE is separated by the parameter name found in the query by underscore and can take values found in:
[StandardSQLTypeName](https://cloud.google.com/java/docs/reference/google-cloud-bigquery/latest/com.google.cloud.bigquery.StandardSQLTypeName)

When `BigQueryDataSource.openPartStream` is invoked, the data source creates the job to execute the query, a `PipedOutputStream` and a connected `PipedInputStream`:
the output stream is passed to a newly created thread, and used to write the fetched rows as JSON entries: the rows are grouped in pages (paginated results) and sent as a JSON array
the input stream instead is passed to a single `BigQueryPart` that is immediately returned
The `openPartStream` method returns immediately, while the created thread fetches the results and streams the results as JSON data to the sink.
The use of a single Part is driven by simplicity: using multiple parts would require to get the total number of returned rows / pages, creating the parts each with an output, and connecting each output to an input before returning.
On top of that, the thread would need to maintain the list of output stream objects and then using each one page at time.

## Data source error handling

The data source is handling exceptions in the main body of `openPartStream` method and within the started thread:
if an exception occurs in the `openPartStream` , the method returns `StreamResult.error`
if an exception occurs in the fetching of the rows in the thread, then the output stream is closed, and the `BigQueryPart` object handed over to the sink is given the occurred exception via `BigQueryPart.setException`

## Data sink

The data sink receives the single `BigQueryPart` returned by the source, and starts reading the JSON entries representing the results rows, serialized by the thread started by the source itself. Once a results page is parsed, a `JSONArray` object is created with it and passed to a `JsonStreamWriter` object, using the BigQuery Storage API:
https://cloud.google.com/bigquery/docs/write-api-streaming

The reading continues till the thread created by the source closes the corresponding output stream. When the pipe is closed, the sink checks the `BigQueryPart`, to verify whether the stream was closed due to an error, by retrieving the stored exception with `BigQueryPart.getException`.

## Data sink error handling

If an error occurs while appending the data to the destination table, or an exception is found in the transferred `BigQueryPart`, the `BigQueryDataSink.transferParts` returns `StreamResult.failure`.

0 comments on commit 5fc8297

Please sign in to comment.