Skip to content

Commit

Permalink
feat(firestore-bigquery-export): Upgrade lifecycle onInstall event fo…
Browse files Browse the repository at this point in the history
…r importing documents
  • Loading branch information
dackers86 authored Oct 17, 2023
2 parents 4719fc5 + 3c460b1 commit 7dcb382
Show file tree
Hide file tree
Showing 12 changed files with 538 additions and 370 deletions.
2 changes: 2 additions & 0 deletions firestore-bigquery-export/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ feature - add oldData to the record

fixed - updating table metadata too often

feature - add lifecycle event to export existing documents to Bigquery

## Version 0.1.26

docs - correct service account name
Expand Down
4 changes: 3 additions & 1 deletion firestore-bigquery-export/POSTINSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,9 @@ When defining a specific BigQuery project, a manual step to set up permissions i
### _(Optional)_ Import existing documents
This extension only sends the content of documents that have been changed -- it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the import script provided by this extension.
If you chose not to automatically import existing documents when you installed this extension, you can backfill your BigQuery dataset with all the documents in your collection using the import script.
If you don't either enable automatic import or run the import script, the extension only exports the content of documents that are created or changed after installation.
The import script can read all existing documents in a Cloud Firestore collection and insert them into the raw changelog table created by this extension. The script adds a special changelog for each document with the operation of `IMPORT` and the timestamp of epoch. This is to ensure that any operation on an imported document supersedes the `IMPORT`.
Expand Down
8 changes: 6 additions & 2 deletions firestore-bigquery-export/PREINSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,13 @@ If you follow these steps, your changelog table should be created using your cus

#### Backfill your BigQuery dataset

This extension only sends the content of documents that have been changed -- it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the [import script](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md) provided by this extension.
To import documents that already exist at installation time into BigQuery, answer **Yes** when the installer asks "Import existing Firestore documents into BigQuery?" The extension will export existing documents as part of the installation and update processes.

**Important:** Run the import script over the entire collection _after_ installing this extension, otherwise all writes to your database during the import might be lost.
Alternatively, you can run the external [import script](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md) to backfill existing documents. If you plan to use this script, answer **No** when prompted to import existing documents.

**Important:** Run the external import script over the entire collection _after_ installing this extension, otherwise all writes to your database during the import might be lost.

If you don't either enable automatic import or run the import script, the extension only exports the content of documents that are created or changed after installation.

#### Generate schema views

Expand Down
8 changes: 6 additions & 2 deletions firestore-bigquery-export/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,13 @@ If you follow these steps, your changelog table should be created using your cus

#### Backfill your BigQuery dataset

This extension only sends the content of documents that have been changed -- it does not export your full dataset of existing documents into BigQuery. So, to backfill your BigQuery dataset with all the documents in your collection, you can run the [import script](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md) provided by this extension.
To import documents that already exist at installation time into BigQuery, answer **Yes** when the installer asks "Import existing Firestore documents into BigQuery?" The extension will export existing documents as part of the installation and update processes.

**Important:** Run the import script over the entire collection _after_ installing this extension, otherwise all writes to your database during the import might be lost.
Alternatively, you can run the external [import script](https://github.com/firebase/extensions/blob/master/firestore-bigquery-export/guides/IMPORT_EXISTING_DOCUMENTS.md) to backfill existing documents. If you plan to use this script, answer **No** when prompted to import existing documents.

**Important:** Run the external import script over the entire collection _after_ installing this extension, otherwise all writes to your database during the import might be lost.

If you don't either enable automatic import or run the import script, the extension only exports the content of documents that are created or changed after installation.

#### Generate schema views

Expand Down
76 changes: 74 additions & 2 deletions firestore-bigquery-export/extension.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ resources:
eventType: providers/cloud.firestore/eventTypes/document.write
resource: projects/${param:PROJECT_ID}/databases/(default)/documents/${param:COLLECTION_PATH}/{documentId}

- name: fsimportexistingdocs
type: firebaseextensions.v1beta.function
description:
Imports exisitng documents from the specified collection into BigQuery. Imported documents will have
a special changelog with the operation of `IMPORT` and the timestamp of epoch.
properties:
runtime: nodejs14
taskQueueTrigger: {}

- name: syncBigQuery
type: firebaseextensions.v1beta.function
description: >-
Expand All @@ -68,6 +77,14 @@ resources:
runtime: nodejs18
taskQueueTrigger: {}

- name: initBigQuerySync
type: firebaseextensions.v1beta.function
description: >-
Runs configuration for sycning with BigQuery
properties:
runtime: nodejs18
taskQueueTrigger: {}

- name: setupBigQuerySync
type: firebaseextensions.v1beta.function
description: >-
Expand Down Expand Up @@ -321,6 +338,61 @@ params:
value: no
default: no
required: true

- param: DO_BACKFILL
label: Import existing Firestore documents into BigQuery?
description: >-
Do you want to import existing documents from your Firestore collection into BigQuery? These documents
will have each have a special changelog with the operation of `IMPORT` and the timestamp of epoch.
This ensures that any operation on an imported document supersedes the import record.
type: select
required: true
options:
- label: Yes
value: yes
- label: No
value: no

- param: IMPORT_COLLECTION_PATH
label: Existing documents collection
description: >-
What is the path of the the Cloud Firestore Collection you would like to import from?
(This may, or may not, be the same Collection for which you plan to mirror changes.)
If you want to use a collectionGroup query, provide the collection name value here,
and set 'Use Collection Group query' to true.
type: string
validationRegex: "^[^/]+(/[^/]+/[^/]+)*$"
validationErrorMessage: Firestore collection paths must be an odd number of segments separated by slashes, e.g. "path/to/collection".
example: posts
required: false

- param: USE_COLLECTION_GROUP_QUERY
label: Use Collection Group query
description: >-
Do you want to use a [collection group](https://firebase.google.com/docs/firestore/query-data/queries#collection-group-query) query for importing existing documents?
Warning: A collectionGroup query will target every collection in your Firestore project that matches the 'Existing documents collection'.
For example, if you have 10,000 documents with a sub-collection named: landmarks, this will query every document in 10,000 landmarks collections.
type: select
default: false
options:
- label: Yes
value: true
- label: No
value: false
- param: DOCS_PER_BACKFILL
label: Docs per backfill
description: >-
When importing existing documents, how many should be imported at once?
The default value of 200 should be ok for most users.
If you are using a transform function or have very large documents, you may need to set this to a lower number.
If the lifecycle event function times out, lower this value.
type: string
example: 200
validationRegex: "^[1-9][0-9]*$"
validationErrorMessage: Must be a postive integer.
default: 200
required: true


- param: KMS_KEY_NAME
label: Cloud KMS key name
Expand All @@ -338,8 +410,8 @@ events:

lifecycleEvents:
onInstall:
function: setupBigQuerySync
processingMessage: Configuring BigQuery Sync
function: initBigQuerySync
processingMessage: Configuring BigQuery Sync and running import if configured.
onUpdate:
function: setupBigQuerySync
processingMessage: Configuring BigQuery Sync
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ Object {
"collectionPath": undefined,
"datasetId": "my_dataset",
"datasetLocation": undefined,
"doBackfill": false,
"docsPerBackfill": 200,
"importCollectionPath": undefined,
"initialized": false,
"instanceId": undefined,
"kmsKeyName": "test",
Expand All @@ -22,6 +25,7 @@ Object {
"timePartitioningFieldType": undefined,
"timePartitioningFirestoreField": undefined,
"transformFunction": "",
"useCollectionGroupQuery": false,
"useNewSnapshotQuerySyntax": false,
"wildcardIds": false,
}
Expand Down
Loading

0 comments on commit 7dcb382

Please sign in to comment.