spanner-deleter is a tool to delete many rows from the tables in a Cloud Spanner database without deleting tables themselves.
Please feel free to report issues and send pull requests, but note that this application is not officially supported as part of the Cloud Spanner product.
- Delete rows from the database while keeping the underlying splits, which are typically for database pre-warming before the launch.
- Delete rows from the database without requiring strong IAM permissions for deleting tables or databases.
- Delete rows from the database using WHERE IN, >=, and <= filters for a specific database column name.
At a glance deleting many rows from the database looks an easy task, but there are several issues we could encounter when we want to delete rows from the real-world databases.
- If the table size is huge, a simple DELETE statement like
DELETE FROM table WHERE true
could easily exceed the transaction mutation limit. - Rows in interleaved tables which have
PARENT ON DELETE NO ACTION
must be deleted first before deleting the rows from the parent table, otherwise it will cause a constraint violation error. - Rows in the tables which reference other tables with
FOREIGN KEY
constraints must be deleted first before deleting rows in the referenced tables, otherwise it will cause a constraint violation error. - It would take a lot of time if we delete rows from the tables one by one.
- Running a high priority delete may stall other high priority operations such as application writes and reads.
To solve the preceding issues, this tool works as follows.
- Use Partitioned DML to delete many rows from the table to overcome the single transaction mutation limit.
- Delete rows from multiple tables in parallel to minimize the total time for deletion.
- Automatically discover the constraints between tables and delete rows from the tables in proper order without violating database constraints.
- Be able to specify the deletion Request Priority.
- This tool does not guarantee the atomicity of deletion. If you access the rows that are being deleted, you will get the inconsistent view of the database.
- This tool does not delete rows which were inserted while the tool was running.
- This tool is not aware of indexes. If specifying a column and column criteria for deletion, please make sure you have an index on that column so the tool does not try to do a COUNT by scanning the entire database table.
GO111MODULE=on go get github.com/standard-ai/spanner-deleter
Usage:
spanner-deleter [OPTIONS]
Application Options:
-p, --project= (required) GCP Project ID.
-i, --instance= (required) Cloud Spanner Instance ID.
-d, --database= (required) Cloud Spanner Database ID.
-q, --quiet Disable all interactive prompts.
-t, --tables= Comma separated table names to be truncated. Default to truncate all tables if not specified.
-o, --timeout= Number of days the command should run before it times out, defaults to 1 day if not specified.
--priority= Spanner priority, as described here https://pkg.go.dev/google.golang.org/genproto/googleapis/spanner/v1#RequestOptions_Priority . Recommended to be set to priority = 2 (RequestOptions_PRIORITY_MEDIUM). Defaults to High priority, which may overtake normal database operations
-c, --column= Column used to perform equality, greater than, and less than filter
-l, --lower= Lower bound comparison for column
-u, --upper= Upper bound comparison for column
-v, --values= Comma separated values for column to match on
Help Options:
-h, --help Show this help message
Example:
$ spanner-deleter --project=myproject --instance=myinstance --database=mydb --tables=Albums,Concerts,Singers,Songs --timeout=7 --priority=2 --column=label_id --values=first_label_id,second_label_id,third_label_id
Running with priority 2
Creating context with timeout 168h0m0s
Fetching table information from projects/myproject/instances/myinstance/databases/mydb
Albums
Concerts
Singers
Songs
Rows in these tables will be deleted. Do you want to continue? [Y/n] Y
Concerts: completed 13s [============================================>] 100% (1,200 / 1,200)
Singers: completed 13s [============================================>] 100% (6,000 / 6,000)
Albums: completed 12s [============================================>] 100% (1,800 / 1,800)
Songs: completed 11s [============================================>] 100% (3,600 / 3,600)
Executing: DELETE FROM `Concerts` WHERE label_id IN ('first_label_id','second_label_id','third_label_id')
Executing: DELETE FROM `Singers` WHERE label_id IN ('first_label_id','second_label_id','third_label_id')
Executing: DELETE FROM `Albums` WHERE label_id IN ('first_label_id','second_label_id','third_label_id')
Executing: DELETE FROM `Songs` WHERE label_id IN ('first_label_id','second_label_id','third_label_id')
Done! All rows have been deleted successfully.