-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Storage write API in BigQuery connector #18897
Conversation
509ea04
to
e92d7ed
Compare
} | ||
|
||
@Override | ||
public CompletableFuture<?> appendPage(Page page) | ||
{ | ||
InsertAllRequest.Builder batch = InsertAllRequest.newBuilder(tableId); | ||
JSONArray batch = new JSONArray(); | ||
for (int position = 0; position < page.getPositionCount(); position++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the data be batched based on a request size limit (config)? https://cloud.google.com/bigquery/quotas#write-api-limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to do but pre-existing issue.
6d1e190
to
3b10a3f
Compare
return NOT_BLOCKED; | ||
} | ||
|
||
private void insertWithCommitted(JSONArray batch) | ||
{ | ||
WriteStream stream = WriteStream.newBuilder().setType(COMMITTED).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are creating a new stream per Page.
BigQuery has a limit of 1k streams open at a time and 30k/4 hours (7.5k creations per hour).
Let's either document this if we think it's shouldn't be a problem or let's consider creating a single stream per page-sink/tablewriter task.
Also do we consider to use "pending mode" long term to provide proper isolation? With current mode ("committed") if a single stream fails then writes from other streams will still succeed and be visible. i.e. it's not ACID and there's no way to rollback. Note that this is same behaviour that we already had so it's not a regression in that sense.
3b10a3f
to
f38227b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
.orElseGet(remoteTableName::toTableName); | ||
// TODO: Consider using PENDING mode | ||
WriteStream stream = WriteStream.newBuilder().setType(COMMITTED).build(); | ||
CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder().setParent(tableName.toString()).setWriteStream(stream).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: multi-line
CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder().setParent(tableName.toString()).setWriteStream(stream).build(); | |
CreateWriteStreamRequest createWriteStreamRequest = CreateWriteStreamRequest.newBuilder() | |
.setParent(tableName.toString()) | |
.setWriteStream(stream) | |
.build(); |
f38227b
to
89d0540
Compare
@ebyhr Does the docs need updating about any new IAM permissions which are needed. |
Release notes
(x) Release notes are required, with the following suggested text: