Releases: OTA-Insight/bqwriter
Releases · OTA-Insight/bqwriter
v0.6.1
Bug Fixes:
- Remove duplicate deadline for InsertAll client, this setting is no longer required as anyhow the
insertAll inner BQ client has a max deadline which cannot be configured, which we can use as-is; - Add Retry logic to InsertAll client as to support Retrying tmp/internal BQ errors,
these are by design not supported by BigQuery google Go API itself,
as can be read on googleapis/google-cloud-go#3792,
but we do want to support these as retryable as that is usually what you want to do;- Permanent issues will fail after ~32s and will non the less end up in the user's logs;
- Reset flush ticket in Streamer always, even in case of a put-flush error, this is fine by convention;
Documentation:
- Add a CONTRIBUTORS file to list anyone who contributed to this project, and is not listed as an AUTHOR;
Other changes:
- Change DefaultMaxBatchDelay from 5s to 10s;
- update
golang.org/x/net
,golang.org/x/sys
andgoogle.golang.org/genproto
to latest version (no semver);
v0.6.0
Documentation improvements:
- Rewrite the batch-driven Streamer README example to stay true to its purpose and
reflect closer the real world scenarios it is meant to serve in (TODO); - Refactor internal benchmark package as a stand-alone binary,
and rename it to integration:- The latter because it was from the beginning used
as integration dev-triggered tests rather than a true benchmark; - And the first to allow more flexibility and differentiation in how we test what streamer configuration;
- The latter because it was from the beginning used
- Add developer instructions to the README;
Bug Fixes:
- Storage Streamer Client couldn't be closed without hanging, now it can be closed;
- Storage Streamer Client can now be used with BigQuery.Schema, previously it would result in a schema name error;
- Storage Streamer Client now demoted Canceled/Unavailable code errors for append rows to debug logs: as these are related
to underlying connections being reset or a similar kind of EOF event; - improve & clean up integration tests;
v0.5.1
Update storage API documentation & end-to-end tests:
- storage writer API expects proto2 semantics, proto3 shouldn't be used (yet);
- the normalizeDescriptor
should be used to get a descriptor with nested types in order to have it work nicely with nested types; - The proto well-known types aren't yet properly supported,
and Timestamp is among them. The public docs have a section on wire format conversions:
https://cloud.google.com/bigquery/docs/write-api#data_type_conversions.- Short answer: use an int64 with epoch-micros for fields that have the Timestamp type...
and this instead ofimport "google/protobuf/timestamp.proto"
;
- Short answer: use an int64 with epoch-micros for fields that have the Timestamp type...
- Batch client supports auto-detection of BigQuery schema for CSV and Json source formats,
with Json you have to be aware in that case however that you match the casing exactly,
as otherwise it will complain about duplicate case-insensitive fields, go figure...
Bug Fixes:
- a couple of error logs used wrongfully the directive
%w
for errors instead of%v
,
this has now been corrected and should result in cleaner logs;
v0.5.0
- add batch upload support (https://cloud.google.com/bigquery/docs/batch-loading-data),
this is a third client next to the already supported InsertAll (legacy) and storage API clients,
fixes issue #2 with PR #5; - add benchmark code (mostly as end-to-end tests) with production-ready Google Cloud infrastructure, load and data;
- remove forked managedwriter code (fixes issue #4):
- managedwriter is still in active development and having to maintain our own copy would be almost a project on itself;
- author seems to be willing to fix our issues where appropriate;
- author also is willing to promote this package to the official bigquery Golang API.
There is no time promise here, the only condition seems to be that the author has to be happy with the API signatures; - there are a couple of open issues tracked for the official Google managedwriter which remain unresolved:
- make GRPC/ManagedWriter stats tracking opt-in or configurable: googleapis/google-cloud-go#5100;
- document and support correctly nested proto types for the storage API client: googleapis/google-cloud-go#5097;
- be able to configure and globally limit the gax.Retryer used for the storage API client: googleapis/google-cloud-go#5094;
Bug Fixes:
- storage API:
- fixes related to EOF errors are fixed by switching over to the latest version of the BQ Storage managedwriter;
- logger:
- std logger (STDERR) was logging without the use of newlines to separate log statements;
v0.4.1
v0.4.0
- refactor code:
- all internal code is now found in one of the internal packages,
as to have a cleaner codebase and keeping all its definitions explicitly internal; - added internal BQ Storage client, which is a heavily modified fork from
https://github.com/googleapis/google-cloud-go/tree/a2af4de215a42848368ec3081263d34782032caa/bigquery/storage/managedwriter;- the fork is only meant to be as long as required, it is desired to switch to the upstream
managed writer as soon as possible; - Only using the default stream is supported. CommittedStream and/or PendingStream
can be supported upon request;
- the fork is only meant to be as long as required, it is desired to switch to the upstream
- constants are now moved to the
constant
package of this module as to make it very clear
within the code that these are constants as well as to allow the ability for both the internal
as well as the public root package to make use of it; - the
Logger
interface is moved to its ownlog
package for the same reasons as the
introduction of theconstant
package;
- all internal code is now found in one of the internal packages,
- adds initial StorageAPI Support:
- Only using the default stream is supported. CommittedStream and/or PendingStream
can be supported upon request;
- Only using the default stream is supported. CommittedStream and/or PendingStream
- bump min Go version supported to Go 1.15, as we make use of the
time.Ticket.Reset
functionality
which is only available sine Go 1.15:- Note this feature isn't critical so if ever required for a good reason,
we can probably work around it and downgrade the min Go version once again;
- Note this feature isn't critical so if ever required for a good reason,
- updated dependencies to latest:
- google.golang.org/grpc: v1.42.0;
Other updates made to the repository:
- enforce issue templates in the OTA-Insight/bqwriter GitHub project;
- add a pull request template in the OTA-Insight/bqwriter GitHub project;
- rename this file to CHANGELOG.md (was CHANGES.md) in order to better reflect the usual conventions;
- add other conventional special files: AUTHORS and CODEOWNERS;
v0.3.1
v0.3.0
v0.2.0
- remove exponential back off logic from insertAll driven streamer client,
as this logic is already built-in the std BQ client used internally;- we do still keep the max deadline on top of that by using a deadline context;
- remove the builder-pattern approach used to build a streamer,
and instead use a clean Config approach, as to keep it as simple as possible,
while at the same time being more Go idiomatic; - upgrade
google.golang.org/grpc
tov1.41.0
, was onv1.40.0
;
v0.1.0
Initial pre-release version.
Not yet ready for production-use.
This version is already used for internal projects
at OTA Insight, mostly for testing purposes.
- provide a small API (
Streamer
) to write rows concurrently to a specific BQ table; - the client within this API can be build (
StreamerBuilder
) using a builder with sane defaults; - most configurations can be optionally configured where desired;
- dependencies are kept to the bare minimum google cloud dependencies,
with no other third party dependencies required;