First cut at bigtable support #15

jneem · 2022-11-15T23:43:49Z

I've been poking at this for a while, and I think it's gotten to the point where it's worth asking whether I'm on the right track. Apart from wrapping the bigtable api, it does just enough of the bigtable table admin api to be able to run examples on the bigtable emulator.

The main missing things I know of are

any testing whatsoever (including support for conveniently spinning up an emulator)
retry handing for reading (but it's complicated enough that I think testing support is needed first)

rnarubin

This is really great work, thanks @jneem

Some broad thoughts:

I like the direction of the API, especially compared to the predecessor solana-bigtable-connection used in pixelstore. Streaming rows/bytes feels like the right abstraction rather than a whole vec at a time
I'd prefer breaking changes like prost/tonic upgrades in separate PRs. It would help from a maintenance perspective to understand whether adding bigtable is a major or minor version change (ideally minor, but not a deal breaker). For the doc-comment build problem, prost-build 0.9 has a disable_comments(proto_paths) option. This isn't a great solution, but it's at least a reasonable workaround for those types that fail to build, and then a later PR can re-introduce the comments along with the version bumps

As an implementor of a new ya-gcp module, I'd also like to hear your opinions on the structure/patterns that the library has used. There are lots of improvements I'd like to make eventually, notably I think the Connector/DefaultConnector was the wrong layer to abstract; it would probably be better to abstract over the whole Service impl passed to tonic. Then users could do their own auth/load balancing/whatever, while the lib would provide a reasonable but replaceable default.

rnarubin · 2022-11-16T18:40:39Z

src/bigtable/admin.rs

+impl BigtableTableAdminConfig {
+    /// If the BIGTABLE_EMULATOR_HOST environment variable is set, use that variable to
+    /// override the endpoint.
+    pub fn with_emulator_env_var(mut self) -> Self {


The signature with_emulator_env_var(mut self) -> Self is tricky, because the easiest mistake is to forget to set the env var and call this function. It will silently do nothing, and then the calls will later fail with perhaps unclear messages (or worse yet, succeed by modifying real tables in the cloud!)

-> Option<Self> or -> Result<Self, ...> might be ok, but in general the emulator typically needs other config like disabled auth. I'd suggest removing this method until the emulator support shapes up more (like the emulator client lib) and perhaps have an emulator.get_config() method or something like that

Fair enough, I've removed these pending a better handling for emulators. My original motivation, though, was the idea that application should basically always honor BIGTABLE_EMULATOR_HOST if it's set (AFAICT the go client library always does). So I was expecting to basically always call this function, and it's completely normal for the env var to be unset.

src/bigtable/mod.rs

src/bigtable/mutation.rs

jneem · 2022-11-24T01:40:28Z

Thanks for the comments and sorry for the slow response! I've had a busy week, but I think I can get another version (without the breaking version bumps) ready soon after the Thanksgiving break.

jneem · 2022-11-29T00:21:37Z

Ok, so the updated PR reverts the various version bumps, using disable_comments to work around the failing doctests.

Regarding changes to the connection abstraction, it would be cool to decouple auth/load-balancing/etc. From the point of view of authoring a new module I don't think it makes much difference either way: all I did was copy a few where clauses from another module and everything else just worked. I guess that means it should be easy to adapt to whatever changes are made to the connection abstraction.

jneem · 2022-12-23T22:29:04Z

Ok, the testing isn't as comprehensive as it should be (especially regarding retries), but at least there's something there so I'll mark is as non-draft.

If you know of any easy way of injecting (retryable) errors into the gRPC stream, I'd very much like to know :)

k1nkreet · 2023-02-14T10:36:24Z

Hey @rnarubin, @jneem. I wonder if I can help somehow to move this PR forward?

I see there are two CI failures, one is probably could be fixed with adding bigtable component installation alongside with pubsub-emulator in .github/workflows/ci.yaml. The other one is because it uses Vec::retain_mut available since 1.61.0 whereas CI configuration specifically checks tests passing with 1.59.0. I wonder if it means that this part should be rewritten without retain_mut or the minimal Rust version on CI could be updated?

rnarubin · 2023-02-14T17:10:08Z

Hi @k1nkreet. Yes fixing these tests is a great step.

Fixing emulators in CI would be a big help. Even the pubsub emulator doesn't work (not sure why). In the workflow there are skipped tests because they use pubsub but there was something broken about interacting with it.
Upgrading to 1.61 should be fine, it's several releases old by now

jneem · 2023-02-15T16:17:42Z

Thanks for the ping! I made a separate PR for the MSRV bump. I'm less sure what to do about the emulator thing. I guess I'll try just updating the CI to include it, but if that doesn't work I might need some help...

rnarubin

I remember reading (from another bigtable rust lib) about a conformance data set that google provides for testing edge cases in reads: https://github.com/googleapis/conformance-tests/blob/main/bigtable/v2/readrows.json liufuyang/bigtable_rs#7 (comment)

Adding these tests would help a lot with confidence, it's tricky state logic in read_rows. That doesn't necessarily have to be in this PR (maybe simple reads are good enough for users in feature store/pixel store?), but it should be written up as a task

src/emulator.rs

src/bigtable/mod.rs

rnarubin · 2023-02-15T23:17:14Z

src/bigtable/mod.rs

+    // TODO: it would be nice to take types more general than Bytes, but the
+    // RangeBounds trait makes that annoying: it doesn't allow extracting the
+    // start/end by value.
+    pub fn with_range(mut self, range: impl RangeBounds<Bytes>) -> Self {


From a user perspective, ranges over strings would be really nice if possible (without conversion to Bytes).

After some thought, the by-value problem is perhaps solveable without much user-facing trouble -- but non-trivial work in the library. Suppose instead of using std's RangeBounds<T>, we define our own trait OwnedRangeBounds<T> and implement it (individually, no blanket) over the existing RangeBounds implementors like Range/RangeFrom/(Bound, Bound)/etc.

OwnedRangeBounds<T> would provide fn into_bounds(self) -> (Bound<T>, Bound<T>) which gets us the owned values we need. By implementing over existing std range types, users can still call with the ../..= sugar, it's essentially transparent to them.

The only exception is that arbitrary/non-stdlib implementors of std::ops::RangeBounds will not be compatible out-of-the-box. That's a price worth paying IMO, I'd expect this to be rare and they can implement ya_gcp::OwnedRangeBounds if they really want it

FWIW this is lower priority than MVP correctness. It doesn't have to happen in this PR, just an idea

src/bigtable/mod.rs

rnarubin · 2023-02-16T00:11:32Z

src/bigtable/mod.rs

+            let row = self.finish_row();
+            if matches!(
+                row_status,
+                v2::read_rows_response::cell_chunk::RowStatus::CommitRow(_),


There isn't a handler for the other RowStatus variant, ResetRow which requires the data for the row key to be discarded.

Given that functionality, it might be better to accumulate a Vec<Bytes> (or better yet SmallVec<[Bytes; 4]>) instead of copying bytes into a BytesMut. Then a reset doesn't incur wasted copying, and as a bonus the single-chunk case gets zero copies

I think this is actually correct, although I agree that is isn't clear, and that SmallVec<Bytes> makes sense: self.finish_row() clears the current row, so if we get the ResetRow, we clear the current row from our state and return None.

rnarubin · 2023-02-16T00:23:30Z

src/bigtable/mod.rs

+            // If we get to this point, there was a retry-able error. We'll loop back and retry the
+            // request again, but first let's modify the request to avoid re-requesting previously-returned
+            // data.
+            if let Some(key) = last_key {


there isn't a loop around the inner.read_rows call, this will exit the stream without retrying. There need to be 2 loops i think: 1 around inner.read_rows and 1 around response.next.

As a matter of convention, I prefer using explicit continues instead of "fall-throughs" in these kinds of non-trivial repetitions (and in the case of nesting, labelled loops with labelled continues). That can more easily catch a case like this where it's hard to tell what portion of the logic needs to repeat. Here's an example from pubsub streaming

ya-gcp/src/pubsub/streaming_subscription.rs

Lines 523 to 538 in d257f56

match should_retry {

// if the retry policy determines a retry is possible, sleep for the

// given backoff and then try reconnecting

Some(backoff_sleep) => {

let span = debug_span!("backoff_sleep");

let _guard = span.enter();

backoff_sleep.await;

continue 'reconnect;

}

// if the policy does not provide a sleep, then it determined that the

// operation is terminal (or the retries have been exhausted). Yield

// the error, and then exit

None => {

error.metadata_mut().insert("subscription", subscription_meta);

yield Err(error);

break 'reconnect;

Oh, good catch! As you can see, I haven't figured out a way to test the retry path. I thought about making a tonic::transport::Channel that injects errors somehow, but it didn't seem super easy. Do you know of a good way?

I don't know an easy way. The most robust way i can think of is to use the proto service schema to generate a stub bigtable server, and then mock messages+retriable errors. That's of course a lot of setup, but provides good abstraction without touching the client.

There could be easier solutions, I'm not sure

This fixes a spurious dead-code warning that was causing tests to fail. See rust-lang/rust#102217

jneem · 2023-02-17T22:28:28Z

Thanks for the comments! I agree that the flushing is too permissive. If the bigtable server returns responses that violate the invariants, I guess the least bad option is to treat it as an unrecoverable error, right?

I will try to deal with the low-hanging fruit, and then I think maybe @k1nkreet will have some time to work on this? In any case I'll give him write access to the PR since I'll be off on Monday.

- make the `ResetRow` control flow clearer - add a loop for retry, and avoid fallthrough - update emulator docs - document `restrict_to_after` better

xlambein · 2023-03-08T08:33:25Z

Using this PR yesterday, I was in a situation where I had to call BigtableClient::mutate_rows directly, rather than go through e.g. set_rows_data. Doing this, I got a bigtable error saying the table name didn't exist, which I eventually found out was because methods like set_rows_data prepend a required prefix to the name:

ya-gcp/src/bigtable/mod.rs

Line 554 in b2e3ea7

let table_name = format!("{}{}", self.table_prefix, table_name);

The same is also true for the read_rows method.

IMO either this should be documented in mutate_row(s) and read_rows, or these lower-level methods should also prepend the prefix. In the case of the former, BigtableClient should have a method to construct a fully-qualified table name, because currently the prefix is private.

matthew-healy · 2023-03-10T13:35:44Z

@xlambein - Perhaps @jneem can correct me here, but it seems like the design intention was for mutate_rows to be 1:1 with the MutateRows rpc (which you can find documented here). As such it directly uses the proto-generated MutateRowsRequest type as its argument. That type has a table_name field which is expected to include the fully-qualified table name.

My initial feeling is that it'd be incongruous to still take that type as an argument but then opaquely change the table_name. Equally, the current state of affairs will clearly lead to mistakes.

I think the best way forward is to change the mutate_rows and mutate_row methods to no longer directly take the proto types as arguments, and then build the proto requests privately so that any call to the BigtableClient is always building the table_name from the prefix & supplied name.

That said, I'm interested to hear what @jneem and @rnarubin think of this.

(Edit: it's also likely this could be done in a follow-up PR so as not to keep this review open significantly longer.)

matthew-healy · 2023-03-10T13:37:30Z

Also @rnarubin - I think I've resolved all of the blocking comments, so have re-requested you review.

rnarubin · 2023-03-10T19:35:43Z

Re: mutate_rows vs set_row*

I agree with having a low-level function that matches the rpc API as closely as possible. Because there's a lot of fine-grained behavior defined in the message request, replicating that control in-library and then constructing the request internally would be difficult to maintain (probably lots of boilerplate, subtle behaviors). So i would advise keeping the low-level one with just the generated proto message request as an argument, for users who need that control.

To address @xlambein 's problem, adding a method to construct the fully specified table name is a good idea. It's a shame that the bigtable schema doesn't document the name format on the field in MutateRowsRequest, they do in the MutateRowRequest (note singular)

/// Required. The unique name of the table to which the mutations should be applied.

vs

/// Required. The unique name of the table to which the mutation should be applied.
/// Values are of the form
/// `projects/<project>/instances/<instance>/tables/<table>`.

jneem · 2023-03-10T21:12:03Z

I realized that I was wrong about the truncation of open intervals, so I pushed a fix.

rnarubin

A question @matthew-healy / @jneem / @xlambein -- how valuable is landing this PR as an MVP compared to (what sounds like the existing workflow) importing this code as a git reference.

The code here is good work, and I think it's shaping up well. However there's a lot of nuanced behavior in the BigTable service that I would not consider this production ready without more testing code:

the conformance testing i mentioned above
some kind of testing of retries (resuming at the right keys is a big area, I'd like a test for that truncation bug!)

If this is valuable as an unblocker, I am open to landing this PR and deferring tests given a few additions:

a fix on the constructed RowSet on retry mentioned below
a note in the top-level readme that BigTable support is still alpha, similar to GCS
links to filed tasks that include the testing initiatives

rnarubin · 2023-03-10T20:49:23Z

src/bigtable/mod.rs

+        if !chunk.row_key.is_empty() {
+            // We don't need to check if there's an existing row to store, because
+            // RowStatus tells us when to do that.
+            self.row.key = chunk.row_key;
+        }


// We don't need to check if there's an existing row to store, because // RowStatus tells us when to do that.

I don't think this is true. The chunk's row key, if it exists, should be checked to match the in-progress key or else return an error. As written, this would instead overwrite the in-progress key and essentially merge two rows' data. Such a case is included in the conformance tests https://github.com/googleapis/conformance-tests/blob/main/bigtable/v2/readrows.json#L604-L654

Given this is a pathological error case, I don't consider it blocking an MVP merge. However this should still be handled as part of more robust error testing

rnarubin · 2023-03-10T21:00:57Z

src/bigtable/mod.rs

+                        // first modify the request to avoid re-requesting
+                        // previously-returned data.
+                        if let Some(key) = last_key {
+                            if let Some(rows) = request.rows.as_mut() {


if the request specified no rows (i.e. read all) but there is a last key for the retry attempt, a RowSet should be created for the retry to limit the read. As currently written i believe the retry will read the whole set from the start again

matthew-healy · 2023-03-13T08:52:35Z

@rnarubin It's definitely valuable to have this merged, but as it's not currently a blocker for us I'm not sure it's necessary to rush it. I'm going to try and implement the conformance tests this week, and then I'll take a look at setting up a stubbed Bigtable server for the other test cases.

If, for whatever reason, this becomes a blocker, then we can reconsider.

matthew-healy · 2023-03-21T11:47:07Z

@rnarubin Hey! So I spent a bit of time last week working on the conformance tests. It took a little longer than I was hoping, but I managed to get them running. Unfortunately - as you predicted - they don't pass. In fact, the very first one fails because we seem to ignore the commitRow field in the CellChunk message. (I'm also having a bit of a hard time getting everything to implement UnwindSafe so I can catch the panics, so I don't yet have a good picture of how many of the tests fail, but I do know it's at least not all of them 😅 ).

That being said, I think the issues we have are mainly around poorly-behaved servers, as we're successfully using the branch without issues right now. Given the investment required to get this past alpha-maturity, I think it might be best if we do what you suggested & clean up what we have so we can merge this PR, and I'll open tickets for the remainder of the testing work.

I'll get started on the tasks you mentioned now and ping you for another review when they're done. I'll continue working on the conformance tests in-between other tasks and hopefully follow up with a PR in a few weeks.

… read

matthew-healy · 2023-03-21T15:36:26Z

I've added Bigtable support (& its alpha status) to the README, and fixed the retry issue identified above. I also created #18 and #19 to cover the requested test cases.

I'm going to tidy up my (very rough) WIP branch & open a draft PR with the conformance tests to increase the bus factor here a little too. 🚌

rnarubin

Sounds like a plan 👍

rnarubin self-requested a review November 16, 2022 17:33

rnarubin reviewed Nov 16, 2022

View reviewed changes

jneem force-pushed the jneem/bigtable branch 3 times, most recently from 31e4e15 to d674b08 Compare December 23, 2022 22:26

jneem marked this pull request as ready for review December 23, 2022 22:27

rnarubin reviewed Feb 16, 2023

View reviewed changes

jneem added 6 commits February 17, 2023 16:15

Add bigtable to protobuf generation.

a6671af

Bigtable (including basic admin) support.

31ef98c

Bump pin-project to 1.0.11.

6f2d4b4

This fixes a spurious dead-code warning that was causing tests to fail. See rust-lang/rust#102217

Split off a common emulator from pubsub emulator.

4471df1

Add bigtable emulator and first tests.

6270bdc

Try installing bigtable emulator in ci

25a2397

Address the easier review comments.

b2e3ea7

- make the `ResetRow` control flow clearer - add a loop for retry, and avoid fallthrough - update emulator docs - document `restrict_to_after` better

jneem force-pushed the jneem/bigtable branch from 9b95db1 to b2e3ea7 Compare February 17, 2023 23:10

Matthew Healy added 3 commits March 3, 2023 14:14

Remove specific pubsub reference from module doc comment

ad3c730

Reference underlying &[u8] slice rather than Bytes in CellRef

53c1460

Return explicit errors when encountering inconsistent data during a read

d16d01d

matthew-healy requested a review from rnarubin March 10, 2023 13:37

Fix truncation of open interval

0774fa7

rnarubin reviewed Mar 10, 2023

View reviewed changes

Matthew Healy added 2 commits March 21, 2023 15:00

Add Bigtable to README

a28ec78

Request the correct RowSet when an error occurrs during a whole-table…

b3a14de

… read

matthew-healy mentioned this pull request Mar 21, 2023

Implement Bigtable conformance tests #18

Open

matthew-healy requested a review from rnarubin March 21, 2023 15:36

rnarubin approved these changes Mar 21, 2023

View reviewed changes

rnarubin merged commit cc35298 into master Mar 21, 2023

jneem mentioned this pull request Apr 18, 2023

Add method and docs for fully qualified table names #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First cut at bigtable support #15

First cut at bigtable support #15

jneem commented Nov 15, 2022 •

edited

Loading

rnarubin left a comment

rnarubin Nov 16, 2022

jneem Nov 29, 2022

jneem commented Nov 24, 2022

jneem commented Nov 29, 2022

jneem commented Dec 23, 2022

k1nkreet commented Feb 14, 2023

rnarubin commented Feb 14, 2023

jneem commented Feb 15, 2023

rnarubin left a comment

rnarubin Feb 15, 2023

rnarubin Feb 16, 2023

rnarubin Feb 16, 2023

jneem Feb 17, 2023

rnarubin Feb 16, 2023

jneem Feb 17, 2023

rnarubin Feb 19, 2023

jneem commented Feb 17, 2023

xlambein commented Mar 8, 2023 •

edited

Loading

matthew-healy commented Mar 10, 2023 •

edited

Loading

matthew-healy commented Mar 10, 2023

rnarubin commented Mar 10, 2023

jneem commented Mar 10, 2023

rnarubin left a comment

rnarubin Mar 10, 2023

rnarubin Mar 10, 2023

matthew-healy commented Mar 13, 2023

matthew-healy commented Mar 21, 2023

matthew-healy commented Mar 21, 2023

rnarubin left a comment

	match should_retry {
	// if the retry policy determines a retry is possible, sleep for the
	// given backoff and then try reconnecting
	Some(backoff_sleep) => {
	let span = debug_span!("backoff_sleep");
	let _guard = span.enter();
	backoff_sleep.await;
	continue 'reconnect;
	}
	// if the policy does not provide a sleep, then it determined that the
	// operation is terminal (or the retries have been exhausted). Yield
	// the error, and then exit
	None => {
	error.metadata_mut().insert("subscription", subscription_meta);
	yield Err(error);
	break 'reconnect;

First cut at bigtable support #15

First cut at bigtable support #15

Conversation

jneem commented Nov 15, 2022 • edited Loading

rnarubin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jneem commented Nov 24, 2022

jneem commented Nov 29, 2022

jneem commented Dec 23, 2022

k1nkreet commented Feb 14, 2023

rnarubin commented Feb 14, 2023

jneem commented Feb 15, 2023

rnarubin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jneem commented Feb 17, 2023

xlambein commented Mar 8, 2023 • edited Loading

matthew-healy commented Mar 10, 2023 • edited Loading

matthew-healy commented Mar 10, 2023

rnarubin commented Mar 10, 2023

jneem commented Mar 10, 2023

rnarubin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew-healy commented Mar 13, 2023

matthew-healy commented Mar 21, 2023

matthew-healy commented Mar 21, 2023

rnarubin left a comment

Choose a reason for hiding this comment

jneem commented Nov 15, 2022 •

edited

Loading

xlambein commented Mar 8, 2023 •

edited

Loading

matthew-healy commented Mar 10, 2023 •

edited

Loading