-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First cut at bigtable support #15
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great work, thanks @jneem
Some broad thoughts:
- I like the direction of the API, especially compared to the predecessor
solana-bigtable-connection
used in pixelstore. Streaming rows/bytes feels like the right abstraction rather than a whole vec at a time - I'd prefer breaking changes like prost/tonic upgrades in separate PRs. It would help from a maintenance perspective to understand whether adding bigtable is a major or minor version change (ideally minor, but not a deal breaker). For the doc-comment build problem, prost-build 0.9 has a
disable_comments(proto_paths)
option. This isn't a great solution, but it's at least a reasonable workaround for those types that fail to build, and then a later PR can re-introduce the comments along with the version bumps
As an implementor of a new ya-gcp module, I'd also like to hear your opinions on the structure/patterns that the library has used. There are lots of improvements I'd like to make eventually, notably I think the Connector/DefaultConnector was the wrong layer to abstract; it would probably be better to abstract over the whole Service impl passed to tonic. Then users could do their own auth/load balancing/whatever, while the lib would provide a reasonable but replaceable default.
src/bigtable/admin.rs
Outdated
impl BigtableTableAdminConfig { | ||
/// If the BIGTABLE_EMULATOR_HOST environment variable is set, use that variable to | ||
/// override the endpoint. | ||
pub fn with_emulator_env_var(mut self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature with_emulator_env_var(mut self) -> Self
is tricky, because the easiest mistake is to forget to set the env var and call this function. It will silently do nothing, and then the calls will later fail with perhaps unclear messages (or worse yet, succeed by modifying real tables in the cloud!)
-> Option<Self>
or -> Result<Self, ...>
might be ok, but in general the emulator typically needs other config like disabled auth. I'd suggest removing this method until the emulator support shapes up more (like the emulator client lib) and perhaps have an emulator.get_config()
method or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I've removed these pending a better handling for emulators. My original motivation, though, was the idea that application should basically always honor BIGTABLE_EMULATOR_HOST
if it's set (AFAICT the go client library always does). So I was expecting to basically always call this function, and it's completely normal for the env var to be unset.
Thanks for the comments and sorry for the slow response! I've had a busy week, but I think I can get another version (without the breaking version bumps) ready soon after the Thanksgiving break. |
Ok, so the updated PR reverts the various version bumps, using Regarding changes to the connection abstraction, it would be cool to decouple auth/load-balancing/etc. From the point of view of authoring a new module I don't think it makes much difference either way: all I did was copy a few |
31e4e15
to
d674b08
Compare
Ok, the testing isn't as comprehensive as it should be (especially regarding retries), but at least there's something there so I'll mark is as non-draft. If you know of any easy way of injecting (retryable) errors into the gRPC stream, I'd very much like to know :) |
Hey @rnarubin, @jneem. I wonder if I can help somehow to move this PR forward? I see there are two CI failures, one is probably could be fixed with adding |
Hi @k1nkreet. Yes fixing these tests is a great step.
|
Thanks for the ping! I made a separate PR for the MSRV bump. I'm less sure what to do about the emulator thing. I guess I'll try just updating the CI to include it, but if that doesn't work I might need some help... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember reading (from another bigtable rust lib) about a conformance data set that google provides for testing edge cases in reads: https://github.com/googleapis/conformance-tests/blob/main/bigtable/v2/readrows.json liufuyang/bigtable_rs#7 (comment)
Adding these tests would help a lot with confidence, it's tricky state logic in read_rows. That doesn't necessarily have to be in this PR (maybe simple reads are good enough for users in feature store/pixel store?), but it should be written up as a task
// TODO: it would be nice to take types more general than Bytes, but the | ||
// RangeBounds trait makes that annoying: it doesn't allow extracting the | ||
// start/end by value. | ||
pub fn with_range(mut self, range: impl RangeBounds<Bytes>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a user perspective, ranges over strings would be really nice if possible (without conversion to Bytes).
After some thought, the by-value problem is perhaps solveable without much user-facing trouble -- but non-trivial work in the library. Suppose instead of using std's RangeBounds<T>
, we define our own trait OwnedRangeBounds<T>
and implement it (individually, no blanket) over the existing RangeBounds implementors like Range
/RangeFrom
/(Bound, Bound)
/etc.
OwnedRangeBounds<T>
would provide fn into_bounds(self) -> (Bound<T>, Bound<T>)
which gets us the owned values we need. By implementing over existing std range types, users can still call with the ..
/..=
sugar, it's essentially transparent to them.
The only exception is that arbitrary/non-stdlib implementors of std::ops::RangeBounds will not be compatible out-of-the-box. That's a price worth paying IMO, I'd expect this to be rare and they can implement ya_gcp::OwnedRangeBounds
if they really want it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is lower priority than MVP correctness. It doesn't have to happen in this PR, just an idea
src/bigtable/mod.rs
Outdated
let row = self.finish_row(); | ||
if matches!( | ||
row_status, | ||
v2::read_rows_response::cell_chunk::RowStatus::CommitRow(_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a handler for the other RowStatus variant, ResetRow
which requires the data for the row key to be discarded.
Given that functionality, it might be better to accumulate a Vec<Bytes>
(or better yet SmallVec<[Bytes; 4]>
) instead of copying bytes into a BytesMut
. Then a reset doesn't incur wasted copying, and as a bonus the single-chunk case gets zero copies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is actually correct, although I agree that is isn't clear, and that SmallVec<Bytes>
makes sense: self.finish_row()
clears the current row, so if we get the ResetRow
, we clear the current row from our state and return None
.
src/bigtable/mod.rs
Outdated
// If we get to this point, there was a retry-able error. We'll loop back and retry the | ||
// request again, but first let's modify the request to avoid re-requesting previously-returned | ||
// data. | ||
if let Some(key) = last_key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there isn't a loop around the inner.read_rows
call, this will exit the stream without retrying. There need to be 2 loops i think: 1 around inner.read_rows and 1 around response.next.
As a matter of convention, I prefer using explicit continue
s instead of "fall-throughs" in these kinds of non-trivial repetitions (and in the case of nesting, labelled loops with labelled continues). That can more easily catch a case like this where it's hard to tell what portion of the logic needs to repeat. Here's an example from pubsub streaming
ya-gcp/src/pubsub/streaming_subscription.rs
Lines 523 to 538 in d257f56
match should_retry { | |
// if the retry policy determines a retry is possible, sleep for the | |
// given backoff and then try reconnecting | |
Some(backoff_sleep) => { | |
let span = debug_span!("backoff_sleep"); | |
let _guard = span.enter(); | |
backoff_sleep.await; | |
continue 'reconnect; | |
} | |
// if the policy does not provide a sleep, then it determined that the | |
// operation is terminal (or the retries have been exhausted). Yield | |
// the error, and then exit | |
None => { | |
error.metadata_mut().insert("subscription", subscription_meta); | |
yield Err(error); | |
break 'reconnect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch! As you can see, I haven't figured out a way to test the retry path. I thought about making a tonic::transport::Channel
that injects errors somehow, but it didn't seem super easy. Do you know of a good way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know an easy way. The most robust way i can think of is to use the proto service schema to generate a stub bigtable server, and then mock messages+retriable errors. That's of course a lot of setup, but provides good abstraction without touching the client.
There could be easier solutions, I'm not sure
This fixes a spurious dead-code warning that was causing tests to fail. See rust-lang/rust#102217
Thanks for the comments! I agree that the flushing is too permissive. If the bigtable server returns responses that violate the invariants, I guess the least bad option is to treat it as an unrecoverable error, right? I will try to deal with the low-hanging fruit, and then I think maybe @k1nkreet will have some time to work on this? In any case I'll give him write access to the PR since I'll be off on Monday. |
- make the `ResetRow` control flow clearer - add a loop for retry, and avoid fallthrough - update emulator docs - document `restrict_to_after` better
9b95db1
to
b2e3ea7
Compare
Using this PR yesterday, I was in a situation where I had to call Line 554 in b2e3ea7
The same is also true for the IMO either this should be documented in |
@xlambein - Perhaps @jneem can correct me here, but it seems like the design intention was for My initial feeling is that it'd be incongruous to still take that type as an argument but then opaquely change the I think the best way forward is to change the That said, I'm interested to hear what @jneem and @rnarubin think of this. (Edit: it's also likely this could be done in a follow-up PR so as not to keep this review open significantly longer.) |
Also @rnarubin - I think I've resolved all of the blocking comments, so have re-requested you review. |
Re: mutate_rows vs set_row* I agree with having a low-level function that matches the rpc API as closely as possible. Because there's a lot of fine-grained behavior defined in the message request, replicating that control in-library and then constructing the request internally would be difficult to maintain (probably lots of boilerplate, subtle behaviors). So i would advise keeping the low-level one with just the generated proto message request as an argument, for users who need that control. To address @xlambein 's problem, adding a method to construct the fully specified table name is a good idea. It's a shame that the bigtable schema doesn't document the name format on the field in
vs
|
I realized that I was wrong about the truncation of open intervals, so I pushed a fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question @matthew-healy / @jneem / @xlambein -- how valuable is landing this PR as an MVP compared to (what sounds like the existing workflow) importing this code as a git reference.
The code here is good work, and I think it's shaping up well. However there's a lot of nuanced behavior in the BigTable service that I would not consider this production ready without more testing code:
- the conformance testing i mentioned above
- some kind of testing of retries (resuming at the right keys is a big area, I'd like a test for that truncation bug!)
If this is valuable as an unblocker, I am open to landing this PR and deferring tests given a few additions:
- a fix on the constructed RowSet on retry mentioned below
- a note in the top-level readme that BigTable support is still alpha, similar to GCS
- links to filed tasks that include the testing initiatives
if !chunk.row_key.is_empty() { | ||
// We don't need to check if there's an existing row to store, because | ||
// RowStatus tells us when to do that. | ||
self.row.key = chunk.row_key; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We don't need to check if there's an existing row to store, because // RowStatus tells us when to do that.
I don't think this is true. The chunk's row key, if it exists, should be checked to match the in-progress key or else return an error. As written, this would instead overwrite the in-progress key and essentially merge two rows' data. Such a case is included in the conformance tests https://github.com/googleapis/conformance-tests/blob/main/bigtable/v2/readrows.json#L604-L654
Given this is a pathological error case, I don't consider it blocking an MVP merge. However this should still be handled as part of more robust error testing
src/bigtable/mod.rs
Outdated
// first modify the request to avoid re-requesting | ||
// previously-returned data. | ||
if let Some(key) = last_key { | ||
if let Some(rows) = request.rows.as_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the request specified no rows (i.e. read all) but there is a last key for the retry attempt, a RowSet should be created for the retry to limit the read. As currently written i believe the retry will read the whole set from the start again
@rnarubin It's definitely valuable to have this merged, but as it's not currently a blocker for us I'm not sure it's necessary to rush it. I'm going to try and implement the conformance tests this week, and then I'll take a look at setting up a stubbed Bigtable server for the other test cases. If, for whatever reason, this becomes a blocker, then we can reconsider. |
@rnarubin Hey! So I spent a bit of time last week working on the conformance tests. It took a little longer than I was hoping, but I managed to get them running. Unfortunately - as you predicted - they don't pass. In fact, the very first one fails because we seem to ignore the That being said, I think the issues we have are mainly around poorly-behaved servers, as we're successfully using the branch without issues right now. Given the investment required to get this past alpha-maturity, I think it might be best if we do what you suggested & clean up what we have so we can merge this PR, and I'll open tickets for the remainder of the testing work. I'll get started on the tasks you mentioned now and ping you for another review when they're done. I'll continue working on the conformance tests in-between other tasks and hopefully follow up with a PR in a few weeks. |
I've added Bigtable support (& its alpha status) to the README, and fixed the retry issue identified above. I also created #18 and #19 to cover the requested test cases. I'm going to tidy up my (very rough) WIP branch & open a draft PR with the conformance tests to increase the bus factor here a little too. 🚌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan 👍
I've been poking at this for a while, and I think it's gotten to the point where it's worth asking whether I'm on the right track. Apart from wrapping the bigtable api, it does just enough of the bigtable table admin api to be able to run examples on the bigtable emulator.
The main missing things I know of are