-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(batchGet): synchronize batchGet() behavior with get() #233
Conversation
return RestliUtils.toTask(() -> { | ||
final Map<URN, KEY> urnMap = | ||
ids.stream().collect(Collectors.toMap(id -> toUrn(id), Function.identity())); | ||
ids.stream().collect(Collectors.toMap(this::toUrn, Function.identity())); | ||
urnMap.keySet().removeIf(urn -> !getLocalDAO().exists(urn)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line will be expensive when batchGet 100 urns. Assuming each existence check takes just 10ms, then it's still 1 second in total. Batch get 100 urns is very realistic in my opinion.
I think there could be two options.
-
Create a new exists method that can take in a set of keys such as
exists(Set<KEY> keys)
, then in SQL we can doselect urn from table where urn=a or urn=b or urn=c or ...
in which a,b,c ... are the keys. This will give us all the valid urns. -
Just remvoe this existence check and directly pass it to DAO. It could generate a
select statement UNION select statement ...
with a bunch invalid urns and nothing returned. But I imagine this case is rare.
@@ -164,10 +164,13 @@ public Task<VALUE> get(@Nonnull KEY id, | |||
public Task<Map<KEY, VALUE>> batchGet( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something similar the following?
public Task<BatchResult<KEY, VALUE>> batchGetV2(
@Nonnull Set<KEY> ids,
@QueryParam(PARAM_ASPECTS) @Optional @Nullable String[] aspectNames) {
Map<KEY, VALUE> batch = new HashMap<KEY, VALUE>();
Map<KEY, RestLiServiceException> errors = new HashMap<KEY, RestLiServiceException>();
for (KEY id : ids)
{
VALUE g = getLocalDAO().get(id);
if (g != null)
{
batch.put(id, g);
}
else
{
errors.put(id, new RestLiServiceException(HttpStatus.S_404_NOT_FOUND));
}
}
return RestliUtils.toTask(() -> new BatchResult<KEY, VALUE>(batch, errors));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with get
and gives more granular view on the batch get result. In my opinion, it's better than throwing a 404 when all not found but no error at all when one record not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is better. However, this would not be a simple change since it will affect batchGet's downstream users as we will be changing the return type from Task<Map<KEY, VALUE>> to Task<BatchGetResult<Key, VALUE>>. I would argue that we can avoid this hassle by using a cleaner, but maybe not fully "correct" approach. The core thing is that we will not omit 404 under any circumstances so that we don't give potentially misleading informatino. I see two options like:
- For keys that do not exist in the db, we will just not add them to the map (I think that's what you suggested in the other comment - second suggestion). So if all keys don't exist, we will just return empty map.
- For keys that do not exist in the db, we can add
(key, null)
to the map. So if all keys don't exist, we will return a non-empty map but all values will be null.
Either way, we will need to form some SLA contract with users on this behavior so they know what to expect with this API. I think 1) is probably better since it's easier to check how many results were returned by the call. WDYT about removing 404s altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upside is this approach is easier to implement and should not affect any existing downstream users. The downside is it will not be consistent with get
behavior since we will not throw any 404s. And users need to be aware of the behavior we decide on when they use this API.
errors.put(urnMap.get(entry.getKey()), new RestLiServiceException(HttpStatus.S_404_NOT_FOUND)); | ||
statuses.put(urnMap.get(entry.getKey()), HttpStatus.S_404_NOT_FOUND); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need both status and errors? Both seems tell user same thing: 404 not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. Ideally I think users can just use statuses, but BatchResult only allows constructors with either just data + errors or data + statuses + errors. I added logic to add 200s for successful results to the statuses map.
To help address issue #136 .
Before:
batchGet would still return results for urns which do not exist in the db.
Now:
batchGet will throw a 404 if all urns passed in do not exist in the db.
If some or all of the urns exist, it will only return results for those that do exist in the db.
Checklist