Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(batchGet): synchronize batchGet() behavior with get() #233

Merged
merged 1 commit into from
Feb 1, 2023

Conversation

jsdonn
Copy link
Contributor

@jsdonn jsdonn commented Jan 27, 2023

To help address issue #136 .

Before:
batchGet would still return results for urns which do not exist in the db.

Now:
batchGet will throw a 404 if all urns passed in do not exist in the db.
If some or all of the urns exist, it will only return results for those that do exist in the db.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

return RestliUtils.toTask(() -> {
final Map<URN, KEY> urnMap =
ids.stream().collect(Collectors.toMap(id -> toUrn(id), Function.identity()));
ids.stream().collect(Collectors.toMap(this::toUrn, Function.identity()));
urnMap.keySet().removeIf(urn -> !getLocalDAO().exists(urn));
Copy link
Contributor

@zhixuanjia zhixuanjia Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will be expensive when batchGet 100 urns. Assuming each existence check takes just 10ms, then it's still 1 second in total. Batch get 100 urns is very realistic in my opinion.

I think there could be two options.

  1. Create a new exists method that can take in a set of keys such as exists(Set<KEY> keys), then in SQL we can do select urn from table where urn=a or urn=b or urn=c or ... in which a,b,c ... are the keys. This will give us all the valid urns.

  2. Just remvoe this existence check and directly pass it to DAO. It could generate a select statement UNION select statement ... with a bunch invalid urns and nothing returned. But I imagine this case is rare.

@@ -164,10 +164,13 @@ public Task<VALUE> get(@Nonnull KEY id,
public Task<Map<KEY, VALUE>> batchGet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something similar the following?

  public Task<BatchResult<KEY, VALUE>> batchGetV2(
      @Nonnull Set<KEY> ids,
      @QueryParam(PARAM_ASPECTS) @Optional @Nullable String[] aspectNames) {
    Map<KEY, VALUE> batch = new HashMap<KEY, VALUE>();
    Map<KEY, RestLiServiceException> errors = new HashMap<KEY, RestLiServiceException>();
    for (KEY id : ids)
    {
      VALUE g = getLocalDAO().get(id);
      if (g != null)
      {
        batch.put(id, g);
      }
      else
      {
        errors.put(id, new RestLiServiceException(HttpStatus.S_404_NOT_FOUND));
      }
    }
    return RestliUtils.toTask(() -> new BatchResult<KEY, VALUE>(batch, errors));
  }

Copy link
Contributor

@zhixuanjia zhixuanjia Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with get and gives more granular view on the batch get result. In my opinion, it's better than throwing a 404 when all not found but no error at all when one record not found.

Copy link
Contributor Author

@jsdonn jsdonn Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is better. However, this would not be a simple change since it will affect batchGet's downstream users as we will be changing the return type from Task<Map<KEY, VALUE>> to Task<BatchGetResult<Key, VALUE>>. I would argue that we can avoid this hassle by using a cleaner, but maybe not fully "correct" approach. The core thing is that we will not omit 404 under any circumstances so that we don't give potentially misleading informatino. I see two options like:

  1. For keys that do not exist in the db, we will just not add them to the map (I think that's what you suggested in the other comment - second suggestion). So if all keys don't exist, we will just return empty map.
  2. For keys that do not exist in the db, we can add (key, null) to the map. So if all keys don't exist, we will return a non-empty map but all values will be null.

Either way, we will need to form some SLA contract with users on this behavior so they know what to expect with this API. I think 1) is probably better since it's easier to check how many results were returned by the call. WDYT about removing 404s altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upside is this approach is easier to implement and should not affect any existing downstream users. The downside is it will not be consistent with get behavior since we will not throw any 404s. And users need to be aware of the behavior we decide on when they use this API.

Comment on lines +208 to +210
errors.put(urnMap.get(entry.getKey()), new RestLiServiceException(HttpStatus.S_404_NOT_FOUND));
statuses.put(urnMap.get(entry.getKey()), HttpStatus.S_404_NOT_FOUND);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need both status and errors? Both seems tell user same thing: 404 not found.

Copy link
Contributor Author

@jsdonn jsdonn Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. Ideally I think users can just use statuses, but BatchResult only allows constructors with either just data + errors or data + statuses + errors. I added logic to add 200s for successful results to the statuses map.

@zhixuanjia zhixuanjia merged commit 2d613e6 into linkedin:master Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants