[DOP-21732] Fix Oracle reading with partitioning_mode=hash #319

dolfinus · 2024-11-27T11:07:30Z

Change Summary

ORA_HASH(col, N) returns results from 0 to N including N (N+1 in total). Spark creates exactly N partitions, so last partition gets twice the data relative to other ones.

Fixed by calling ORA_HASH(col, N-1). Other JDBC sources don't have such an issue, as they use modulo which always returns values from 0 to N-1.

Related issue number

Checklist

Commit message and PR title is comprehensive
Keep the change as small as possible
Unit and integration tests for the changes exist
Tests pass on CI and coverage does not decrease
Documentation reflects the changes where applicable
docs/changelog/next_release/<pull request or issue id>.<change type>.rst file added describing change
(see CONTRIBUTING.rst for details.)
My PR is ready to review.

codecov · 2024-11-27T11:19:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.84%. Comparing base (8c39d1d) to head (e974d34).
Report is 1 commits behind head on develop.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #319      +/-   ##
===========================================
+ Coverage    91.67%   91.84%   +0.16%     
===========================================
  Files          225      225              
  Lines         9649     9649              
  Branches       987      987              
===========================================
+ Hits          8846     8862      +16     
+ Misses         608      593      -15     
+ Partials       195      194       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dolfinus self-assigned this Nov 27, 2024

dolfinus force-pushed the bugfix/DOP-21732 branch from f57defd to 02095e6 Compare November 27, 2024 11:09

dolfinus temporarily deployed to test-pypi November 27, 2024 11:09 — with GitHub Actions Inactive

[DOP-21732] Fix Oracle reading with partitioning_mode=hash

e974d34

dolfinus force-pushed the bugfix/DOP-21732 branch from 02095e6 to e974d34 Compare November 27, 2024 11:11

dolfinus temporarily deployed to test-pypi November 27, 2024 11:11 — with GitHub Actions Inactive

dolfinus requested review from TiGrib, maxim-lixakov and IlyasDevelopment November 27, 2024 11:12

dolfinus marked this pull request as ready for review November 27, 2024 11:12

IlyasDevelopment approved these changes Nov 27, 2024

View reviewed changes

dolfinus merged commit 83e6c80 into develop Nov 27, 2024
35 checks passed

dolfinus deleted the bugfix/DOP-21732 branch November 27, 2024 11:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOP-21732] Fix Oracle reading with partitioning_mode=hash #319

[DOP-21732] Fix Oracle reading with partitioning_mode=hash #319

dolfinus commented Nov 27, 2024 •

edited

Loading

codecov bot commented Nov 27, 2024 •

edited

Loading

[DOP-21732] Fix Oracle reading with partitioning_mode=hash #319

[DOP-21732] Fix Oracle reading with partitioning_mode=hash #319

Conversation

dolfinus commented Nov 27, 2024 • edited Loading

Change Summary

Related issue number

Checklist

codecov bot commented Nov 27, 2024 • edited Loading

Codecov Report

dolfinus commented Nov 27, 2024 •

edited

Loading

codecov bot commented Nov 27, 2024 •

edited

Loading