Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Open or close interval need to be considered when using runtime filter to process zonemap. #54687

Merged
merged 2 commits into from
Jan 4, 2025

Conversation

trueeyu
Copy link
Contributor

@trueeyu trueeyu commented Jan 3, 2025

Why I'm doing:

When the column using topn runtime filter is low cardinality, the filtering effect is sometimes not very good, so we need to consider the open/close interval.

What I'm doing:

Open or close interval need to be considered when using runtimeFilter to process zonemap.

mysql> select count(*) from t_lineorder;                                                                                                                                                                                                                                          
+----------+                                                                                                                                                                                                                                                                      
| count(*) |                                                                                                                                                                                                                                                                      
+----------+                                                                                                                                                                                                                                                                      
| 20000000 |                                                                                                                                                                                                                                                                      
+----------+                                                                                                                                                                                                                                                                      
1 row in set (0.07 sec)                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                                  
mysql> select count(distinct lo_linenumber) from t_lineorder;                                                                                                                                                                                                                     
+-------------------------------+                                                                                                                                                                                                                                                 
| count(DISTINCT lo_linenumber) |                                                                                                                                                                                                                                                 
+-------------------------------+                                                                                                                                                                                                                                                 
|                             1 |                                                                                                                                                                                                                                                 
+-------------------------------+                                                                                                                                                                                                                                                 
1 row in set (0.26 sec)

Before the pr: 3.02s

mysql> select * from t_lineorder order by lo_linenumber asc limit 5;
...
5 rows in set (3.02 sec)

After the pr: 0.69s

mysql> select * from t_lineorder order by lo_linenumber asc limit 5;                                                                                                                                                                                                              
...
5 rows in set (0.69 sec)

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@trueeyu trueeyu requested a review from a team as a code owner January 3, 2025 09:19
@mergify mergify bot assigned trueeyu Jan 3, 2025
@trueeyu trueeyu changed the title [Enhancment] Open or close interval need to be considered when using runtimeFilter to process zonemap. [Enhancment] Open or close interval need to be considered when using runtime filter to process zonemap. Jan 3, 2025
if (*min_value > _max) return true;
} else {
if (*min_value >= _max) return true;
}
return false;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Potential dereference of null pointer(s) min_value or max_value.

You can modify the code like this:

bool filter_zonemap_with_min_max(const CppType* min_value, const CppType* max_value) const {
    if (min_value == nullptr || max_value == nullptr) return false;
    if (_left_close_interval) {
        if (*max_value < _min) return true;
    } else {
        if (*max_value <= _min) return true;
    }
    if (_right_close_interval) {
        if (*min_value > _max) return true;
    } else {
        if (*min_value >= _max) return true;
    }
    return false;
}

Note: The modification ensures appropriate handling of null pointers. However, given that there are checks verifying null values at the start of the function, it seems more likely to be a matter of logical assurance than an actual programming oversight. If there's any part of the calling context not shared here that doesn't safeguard against this issue, you should add or maintain proper null checks before dereferencing these pointers elsewhere, confirming the provided conditional logic is robust.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already add the dereference of null pointers

@trueeyu trueeyu changed the title [Enhancment] Open or close interval need to be considered when using runtime filter to process zonemap. [Enhancement] Open or close interval need to be considered when using runtime filter to process zonemap. Jan 3, 2025
Signed-off-by: trueeyu <[email protected]>
@stdpain
Copy link
Contributor

stdpain commented Jan 3, 2025

need add a benchmark case

@satanson satanson enabled auto-merge (squash) January 3, 2025 12:56
Copy link

github-actions bot commented Jan 4, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jan 4, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Jan 4, 2025

[BE Incremental Coverage Report]

fail : 4 / 6 (66.67%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exprs/runtime_filter.h 4 6 66.67% [780, 785]

@andyziye andyziye disabled auto-merge January 4, 2025 15:53
@andyziye andyziye merged commit defc9c3 into StarRocks:main Jan 4, 2025
52 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants