[Enhancement] Only enable query context cache when there are more than one related mvs #54594

LiShuMing · 2025-01-02T09:32:56Z

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

sonarqubecloud · 2025-01-02T09:39:20Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

starrocks-cr · 2025-01-02T09:40:00Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/QueryMaterializationContext.java

-            return mvTimelinessInfos.get(mv);
-        }
+        return mvTimelinessInfos.computeIfAbsent(mv,
+                ignored -> MvRefreshArbiter.getMVTimelinessUpdateInfo(mv, true));
    }

    /**


The most risky bug in this code is:
The getQueryCacheStats() method potentially returns a null reference when the cache should have statistics.

You can modify the code like this:

public QueryCacheStats getQueryCacheStats() { return queryCacheStats; }

This change ensures that getQueryCacheStats() always returns an instance of QueryCacheStats, regardless of whether mvQueryContextCache is initialized or not.

starrocks-cr · 2025-01-02T09:40:06Z

fe/fe-core/src/main/java/com/starrocks/sql/optimizer/MvRewritePreprocessor.java

+
+                if (queryMaterializationContext.getValidCandidateMVs().size() > 1) {
+                    queryMaterializationContext.setEnableQueryContextCache(true);
+                }
            } catch (Exception e) {
                List<String> tableNames = queryTables.stream().map(Table::getName).collect(Collectors.toList());
                logMVPrepare(connectContext, "Prepare query tables {} for mv failed:{}", tableNames, e.getMessage());


The most risky bug in this code is:
There might be an overlooked scenario where the selectedRelatedMVs set remains empty after each filtering step, causing the method to return prematurely without executing subsequent processing steps. However, assuming no logical changes after empty checks, the overall flow could be unintended. There's potentially a redundant check of selectedRelatedMVs after filtering and before conversion, which might incorrectly lead to skips.

You can modify the code like this:

public void prepare(OptExpression queryOptExpression) { try { // 1. get related mvs for all input tables Set<MaterializedView> relatedMVs = getRelatedMVs(queryTables, context.getOptimizerConfig().isRuleBased()); if (relatedMVs.isEmpty()) { return; } // filter mvs which is set by config: including/excluding mvs Set<MaterializedView> selectedRelatedMVs = getRelatedMVsByConfig(relatedMVs); logMVPrepare(connectContext, "Choose {}/{} mvs after user config", selectedRelatedMVs.size(), relatedMVs.size()); // add into queryMaterializationContext for later use this.queryMaterializationContext.addRelatedMVs(selectedRelatedMVs); if (selectedRelatedMVs.isEmpty()) { return; } // Ensure selectedRelatedMVs is checked/handled only once for emptiness after being fully processed Set<MvWithPlanContext> mvWithPlanContexts; try (Timer t1 = Tracers.watchScope("MVChooseCandidates")) { selectedRelatedMVs = chooseBestRelatedMVs(queryTables, selectedRelatedMVs, queryOptExpression); if (selectedRelatedMVs.isEmpty()) { return; } // convert to mv with planContext try (Timer t2 = Tracers.watchScope("MVGenerateMvPlan")) { mvWithPlanContexts = getMvWithPlanContext(selectedRelatedMVs); } if (mvWithPlanContexts.isEmpty()) { return; } } // 4. process related mvs to candidates try (Timer t3 = Tracers.watchScope("MVPrepareRelatedMVs")) { prepareRelatedMVs(queryTables, mvWithPlanContexts); } if (queryOptExpression == null) { return; } // 5. apply view-based optimization on candidates try (Timer t4 = Tracers.watchScope("MVRewritePlan")) { processPlanWithView(queryMaterializationContext, connectContext, queryOptExpression, queryColumnRefFactory, requiredColumns); } if (queryMaterializationContext.getValidCandidateMVs().size() > 1) { queryMaterializationContext.setEnableQueryContextCache(true); } } catch (Exception e) { List<String> tableNames = queryTables.stream().map(Table::getName).collect(Collectors.toList()); logMVPrepare(connectContext, "Prepare query tables {} for mv failed:{}", tableNames, e.getMessage()); } }

starrocks-cr · 2025-01-02T09:40:09Z

fe/fe-core/src/main/java/com/starrocks/sql/common/ListPartitionDiffer.java

-            dstAtomMaps.putAll(srcDistinctAtoms.stream().collect(Collectors.toMap(x -> x, x -> srcItem)));
-            PListCell newValue =
-                    new PListCell(srcDistinctAtoms.stream().map(x -> x.getPartitionItem()).collect(Collectors.toList()));
-            result.put(key, newValue);
        }
        return result;
    }


The most risky bug in this code is:
There is a potential ClassCastException when casting PCell to PListCell without checking the type.

You can modify the code like this:

public static Map<String, PCell> diffList(Map<String, PCell> srcListMap, Map<String, PCell> dstListMap) { Map<String, PCell> result = Maps.newTreeMap(); boolean isDstContainMultiValues = dstListMap.values() .stream() .anyMatch(x -> (x instanceof PListCell) && ((PListCell) x).getPartitionItems().size() > 1); if (isDstContainMultiValues) { Map<PListAtom, PListCell> dstAtomMaps = Maps.newHashMap(); for (PCell cell : dstListMap.values()) { Preconditions.checkArgument(cell instanceof PListCell, "PListCell expected"); PListCell listCell = (PListCell) cell; listCell.toAtoms().forEach(x -> dstAtomMaps.put(x, listCell)); } for (Map.Entry<String, PCell> srcEntry : srcListMap.entrySet()) { String key = srcEntry.getKey(); PCell value = srcEntry.getValue(); if (!(value instanceof PListCell)) { continue; // Skip non-PListCell entries safely } PListCell srcItem = (PListCell) value; if (srcItem.equals(dstListMap.get(key))) { continue; } Set<PListAtom> srcAtoms = srcItem.toAtoms(); List<PListAtom> srcDistinctAtoms = srcAtoms.stream() .filter(x -> !dstAtomMaps.containsKey(x)) .collect(Collectors.toList()); if (srcDistinctAtoms.isEmpty()) { continue; } dstAtomMaps.putAll(srcDistinctAtoms.stream().collect(Collectors.toMap(x -> x, x -> srcItem))); PListCell newValue = new PListCell(srcDistinctAtoms.stream().map(x -> x.getPartitionItem()).collect(Collectors.toList())); result.put(key, newValue); } } else { for (Map.Entry<String, PCell> srcEntry : srcListMap.entrySet()) { String key = srcEntry.getKey(); PCell value = srcEntry.getValue(); if (!(value instanceof PListCell)) { continue; // Skip non-PListCell entries safely } PListCell srcItem = (PListCell) value; if (srcItem.equals(dstListMap.get(key))) { continue; } result.put(key, srcItem); } } return result; }

In this revised version, I've added checks to ensure that values are instances of PListCell before casting them. This prevents ClassCastException from occurring.

github-actions · 2025-01-02T10:14:41Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-01-02T10:18:03Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

LiShuMing requested review from a team as code owners January 2, 2025 09:32

mergify bot assigned LiShuMing Jan 2, 2025

github-actions bot added the 3.4 label Jan 2, 2025

starrocks-cr bot reviewed Jan 2, 2025

View reviewed changes

LiShuMing closed this Jan 2, 2025

LiShuMing force-pushed the fix_mv_performance_v1 branch from f8be87b to 2bce91f Compare January 2, 2025 11:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Only enable query context cache when there are more than one related mvs #54594

[Enhancement] Only enable query context cache when there are more than one related mvs #54594

LiShuMing commented Jan 2, 2025 •

edited

Loading

sonarqubecloud bot commented Jan 2, 2025

starrocks-cr bot Jan 2, 2025

starrocks-cr bot Jan 2, 2025

starrocks-cr bot Jan 2, 2025

github-actions bot commented Jan 2, 2025

github-actions bot commented Jan 2, 2025

[Enhancement] Only enable query context cache when there are more than one related mvs #54594

[Enhancement] Only enable query context cache when there are more than one related mvs #54594

Conversation

LiShuMing commented Jan 2, 2025 • edited Loading

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

sonarqubecloud bot commented Jan 2, 2025

Quality Gate passed

starrocks-cr bot Jan 2, 2025

Choose a reason for hiding this comment

starrocks-cr bot Jan 2, 2025

Choose a reason for hiding this comment

starrocks-cr bot Jan 2, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 2, 2025

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Jan 2, 2025

[BE Incremental Coverage Report]

LiShuMing commented Jan 2, 2025 •

edited

Loading