Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-6920][CORE][VL] New APIs and refactors to allow different backends / components to be registered and used #8143

Merged
merged 29 commits into from
Dec 6, 2024

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Dec 4, 2024

This is the major API layer change for #6920.

The PR introduces a new concept Component int Gluten and adjusts the role of current concept Backend.

After the change, Backend API will become a special variant of Component API. Details are as below:

  1. Dependency
    • Component should have at least one parent. Each parent can either be another Component or Backend.
    • Backend doesn't have any parents.
  2. Component and Backend instances are used as exactly the same manner in Gluten. Gluten treats them all as regular entities that inject custom code into Gluten.
  3. Having Component's parents defined, Gluten will figure out a fixed order / priority of using the components. For exmaple:
    • A component's parent's loading APIs, namely onDriverStart / onExecutorStart, will be guaranteed to call EARLIER than the same APIs of the child.
    • A component's parent's rule injection API, namely injectRules, will be guaranteed to call LATER than the same API of the child.
  4. For legacy reason, only one SubstraitBackend will be allowed to register at the same time. Which means, CH backend and Velox backend are still not allowed to load at the same time. When they are both loaded, error will be thrown. See code.

In subsequent PRs of #6920, we will:

  1. Make iceberg / hudi / delta implement component API, remove the previous rule injection points used by their code.
  2. Make uniffle / celeborn implement component API, remove the previous rule injection points by used their code.
  3. Add an example backend that can work together with Velox backend.

Non-goals:

  1. This doesn't tend to make Velox backend and CH backend work together.

@github-actions github-actions bot added CORE works for Gluten Core VELOX RSS CLICKHOUSE labels Dec 4, 2024
Copy link

github-actions bot commented Dec 4, 2024

#6920

Copy link

github-actions bot commented Dec 4, 2024

Run Gluten Clickhouse CI on x86

1 similar comment
Copy link

github-actions bot commented Dec 4, 2024

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [GLUTEN-6920][CORE][VL] Allow different backends / components to be registered and used [GLUTEN-6920][CORE][VL] APIs to allow different backends / components to be registered and used Dec 4, 2024
@zhztheplayer zhztheplayer changed the title [GLUTEN-6920][CORE][VL] APIs to allow different backends / components to be registered and used [GLUTEN-6920][CORE][VL] New APIs and refactors to allow different backends / components to be registered and used Dec 4, 2024
Copy link

github-actions bot commented Dec 4, 2024

Run Gluten Clickhouse CI on x86

3 similar comments
Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer marked this pull request as ready for review December 5, 2024 01:59
Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

3 similar comments
Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 5, 2024

Run Gluten Clickhouse CI on x86

@PHILO-HE
Copy link
Contributor

PHILO-HE commented Dec 5, 2024

Non-goals:
This doesn't tend to make Velox backend and CH backend work together.

This is for clarifying the non-goal of this pr or mixed backend work? According to #6920, it seems making different backends co-work at runtime is the long-term goal. Just a bit confused.

@zhztheplayer
Copy link
Member Author

Non-goals:
This doesn't tend to make Velox backend and CH backend work together.

This is for clarifying the non-goal of this pr or mixed backend work. According to #6920, it seems making different backends co-work at runtime is the long-term goal. Just a bit confused.

Good question, thanks.

Velox / CH may be possible to work together but it may require for considerable effort on refactoring. I am not sure whether any user is interested with this kind of combination now so I didn't intentionally put it as a goal for this PR (will make a note in issue #6920 too). I think we can choose to have another individual thread for VL+CH working together when it's proved to be worth to have our investment.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some trivial comments. Thanks!


/**
* The base API to inject user-defined logic to Gluten. To register a component, its implementations
* should be placed to Gluten's classpath with a Java service file. Gluten will discover all the
Copy link
Contributor

@PHILO-HE PHILO-HE Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the implementation of external component may depend on this trait in its codebase? Maybe, better to place such interface code in a new module, then it can be introduced into external project as a dependency, instead of unnecessarily depending on gluten-core.

If it makes sense, it's ok to do this in the future, considering we are still in the early stage to support external component.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, better to place such interface code in a new module

If you think the APIs can be isolated from concrete implementations, this is actually what I have been doing for long time on the module gluten-core. You can refer to gluten-core's latest code structure here to see if we are aligned already.

gluten-core will become the thin bridge between Gluten backends and Spark's plugin API, with all the predefined arch level features including the enhanced query optimizer and other runtime essentials.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer, I see. We are on the same page.

* requirement from component A to component B.
*/
// format: on
def sorted(): Seq[Component] = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does cyclic dependency possibly happen? It would be better to leave some comments to clarify it, e.g., cyclic dependency is not allowed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just note error will be reported for cyclic case. Please document it at above.

Copy link
Member Author

@zhztheplayer zhztheplayer Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks and I've moved/added documents to the public API Component.sorted. This one is internal.

private class Graph {
import Graph._
private val registry: Registry = new Registry()
private val requirements: mutable.Buffer[(Int, Class[_ <: Component])] = mutable.Buffer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use depend/dependency to describe the requirement concept? Not quite sure which one is logically clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do parent / requirement sound really unclear? If so I'd make them dependency which is also OK to me.

Copy link

github-actions bot commented Dec 6, 2024

Run Gluten Clickhouse CI on x86

Copy link

github-actions bot commented Dec 6, 2024

Run Gluten Clickhouse CI on x86

*
* 1. [component-B, component-A, component-C]
* 2. [component-C, component-B, component-A]
* 3. [component-B, component-C, component-A]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions need clarification from my superficial understanding.

  1. The class loading order potentially decides the occurrence of any one of the above ordered legal combination?

  2. If A depends on both B & C and there is no dependency between B and C, I assume the above 2nd/3rd combinations are acceptable. Is it possible that they produce different behaviors? For example, both B and C override some rule in A. The order of B and C will determine which one actually overrides that rule?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class loading order potentially decides the occurrence of any one of the above ordered legal combination?

That's possible so far. We could introduce some kind of configurations to determine on a fixed load order but that's messing things up so we should only do that when really necessary.

For example, both B and C override some rule in A. The order of B and C will determine which one actually overrides that rule?

If A depends on B & C, then it means A could override rules in B or C, not the opposite. Say if rule in A tends to create AScanExec while rules in B / C tend to create BScanExec / CScanExec, then A's rule will be applied in higher precedence so AScanExec will be eventually used.

This is the case in heuristic query planner. If RAS planner is used, the rules are loaded with exactly the same importance. Using which one is subject to the cost model.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

import java.util.concurrent.atomic.AtomicBoolean;

import scala.runtime.BoxedUnit;

// Initialize native backend before calling any native methods from Java side.
public final class NativeBackendInitializer {
private static final Logger LOG = LoggerFactory.getLogger(NativeBackendInitializer.class);
private static final AtomicBoolean initialized = new AtomicBoolean(false);
private static final Map<String, NativeBackendInitializer> instances = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since This doesn't tend to make Velox backend and CH backend work together, why we need a backend map here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not tended actually. It's just because the module gluten-arrow can no longer get a precise backend name because it's decoupled with gluten-substrait so it may see multiple backends loaded. There is another option to move this class to backends-velox so no such change is needed.

@zhztheplayer
Copy link
Member Author

I am merging this. Let me know if any other questions @jackylee-ch

Thanks for you guys' review!

@zhztheplayer zhztheplayer merged commit 232ce55 into apache:main Dec 6, 2024
47 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants