Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-7028][CH][Part-5] Refactor: add NativeOutputWriter to unify CHDatasourceJniWrapper #7395

Merged
merged 11 commits into from
Oct 9, 2024

Conversation

baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Sep 30, 2024

What changes were proposed in this pull request?

(Fixes: #7028), This is last refactor PR, we unfiy how pass info between 3.3 and 3.5.

  1. Add NativeOutputWriter, so we can unify CHDatasourceJniWrapper
NativeOutputWriter
   | - NormalFileWriter           --> for file based parquet and orc
   | - SparkMergeTreeWriter  --> for mergetree, based on clickhouse storage  
  1. Using Configuration to pass config from driver to worker, this is standard way which spark used, and hence we can use the same CHMergeTreeWriterInjects::createOutputWriter definition.
  2. Now, we use WriteRel to pass info from jvm to cpp, see below data structure. optimization is Any, messge Write is added in write_optimization.proto
WriteRel
   |- tableSchema
   |- namedTable
   |--- advancedExtension
   |----- optimization : Any => Write


message Write {
  message Common {
    string format = 1;
  }
  message ParquetWrite{}
  message OrcWrite{}
  message MergeTreeWrite{
   // ...
  }

  Common common = 1;
  oneof file_format {
    ParquetWrite parquet = 2;
    OrcWrite orc = 3;
    MergeTreeWrite mergetree = 4;
  }
}

How was this patch tested?

UTs

Copy link

#7028

Copy link

Run Gluten Clickhouse CI

Copy link

github-actions bot commented Oct 4, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from 9a13f5f to def1e47 Compare October 8, 2024 01:53
Copy link

github-actions bot commented Oct 8, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from def1e47 to 552e819 Compare October 8, 2024 09:20
Copy link

github-actions bot commented Oct 8, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from 552e819 to c139b8d Compare October 8, 2024 16:51
Copy link

github-actions bot commented Oct 8, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from c139b8d to ef03755 Compare October 9, 2024 01:20
Copy link

github-actions bot commented Oct 9, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from ef03755 to 17bb049 Compare October 9, 2024 03:35
@baibaichen baibaichen marked this pull request as ready for review October 9, 2024 03:35
Copy link

github-actions bot commented Oct 9, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen force-pushed the feature/one-pipeline-native_out branch from 17bb049 to 898498e Compare October 9, 2024 06:46
Copy link

github-actions bot commented Oct 9, 2024

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [GLUTEN-7028][CH][Part-5] [GLUTEN-7028][CH][Part-5] Refactor: add NativeOutputWriter to unify CHDatasourceJniWrapper Oct 9, 2024
@baibaichen baibaichen merged commit 5d28de6 into apache:main Oct 9, 2024
11 checks passed
@baibaichen baibaichen deleted the feature/one-pipeline-native_out branch October 9, 2024 09:38
baibaichen added a commit to Kyligence/gluten that referenced this pull request Oct 15, 2024
(cherry picked from commit 94e1837a922d5a092226b195d6c3079d320878cb)
baibaichen added a commit that referenced this pull request Oct 15, 2024
* [GLUTEN-1632][CH]Daily Update Clickhouse Version (20241015)

* Fix Build due to ClickHouse/ClickHouse#70135

* Resovle conflict with #7322

* gtest skip since plan is chagned due to #7395

(cherry picked from commit 94e1837a922d5a092226b195d6c3079d320878cb)

---------

Co-authored-by: kyligence-git <[email protected]>
Co-authored-by: Chang Chen <[email protected]>
sharkdtu pushed a commit to sharkdtu/gluten that referenced this pull request Nov 11, 2024
…HDatasourceJniWrapper (apache#7395)

* Add NativeOutputWriter

* refactor CHDatasourceJniWrapper

* WriteConfiguration

* using hadoop Configuration to pass parameter

* Implement CHMergeTreeWriterInjects::createNativeWrite

* Rename datasources.clickhouse.ClickhouseMetaSerializer => datasources.mergetree.MetaSerializer

* delete MergeTreeDeltaUtil and move its functionality to StorageMeta

* WriteConfiguration => StorageConfigProvider

* fix prefixof

* WriteConfiguration => StorageConfigProvider 2

* withStorageID
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CH] Fully Support writing parquet and mergetree in spark 3.5.x with delta protocol
2 participants