-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Use more clear type description in ParquetField #52575
[Enhancement] Use more clear type description in ParquetField #52575
Conversation
Signed-off-by: Smith Cruise <[email protected]>
#if defined(ADDRESS_SANITIZER) || defined(LEAK_SANITIZER) || defined(THREAD_SANITIZER) | ||
LOG(INFO) << "Memory tracking is not available with address sanitizer builds."; | ||
#else | ||
size_t value = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix asan compile warning: unused values
Signed-off-by: Smith Cruise <[email protected]>
Signed-off-by: Smith Cruise <[email protected]>
@@ -262,13 +259,7 @@ Status SchemaDescriptor::group_to_struct_field(const std::vector<tparquet::Schem | |||
field->name = group_schema->name; | |||
field->is_nullable = is_optional(group_schema); | |||
field->level_info = cur_level_info; | |||
field->type.type = TYPE_STRUCT; | |||
for (size_t i = 0; i < num_children; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never used children
be/src/formats/parquet/schema.h
Outdated
@@ -89,12 +89,29 @@ struct LevelInfo { | |||
std::string debug_string() const; | |||
}; | |||
|
|||
enum ColumnType { SCALAR = 0, ARRAY, MAP, STRUCT }; | |||
|
|||
inline std::string column_type_to_string(const ColumnType& column_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because this function is not necessary to be inlined for performance. It will be good for compile performance if you move it into cpp files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
be/src/formats/parquet/schema.h
Outdated
@@ -118,6 +135,19 @@ struct ParquetField { | |||
int16_t max_def_level() const { return level_info.max_def_level; } | |||
int16_t max_rep_level() const { return level_info.max_rep_level; } | |||
std::string debug_string() const; | |||
bool is_complex_type() const { return type == ARRAY || type == MAP || type == STRUCT; } | |||
bool has_same_complex_type(const TypeDescriptor& type_descriptor) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Smith Cruise <[email protected]>
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 42 / 49 (85.71%) file detail
|
@Mergifyio backport branch-3.3 |
@Mergifyio backport branch-3.2 |
✅ Backports have been created
|
✅ Backports have been created
|
Signed-off-by: Smith Cruise <[email protected]> (cherry picked from commit 1028b6a) # Conflicts: # be/src/formats/parquet/column_reader.cpp # be/src/formats/parquet/group_reader.cpp # be/src/formats/parquet/meta_helper.cpp # be/src/formats/parquet/schema.cpp # be/test/formats/parquet/group_reader_test.cpp
Signed-off-by: Smith Cruise <[email protected]> (cherry picked from commit 1028b6a) # Conflicts: # be/src/formats/parquet/column_reader.cpp # be/src/formats/parquet/group_reader.cpp # be/src/formats/parquet/schema.cpp # be/test/formats/parquet/group_reader_test.cpp
…n in ParquetField (backport #52575) (#52635) Signed-off-by: Smith Cruise <[email protected]>
…n in ParquetField (backport #52575) (#52642) Signed-off-by: Smith Cruise <[email protected]>
…cks#52575) Signed-off-by: Smith Cruise <[email protected]> Signed-off-by: zhiminr.ren <[email protected]>
Why I'm doing:
In the previous code, we used TypeDescriptor to represent type in ParquetField.
But actually, we will only set TypeDescriptor when ParquetField is a complex type, for scalar type, we will set a random value for it. it will easily mislead programmers.
What I'm doing:
Use a more strict type
ColumnType
to replace TypeDescriptor.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: