Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH-436] support columns with different nullable type when split for union #437

Open
wants to merge 765 commits into
base: clickhouse_backend
Choose a base branch
from

Conversation

shuai-xu
Copy link

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in official stable or prestable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This pr supports that when split blocks, they may have differnt nullable types as they may come from two different steams of union.
It fix #436 .

Felixoid and others added 30 commits April 4, 2022 14:15
…ash_v2

One more try to resurrect build hash
Backport ClickHouse#35733 to 22.3: Added settings for insert of invalid IPv6, IPv4 values
Backport ClickHouse#35820 to 22.3: Avoid processing per-column TTL multiple times
- Allow define version as file
- Add inline cache
- Fix auto_release_type function
exmy and others added 24 commits March 21, 2023 10:35
…ence#354)

ShuffleSplitter improvement: support multiple subdirs
Support full join with join condition

Co-authored-by: shuai.li <[email protected]>
Support Decimal type in Gluten 
Co-authored-by: shuai.li <[email protected]>
…like Column 'deviceid' is not presented in input data (Kyligence#388)
@kyligence-git
Copy link
Collaborator

Can one of the admins verify this patch?

{
// for union, the columns type may be different for the two steam, one is nullable, the other not.
std::string l_name = typeid(*accumulated_columns[i]).name();
std::string r_name = typeid(*block.getByPosition(i).column).name();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to use checkAndGetColumn instead of typeid().name()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto l_type = accumulated_columns[i]->getDataType();
auto r_type = block.getByPosition(i).column->getDataType();

if (l_type == r_type)
{
xxx
}
else if (l_type == TypeIndex::Nullable)
{
xxx
}
else if (r_type == TypeIndex::Nullable)
{
xxx
}

@lgbo-ustc
Copy link

lgbo-ustc commented Apr 17, 2023

The main problem here is that, the 1st block comed into ColumnBuffer could be non-nullable, but the later blocks come with nullable, and the real final schema shoul be nullable.

You can make an assumption that, if we meet a nullable column once, all results should be nullable. but we may have spill some blocks with non-nullable columns out into next stage before we meet the first block with nullable.

@lwz9103 lwz9103 force-pushed the clickhouse_backend branch 2 times, most recently from dc60d55 to 8066113 Compare May 26, 2023 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Columns of union may be diffferent in type