Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting dataset_table_separator to _ throws an error. #1998

Open
ahnazary opened this issue Oct 28, 2024 · 0 comments
Open

Setting dataset_table_separator to _ throws an error. #1998

ahnazary opened this issue Oct 28, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@ahnazary
Copy link

ahnazary commented Oct 28, 2024

dlt version

1.2.0

Describe the problem

I am trying to build a pipeline for moving data from postgres to clickhouse, setting dataset_table_separator to a single underscore _ throws an exception. I am setting the dataset_table_separator like below:

dlt.secrets["destination.clickhouse.dataset_table_separator"] = "_"

Here is the full error message:

dlt.pipeline.exceptions.PipelineStepFailed: Pipeline execution failed at stage load when processing package 1729763674.409475 with exception:

<class 'dlt.destinations.exceptions.DatabaseTerminalException'>
Code: 57.
DB::Exception: Table dev.dlt_dlt_sentinel_table already exists. Stack trace:

0. Poco::Exception::Exception(String const&, int)
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool)
2. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&)
3. DB::InterpreterCreateQuery::doCreateTable(DB::ASTCreateQuery&, DB::InterpreterCreateQuery::TableProperties const&, std::unique_ptr<DB::DDLGuard, std::default_delete<DB::DDLGuard>>&)
4. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&)
5. DB::InterpreterCreateQuery::execute()
6. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
7. DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, std::shared_ptr<DB::Context>, std::function<void (DB::QueryResultDetails const&)>, DB::QueryFlags, std::optional<DB::FormatSettings> const&, std::function<void (DB::IOutputFormat&)>)
8. DB::DDLWorker::tryExecuteQuery(DB::DDLTaskBase&, std::shared_ptr<zkutil::ZooKeeper> const&)
9. DB::DDLWorker::processTask(DB::DDLTaskBase&, std::shared_ptr<zkutil::ZooKeeper> const&)
10. DB::DatabaseReplicatedDDLWorker::tryEnqueueAndExecuteEntry(DB::DDLLogEntry&, std::shared_ptr<DB::Context const>)
11. DB::DatabaseReplicated::tryEnqueueReplicatedDDL(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const>, DB::QueryFlags)
12. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&)
13. DB::InterpreterCreateQuery::execute()
14. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*)
15. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum)
16. DB::TCPHandler::runImpl()
17. DB::TCPHandler::run()
18. Poco::Net::TCPServerConnection::start()
19. Poco::Net::TCPServerDispatcher::run()
20. Poco::PooledThread::run()
21. Poco::ThreadImpl::runnableEntry(void*)
22. ?
23. ?

From what I understand, dlt tries to create dlt_dlt_sentinel_table table twice in the pipleline.run() process. using different combination of values for dataset_sentinel_table_name and dataset_name did not work either.
I am wondering if this is an intended behaviour or not.

Setting the dataset_table_separator in the secrets.toml file results in the same error too.

Expected behavior

Expected behaviour is a running pipeline moving data from postgres into a clickhouse table with _ being the separator character between dataset_name and table_name.

Steps to reproduce

Here is a code snippet to recreate the issue:

import dlt
from dlt.sources.sql_database import Table, sql_database, sql_table

dlt.secrets["destination.clickhouse.dataset_table_separator"] = "_"
dlt.secrets["destination.clickhouse.table_engine_type"] = "merge_tree"
dlt.secrets["destination.clickhouse.dataset_sentinel_table_name"] = "dlt_sentinel_table"

pipeline = dlt.pipeline(
    pipeline_name="dummy_pipeline_name",
    destination="clickhouse",
    dataset_name="dlt",
)
table = sql_table(table="dummy_table_name")

info = pipeline.run(table)

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

Postgres

dlt destination

Clickhouse

Other deployment details

No response

Additional information

No response

@ahnazary ahnazary changed the title Settong dataset_table_separator to _ throws an error. Setting dataset_table_separator to _ throws an error. Oct 28, 2024
@rudolfix rudolfix added the bug Something isn't working label Nov 23, 2024
@rudolfix rudolfix moved this from Todo to Planned in dlt core library Nov 23, 2024
@rudolfix rudolfix moved this from Planned to In Progress in dlt core library Nov 29, 2024
@rudolfix rudolfix self-assigned this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

No branches or pull requests

2 participants