[BUG] Reinstantiating Linker with same input_table_aliases causes error #2422

RobinL · 2024-09-25T16:02:10Z

import duckdb

from splink import DuckDBAPI, Linker, SettingsCreator, splink_datasets

con = duckdb.connect()
db_api = DuckDBAPI(connection=con)

df = splink_datasets.fake_1000

settings = SettingsCreator(
    link_type="dedupe_only",
)

linker = Linker(df, settings, db_api, input_table_aliases=["mytable"])

linker = Linker(df, settings, db_api, input_table_aliases=["mytable"])

# List the tables in the duckdb database
# con.sql("show tables").show()

Result in:

ValueError: Table(s): mytable already exists in database. Please remove or rename before retrying

Because df is registered as a table called mytable

Probably we need to divorce the aliases (which control the values in source_dataset) from the registered name

The text was updated successfully, but these errors were encountered:

RobinL · 2024-09-25T16:04:04Z

Possible workaround:

import duckdb

from splink import DuckDBAPI, Linker, SettingsCreator, splink_datasets

con = duckdb.connect()
db_api = DuckDBAPI(connection=con)

df = splink_datasets.fake_1000

settings = SettingsCreator(
    link_type="dedupe_only",
)

con.register("mytable", df)

linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])

linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])

fscholes · 2024-10-02T07:31:21Z

I modified the above workaround for pyspark configurations - this should work for Spark installations

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession

from splink import SparkAPI, Linker, SettingsCreator, splink_datasets

conf = SparkConf()
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)

db_api = SparkAPI(spark_session=spark)

df = splink_datasets.fake_1000

settings = SettingsCreator(
    link_type="dedupe_only",
)

db_api.register_table("mytable", df)

linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])

linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Reinstantiating Linker with same input_table_aliases causes error #2422

[BUG] Reinstantiating Linker with same input_table_aliases causes error #2422

RobinL commented Sep 25, 2024 •

edited

Loading

RobinL commented Sep 25, 2024

fscholes commented Oct 2, 2024

[BUG] Reinstantiating Linker with same input_table_aliases causes error #2422

[BUG] Reinstantiating Linker with same input_table_aliases causes error #2422

Comments

RobinL commented Sep 25, 2024 • edited Loading

RobinL commented Sep 25, 2024

fscholes commented Oct 2, 2024

RobinL commented Sep 25, 2024 •

edited

Loading