We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import duckdb from splink import DuckDBAPI, Linker, SettingsCreator, splink_datasets con = duckdb.connect() db_api = DuckDBAPI(connection=con) df = splink_datasets.fake_1000 settings = SettingsCreator( link_type="dedupe_only", ) linker = Linker(df, settings, db_api, input_table_aliases=["mytable"]) linker = Linker(df, settings, db_api, input_table_aliases=["mytable"]) # List the tables in the duckdb database # con.sql("show tables").show()
Result in:
ValueError: Table(s): mytable already exists in database. Please remove or rename before retrying
Because df is registered as a table called mytable
mytable
Probably we need to divorce the aliases (which control the values in source_dataset) from the registered name
The text was updated successfully, but these errors were encountered:
Possible workaround:
import duckdb from splink import DuckDBAPI, Linker, SettingsCreator, splink_datasets con = duckdb.connect() db_api = DuckDBAPI(connection=con) df = splink_datasets.fake_1000 settings = SettingsCreator( link_type="dedupe_only", ) con.register("mytable", df) linker = Linker("mytable", settings, db_api, input_table_aliases=["m"]) linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])
Sorry, something went wrong.
I modified the above workaround for pyspark configurations - this should work for Spark installations
from pyspark import SparkContext, SparkConf from pyspark.sql import SparkSession from splink import SparkAPI, Linker, SettingsCreator, splink_datasets conf = SparkConf() sc = SparkContext.getOrCreate(conf=conf) spark = SparkSession(sc) db_api = SparkAPI(spark_session=spark) df = splink_datasets.fake_1000 settings = SettingsCreator( link_type="dedupe_only", ) db_api.register_table("mytable", df) linker = Linker("mytable", settings, db_api, input_table_aliases=["m"]) linker = Linker("mytable", settings, db_api, input_table_aliases=["m"])
No branches or pull requests
Result in:
Because df is registered as a table called
mytable
Probably we need to divorce the aliases (which control the values in source_dataset) from the registered name
The text was updated successfully, but these errors were encountered: