Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch: Cannot extract source URI #827

Closed
wajda opened this issue Aug 30, 2024 · 1 comment · Fixed by #829
Closed

Elasticsearch: Cannot extract source URI #827

wajda opened this issue Aug 30, 2024 · 1 comment · Fixed by #829
Assignees
Labels
bug Something isn't working
Milestone

Comments

@wajda
Copy link
Contributor

wajda commented Aug 30, 2024

Hi team,

I am facing an issue in getting the lineage of loading data to elasticsearch from a csv file.

Spark Job

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("WriteToElasticsearch").getOrCreate()
df = spark.read.option("header","true").csv("/Users/ts-mohini.tripathi/Documents/DO/POC/databricks/people_data.csv")

df.write \
        .format("org.elasticsearch.spark.sql") \
        .mode("overwrite")\
        .option("es.resource","<index>") \
        .option("es.nodes","<id address>") \
        .option("es.port","<port>") \
        .option("es.client.node.only","true")\
        .option("es.nodes.wan.only","true")\
        .option("es.net.ssl","false") \
        .option("es.net.http.auth.user","<username>") \
        .option("es.net.http.auth.pass","<password>") \
        .option("es.spark.dataframe.write.null", "true") \
        .save()

Error

24/08/30 12:58:28 ERROR SplineAgent: Unexpected error occurred during lineage processing for application: WriteToElasticsearch #local-1725002895271
Caused by: java.lang.RuntimeException: Cannot extract source URI from the options: es.nodes.wan.only,es.net.http.auth.user,es.net.http.auth.pass,es.client.node.only,es.port,es.resource,es.nodes,es.spark.dataframe.write.null,es.net.ssl

Run command

spark-submit \
  --packages org.elasticsearch:elasticsearch-spark-30_2.12:8.15.0 \
  --jars /Users/ts-mohini.tripathi/Documents/DO/POC/spline-sandbox/jars/spark-3.1-spline-agent-bundle_2.12-2.1.0.jar \
  --conf spark.sql.queryExecutionListeners=za.co.absa.spline.harvester.listener.SplineQueryExecutionListener \
  --conf spark.spline.producer.url=http://localhost:8080/producer \
  elksearch.py

Kindly let me know if there is support for elasticsearch. Thank you.

Best Regards,
Mohini Tripathi

Originally posted by @mohini-tripathi in AbsaOSS/spline#1364

@wajda wajda transferred this issue from AbsaOSS/spline Aug 30, 2024
@wajda wajda added this to the 2.2.0 milestone Aug 30, 2024
@wajda wajda added the bug Something isn't working label Aug 30, 2024
@wajda wajda added the help wanted Extra attention is needed label Sep 8, 2024
@wajda wajda modified the milestone: 2.2.0 Sep 8, 2024
@wajda wajda removed the help wanted Extra attention is needed label Sep 8, 2024
@wajda wajda self-assigned this Sep 8, 2024
@wajda wajda moved this from New to In Progress in Spline Sep 8, 2024
@wajda
Copy link
Contributor Author

wajda commented Sep 8, 2024

A proper fix will be available in the upcoming version 2.2.0.

For earlier agent versions there is a workaround:

  1. Replace format("org.elasticsearch.spark.sql") with format("es")
  2. Instead of .option("es.resource","<index>") use .save("<index>")

@wajda wajda moved this from In Progress to Review / PR Created in Spline Sep 8, 2024
wajda added a commit that referenced this issue Sep 8, 2024
wajda added a commit that referenced this issue Sep 8, 2024
wajda added a commit that referenced this issue Sep 9, 2024
* issue #827 Elasticsearch: Cannot extract source URI

* issue #827 Scala 2.11 compatibility

* issue #827 Fix for Spark 2.2
@wajda wajda closed this as completed in #829 Sep 9, 2024
@github-project-automation github-project-automation bot moved this from Review / PR Created to Done in Spline Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant