Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use V2 for streaming read #734

Open
1 task done
james-miles-ccy opened this issue Apr 22, 2023 · 2 comments
Open
1 task done

Cannot use V2 for streaming read #734

james-miles-ccy opened this issue Apr 22, 2023 · 2 comments

Comments

@james-miles-ccy
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am trying to read via V2 in streaming way, with no success. I was wondering if there is anything I can do to get this working?

the code is below:

df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)

display(df)

the exception error is given below:

java.lang.UnsupportedOperationException: ExcelFileFormat as fallback format for V2 supports writing only

Expected Behavior

I was hoping it would generate a dataframe.

Steps To Reproduce

df = spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "excel")
.option("maxRowsInMemory", 20)
.schema(schema)
.load(file_path)

display(df)

Environment

- Spark version:3.3.1
- Spark-Excel version:2.12:3.3.1_0.18.7
- OS:Windows
- Cluster environment:Databricks

Anything else?

No response

@james-miles-ccy james-miles-ccy changed the title Cannot use V2 for streaming Cannot use V2 for streaming read Apr 22, 2023
@nightscape
Copy link
Owner

The documentation reads like this is only supported for a few specific file formats:
https://docs.databricks.com/ingestion/auto-loader/options.html#file-format-options
Not sure if they are hard-coded somewhere, or one would need to implement a special API.
I don't have time to look into this, but if you're willing to give it a try yourself I can give you some guidance.

@arcaputo3
Copy link

We have gotten this to work for other custom file formats with fixed schema. I wonder if we can apply a similar approach here while supporting provided schemas or inferred schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants