Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Data Frame creation #788

Open
1 task done
akshitarora4259 opened this issue Sep 20, 2023 · 1 comment
Open
1 task done

Incorrect Data Frame creation #788

akshitarora4259 opened this issue Sep 20, 2023 · 1 comment

Comments

@akshitarora4259
Copy link

akshitarora4259 commented Sep 20, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I am working with an Excel file, which contains data in the following format:

The first column of the first row contains a Note specifying the details of the data, the file contains.
The second row is empty.
The third row onwards, contains the actual data which has 5 columns of data.
When I am creating a data frame using this Excel file using the following code snippet:
df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")

Though, my data contains 5 columns, in the final data frame I can see only 1 column

Expected Behavior

I want the generated data frame should consider all the columns of data in the file, instead of the column from the first row.
The final data frame should contain the columns of data present from 3rd row onwards too.
the

Steps To Reproduce

df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")
incorrect dataframe excel.xlsx

Environment

- Spark version:
- Spark-Excel version:
- OS:
- Cluster environment

Anything else?

No response

@williamdphillips
Copy link
Collaborator

Hi @akshitarora4259,

When you specify the dataAddress please point to the location where the data is within the sheet. For example, if you say the data is on the third row then your dataAddress would be something like dataAddress="'Sheet3'!A3:E9999999". Let me know if this works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants