You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with an Excel file, which contains data in the following format:
The first column of the first row contains a Note specifying the details of the data, the file contains.
The second row is empty.
The third row onwards, contains the actual data which has 5 columns of data.
When I am creating a data frame using this Excel file using the following code snippet:
df = spark.read.format("com.crealytics.spark. incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")
Though, my data contains 5 columns, in the final data frame I can see only 1 column
Expected Behavior
I want the generated data frame should consider all the columns of data in the file, instead of the column from the first row.
The final data frame should contain the columns of data present from 3rd row onwards too.
the
When you specify the dataAddress please point to the location where the data is within the sheet. For example, if you say the data is on the third row then your dataAddress would be something like dataAddress="'Sheet3'!A3:E9999999". Let me know if this works for you.
Is there an existing issue for this?
Current Behavior
I am working with an Excel file, which contains data in the following format:
The first column of the first row contains a Note specifying the details of the data, the file contains.
The second row is empty.
The third row onwards, contains the actual data which has 5 columns of data.
When I am creating a data frame using this Excel file using the following code snippet:
df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")
Though, my data contains 5 columns, in the final data frame I can see only 1 column
Expected Behavior
I want the generated data frame should consider all the columns of data in the file, instead of the column from the first row.
The final data frame should contain the columns of data present from 3rd row onwards too.
the
Steps To Reproduce
df = spark.read.format("com.crealytics.spark.
incorrect dataframe excel.xlsx
excel").option("header", False).option("dataAddress", "'Sheet3'!").load("file.xlsx")
incorrect dataframe excel.xlsx
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: