-
Notifications
You must be signed in to change notification settings - Fork 149
Examples: Resource Usage and How Big Spark Excel Can Handle?
Quang Hoang Xuan edited this page Sep 17, 2021
·
2 revisions
Purpose: Find out the resource usage and limitation of spark-excel
Under the hood, spark-excel relies on the Apache POI to do everything excel. Here are the limitation of single excel file, copied from SpreadsheetVersion in the Apache POI document (in the reference)
- The total number of available rows is 64k (2^16)
- The total number of available columns is 256 (2^8)
- The maximum number of arguments to a function is 30
- Number of conditional format conditions on a cell is 3
- Number of cell styles is 4000
- Length of text cell contents is 32767
- The total number of available rows is 1M (2^20)
- The total number of available columns is 16K (2^14)
- The maximum number of arguments to a function is 255
- Number of conditional format conditions on a cell is unlimited (actually limited by available memory in Excel)
- Number of cell styles is 64000
- Length of text cell contents is 32767
Spark-excel supports read and write multiple excel files, so the total number of rows in data frame for both reading or writing just depend of resource available and how we partitioning the data in writing.
TBD
- Apache POI - HSSF and XSSF Limitations
- Enum SpreadsheetVersion
- #79 Writing a large Dataset into an Excel file causes java.lang.OutOfMemoryError: GC overhead limit exceeded
- #142 read quite big excel error, size=300M
- #322 [Read an Excel File]: GC overhead limit exceeded
- #388 Error Reading files in Excel Worksheet 97-2003 File - xls format