Functions to detect use of formulas, macros and cell merging #944
-
We are using Excel spreadsheets to capture data from multiple research groups who conduct field experiments in agriculture. I developed a tool using openxlsx to identify problems such as inconsistent naming of variables or factor levels. A remaining issue is to check the sheets for calculations using formulas, presence of merged cells, and inclusion of macros. We strongly prefer that data submitters avoid anything that might corrupt their data or lead to processing errors later on. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi @Jeff-White-AZ , thanks for the kind words! If I understand you correctly you want to identify cells with formulas and merged cells. Though here I have a code snippet for you to which might give you a hint. Since this is not an issue, I convert it to a discussion. library(openxlsx2)
fl <- system.file("extdata", "loadExample.xlsx", package = "openxlsx")
wb <- wb_load(fl)
sheet <- wb$validate_sheet("testing")
mergeCells <- wb$worksheets[[sheet]]$mergeCells
mergeCells <- do.call("rbind", xml_attr(mergeCells, "mergeCell"))
formulas <- wb$worksheets[[sheet]]$sheet_data$cc
formulas <- formulas[formulas$f != "" | formulas$f_t != "" | formulas$f_ref != "" | formulas$f_ca != "" | formulas$f_si != "", c("row_r", "c_r")]
dimension <- wb$worksheets[[sheet]]$dimension
dimension <- xml_attr(dimension, "dimension")[[1]][["ref"]]
dd <- dims_to_dataframe(dims = dimension)
dd[rownames(dd) %in% formulas$row_r, colnames(dd) %in% formulas$c_r] <- TRUE
for (i in seq_len(nrow(mergeCells))) {
mC <- dims_to_dataframe(dims = mergeCells[i])
dd[rownames(dd) %in% rownames(mC), colnames(dd) %in% colnames(mC)] <- TRUE
}
dd
#> B C D E F G H I J K L M N O P
#> 2 TRUE TRUE TRUE TRUE TRUE <NA> <NA> TRUE TRUE TRUE TRUE TRUE TRUE <NA> <NA>
#> 3 TRUE TRUE TRUE TRUE TRUE <NA> <NA> TRUE TRUE TRUE TRUE TRUE TRUE <NA> <NA>
#> 4 TRUE TRUE TRUE TRUE TRUE <NA> <NA> TRUE TRUE TRUE TRUE TRUE TRUE <NA> <NA>
#> 5 TRUE TRUE TRUE TRUE TRUE <NA> <NA> TRUE TRUE TRUE TRUE TRUE TRUE <NA> <NA>
#> 6 TRUE TRUE TRUE TRUE TRUE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 7 TRUE TRUE TRUE TRUE TRUE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 9 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 10 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 11 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 12 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 13 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 14 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 15 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 16 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 18 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 19 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 20 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 21 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 22 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 23 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 24 TRUE TRUE TRUE TRUE TRUE TRUE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 25 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 26 <NA> <NA> TRUE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 27 <NA> <NA> TRUE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 28 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 29 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 30 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 31 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 32 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 33 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 34 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 35 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 36 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 37 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> 38 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#> Q
#> 2 <NA>
#> 3 <NA>
#> 4 <NA>
#> 5 <NA>
#> 6 <NA>
#> 7 <NA>
#> 8 <NA>
#> 9 <NA>
#> 10 <NA>
#> 11 <NA>
#> 12 <NA>
#> 13 <NA>
#> 14 <NA>
#> 15 <NA>
#> 16 <NA>
#> 17 <NA>
#> 18 <NA>
#> 19 <NA>
#> 20 <NA>
#> 21 <NA>
#> 22 <NA>
#> 23 <NA>
#> 24 <NA>
#> 25 <NA>
#> 26 <NA>
#> 27 <NA>
#> 28 <NA>
#> 29 <NA>
#> 30 <NA>
#> 31 <NA>
#> 32 <NA>
#> 33 <NA>
#> 34 <NA>
#> 35 <NA>
#> 36 <NA>
#> 37 <NA>
#> 38 <NA> |
Beta Was this translation helpful? Give feedback.
Checking for merged cells is simply comparing if the worksheet contains a mergedCells xml tag. In the code above I use this to identify the merged cells from the selected sheet.
Macros as in Excel macros that are run able from xlsm files are binary blobs that live in
wb$bin
or something like this, not really sure atm (but these are only those that are loadable with spreadsheet macros). Since xlsx files are zip files, you could embed anything inside them. And then there are excel addins and probably hundreds of other things I haven’t yet stumbled over.For formulas, you can use the formulas part from the snipped above to check if there are any inside of the spreadsheet. That way your cases…