Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_col and nested 1-column frames #6592

Open
r2evans opened this issue Oct 26, 2024 · 2 comments · May be fixed by #6593
Open

format_col and nested 1-column frames #6592

r2evans opened this issue Oct 26, 2024 · 2 comments · May be fixed by #6593
Labels

Comments

@r2evans
Copy link

r2evans commented Oct 26, 2024

(This may be related to #5948, since nested frames is a common link.)

mt <- as.data.table(mtcars)[1:3,]
mt$frm <- list(data.frame(a=1), data.frame(a=1), data.frame(a=1))
mt
# Error in vapply(X = x, FUN = fun, ..., FUN.VALUE = NA_character_, USE.NAMES = use.names) : 
#   values must be type 'character',
#  but FUN(X[[1]]) result is type 'list'
mt[,-12]
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
# 1:  21.0     6   160   110  3.90 2.620 16.46     0     1     4     4
# 2:  21.0     6   160   110  3.90 2.875 17.02     0     1     4     4
# 3:  22.8     4   108    93  3.85 2.320 18.61     1     1     4     1

When debugging,

debug(data.table:::format_col.default)
mt
#### 'c'ontinue through the first 11 columns, then
x
# [[1]]
#   a
# 1 1
# [[2]]
#   a
# 1 1
# [[3]]
#   a
# 1 1
vapply_1c(x, format_list_item, ...)
# Error in vapply(X = x, FUN = fun, ..., FUN.VALUE = NA_character_, USE.NAMES = use.names) : 
#   values must be type 'character',
#  but FUN(X[[1]]) result is type 'list'
format_list_item(x[[1]])
#   a
# 1 1

Where it is assumed that all return values from vapply_1c are expected to be strings.

Note that this does not fail when the nested frames are more than one column, since format_list_item.default perhaps-naively uses length(format(..)) == 1.

mt$frm <- list(data.frame(a=1,b=1), data.frame(), data.frame(a=1,b=1))
mt
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb               frm
#    <num> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>            <list>
# 1:  21.0     6   160   110  3.90 2.620 16.46     0     1     4     4 <data.frame[1x2]>
# 2:  21.0     6   160   110  3.90 2.875 17.02     0     1     4     4 <data.frame[0x0]>
# 3:  22.8     4   108    93  3.85 2.320 18.61     1     1     4     1 <data.frame[1x2]>

I don't know if it makes sense to define format_list_item.data.frame as well to preclude this, or if there are better methods:

format_list_item.data.frame <- function(x, ...) "<multi-column>"
vapply_1c(x, format_list_item.data.frame, ...)
# [1] "<multi-column>" "<multi-column>" "<multi-column>"

This test was done with data.table_1.16.2, but it also fails with data.table_1.15.2, so it's not a recent breakage.

packageVersion("data.table")
# [1] ‘1.15.2’
mt <- as.data.table(mtcars)[1:3,]
mt$frm <- list(data.frame(a=1), data.frame(a=1), data.frame(a=1))
mt
# Error in vapply(X = x, FUN = fun, ..., FUN.VALUE = NA_character_, USE.NAMES = use.names) :
#   values must be type 'character',
#  but FUN(X[[1]]) result is type 'list'
sessionInfo()
sessionInfo()
# R version 4.3.3 (2024-02-29)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 24.04.1 LTS
# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
# LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
# locale:
#  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
# time zone: America/New_York
# tzcode source: system (glibc)
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# other attached packages:
# [1] data.table_1.16.2 r2_0.11.0        
# loaded via a namespace (and not attached):
#  [1] compiler_4.3.3    clipr_0.8.0       fastmap_1.2.0     cli_3.6.2         tools_4.3.3       htmltools_0.5.8.1 rmarkdown_2.26    knitr_1.45       
#  [9] xfun_0.42         digest_0.6.34     rlang_1.1.3       evaluate_0.23    
@r2evans r2evans changed the title format_col and nested frames format_col and nested 1-column frames Oct 26, 2024
@r2evans
Copy link
Author

r2evans commented Oct 26, 2024

If we change the current internal function to the following, it works:

format_list_item2 <- function(x, ...) {
    if (is.null(x)) 
        "[NULL]"
    else if (is.atomic(x) || inherits(x, "formula")) 
        paste(c(format(head(x, 6L), ...), if (length(x) > 6L) "..."), 
            collapse = ",")
    else if (!inherits(x, "data.frame") && has_format_method(x) && length(formatted <- format(x, ...)) == 1L) {
        formatted
    }
    else {
        paste0("<", class(x)[1L], paste_dims(x), ">")
    }
}
vapply_1c(x, format_list_item2, ...)
# [1] "<data.frame[1x1]>" "<data.frame[1x1]>" "<data.frame[1x1]>"

Is that simple enough? I'm happy to submit a PR to that effect. (I'd change the original format_list_item.default, not add the above renamed function. The only change is the addition of !inherits(..).)

@MichaelChirico
Copy link
Member

I think registering format_list_item.data.frame is the right approach here.

r2evans pushed a commit to r2evans/data.table that referenced this issue Oct 28, 2024
@r2evans r2evans linked a pull request Oct 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants