Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add "hms" to xportr.numeric_types #271

Closed
bundfussr opened this issue Jul 2, 2024 · 16 comments · Fixed by #277
Closed

Feature Request: Add "hms" to xportr.numeric_types #271

bundfussr opened this issue Jul 2, 2024 · 16 comments · Fixed by #277
Assignees
Labels
enhancement New feature or request programming

Comments

@bundfussr
Copy link
Collaborator

Feature Idea

I would suggest to add "hms" to the default value of xportr.numeric_types. Time variables which are created by admiral::derive_vars_dtm_to_tm() are of class "hms". Without adding "hms" xportr_type() considers time variable as converted.

Relevant Input

No response

Relevant Output

No response

Reproducible Example/Pseudo Code

library(xportr)
library(magrittr)
library(admiral)
metadata <- data.frame(
  dataset = "test",
  variable = c("ADTM", "ATM"),
  type = "numeric"
)

.df <- data.frame(
  ADTM = convert_dtc_to_dtm("2024-03-24T12:34:56")
) %>% derive_vars_dtm_to_tm(source_vars = exprs(ADTM))

df2 <- xportr_type(.df, metadata, "test", verbose = "warn")

produces


── Variable type mismatches found. ──

✔ 1 variables coerced
Warning message:
Variable type(s) in dataframe don't match metadata: `ATM`
@bundfussr bundfussr added enhancement New feature or request programming labels Jul 2, 2024
@bundfussr
Copy link
Collaborator Author

By the way, the message could be improved by stating which type was found in metadata and which in data. I had to read the code to find out what caused the warning.

@bms63
Copy link
Collaborator

bms63 commented Jul 5, 2024

@atorus-research/xportr-development-team anyone on the team want to make this update?

@bundfussr bundfussr self-assigned this Nov 13, 2024
bundfussr added a commit that referenced this issue Nov 14, 2024
bundfussr added a commit that referenced this issue Nov 14, 2024
bundfussr added a commit that referenced this issue Nov 14, 2024
@bundfussr
Copy link
Collaborator Author

@bms63 , I wonder if the documentation of xportr_type() should be updated. The options xportr.character_metadata_types and xportr.numeric_metadata_types are not mentioned but they are important to understand what the function does. From the current documentation I would expect that all variable whose class is included in xportr.numeric_types are coerced to numeric. However, this is only true if their type in the metadata is not included in xportr.numeric_metadata_types.

Also on the "Get started" page it says

Using xportr_type() and the supplied specification file, we can coerce the variables in the ADSL set to be either numeric or character.

But in the example no variable was coerced and there are still variables which are not numeric or character, e.g., dates.

What do you think?

@bms63
Copy link
Collaborator

bms63 commented Nov 14, 2024

So - I was using xport_type() a few weeks ago and could of sworn it used to coerce dates classes to numeric as this allows you to apply the format using xportr_format(), but something happened preventing me from doing this.

But yes agree to updates. Just a bit confused on the behavior of this function and xportr_format().

Here I have an example where the formats were not applied nor the Date class coerced. https://github.com/pharmaverse/rpharma-2024-ADaM-workshop/blob/main/solutions/adsl.R

@elimillera and @cpiraux any ideas?

@cpiraux
Copy link
Collaborator

cpiraux commented Nov 15, 2024

In the latest update of the xport_type function, the format specified in the specs is normally not considered when determining whether a variable should be coerced.

Dates, datetimes, and times should, in theory, not be coerced to numeric; otherwise, the haven package will not recognize these variables correctly as date, datetime, or time, which could lead to incorrect date values in the XPT file.
I agree that it's not entirely clear how the function handles type conversions, and we currently have an open issue to improve the documentation #166 .

Here’s an explanation of how it works:
The R class of a variable is categorized as either numeric or character. Similarly, the type specified in the specs is also categorized as numeric or character. If the category of the R class matches the category of the spec's type, no coercion occurs. Otherwise, the variable is coerced.

For instance, a numeric date variable is categorized as numeric, and if the specs also categorize it as numeric, the R class remains unchanged (class(variable) = "Date").

I’ve drafted a figure to explain this. It might not be entirely clear yet, but it provides some logical structure.

Datatype

@bms63
Copy link
Collaborator

bms63 commented Nov 15, 2024

Lovely summary Celine appreciate it.

So when a variable has a Date or Datetime class I can not associate the format with it. I would think the attributes could be associated with a Date class, but it looks like the current version of xportr_format can't do that. Maybe it is something to do with admiral functions.

Can you take a look at the code that is available above please

@bundfussr
Copy link
Collaborator Author

So when a variable has a Date or Datetime class I can not associate the format with it. I would think the attributes could be associated with a Date class, but it looks like the current version of xportr_format can't do that. Maybe it is something to do with admiral functions.

@bms63 , I think it is possible. I've just tried it with the admiralroche ADEX script. For example, ASTDTM is not converted to numeric and a format is associated:

> attributes(adex_xpt$ASTDTM)
$tzone
[1] "UTC"

$class
[1] "POSIXct" "POSIXt" 

$label
[1] "Analysis Start Datetime"

$width
[1] 8

$format.sas
[1] "DATETIME20."

@bms63
Copy link
Collaborator

bms63 commented Nov 15, 2024

@bundfussr is the spec a metacore object that is applying the format using xportr_Format()?

@bundfussr
Copy link
Collaborator Author

@bundfussr is the spec a metacore object that is applying the format using xportr_Format()?

Yes

@bms63
Copy link
Collaborator

bms63 commented Nov 15, 2024

I feel like there is something wrong with my code then - as I can only get the format to apply to TRTSTM but not to any of the other ---DT. If you some time @bundfussr could I solicit your expertise in looking at my ADaM code.

I have used xportr_format() in the past with success but never used a metacore object until now. Wondering if there is just some weird variation in my spec that is causing the issue.

@bundfussr
Copy link
Collaborator Author

@bms63 , I needed to update the xportr_df_label() call to xportr_df_label(metacore, domain = "ADSL"). Then it worked. For example:

> attributes(adsl$TRTSDTM)
$tzone
[1] "UTC"

$class
[1] "POSIXct" "POSIXt" 

$width
[1] 8

$label
[1] "Datetime of First Exposure to Treatment"

$format.sas
[1] "DATETIME20."

@bms63
Copy link
Collaborator

bms63 commented Nov 15, 2024

@bundfussr yay!!!!!! You are too amazing. @Fanny-Gautier check this out!!

@bms63
Copy link
Collaborator

bms63 commented Nov 15, 2024

Well - still not working for me. @bundfussr what is your session info

image

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xportr_0.4.1          metatools_0.1.6       metacore_0.1.3        stringr_1.5.1        
[5] lubridate_1.9.3       dplyr_1.1.4           pharmaversesdtm_1.1.0 admiral_1.1.1        

loaded via a namespace (and not attached):
 [1] compiler_4.4.1   tidyselect_1.2.1 xml2_1.3.6       tidyr_1.3.1      readxl_1.4.3    
 [6] readr_2.1.5      R6_2.5.1         generics_0.1.3   admiraldev_1.1.0 backports_1.5.0 
[11] forcats_1.0.0    checkmate_2.3.2  tibble_3.2.1     pillar_1.9.0     tzdb_0.4.0      
[16] rlang_1.1.4      utf8_1.2.4       stringi_1.8.4    timechange_0.3.0 cli_3.6.3       
[21] withr_3.0.2      magrittr_2.0.3   haven_2.5.4      hms_1.1.3        lifecycle_1.0.4 
[26] vctrs_0.6.5      glue_1.8.0       cellranger_1.1.0 fansi_1.0.6      purrr_1.0.2     
[31] tools_4.4.1      pkgconfig_2.0.3 

@bundfussr
Copy link
Collaborator Author

@bms63 , my session info is:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xportr_0.4.1          metatools_0.1.6       metacore_0.1.3        stringr_1.5.1         lubridate_1.9.3       dplyr_1.1.4          
[7] pharmaversesdtm_1.0.0 admiral_1.1.1.9015   

loaded via a namespace (and not attached):
 [1] compiler_4.3.1        tidyselect_1.2.1      xml2_1.3.6            tidyr_1.3.1           readxl_1.4.3          readr_2.1.5          
 [7] R6_2.5.1              generics_0.1.3        admiraldev_1.1.0.9006 backports_1.4.1       forcats_1.0.0         checkmate_2.3.1      
[13] tibble_3.2.1          tzdb_0.4.0            pillar_1.9.0          rlang_1.1.3           utf8_1.2.4            stringi_1.8.3        
[19] timechange_0.2.0      cli_3.6.3             withr_3.0.0           magrittr_2.0.3        rstudioapi_0.15.0     haven_2.5.4          
[25] hms_1.1.3             lifecycle_1.0.4       vctrs_0.6.5           glue_1.8.0            cellranger_1.1.0      fansi_1.0.6          
[31] purrr_1.0.2           tools_4.3.1           pkgconfig_2.0.3

However, I think it's just an issue in RStudio. When I look at the "Environment" tab in RStudio, I see the same as you. But when I print the attributes in the console it looks OK.

@Fanny-Gautier
Copy link

Well - still not working for me. @bundfussr what is your session info

image

> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] xportr_0.4.1          metatools_0.1.6       metacore_0.1.3        stringr_1.5.1        
[5] lubridate_1.9.3       dplyr_1.1.4           pharmaversesdtm_1.1.0 admiral_1.1.1        

loaded via a namespace (and not attached):
 [1] compiler_4.4.1   tidyselect_1.2.1 xml2_1.3.6       tidyr_1.3.1      readxl_1.4.3    
 [6] readr_2.1.5      R6_2.5.1         generics_0.1.3   admiraldev_1.1.0 backports_1.5.0 
[11] forcats_1.0.0    checkmate_2.3.2  tibble_3.2.1     pillar_1.9.0     tzdb_0.4.0      
[16] rlang_1.1.4      utf8_1.2.4       stringi_1.8.4    timechange_0.3.0 cli_3.6.3       
[21] withr_3.0.2      magrittr_2.0.3   haven_2.5.4      hms_1.1.3        lifecycle_1.0.4 
[26] vctrs_0.6.5      glue_1.8.0       cellranger_1.1.0 fansi_1.0.6      purrr_1.0.2     
[31] tools_4.4.1      pkgconfig_2.0.3 

Hi @bms63 ,
I tried from my end and get the right results I think both for ADVS (which was already coded as , domain = "ADVS"),
image
and ADSL: I do not face any issue.
image

From the sessionInfo(), only withr version differs
image

@bms63
Copy link
Collaborator

bms63 commented Nov 18, 2024

Alright sanity restored
image

elimillera added a commit that referenced this issue Nov 26, 2024
Closes #271 add_hms: add hms to xportr.numeric_types and add details to xpor…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request programming
Projects
None yet
4 participants