Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate coerce_to_numeric() into to_numeric() via a new argument #206

Open
IndrajeetPatil opened this issue Jul 22, 2022 · 14 comments
Open
Assignees

Comments

@IndrajeetPatil
Copy link
Member

No description provided.

@strengejacke
Copy link
Member

I think the currently existing argument preserve_levels is actually most appropriate to trigger coerce_to_numeric(), but then we have to find a new name for preserve_levels.

@strengejacke
Copy link
Member

@bwiernik @vincentarelbundock what do you think? Do you have suggestions for argument names?

Currently, the preserve_labels argument only works for factors with numeric levels. If preserve_levels = FALSE, the returned numeric vector starts with 1. If preserve_levels = TRUE, the returned numeric vector contains the factor levels, coerced to numeric.

If a vector has character or string levels, preserve_levels has no effects and always a numeric vector, starting at 1, is returned. However, sometimes you want to convert only factors with numeric levels into numeric vectors, and factors with character levels should be left unchanged. This is what coerce_to_numeric() does. However, it would be nice to integrate that function into to_numeric(), and just have a new argument.

I think the currently existing argument preserve_levels is actually most appropriate to trigger coerce_to_numeric(), but then we have to find a new name for preserve_levels.

Suggestions?

library(datawizard)
f1 <- factor(c(1, 2, 3, 4))
f2 <- factor(c(5, 6, 7 , 8))

to_numeric(f1, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f2, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f2, dummy_factors = FALSE, preserve_levels = TRUE)
#> [1] 5 6 7 8


f1 <- factor(c("a", "b", "c", "d"))
to_numeric(f1, dummy_factors = FALSE)
#> [1] 1 2 3 4
to_numeric(f1, dummy_factors = FALSE, preserve_levels = TRUE)
#> [1] 1 2 3 4

# which argument for `to_numeric()` could behave like `coerce_to_numeric()`?
coerce_to_numeric(f1)
#> [1] a b c d
#> Levels: a b c d

Created on 2022-08-05 by the reprex package (v2.0.1)

@vincentarelbundock
Copy link

Sorry, I'm probably not the right person to ask. I'm a grumpy old man who likes base R as.numeric

😣

@strengejacke
Copy link
Member

I'm not asking you about the functionality ;-) I'd like to know from a native speaker which argument name sounds best for this task :-) And you did your Bsc in 2007, you can't be older than me. 😝

@vincentarelbundock
Copy link

Honestly, I think I just want to challenge the premise of the question: I think a function called to_numeric() should always return a numeric, and it feels like not great design for the output class to depend on the input class and/or arguments. If users want a character, they should call to_character()

@strengejacke
Copy link
Member

The reason for this discussion is that we had several internal .to_numeric() functions that should be replaced by the datawizard function, to avoid spread of code and increase maintainability. However, the internal functions require different handlings. In some cases, to_numeric(factor(c("a", "b", "c", "d"))) should return a numeric, in other cases the original factor, if levels cannot be coerced to numeric.

So we need an additional argument, unless you want to maintain all our internal functions. :-)

@vincentarelbundock
Copy link

I would argue that in those cases clarity and maintainability is best achieved by a standard as.numeric(as.character(x)) or somesuch.

Being more verbose and sticking with base R is often preferable, IMHO.

But obviously, I haven't seen all the use cases, so you know better.

And to be clear, this isn't a big deal to me; I just have difficulty to not give opinions.

@bwiernik
Copy link
Contributor

bwiernik commented Aug 5, 2022

I generally agree with Vincent that to_numeric() should consistently

If an argument is used to trigger this conditional behavior, it should be very explicit that this is what the argument does, and it should be independent of the preserve_levels behavior. Something like .only_numberlike (defaulting to false).

I would prefer a separate function like maybe_to_numeric(), but would be okay with an argument like .only_numberlike.

@strengejacke
Copy link
Member

maybe_to_numeric() is the current coerce_to_numeric() (if I understood correctly). So you would opt to not consolidate the two present functions, like suggested in this issue?

@bwiernik
Copy link
Contributor

bwiernik commented Aug 5, 2022

I really don't understand the context where to_numeric(factor(c("a", "b", "c", "d"))) should be expected to return a, b, c, d? Why are we trying to convert to numeric and not expecting to always have numeric?

@strengejacke
Copy link
Member

I don't know, but it's used in modelbased for example:
https://github.com/search?p=2&q=org%3Aeasystats+to_numeric&type=Code

@IndrajeetPatil
Copy link
Member Author

So coerce_to_numeric() is used only in one place in {easystats} ecosystem, which is in {datawizard} itself:

data[["_Row"]] <- coerce_to_numeric(row.names(data))

If we change this to something like

data[["_Row"]] <- as.numeric(row.names(data)) 

we can remove this function. WDYT?

@IndrajeetPatil
Copy link
Member Author

Its usage in {modelbased} is covered in a separate issue: easystats/modelbased#206

@strengejacke
Copy link
Member

we can remove this function. WDYT?

No, make to internal instead. Else reshape_longer() no longer works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants