-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement data_replicate()
#488
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #488 +/- ##
==========================================
+ Coverage 90.66% 90.73% +0.06%
==========================================
Files 74 75 +1
Lines 5765 5805 +40
==========================================
+ Hits 5227 5267 +40
Misses 538 538 ☔ View full report in Codecov by Sentry. |
Alternatively, we could name that function |
Thanks, I'll try to review this more in depth tomorrow, but I can already say that I'm not a big fan of |
I'm fine with both |
The function really just replicates rows: library(datawizard)
d <- data.frame(
a = c("a", "b", "c"),
b = 1:3,
rep = c(3, 2, 4)
)
data_tabulate(d$a)
#> d$a <character>
#> # total N=3 valid N=3
#>
#> Value | N | Raw % | Valid % | Cumulative %
#> ------+---+-------+---------+-------------
#> a | 1 | 33.33 | 33.33 | 33.33
#> b | 1 | 33.33 | 33.33 | 66.67
#> c | 1 | 33.33 | 33.33 | 100.00
#> <NA> | 0 | 0.00 | <NA> | <NA>
data_expand(d, "rep")
#> a b
#> 1 a 1
#> 2 a 1
#> 3 a 1
#> 4 b 2
#> 5 b 2
#> 6 c 3
#> 7 c 3
#> 8 c 3
#> 9 c 3
data_tabulate(data_expand(d, "rep")$a)
#> data_expand(d, "rep")$a <character>
#> # total N=9 valid N=9
#>
#> Value | N | Raw % | Valid % | Cumulative %
#> ------+---+-------+---------+-------------
#> a | 3 | 33.33 | 33.33 | 33.33
#> b | 2 | 22.22 | 22.22 | 55.56
#> c | 4 | 44.44 | 44.44 | 100.00
#> <NA> | 0 | 0.00 | <NA> | <NA> Created on 2024-03-20 with reprex v2.1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM, just minor docs and code tweaks to make. For the name, I'm ok with data_replicate()
instead of data_expand()
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
Co-authored-by: Etienne Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually can you add some documentation and tests for cases where the "expand" column is not an integer? Here's what I have so far for floats (take the floor value), character (dirty error), and factor (uses the underlying value):
library(datawizard)
foo <- data.frame(
float = c(1.1, 1.8),
char = c("a", "b"),
factor = factor(c("a", "b"))
)
data_replicate(foo, "float")
#> char factor
#> 1 a 1
#> 2 b 2
data_replicate(foo, "char")
#> Warning in FUN(X[[i]], ...): NAs introduced by coercion
#> Error in FUN(X[[i]], ...): invalid 'times' value
data_replicate(foo, "factor")
#> float char
#> 1 1.1 a
#> 2 1.8 b
#> 3 1.8 b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just a minor comment to add
Co-authored-by: Etienne Bacher <[email protected]>
One failure due to some setup issues in the CI, one because of a segfault that seems unrelated to this. |
Today, I needed a function that repeats rows based on values of another column (similar to
uncount()
). Here's a quick implementation, wdyt?Created on 2024-03-20 with reprex v2.1.0