Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for log-normal distribution #99

Closed
mutlusun opened this issue Jun 18, 2021 · 5 comments
Closed

Add support for log-normal distribution #99

mutlusun opened this issue Jun 18, 2021 · 5 comments
Labels
feature feature request or enhancement

Comments

@mutlusun
Copy link

First of all, many thanks for all the work on this awesome package!

Please briefly describe your problem and what output you expect.

I would like to create / simulate a log-normal distribution, however the package lacks this function at the moment. There are probably some ways to work around this, but an official support would be nice!

Thanks!

@assignUser
Copy link
Collaborator

Thank you for the suggestion! I am currently thinking about restructuring the way we handle different distributions to make maintenance easier and better support new and custom distributions (@kgoldfeld FYI :D).

Until then the easiest way would be to use the non-random distribution:

library(simstudy)

def <- defData(
    varname = "log_normal",
    formula = "rlnorm(..n, 3, 1)",
    dist = "nonrandom"
)

n <- 10
genData(n, def)
#>     id log_normal
#>  1:  1   47.49277
#>  2:  2   27.02211
#>  3:  3   29.59762
#>  4:  4  137.59722
#>  5:  5   20.31148
#>  6:  6   31.99283
#>  7:  7   11.32037
#>  8:  8   32.01089
#>  9:  9   19.46739
#> 10: 10   12.42206

Created on 2021-06-19 by the reprex package (v0.3.0.9001)

It is important to pass n as a double-dot var as the formula does not have access to it otherwise, so just define it explicitly before calling genData.

You can of course also use other variables defined in def for mean and sd, because rlnorm is not vectorized we have to use mapply. Just replace n with one in that case so the formula is run for each row separately:

library(simstudy)

def <- defData(varname = "mean", formula = 1, variance = .5, dist = "normal") 
def <- defData(def, varname = "sd", formula = 1, dist = "nonrandom") 
def <- defData(def,
        varname = "log_normal",
        formula = "mapply(..rlnorm, meanlog = mean, sdlog = sd, list(n = 1))",
        dist = "nonrandom"
    )

genData(10, def)
#>     id       mean sd log_normal
#>  1:  1 -0.2031097  1  0.1046788
#>  2:  2  0.7177785  1  3.2165236
#>  3:  3  1.4812141  1  0.8749564
#>  4:  4  1.4459571  1  1.1580688
#>  5:  5  0.8243017  1  3.4912104
#>  6:  6  0.7569261  1  0.2636364
#>  7:  7  1.1386356  1 10.2438622
#>  8:  8  2.1595046  1 46.3078970
#>  9:  9  2.2524009  1 41.5428683
#> 10: 10  2.3570354  1 55.5207466

Created on 2021-06-19 by the reprex package (v0.3.0.9001)

@assignUser assignUser added the feature feature request or enhancement label Jun 19, 2021
@mutlusun
Copy link
Author

Thanks for your fast reply and the example! :)

I have now seen that in the vignette custom functions are already described 🤷

@assignUser
Copy link
Collaborator

:) As this workaround seems to work for you I will close this issue. If you have any more questions don't hesitate to comment/reopen! The custom distributions support etc. will likely be tracked in #75 or mentioned there if you want to follow up.

@assignUser
Copy link
Collaborator

@mutlusun You could of course also make it a bit easier to read using Vectorize:

library(simstudy)

log_norm <- Vectorize(rlnorm, c("meanlog", "sdlog"))

def <- defData(varname = "mean", formula = 1, variance = .5, dist = "normal") 
def <- defData(def, varname = "sd", formula = 1, dist = "nonrandom") 
def <- defData(def,
        varname = "log_normal",
        formula = "log_norm(1, mean, sd)",
        dist = "nonrandom"
    )

genData(10, def)
#>     id       mean sd log_normal
#>  1:  1  1.0137600  1  9.8496741
#>  2:  2  1.6038807  1  6.0124375
#>  3:  3 -0.6702917  1  1.4655686
#>  4:  4  0.2745950  1  1.6558383
#>  5:  5  0.5995254  1  3.6522213
#>  6:  6  1.5485464  1  6.2918779
#>  7:  7  1.3193654  1 12.4443602
#>  8:  8  0.8407032  1  0.7066864
#>  9:  9 -0.2637986  1  0.5728164
#> 10: 10  0.6323783  1  3.6803245

Created on 2021-06-22 by the reprex package (v2.0.0)

@mutlusun
Copy link
Author

Thanks @assignUser for your detailed help! I will look into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants