-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- Start of ToxicRmod commit #211
base: main
Are you sure you want to change the base?
Conversation
updated from meeting to use lapply instead of for loop, greatly improved speed. Tried to use mclapply but it hangs on the first processing of the model, am looking further into it |
ToxicR workinprogress.R
Outdated
} | ||
} | ||
|
||
result <- pbmclapply(bmdoutput[,2:ncol(bmdoutput)], function(col) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can speed this up much more by passing mc.cores (usually a good idea to use num CPUs - 1 - see some examples and discussion here: https://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my apologies I thought I read that it automatically used the maximum amount of cores but you are correct including detectcores greatly improves speed!
ToxicR workinprogress.R
Outdated
dose_column <-as.numeric(bmdoutput[,1]) | ||
start.time <- Sys.time() | ||
Func <- function(resp_bygene_column){ | ||
trendtestres<- williamsTest(resp_bygene_column ~ dose_column, alternative = c("greater", "less")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raiduhs pointed out a downside to this, we can't set alpha with this implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He also gave this a test at his end and found that it might need to be run for "greater" and "less" as separate tests, can you confirm whether you get both alternative hypotheses output?
> williamsTest(-1*x ~ g, alternative = c("greater", "less"))
Williams trend test
data: -1 * x by g
alternative hypothesis: greater
H0
t'-value df t'-crit decision alpha
mu1 - ctr <= 0 -1.435 12 1.782 accept 0.05
mu2 - ctr <= 0 -1.435 12 1.873 accept 0.05
---
> williamsTest(-1*x ~ g, alternative = "less")
Williams trend test
data: -1 * x by g
alternative hypothesis: less
H0
t'-value df t'-crit decision alpha
mu1 - ctr >= 0 1.357 12 1.782 accept 0.05
mu2 - ctr >= 0 2.950 12 1.873 reject 0.05
---
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code to test both greater and less, I am also check the ToxicR williamtrendtest and unfortunately it also does not allow for choosing an alpha value, I think we might be revamping the williamtrendtest in general anyways based on the BMDexpress source files I have sent you
ToxicR workinprogress.R
Outdated
trendestresdf <- data_frame("pass/fail value" = trendtestres$crit.value[,1] - trendtestres$statistic[,1]) | ||
#only generate model when significant based on williams test | ||
if (any(trendestresdf$`pass/fail value` <0)){ | ||
model<-single_continuous_fit(dose_column,resp_bygene_column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you're specifying any model; I guess the default is Hill?
The ideal way to approach this would be to wrap the whole function so that it can iterate over model types (ideally also in parallel)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the default was hill, I have implemented modelling of all 5 available models, and to save the best one based on The maximum value of the likelihod/posterior, I read that this is an assessment of the goodness of fit and a higher maximum would be considered the best model, please correct me if I am wrong. I have also implemented this in parallel processing (thank you for teaching me about it, it has changed my world), I used nested mclapplys but was worried about how it would be affected given the cores distributed to the workers (maybe there is an optimal split?), I also used foreach dopar, based on this discussion: https://stackoverflow.com/questions/51707443/should-mclapply-calls-be-nested. Indeed after testing, foreach dopar is slightly faster than nested mclapply with maximum cores used for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output file also now contains the gene the bmd and the model that was the best used
- calculate all model and choose best method (based on The maximum value of the likelihod/posterior) also in parellel processing, - input best model - perform williams trend test with alternative both greater and less - start of rmarkdown file using shiny (has not been updated for change in including best model type)
No description provided.