Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- Start of ToxicRmod commit #211

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

- Start of ToxicRmod commit #211

wants to merge 4 commits into from

Conversation

syeddans
Copy link
Collaborator

No description provided.

@syeddans syeddans requested a review from mattjmeier February 15, 2024 21:30
@syeddans syeddans self-assigned this Feb 15, 2024
@syeddans
Copy link
Collaborator Author

updated from meeting to use lapply instead of for loop, greatly improved speed. Tried to use mclapply but it hangs on the first processing of the model, am looking further into it

}
}

result <- pbmclapply(bmdoutput[,2:ncol(bmdoutput)], function(col) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can speed this up much more by passing mc.cores (usually a good idea to use num CPUs - 1 - see some examples and discussion here: https://stackoverflow.com/questions/28954991/whether-to-use-the-detectcores-function-in-r-to-specify-the-number-of-cores-for)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my apologies I thought I read that it automatically used the maximum amount of cores but you are correct including detectcores greatly improves speed!

dose_column <-as.numeric(bmdoutput[,1])
start.time <- Sys.time()
Func <- function(resp_bygene_column){
trendtestres<- williamsTest(resp_bygene_column ~ dose_column, alternative = c("greater", "less"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raiduhs pointed out a downside to this, we can't set alpha with this implementation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

He also gave this a test at his end and found that it might need to be run for "greater" and "less" as separate tests, can you confirm whether you get both alternative hypotheses output?

> williamsTest(-1*x ~ g, alternative = c("greater", "less"))
 
         Williams trend test
 
data:  -1 * x by g 
alternative hypothesis:  greater

H0
               t'-value df t'-crit decision alpha
mu1 - ctr <= 0   -1.435 12   1.782   accept  0.05
mu2 - ctr <= 0   -1.435 12   1.873   accept  0.05
---
> williamsTest(-1*x ~ g, alternative = "less")
 
         Williams trend test
 
data:  -1 * x by g 
alternative hypothesis:  less
 
H0

               t'-value df t'-crit decision alpha
mu1 - ctr >= 0    1.357 12   1.782   accept  0.05
mu2 - ctr >= 0    2.950 12   1.873   reject  0.05
---

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to test both greater and less, I am also check the ToxicR williamtrendtest and unfortunately it also does not allow for choosing an alpha value, I think we might be revamping the williamtrendtest in general anyways based on the BMDexpress source files I have sent you

trendestresdf <- data_frame("pass/fail value" = trendtestres$crit.value[,1] - trendtestres$statistic[,1])
#only generate model when significant based on williams test
if (any(trendestresdf$`pass/fail value` <0)){
model<-single_continuous_fit(dose_column,resp_bygene_column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like you're specifying any model; I guess the default is Hill?

The ideal way to approach this would be to wrap the whole function so that it can iterate over model types (ideally also in parallel)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the default was hill, I have implemented modelling of all 5 available models, and to save the best one based on The maximum value of the likelihod/posterior, I read that this is an assessment of the goodness of fit and a higher maximum would be considered the best model, please correct me if I am wrong. I have also implemented this in parallel processing (thank you for teaching me about it, it has changed my world), I used nested mclapplys but was worried about how it would be affected given the cores distributed to the workers (maybe there is an optimal split?), I also used foreach dopar, based on this discussion: https://stackoverflow.com/questions/51707443/should-mclapply-calls-be-nested. Indeed after testing, foreach dopar is slightly faster than nested mclapply with maximum cores used for both.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output file also now contains the gene the bmd and the model that was the best used

- calculate all model and choose best method (based on The maximum value of the likelihod/posterior) also in parellel processing,
- input best model
- perform williams trend test with alternative both greater and less
- start of rmarkdown file using shiny (has not been updated for change in including best model type)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants