Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Muting errors that may arise from statistical tests #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vhorvath
Copy link

Came across this useful package and as I was playing around with cellularity parameters in the twosamplecompare function I ran into a few errors, where some segments would fail the t.test() or wilcox.test().

The error is that the values these tests are supposed to be ran on are technically identical and therefore the tests throw an error. However, this prevents the tests from completing, the function has no error handling for this scenario.

This PR provides a way to silence these errors. Since errors may arise from other sources than feeding improper data, and R does not provide a formal mechanism for testing for a specific error with these tests a toggle is provided so that the errors can be inspected for debugging purposes.

Note this is a quick fix and should be evaluated on more datasets, the man page has also been updated accordingly.

@jpoell
Copy link
Collaborator

jpoell commented May 30, 2023

Hi Viktor,

Thanks for reporting on this issue and suggesting a solution. To be honest, I have to brush up on my Git skills to see how and if I take over these changes. But I would also like to have a look at the errors before making any changes. So let me try to understand the problem first. Are you trying to do twosamplecompare on copynumber templates that have exactly identical values for all the bins in the segments, so that you get the "data are essentially constant" error? I suppose there is a use for twosamplecompare to just plot the two samples and not have associated tests. But if that is the case, it is perhaps more useful to add a function that is better equipped to plot multiple samples (I have some code ready for this, but did not add it to the package yet). Perhaps you can better inform me on your use case.

Cheers,

Jos

@vhorvath
Copy link
Author

Hello Jos,

This error only appears if the cellularity values have some specific values when it seems that some of the binned values become identical after scaling as it is assessed by the statistical test functions. I truncated the data I use to give a reproducible example, see at the bottom of my message.

After example_short <- readRDS("example-short-2023-05-30.rds") you could try the following two scenarios to investigate the error.

Scenario 1

It will run just fine if the following is done:

ACE::twosamplecompare(example_short,
                        index1 = 1,
                        index2 = 2,
                        cellularity1 = 1,
                        cellularity2 = 1,
                        quieterrors = FALSE)

Output 1

Corresponding statistical test data frame looks like this
image

Scenario 2

It fails when

ACE::twosamplecompare(example_short,
                        index1 = 1,
                        index2 = 2,
                        cellularity1 = 0.82,
                        cellularity2 = 1,
                        quieterrors = FALSE)

Output 2

With the statistical data frame looks like this if this patch is applied
image

It is entirely possible that this is not actually an error in the algorithm, but the data that goes into it and one should know better. For sample AL_10 in the data that I used the cellularity values did not have minimums, in fact the choice by ACE was the final maximum, so I thought it might be sensible to toy around with the cellularity parameter.

To me it was convenient to use ACE to do the statistical testing between segments, but it seems that certain failure modes were not handled, or encountered.

If this patch is not the proper way of addressing the issue that is okay with me, but it would be great if a higher-level user error could be generated.

As for merging, it should be as simple as accepting my PR. The only thing I don't know about your process if versioning and Bioconductor releases are tied to commit hashes, or if you need to deal with that manually, so I did not touch any files dealing with versions.

I hope this helps!

Kind regards,
Viktor

Example data

example-short-2023-05-30.rds.gz

@jpoell
Copy link
Collaborator

jpoell commented May 31, 2023

Thanks! That is a very peculiar example! I will try to get to the bottom of it. Either way, I like your error handling, so I think I might adopt it that way.

I think in terms of version bumps it might be okay since it is now applied to the devel version. Perhaps this is also a good opportunity for me to add some more recent functionality that I have been working on. Hopefully I will find the time in the next few weeks (but I have some time until the next Bioconductor release). Thanks again, and I will leave this open until I actually implement it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants