Muting errors that may arise from statistical tests #9

vhorvath · 2023-05-30T14:38:03Z

Came across this useful package and as I was playing around with cellularity parameters in the twosamplecompare function I ran into a few errors, where some segments would fail the t.test() or wilcox.test().

The error is that the values these tests are supposed to be ran on are technically identical and therefore the tests throw an error. However, this prevents the tests from completing, the function has no error handling for this scenario.

This PR provides a way to silence these errors. Since errors may arise from other sources than feeding improper data, and R does not provide a formal mechanism for testing for a specific error with these tests a toggle is provided so that the errors can be inspected for debugging purposes.

Note this is a quick fix and should be evaluated on more datasets, the man page has also been updated accordingly.

jpoell · 2023-05-30T15:19:06Z

Hi Viktor,

Thanks for reporting on this issue and suggesting a solution. To be honest, I have to brush up on my Git skills to see how and if I take over these changes. But I would also like to have a look at the errors before making any changes. So let me try to understand the problem first. Are you trying to do twosamplecompare on copynumber templates that have exactly identical values for all the bins in the segments, so that you get the "data are essentially constant" error? I suppose there is a use for twosamplecompare to just plot the two samples and not have associated tests. But if that is the case, it is perhaps more useful to add a function that is better equipped to plot multiple samples (I have some code ready for this, but did not add it to the package yet). Perhaps you can better inform me on your use case.

Cheers,

Jos

vhorvath · 2023-05-30T16:32:18Z

Hello Jos,

This error only appears if the cellularity values have some specific values when it seems that some of the binned values become identical after scaling as it is assessed by the statistical test functions. I truncated the data I use to give a reproducible example, see at the bottom of my message.

After example_short <- readRDS("example-short-2023-05-30.rds") you could try the following two scenarios to investigate the error.

Scenario 1

It will run just fine if the following is done:

ACE::twosamplecompare(example_short,
                        index1 = 1,
                        index2 = 2,
                        cellularity1 = 1,
                        cellularity2 = 1,
                        quieterrors = FALSE)

Output 1

Corresponding statistical test data frame looks like this

Scenario 2

It fails when

ACE::twosamplecompare(example_short,
                        index1 = 1,
                        index2 = 2,
                        cellularity1 = 0.82,
                        cellularity2 = 1,
                        quieterrors = FALSE)

Output 2

With the statistical data frame looks like this if this patch is applied

It is entirely possible that this is not actually an error in the algorithm, but the data that goes into it and one should know better. For sample AL_10 in the data that I used the cellularity values did not have minimums, in fact the choice by ACE was the final maximum, so I thought it might be sensible to toy around with the cellularity parameter.

To me it was convenient to use ACE to do the statistical testing between segments, but it seems that certain failure modes were not handled, or encountered.

If this patch is not the proper way of addressing the issue that is okay with me, but it would be great if a higher-level user error could be generated.

As for merging, it should be as simple as accepting my PR. The only thing I don't know about your process if versioning and Bioconductor releases are tied to commit hashes, or if you need to deal with that manually, so I did not touch any files dealing with versions.

I hope this helps!

Kind regards,
Viktor

Example data

example-short-2023-05-30.rds.gz

jpoell · 2023-05-31T11:27:21Z

Thanks! That is a very peculiar example! I will try to get to the bottom of it. Either way, I like your error handling, so I think I might adopt it that way.

I think in terms of version bumps it might be okay since it is now applied to the devel version. Perhaps this is also a good opportunity for me to add some more recent functionality that I have been working on. Hopefully I will find the time in the next few weeks (but I have some time until the next Bioconductor release). Thanks again, and I will leave this open until I actually implement it ;-)

Muting errors that may arise from tests

f54cc1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Muting errors that may arise from statistical tests #9

Muting errors that may arise from statistical tests #9

vhorvath commented May 30, 2023

jpoell commented May 30, 2023

vhorvath commented May 30, 2023

jpoell commented May 31, 2023

Muting errors that may arise from statistical tests #9

Are you sure you want to change the base?

Muting errors that may arise from statistical tests #9

Conversation

vhorvath commented May 30, 2023

jpoell commented May 30, 2023

vhorvath commented May 30, 2023

Scenario 1

Output 1

Scenario 2

Output 2

Example data

jpoell commented May 31, 2023