Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regionFinder not sorting positions #27

Open
kperzel opened this issue Feb 6, 2020 · 0 comments
Open

regionFinder not sorting positions #27

kperzel opened this issue Feb 6, 2020 · 0 comments

Comments

@kperzel
Copy link

kperzel commented Feb 6, 2020

Hi,
I have been trying to use regionFinder to find statistic bumps in large genomic data sets, but I have noticed that the output doesn't make sense. Here is an example:

library(bumphunter)
statistic = c(5.552229, 5.548916, 5.518587, 5.486775, 5.478647, 5.472431, 5.469969, 5.450930,5.437189, 5.385867, 5.377567, 5.369858, 5.331892, 5.305152, 5.277105, 5.269921,5.256093, 5.203086)
methPos = c(2225407, 2225187, 2236547, 2225535, 2279596, 2225023, 2225019, 2225580, 2224969,2279661, 2237227, 2224879, 2224834, 2203900, 2202092, 2203918, 2224756, 2224708)
methChr = rep("chr12",18)
data = as.data.frame(cbind(statistic,methPos,methChr))
data$statistic = as.numeric(as.character(data$statistic))
data$methPos = as.numeric(as.character(data$methPos))
regionFinder(data$statistic, data$methChr, data$methPos, cutoff = 3.5)

The first couple lines of the output look like this:

    chr   start     end    value      area cluster indexStart indexEnd  L
3 chr12 2224708 2225580 5.416306 59.579368       3          1       18 18
6 chr12 2279596 2279661 5.432257 10.864514       6          5       10  6
  clusterL
3       11
6        2

Those numbers don't make sense though, L should not be longer than clusterL, and if you look at what's actually contained in the first range, it's only 9 sites, not 18. I have noticed that the issue goes away if I manually order my data, but because the default for assumeSorted = FALSE it seems that I shouldn't have to. My data comes from multiple locations so it is a pain to have to pull it all together to order, rather than just being able to pull the relevant vectors. My version in use is bumphunter_1.28.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant