Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vectorised DBSCAN outlier detection implementation #119

Open
sd2k opened this issue Oct 4, 2024 · 0 comments
Open

Add vectorised DBSCAN outlier detection implementation #119

sd2k opened this issue Oct 4, 2024 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sd2k
Copy link
Collaborator

sd2k commented Oct 4, 2024

It should be possible to use SIMD to vectorise the outlier detector calculation. I haven't used SIMD directly before but the std::simd ('portable SIMD') module should handle what we need (albeit behind a nightly only flag).

I made some very rough notes on the dbscan_1d algorithm which might help implementers:

Original data:

9 6 3 4 1 15 0 10 5

sort it and keep reverse index for sorted data:

0 1 3 4 5 6 9 10 15

6 4 2 3 8 1 0 7 5


dbscan_1d:

Given sorted data:

0 1 3 4 5 6 9 10 15 <...0 padding until lane length>

Shift it right:

0 0 1 3 4 5 6 9 10 15 <... 0 padding until lane length>

First calculate diffs:

 1 2 1 1 1 3 1  5

Then determine if <= eps, e.g. for 2:

 true true true true true false true false

Find start/end index of longest chain of trues:

   true  true true true true false true  false <...false padding until lane length>

   false true true true true true  false true false <...false padding until lane length>

  xor'd:
    1     0    0    0    0    1     1     1
    |                         -     |     -

 0 5 6 7

Odds are cluster starts, evens are cluster ends.

Everything in the largest range is in the cluster.

Everything not included in this index range is an outlier:

  indexes 6 7 8
  values 9 10 15
@sd2k sd2k added enhancement New feature or request help wanted Extra attention is needed labels Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant