Add vectorised DBSCAN outlier detection implementation #119

sd2k · 2024-10-04T07:14:23Z

It should be possible to use SIMD to vectorise the outlier detector calculation. I haven't used SIMD directly before but the std::simd ('portable SIMD') module should handle what we need (albeit behind a nightly only flag).

I made some very rough notes on the dbscan_1d algorithm which might help implementers:

Original data:

9 6 3 4 1 15 0 10 5

sort it and keep reverse index for sorted data:

0 1 3 4 5 6 9 10 15

6 4 2 3 8 1 0 7 5


dbscan_1d:

Given sorted data:

0 1 3 4 5 6 9 10 15 <...0 padding until lane length>

Shift it right:

0 0 1 3 4 5 6 9 10 15 <... 0 padding until lane length>

First calculate diffs:

 1 2 1 1 1 3 1  5

Then determine if <= eps, e.g. for 2:

 true true true true true false true false

Find start/end index of longest chain of trues:

   true  true true true true false true  false <...false padding until lane length>

   false true true true true true  false true false <...false padding until lane length>

  xor'd:
    1     0    0    0    0    1     1     1
    |                         -     |     -

 0 5 6 7

Odds are cluster starts, evens are cluster ends.

Everything in the largest range is in the cluster.

Everything not included in this index range is an outlier:

  indexes 6 7 8
  values 9 10 15

The text was updated successfully, but these errors were encountered:

sd2k added enhancement New feature or request help wanted Extra attention is needed labels Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vectorised DBSCAN outlier detection implementation #119

Add vectorised DBSCAN outlier detection implementation #119

sd2k commented Oct 4, 2024

Add vectorised DBSCAN outlier detection implementation #119

Add vectorised DBSCAN outlier detection implementation #119

Comments

sd2k commented Oct 4, 2024