Feature/fold delta's: only insert and delete effective inserts and deletions. #7

ajuvercr · 2021-08-19T12:32:31Z

Analyse incoming quad changes for effective inserts and deletions.

An effective insert is a quad that would be inserted (is not yet present in the triplestore).
An effective deletion is a quad that would be deleted (is present in the triplestore).

Multiple quad change requests are folded together and flushed with the first select query or with a flush timeout.
With this change delta-notifier can also forward these effective changes in format "v0.0.2".

To determine what changes are effective or not one CONSTRUCT query is created as follows:

CONSTRUCT {
  ?s ?p ?o
} WHERE {
  VALUES (?s ?p ?o) {
    (<some/subject> <some/predicate> <some/object>)
    (<some/subject2> <some/predicate2> <some/object2>)
  }
  ?s ?p ?o.
}

Currently the used VIRTUOSO instance returned an error when adding GRAPH information, although supported in sparql 1.1.

This created a problem. One triple can be present in one graph but not in another, to handle this edge case, the presence of these quads is determined with ASK queries.

…es per mu_call_id_trail

… query + broken delta messages)

ajuvercr · 2021-08-19T12:40:53Z

TODO:

update README
add timer to flush
add ENV variable to make timeout duration variable

ajuvercr · 2021-08-23T11:12:05Z

TODO:

vigorous testing
benchmark performance differences between old and new implementation

ajuvercr · 2021-08-25T22:28:39Z

Benchmark results

Benchmarking this change is pretty challenging, because the use cases of mu-auth are very diverse.
The expected ratio between read and write queries is not set in stone.

I tried a benchmark where a simple object is created with around 12 fields, then changed 2 properties in a similar fashion as mu-cl-resources: delete all properties, not only the changed properties and create all new properties.
Next the same object is read for 4 times and then deleted.

The only notable times the cache flushes delta's to the triplestore is either during a read query (all delta's have to be flushed to guarantee a correct read query) or just before a read query due to a timeout (set with QUAD_CHANGE_CACHE_TIMEOUT env variable).

Results

Single entity manipulation

First the flush to the triplestore is executed during a read query (worst case)

New implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
17.575      15.000      9.235       10.000      45.000      14.800    16.000    3.187     11.000    18.000    16.267    16.000    2.516     12.000    20.000    12.600    12.000    1.625     11.000    15.000    21.467    13.000    13.754    10.000    45.000

Old implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
27.425      26.000      13.472      10.000      54.000      27.400    26.000    3.826     25.000    35.000    42.467    41.000    5.162     35.000    54.000    26.600    27.000    1.625     24.000    29.000    12.667    13.000    2.700     10.000    21.000

Next the flush is executed due to a timeout, so not during any read query (best case)

New implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
15.950      16.000      4.283       9.000       27.000      13.200    16.000    2.993     9.000     17.000    18.867    19.000    3.364     11.000    25.000    13.000    15.000    2.608     9.000     16.000    14.933    15.000    4.171     10.000    27.000

Old implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
22.725      26.500      8.729       9.000       35.000      29.000    29.000    1.897     26.000    31.000    30.333    30.000    3.300     25.000    35.000    25.400    26.000    2.154     22.000    28.000    12.133    12.000    1.996     9.000     16.000

Multiple entity manipulations

The previous section only covered the manipulation of a single entity, this is probably not representative.
Here the same benchmark is started multiple times with a slight offset, manipulating different entities.

The big flush happens with the first read query. But there are many more read queries to bring down the average.

New implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
23.031      18.000      17.912      8.000       114.000     11.350    12.500    2.056     8.000     16.000    17.817    18.000    3.806     11.000    26.000    17.350    18.500    3.745     11.000    24.000    34.033    28.500    25.110    8.000     114.000

Old implementation:
total time                                                  insert                                            update                                            delete                                            select                                            
mean        median      std         min         max         mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       mean      median    std       min       max       
22.288      27.000      9.401       8.000       45.000      24.500    24.500    3.486     18.000    31.000    31.183    31.000    4.209     24.000    45.000    26.050    25.000    3.612     19.000    34.000    11.400    12.000    2.354     8.000     19.000

Conclusion

In all cases the expected median is lower when using the new implementation. To achieve this one read query takes considerably longer than normal. If the actual use case allows this, one can put the timeout duration pretty low, so executing the delta's happens when no request is open.

Trivia

You can change the cache behavior with a request header (update is coming to the readme). The best cache behavior is the select behavior, which ~~will soon be~~ is the default. Other options are constructs, construct_and_asks and only_asks. These define how the mechanism determines if a quad is present in the triplestore.
The measurements are taken from the entire sparql query, including HTTP protocol etc to mu-auth.
Each measurement uses 5 samples.

ajuvercr added 10 commits August 9, 2021 14:56

remove linter warnings

3708cbc

add indexed field in deltanotifier body

8ea8593

coalesce delta messages try 1: with timeout, but just array of messag…

72721ee

…es per mu_call_id_trail

add configurable timeouts

feb34cf

ASK if updates are necessary before executing

8a69f62

move multiple ASK queries to a single CONSTRUCT query (where possible)

593c48e

create delta's cache with actual inserts and deletions (flush at read…

acfd107

… query + broken delta messages)

cleanup + fix some issues

0e51a56

make backwards compatible with deltanotifier

de466da

revert unwanted changes

bc27fa6

ajuvercr added 2 commits August 19, 2021 17:01

add timeout support to deltas

7a605cc

update README

8c2427d

ajuvercr changed the title ~~WIP Feature/fold delta's: only insert and delete effective inserts and deletions.~~ Feature/fold delta's: only insert and delete effective inserts and deletions. Aug 19, 2021

ajuvercr mentioned this pull request Aug 19, 2021

V0.0.2: add effective inserts and deletions mu-semtech/delta-notifier#9

Open

ajuvercr and others added 6 commits August 23, 2021 14:09

remember all inserts and deletion indices

56911ab

bump

39640b4

bump2

408f468

Split CacheType from Deltas

f105c10

add other query methods

cdeb054

Revert Dockerfile

31f1632

make select default + remove debug statements

89464cf

ajuvercr mentioned this pull request Sep 17, 2021

WIP: Feature/indexed deltas, attach index to delta messages to indicate order #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/fold delta's: only insert and delete effective inserts and deletions. #7

Feature/fold delta's: only insert and delete effective inserts and deletions. #7

ajuvercr commented Aug 19, 2021 •

edited

Loading

ajuvercr commented Aug 19, 2021 •

edited

Loading

ajuvercr commented Aug 23, 2021 •

edited

Loading

ajuvercr commented Aug 25, 2021 •

edited

Loading

Feature/fold delta's: only insert and delete effective inserts and deletions. #7

Are you sure you want to change the base?

Feature/fold delta's: only insert and delete effective inserts and deletions. #7

Conversation

ajuvercr commented Aug 19, 2021 • edited Loading

ajuvercr commented Aug 19, 2021 • edited Loading

ajuvercr commented Aug 23, 2021 • edited Loading

ajuvercr commented Aug 25, 2021 • edited Loading

Benchmark results

Results

Single entity manipulation

Multiple entity manipulations

Conclusion

Trivia

ajuvercr commented Aug 19, 2021 •

edited

Loading

ajuvercr commented Aug 19, 2021 •

edited

Loading

ajuvercr commented Aug 23, 2021 •

edited

Loading

ajuvercr commented Aug 25, 2021 •

edited

Loading