-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use array datatype for tags storage and searching #70
base: dev
Are you sure you want to change the base?
Conversation
Looks like we need to update the build system here |
Where is it defined? :p |
@CumpsD Sorry, can you resolve the merge conflict? |
@Arkatufus I'll try to get around to it as fast as I can, somewhere next week |
@CumpsD a couple notes on this:
|
I did a performance test using 7 tags and 2 million records using the latest (13.2) PostgreSql docker image, the result is not encouraging. Current implementation ============================== Tag-array |
If it helps to get an idea about scale, we have 200 million events :D |
There are a few outliers, but if you look at the median of both tests, you see that there are no significant improvement in performance between the 2 implementation. |
@Arkatufus just wanted to let you know I have not lost sight of this, will do when I find some time in the coming week(s) |
Interesting - is this still on the table or should we be looking at Linq2Db instead? |
There's advantages and drawbacks to Array datatype. Obviously, it simplifies parts of the query pipeline, as you're able to write everything into one row and not interleave with tables. The drawback is (AFAIK) Indexes on arrays are GIN (Generalized inverted) indexes. These can have different performance characteristics than BTrees on inserts and updates. BUT IIRC they can be more size efficient than BTree. (FWIW, this would be easy to add as an option to Persistence.Linq2Db, Yet another tagmode flag 😅 .) Side note regarding query performance: It may be prudent to flush Postgres' (and other DBs) internal caches for an operation like this, at least for some scenarios. If rows are recently inserted and pages fit into memory, unless there are a -lot- of tags I wouldn't expect to see much difference in perf. |
Sounds like we're better off sticking to Linq2Db :p |
I never got around to testing it because I'm not using it anymore. But purely postgress wise, this made more sense :) |
Apparently
tags
is a semicolon-separated list with aLIKE %%
query against it, which is not that performant when you have millions of events.PostgreSQL can turn tags into an array type and efficiently index it with a GIN index.
This PR turns the tags column in an array type and changes how tags are inserted and searched for.