-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash opclass for spoint #102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic contribution! Thank you, @df7cb !
I think the only thing missing is the addition of the pgs_hash.sql.in to the upgrade script?
That's already in there:
|
Oh, sorry, I was expecting to see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will defer to @vitcpp when it comes to the implementation of the hash function, but everything else looks good to me.
The current implementation just xor-s the 4x4 bytes chunks together. I think that should yield a good distribution on the "random" coordinate values that should be seen in practise. If we think that's not good enough, we might just use the built-in hash functions on floats. ( |
I think, everythink is ok, but I would propose to improve the hash function. The longitude range is [0, 2*pi], the latitude range is [-pi/2,p/2], EPSILON = 1E-9. We have to calculate the hashes in these ranges. The smaller numbers are "equal" and should have the same hash. There are also some other special values like INF, NAN. I think that all NANs should produce the same hash value. Denormalized numbers are smaller than 1E-9, they are 0. I will think about some other implementation of hash function. |
I don't think we can do much about mapping values less than EPSILON apart to the same value - we don't know in which direction to round. The only thing we could do is map values close to 0 to 0, but if we don't do it for the rest, why bother with 0. Infinity is not a problem since it's a unique bit value, but along with NaN, can these really appear in spoint values? I even had trouble getting One possible hash implementation that takes care of
or
|
@df7cb May be to use hashfloat8? I have no strong arguments against your implementation except that it is not the best hash function, I think. I want to ask my junior colleagues to think how to improve hash function and benchmark it. |
Besides hash indexes on spoint, this enables "select distinct spoint from tbl" queries. Close postgrespro#101.
I used the |
Besides hash indexes on spoint, this enables "select distinct spoint from tbl" queries.
Close #101.