Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash opclass for spoint #102

Merged
merged 1 commit into from
Nov 7, 2023

Conversation

df7cb
Copy link
Contributor

@df7cb df7cb commented Nov 3, 2023

Besides hash indexes on spoint, this enables "select distinct spoint from tbl" queries.

Close #101.

Copy link
Contributor

@esabol esabol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic contribution! Thank you, @df7cb !

I think the only thing missing is the addition of the pgs_hash.sql.in to the upgrade script?

@df7cb
Copy link
Contributor Author

df7cb commented Nov 3, 2023

That's already in there:

pg_sphere--1.3.1--1.3.2.sql: pgs_circle_sel.sql.in pgs_hash.sql.in
	cat upgrade_scripts/[email protected] $^ > $@

@esabol
Copy link
Contributor

esabol commented Nov 4, 2023

That's already in there:

pg_sphere--1.3.1--1.3.2.sql: pgs_circle_sel.sql.in pgs_hash.sql.in
	cat upgrade_scripts/[email protected] $^ > $@

Oh, sorry, I was expecting to see pg_sphere--1.3.1--1.3.2.sql in the list of changed files in the commit and didn't look closely enough at the changes to the Makefile.

Copy link
Contributor

@esabol esabol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to @vitcpp when it comes to the implementation of the hash function, but everything else looks good to me.

@df7cb
Copy link
Contributor Author

df7cb commented Nov 4, 2023

The current implementation just xor-s the 4x4 bytes chunks together. I think that should yield a good distribution on the "random" coordinate values that should be seen in practise. If we think that's not good enough, we might just use the built-in hash functions on floats. (hashfloat8, hashfloat8extended)

@vitcpp
Copy link
Contributor

vitcpp commented Nov 7, 2023

I think, everythink is ok, but I would propose to improve the hash function. The longitude range is [0, 2*pi], the latitude range is [-pi/2,p/2], EPSILON = 1E-9. We have to calculate the hashes in these ranges. The smaller numbers are "equal" and should have the same hash. There are also some other special values like INF, NAN. I think that all NANs should produce the same hash value. Denormalized numbers are smaller than 1E-9, they are 0. I will think about some other implementation of hash function.

@df7cb
Copy link
Contributor Author

df7cb commented Nov 7, 2023

I don't think we can do much about mapping values less than EPSILON apart to the same value - we don't know in which direction to round. The only thing we could do is map values close to 0 to 0, but if we don't do it for the rest, why bother with 0.

Infinity is not a problem since it's a unique bit value, but along with NaN, can these really appear in spoint values? I even had trouble getting -0 into an spoint - at one point I think I had one, but couldn't reproduce it later.

One possible hash implementation that takes care of -0 and NaN would be this:

PG_RETURN_INT32(hashfloat8(p1->lat + p1->lng));

or

PG_RETURN_INT32(hashfloat8(p1->lat) ^ hashfloat8(p1->lng));

@vitcpp
Copy link
Contributor

vitcpp commented Nov 7, 2023

@df7cb May be to use hashfloat8? I have no strong arguments against your implementation except that it is not the best hash function, I think. I want to ask my junior colleagues to think how to improve hash function and benchmark it.

Besides hash indexes on spoint, this enables "select distinct spoint
from tbl" queries.

Close postgrespro#101.
@df7cb
Copy link
Contributor Author

df7cb commented Nov 7, 2023

I used the hash1 ^ hash2 approach now, and added documentation.

@vitcpp vitcpp mentioned this pull request Nov 7, 2023
@vitcpp vitcpp merged commit 339ed93 into postgrespro:master Nov 7, 2023
14 checks passed
@df7cb df7cb deleted the hash-opclass branch November 8, 2023 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing equality operator for type spoint?
3 participants