Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linear interpolation leads to some crazy estimates #14

Open
astolman opened this issue Aug 16, 2016 · 6 comments
Open

linear interpolation leads to some crazy estimates #14

astolman opened this issue Aug 16, 2016 · 6 comments

Comments

@astolman
Copy link

The linear interpolation you wrote for estimating the biases seems like a reasonable approach, but I found a case where it leads to absurd estimates. I had a counter of precision 5 with a raw estimate of 128 point 6 something and the two closest raw estimates libcount found in the table were like 128.3464 and 128.3462, and their respective biases are about 1 apart. The linear interpolation said that it's bias should be in the thousands since the straight line connecting those two closest points is so steep. As a result, I got back a negative cardinality estimation, which is definitely not right. The spec in the paper just says "interpolate" without specifying how, so I would instead of just looking for the two nearest neighbors, look for the interval which contains the raw estimate and find where the estimate lies on that line. That seems like another reasonable interpolation and avoids the bias estimate for a given value to be wildly larger than that of its nearest neighbors.

@dialtr
Copy link
Owner

dialtr commented Aug 16, 2016

That sounds very reasonable. I'll work on a fix.

Sent from my iPhone

On Aug 16, 2016, at 5:28 PM, astolman [email protected] wrote:

The linear interpolation you wrote for estimating the biases seems like a reasonable approach, but I found a case where it leads to absurd estimates. I had a counter of precision 5 with a raw estimate of 128 point 6 something and the two closest raw estimates libcount found in the table were like 128.3464 and 128.3462, and their respective biases are about 1 apart. The linear interpolation said that it's bias should be in the thousands since the straight line connecting those two closest points is so steep. As a result, I got back a negative cardinality estimation, which is definitely not right. The spec in the paper just says "interpolate" without specifying how, so I would instead of just looking for the two nearest neighbors, look for the interval which contains the raw estimate and find where the estimate lies on that line. That seems like another reasonable interpolation and avoids the bias estimate for a given value to be wildly larger than that of its nearest neighbors.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@dialtr
Copy link
Owner

dialtr commented Mar 6, 2017

Hey Andrew, I finally got around to a fix for this, would love to get your review on it. The branch is:

tdial-interpolation-fix

I am working on certification tests next.

@dialtr
Copy link
Owner

dialtr commented Mar 6, 2017

Actually, hold off. There's an issue with the data in the tables.

@astolman
Copy link
Author

astolman commented Mar 6, 2017 via email

@astolman
Copy link
Author

astolman commented Mar 7, 2017 via email

@dialtr
Copy link
Owner

dialtr commented Mar 7, 2017

Linear interpolation fix completed. Note that the empirical data is now sorted properly. Also added a simple certification test. Just run:

make certify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants