Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vet] make a list of weaknesses in sybil #50

Open
okayzed opened this issue May 24, 2018 · 0 comments
Open

[vet] make a list of weaknesses in sybil #50

okayzed opened this issue May 24, 2018 · 0 comments

Comments

@okayzed
Copy link
Collaborator

okayzed commented May 24, 2018

there are some places that i think can use improvement, this task is to list and improve them

accuracy:

  • combining histograms between leaf nodes might lose accuracy, because the leaf nodes are not necessarily using histograms with the same size buckets.
  • the "auto" histogram size is based off the extents of all data in a column. if you use "auto" and filter to a subset with a smaller range, the histogram will be inaccurate. can set the hist bucket manually or use a log hist to remediate
  • a large group by might have intermediate results pruned out during aggregation. the pruning limit is 1000 internal rows for a group of block specs (typically 4 - 8 blocks)

safety:

  • writing a new block of data involves loading all data from the unfinished block and then re-saving it all (instead of appending). this is easier / safer with gob, but maybe not as fast

memory:

  • a large group by with log hists can blow memory up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant