Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile-based discretisation function #1197

Open
JosephBond opened this issue Dec 12, 2024 · 2 comments
Open

Quantile-based discretisation function #1197

JosephBond opened this issue Dec 12, 2024 · 2 comments

Comments

@JosephBond
Copy link
Collaborator

JosephBond commented Dec 12, 2024

Fluid library function to bin a set of “observations” into a set of (potentially non-uniform) bins.

As a test case, use the new function to compute the 5-95% interpercentile range from the sample we generate from the preprocessing step in explorable-viz/transparent-text#22. This will make up part of the pipeline for the examples, and is needed to provide a data source for the whisker plots on the bar chart bars, and also for computing the appropriate probabilities to use in text like “very likely”.

Python libraries to consider:

  • pandas.cut (supports custom bins)
  • pandas.qcut (quantile-based bins)
  • numpy.digitize (similar but doesn’t require bin labels)

Going forward let’s use Python-inspired names for library functions, to leverage #1139.

See also:

  • explorable-viz/transparent-text#26
@JosephBond JosephBond added this to the transparent-text 0.1 milestone Dec 12, 2024
@JosephBond JosephBond self-assigned this Dec 12, 2024
@JosephBond JosephBond moved this to In Progress in Fluid Dec 12, 2024
@JosephBond JosephBond added this to Fluid Dec 12, 2024
@rolyp rolyp moved this from In Progress to Planned in Fluid Dec 16, 2024
@rolyp
Copy link
Collaborator

rolyp commented Dec 16, 2024

@JosephBond Added some clarification to this task and renamed from “Calculate Interpercentile Range from empirical distribution”.

@rolyp rolyp changed the title Calculate Interpercentile Range from empirical distribution Bin data set using custom bin sizes Dec 16, 2024
@rolyp
Copy link
Collaborator

rolyp commented Dec 16, 2024

@JosephBond It looks like qcut does take an argument that allows you specify the target quantiles, so we could take a similar approach. E.g. something pandas.qcut(xs, q=[0, 0.05, 0.95, 1.0]) but without the named argument syntax. Renamed task again.

@rolyp rolyp changed the title Bin data set using custom bin sizes Quantile-based discretisation function with custom quantiles Dec 16, 2024
@rolyp rolyp changed the title Quantile-based discretisation function with custom quantiles Quantile-based discretisation function Dec 16, 2024
@rolyp rolyp moved this from Planned to In Progress in Fluid Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants