Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review binning logic for Grade Distribution to handle edge cases not caught by current logic #1419

Open
jennlove-um opened this issue Sep 1, 2022 · 0 comments
Assignees

Comments

@jennlove-um
Copy link
Contributor

Describe your problem or feature you'd like added

Grade Distribution binning logic should be revisited to handle an edge case distribution with specific properties:

  1. The difference between the 5th lowest grade and each the grades above is less than 2
  2. The difference between the first 4 lowest grades and the 5th grade is more than 2

The current binning logic looks at the 5th lowest grade and the grades above to determine whether to bin all the lowest grades together. However, there may be cases where the lowest 4 grades should be binned for student privacy, but the 5th lowest fits with the higher grades in the histogram bins.

Example data:
(96.0, 96.0, 96.0, 96.0, 98.0, 98.0, 98.5, 98.5, 99, 99, 99.99)

The logic looks at the 5th lowest (98) and checks to see if it is in the same bin as each subsequent grade by checking for a different >2. In this case, all of the remaining grades (98 through 99.99) would all fall in the same bin. From there, the logic determines that since everything would fall in one bin for privacy. In the case where the logic determines to bin all, we don't use the dotted line or group lowest 4 grades together. Instead the distribution looks like the one below:

image

This becomes more of a problem if the lowest 4 grades are significantly lower than the 5th and above.

Describe the solution you'd like

We need a way to protect the privacy of lower performing students when this edge case occurs. This may require a significant rewrite to the privacy binning logic to use a statistical analysis formula rather than the approach we are currently using.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant