Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rounding error in sample_work.py #524

Open
siwhitehouse opened this issue Oct 19, 2019 · 0 comments
Open

Rounding error in sample_work.py #524

siwhitehouse opened this issue Oct 19, 2019 · 0 comments
Assignees

Comments

@siwhitehouse
Copy link
Collaborator

The logic in https://github.com/pwyf/aid-transparency-tracker/blob/original-version/iatidq/sample_work/sample_work.py can cause a rounding error which leads to n-1 files being sampled, where n is the total number of activity files an organisation publishes and n<20.

The code in question is repeated below:


        total = int(sum([x.results_data / 100. * x.results_num
                         for x in ag_results]))
        if total <= num_samples:
            indexes = range(total)
        else:
            indexes = sorted(random.sample(range(total), num_samples))

The above code attempts to works out how many files to sample from the aggregate results table in the database. It does this by looking at the fraction of results that pass the test (x.results_data/100) and multiplies it by the total number of results (x.results_num). It then adds up all of the rows in the table for that test and organisation and finally it casts this to an integer.

It's the last bit that is causing the problem. int(x) in Python 2.7 ignores everything after the decimal point so when x.results_data / 100. * x.results_num doesn't equal a whole number (for values of x.results_num such as 3, 7, 11, 13 etc.) then casting to an int causes the rounding error.

This can be 'fixed' by changing 'int' to 'round' in the code as this will always round up in these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants