You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
total = int(sum([x.results_data / 100. * x.results_num
for x in ag_results]))
if total <= num_samples:
indexes = range(total)
else:
indexes = sorted(random.sample(range(total), num_samples))
The above code attempts to works out how many files to sample from the aggregate results table in the database. It does this by looking at the fraction of results that pass the test (x.results_data/100) and multiplies it by the total number of results (x.results_num). It then adds up all of the rows in the table for that test and organisation and finally it casts this to an integer.
It's the last bit that is causing the problem. int(x) in Python 2.7 ignores everything after the decimal point so when x.results_data / 100. * x.results_num doesn't equal a whole number (for values of x.results_num such as 3, 7, 11, 13 etc.) then casting to an int causes the rounding error.
This can be 'fixed' by changing 'int' to 'round' in the code as this will always round up in these cases.
The text was updated successfully, but these errors were encountered:
The logic in https://github.com/pwyf/aid-transparency-tracker/blob/original-version/iatidq/sample_work/sample_work.py can cause a rounding error which leads to n-1 files being sampled, where n is the total number of activity files an organisation publishes and n<20.
The code in question is repeated below:
The above code attempts to works out how many files to sample from the aggregate results table in the database. It does this by looking at the fraction of results that pass the test (x.results_data/100) and multiplies it by the total number of results (x.results_num). It then adds up all of the rows in the table for that test and organisation and finally it casts this to an integer.
It's the last bit that is causing the problem. int(x) in Python 2.7 ignores everything after the decimal point so when
x.results_data / 100. * x.results_num
doesn't equal a whole number (for values of x.results_num such as 3, 7, 11, 13 etc.) then casting to an int causes the rounding error.This can be 'fixed' by changing 'int' to 'round' in the code as this will always round up in these cases.
The text was updated successfully, but these errors were encountered: