Add automatic checks that results don't change a lot when re-processing #32

cdeil · 2014-11-15T14:51:29Z

We need some diff functionality that checks that results didn't change significantly with re-processing.

Without that this is a maintenance nightmare ... every time we re-run the benchmarks new issues can creep in (different machine, different numpy, changes in the coordinates packages).

We could do asserts on the summary statistics like min / mean / max differences ... not sure how to best set this up so that it doesn't become much work, but scales to more coordinate systems (especially horizon with observers at different locations).

The text was updated successfully, but these errors were encountered:

astrofrog · 2014-11-15T15:20:22Z

Maybe we can just add a summary that is equivalent to summary but just compares the results to the old ones instead of writing it out?

cdeil · 2014-11-15T15:27:04Z

Or only store results as ascii for a sensible number of digits, so that on 64 bit computers there will be no differences, but any changes above ~ 1 milliarcsec would show up?

cdeil · 2014-11-15T15:29:23Z

If we make this change in printing of values we should also change to radians.

Anyone that wants speed has to use radians as input / output, so would be better because then we can use the same functions for performance testing. And as far as I see changing from deg to radians has no drawbacks?

astrofrog · 2014-11-15T15:32:51Z

Actually the summary table already kind of does this in the sense that when you commit it, you see genuine changes in accuracy. Right now if I look at the diff for the summary.txt file (not committed) I can clearly see regressions and the other values have all stayed the same.

cdeil · 2014-11-15T15:38:45Z

Then we probably need to reduce the number of digits, because now that I re-ran this I get differences in the generated results ascii files for all packages:
https://gist.github.com/cdeil/45d2a1403b5cd09bc4d8

astrofrog · 2014-11-15T15:44:55Z

@cdeil - for the coordinate files yes, but I think the important one is the summary file. Do you get differences on every line there?

astrofrog · 2014-11-15T15:52:09Z

@cdeil - see #34, many of the differences are real (at least in the astropy files). But it is valid to ask if we can reduce the precision in these files. I guess we should decide on what we consider to be a sufficient resolution threshold and that will dictate the precision. What about 7 decimal places/<1mas?

cdeil · 2014-11-15T16:21:36Z

We should make sure we use float64 and then there should be 16 significant digits:
http://en.wikipedia.org/wiki/Floating_point#IEEE_754%3a_floating_point_in_modern_computers

So how about using radians to store results in ascii and using 10 digits (to have a huge safety margin).
This will mean we have `0.02 milli-arcsec in the results we store:

In [6]: Angle(1e-10, 'rad').to('milliarcsecond')
Out[6]: <Angle 0.020626480624709634 marcsec>

And we should store the version numbers of the packages and a date / username / machine-ID in the generated output.

cdeil · 2014-11-15T16:25:52Z

I just re-ran the benchmarks and get a diff like this, i.e. differences wrt your machine are in the ~15th digit and we should just store a few digits less in the generated text files to get rid of this annoyance.

$ git diff tools/astropy/fk4_to_fk5.txt
diff --git a/tools/astropy/fk4_to_fk5.txt b/tools/astropy/fk4_to_fk5.txt
index ac52e94..16af90f 100644
--- a/tools/astropy/fk4_to_fk5.txt
+++ b/tools/astropy/fk4_to_fk5.txt
@@ -7,18 +7,18 @@
  347.896167331701804  -26.594751364243319
  235.776223837825427    1.994050429678648
  270.208465512178236    7.919214807598861
- 235.444383081373815   64.489126735053333
+ 235.444383081373815   64.489126735053347
  269.849252250025302   -6.381889485270722
  346.581287089212708   63.276052845763466
    3.662930167152712    8.648362365537350
   39.225938675446685   56.806244896018313
  108.110520614048738  -13.502360598743845
  237.449850028100940  -65.121330926412611
- 291.903792021189304   46.242056251464987
+ 291.903792021189304   46.242056251464980
  314.510247855621571   29.828476026107310
  347.907503891610986    6.216260647308128
  261.101250385486651   13.536903593041876
- 231.732568628918983   42.369241964526651
+ 231.732568628918983   42.369241964526658
  258.561399424691558   53.039607077732036
  168.970316782883827   -6.379417524632820
  117.526548668390063  -52.764257819077997
@@ -41,7 +41,7 @@
  294.754811206351690   62.856602201779424
  180.718565186907369   49.697910973274276
  292.535301795154965  -41.213092295517889
-  35.250985869096795   21.277989437311117
+  35.250985869096795   21.277989437311113
   78.742465707000150  -69.205893703245678
   93.977653064137399   35.497490314874241
  169.201475305137222   38.344427127929706

astrofrog · 2014-11-15T16:38:03Z

@cdeil - done in 6db4f02. I reduced the precision to 1.e-10 which is just under 1µas, which I think will be fine for our purposes!

astrofrog · 2014-11-15T16:38:17Z

Do you get any differences now?

astrofrog · 2014-11-15T16:39:21Z

The values are still stored in degrees by the way. I don't think that really matters vs radians since this is just the storage format, irrespective of what we use for the calculations?

astrofrog · 2014-11-15T16:42:02Z

With these changes, git diff may be enough to show us the changes. Let's leave this open for now to see if that's true.

cdeil · 2014-11-15T16:52:29Z

I have this one case where there's a difference in the 12th significant digit:

$ git diff tools/astropy/fk4_to_fk5.txt 
diff --git a/tools/astropy/fk4_to_fk5.txt b/tools/astropy/fk4_to_fk5.txt
index 25b421d..bb37794 100644
--- a/tools/astropy/fk4_to_fk5.txt
+++ b/tools/astropy/fk4_to_fk5.txt
@@ -272,7 +272,7 @@
   68.8512663356   -4.0404754576
    7.9637754065   67.0293856073
  253.8555162431    8.9369845909
- 125.5572124175  -23.9708421442
+ 125.5572124175  -23.9708421443
  151.6794083542    7.3038965433
  277.0596075441   18.6503099917
  347.2618608538   13.5452761667

Really this shouldn't happen with 64-bit calculations.

Should we investigate why the Astropy calculation gives ~ 1e4 machine EPS differences on your machine and mine?

Or just reduce the number of digits by one?

cdeil · 2014-11-15T16:54:12Z

And for astrolib all lines are different on my machine ... does it use 32 bit?

    modified:   tools/astrolib/ecliptic_to_fk4.txt
    modified:   tools/astrolib/ecliptic_to_fk5.txt
    modified:   tools/astrolib/ecliptic_to_galactic.txt
    modified:   tools/astrolib/fk4_to_ecliptic.txt
    modified:   tools/astrolib/fk5_to_ecliptic.txt
    modified:   tools/astrolib/galactic_to_ecliptic.txt

astrofrog · 2014-11-15T16:58:43Z

@cdeil - 64-bit calculations don't always preserve all significant digits especially with trigonometric functions, etc. so maybe this is not surprising. If it's just one line every so often, maybe it's not a big problem? For astrolib it does seem an issue though.

cdeil · 2014-11-15T17:00:28Z

Sure ... there should be differences at the EPS or 10 x EPS level, but not 1000 x EPS for simple trigonometry and matrix algebra.

cdeil · 2014-11-15T17:01:26Z

Let's try to debug the astrolib issue ... which version are you using? is there a way to check if it uses 32 bits internally?

astrofrog · 2014-11-15T17:05:57Z

I'm using the latest astrolib.coords from PyPI. Just to check, does this run for you?

>>> import astrolib.coords
>>> astrolib.coords._test()
.................S.........................S...S.........
----------------------------------------------------------------------
Ran 57 tests in 0.008s

OK (SKIP=3)

astrofrog · 2014-11-15T17:06:53Z

There is an .so file and it's 64-bit:

$ file _pytpm.so 
_pytpm.so: Mach-O 64-bit bundle x86_64

But I'm not sure if that means much. Are your values always offset in same direction, or random?

astrofrog · 2014-11-15T17:07:15Z

I have to run for now so won't have time to look into astrolib issue further, but let me know if there are any tests you want me to do here!

cdeil · 2014-11-15T17:32:23Z

Are you using this astrolib?
https://pypi.python.org/pypi/astrolib.coords/0.39.5

My astrolib was also compiled as 64 bit and the tests all pass:

$ python -c 'import astrolib.coords; astrolib.coords._test()'
.................S.........................S...S.........
----------------------------------------------------------------------
Ran 57 tests in 0.009s

OK (SKIP=3)
$ file ./env/lib/python2.7/site-packages/astrolib/coords/_pytpm.so
./env/lib/python2.7/site-packages/astrolib/coords/_pytpm.so: Mach-O 64-bit bundle x86_64

This is the diff I get with respect to your version is here:
https://gist.github.com/cdeil/7bf8071879a2566866af

Manually looking at output/summary_matrix.html and blinking with yours I see that only ecliptic results are different and that the differences are wrt. all other transforms and at the ~ 1 milli-arcsec level.

Maybe we should just list this as a "known issue" in the README for now ... I don't know a quick way to debug / fix this issue and it's not really important ... we mainly care for astropy.coordinates, right?

astrofrog · 2014-11-15T17:36:10Z

@cdeil - I'm using that too. I agree we don't really care in this case, at least not until we start implementing the ecliptic transformation.

cdeil · 2014-11-15T17:37:38Z

Maybe relevant: #11 (comment)

cdeil · 2015-01-22T08:02:45Z

I'm closing this old issue.

The conclusion is that we simply store results with a well-chosen number of digits in ascii tables and then use git diff to notice when something changes.

All that's left to be done in my opinion is #45

cdeil changed the title ~~Add diff summary when re-processing results~~ Add automatic checks that results don't change a lot when re-processing Nov 15, 2014

cdeil mentioned this issue Jan 18, 2015

astrolib celestial results are "random" #44

Closed

cdeil closed this as completed Jan 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic checks that results don't change a lot when re-processing #32

Add automatic checks that results don't change a lot when re-processing #32

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Jan 22, 2015

Add automatic checks that results don't change a lot when re-processing #32

Add automatic checks that results don't change a lot when re-processing #32

Comments

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

astrofrog commented Nov 15, 2014

cdeil commented Nov 15, 2014

cdeil commented Jan 22, 2015