Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed: 3 different performance timings with same code #9

Open
mfiano opened this issue Aug 8, 2021 · 2 comments
Open

Help needed: 3 different performance timings with same code #9

mfiano opened this issue Aug 8, 2021 · 2 comments

Comments

@mfiano
Copy link

mfiano commented Aug 8, 2021

I am experiencing 3 different performance results consistently with the same code, depending on how I load it. Here is the test code:

(in-package #:cl-user)

(defpackage #:origin2
  (:use #:static-dispatch-cl))

(in-package #:origin2)

(defstruct (aggregate
            (:predicate nil)
            (:copier nil))
  (storage (make-array 0 :element-type 'single-float :initial-element 0.0) :type (simple-array single-float (*))))

(defgeneric ref (sequence index)
  (:method ((sequence aggregate) index)
    (elt (aggregate-storage sequence) index)))

(defun test ()
  (declare (optimize (speed 3)))
  (let ((a (make-aggregate :storage (make-array 3 :element-type 'single-float :initial-contents '(1.0 2.0 3.0)))))
    (declare (type aggregate a))
    (dotimes (i (expt 10 9))
      (ref a 2))))

Clearing FASL's, loading this code into my image on SBCL 2.1.7, and issuing:

(progn
  (sb-ext:gc :full t)
  (time (test)))

I get these results consistently:

Evaluation took:
  7.146 seconds of real time
  7.130177 seconds of total run time (7.130177 user, 0.000000 system)
  99.78% CPU
  20 lambdas converted
  14,232,702,075 processor cycles
  1,080,336 bytes consed

If I now re-compile the #'test function in the same image, and re-run the test I consistently get:

Evaluation took:
  0.350 seconds of real time
  0.351234 seconds of total run time (0.351196 user, 0.000038 system)
  100.29% CPU
  701,597,093 processor cycles
  0 bytes consed

If I wrap the defgeneric in (eval-when (:compile-toplevel :load-toplevel :execute) ...), clear FASL's, loading the code, and re-run the test, I consistently get:

Evaluation took:
  0.583 seconds of real time
  0.579642 seconds of total run time (0.579642 user, 0.000000 system)
  99.49% CPU
  1,157,429,524 processor cycles
  0 bytes consed

I get no warnings from static-dispatch in all 3 cases, and everything compiles cleanly. I would like to know why these differences are occurring, and how I would get the results of the second test above from a clean image (instead of manually recompiling the function after loading). Thank you for your help.

@scymtym
Copy link

scymtym commented Aug 8, 2021

I think the slowness in the first case can be explained by noting the time at which the effects of the defgeneric expansion happen: the ref call in test can be optimized only during compilation of test so only compile-time effects of the defgeneric expansion have already happened when test is compiled (in the first case -- in the other cases, all effects of the defgeneric expansion have already happened when test is compiled).

The defgeneric expansion contains eval-when forms to address the issue, but those only work at the top-level. let and when make their sub-forms non-top-level. This effect can be seen by changing make-static-dispatch to emit

(eval-when (:compile-toplevel :load-toplevel :execute)
	   (print "should be at compile-time")
           (unless (sb-c::info :function :info ',name)
             (sb-c:defknown ,name * * nil :overwrite-fndb-silently t)))

Compiling and loading @mfiano 's example via (load (compile-file "/tmp/dispatch.lisp")) then results in

; compiling file "/tmp/dispatch.lisp" (written 08 AUG 2021 05:43:37 PM):
…
; processing (DEFSTRUCT (AGGREGATE # ...) ...)
; processing (DEFGENERIC REF ...)
; processing (DEFUN TEST ...)

; wrote /tmp/dispatch.fasl
; compilation finished in 0:00:00.036

"should be at compile-time" 
T

That is, the effect happens at load-time, not compile-time.

@alex-gutev
Copy link
Owner

alex-gutev commented Aug 8, 2021

To avoid these difference you should define the generic functions and their methods in a separate file from the code which uses them. As already mentioned by @scymtym, what's happening here is that since they are contained in the same file, the deftransform forms emitted by static-dispatch, which are responsible for the actual inlining, only take effect after the entire file is compiled, that is after the test function, which calls the generic function, is compiled. As a result no inlining takes place in test. On other implementations, where compiler-macros are used, you will get a warning however on SBCL, where deftransform is used there is no warning since the transform isn't even registered yet, and hence is never called.

I don't know what's causing the difference between the second and third case. There shouldn't be any difference unless SBCL is performing more optimizations when the test function is recompiled. I will look into that further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants