Skip to content
This repository has been archived by the owner on Apr 2, 2023. It is now read-only.

Why is MLton so slow, again? #437

Closed
favonia opened this issue Nov 11, 2017 · 16 comments · Fixed by #470
Closed

Why is MLton so slow, again? #437

favonia opened this issue Nov 11, 2017 · 16 comments · Fixed by #470

Comments

@favonia
Copy link
Contributor

favonia commented Nov 11, 2017

During the development of immortal lines (#431), somehow MLton became unhappy again. I wonder whether it is due to the use of the new polymorphic datatypes.

@MatthewFluet
Copy link
Contributor

Do you have a particular commit that triggers the slow compile?

@favonia
Copy link
Contributor Author

favonia commented Nov 11, 2017

@MatthewFluet It seems the time increases gradually along the commits 2e0e13b, 4d46372, 578d12a, fce66db and 63c0e98 (in the order of time). Not sure what exactly is the cause or the trend.

@favonia
Copy link
Contributor Author

favonia commented Nov 11, 2017

@MatthewFluet I tried to pin down the change that triggered the slow compile but did not have any conclusive finding.

@favonia
Copy link
Contributor Author

favonia commented Nov 13, 2017

@MatthewFluet Is there something I can do to pin down the cause? I suspect it's because of the introduction of

datatype ('a, 'b) binder = DIM of 'a | TERM of ('a * 'b)

@MatthewFluet
Copy link
Contributor

If the slowdown was all in a single pass, then we can usually work out a reason. But, if it is just generally slower throughout the compilation, or just within the native codegen, then it is a lot harder to pin down. I wouldn't expect the introduction of a datatype to have significant effect; perhaps you changed the datatype from a unary tycon to a binary tycon and are worried that it has caused blowup in monomorphisation? But, that would probably manifest as slowdown throughout the compilation, since the monomorphisation happens very early in the compilation. What I saw previously, mentioned in MLton/mlton#196, was some huge case dispatches, which just led to massive numbers of basic blocks in some functions which seemed to be bad (for both the native codegen and for the C codegen).

@favonia
Copy link
Contributor Author

favonia commented Nov 13, 2017

@MatthewFluet Thanks. Yes, I was worried about blowup in monomorphisation. Essentially it was changed from (something option list * abt) to (something option list, abt) binder and from (something option * abt) to (something option, abt) binder to account for more features.

@jonsterling
Copy link
Contributor

For what it's worth, whatever configuration is triggered when we do the profiling build seems to go much faster than the ordinary build (try ./script/profile.sh). At least on my machine...

@favonia
Copy link
Contributor Author

favonia commented Nov 20, 2017

Is it possible to skip almost all the optimization (on travis)? I am not sure how to do that from the help message of mlton...

@MatthewFluet
Copy link
Contributor

You can use -drop-pass <re> to skip optimization passes matching a regular expression. However, this doesn't always reduce compile times --- because we end up skipping dead code elimination and other simplifications that the resulting required passes (e.g., codegen) run slower. Similarly, insufficient inlining means that every + operation is a full function call/return and the resulting programs are unusably slow.

It would be a nice project to develop a good "-O0" collection of passes that make a good tradeoff in compile time and run time.

@MatthewFluet
Copy link
Contributor

That's curious about profiling leading to faster builds --- usually the addition of profiling code into the program inhibits optimizations and slows things down.

@jonsterling
Copy link
Contributor

That's curious about profiling leading to faster builds --- usually the addition of profiling code into the program inhibits optimizations and slows things down.

It's very strange indeed! The build goes about 15x faster with profiling turned on. The built code itself is a bit slower of course...

@favonia
Copy link
Contributor Author

favonia commented Nov 20, 2017

So... the conclusion right now is to enable profiling on travis to speed up the compilation?

@jonsterling
Copy link
Contributor

BTW @MatthewFluet I tried running MLton with -drop-pass '.*' and I get a weird error in closure conversion:

         closureConvert starting
            flow analysis starting
            flow analysis raised in 0.00 + 0.00 (nan% GC)
            flow analysis raised: Fail
         closureConvert raised in 0.00 + 0.00 (nan% GC)
         closureConvert raised: Fail
      pre codegen raised in 6.78 + 3.17 (32% GC)
      pre codegen raised: Fail
   Compile SML raised in 6.78 + 3.17 (32% GC)
   Compile SML raised: Fail
MLton raised in 6.79 + 3.17 (32% GC)
MLton raised: Fail
ClosureConvert.loopDec: strange dec

Any idea what might cause that? Or maybe there is some better more finely-tuned collection of passes I should drop.

@MatthewFluet
Copy link
Contributor

-drop-pass '.*' won't quite work, because some non-optional passes are still under the control of the wrappers that drop passes. In this particular case, you've dropped the implementExceptions SXML pass, so there are unexpected exception <con> [of <ty>] declarations left in the program, which closureConvert is not prepared to deal with.

@jonsterling
Copy link
Contributor

@MatthewFluet Ahh, that explains it 😄

@MatthewFluet
Copy link
Contributor

Attached are the -drop-pass <name> options that disable all optional passes: drop.txt

And, here is the comparison on a simple hello-world program:

[matthew@shadow tmp]$ cat z.sml 
val () = print "Hello, world!\n"
[matthew@shadow tmp]$ mlton
MLton 20130715 (built Sat Feb 11 06:53:44 UTC 2017 on buildvm-29.phx2.fedoraproject.org)
[matthew@shadow tmp]$ /usr/bin/time mlton z.sml 
1.69user 0.36system 0:02.06elapsed 99%CPU (0avgtext+0avgdata 678168maxresident)k
0inputs+424outputs (0major+421459minor)pagefaults 0swaps
[matthew@shadow tmp]$ /usr/bin/time ./z 
Hello, world!
0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1896maxresident)k
0inputs+0outputs (0major+81minor)pagefaults 0swaps
[matthew@shadow tmp]$ /usr/bin/time mlton $(cat drop.txt) z.sml
9.84user 0.89system 0:10.77elapsed 99%CPU (0avgtext+0avgdata 1072052maxresident)k
0inputs+5584outputs (0major+963508minor)pagefaults 0swaps
[matthew@shadow tmp]$ /usr/bin/time ./z
^C840.92user 205.97system 47:32.22elapsed 36%CPU (0avgtext+0avgdata 31646504maxresident)k
393892352inputs+0outputs (11572217major+97157205minor)pagefaults 0swaps

Without any optimizations, the program takes 5X longer to compile and runs for more than 45min (yes 45min!) and uses 35G (yes, 35G!) heap, at which point we are GC thrashing and I killed it.

MatthewFluet added a commit to MatthewFluet/sml-redprl that referenced this issue Nov 21, 2017
Closes RedPRL#437

MLton's `polyvariance` optimization duplicates small higher-order
functions.

One (unresolved, though usually not significant) issue is that
polyvariance can duplicate code local to a function, even if it
doesn't depend on the higher-orderness (see
http://www.mlton.org/pipermail/mlton-devel/2002-January/021211.html).
This seems consistent with the discussion at MLton/mlton#136.
MatthewFluet added a commit to MatthewFluet/sml-redprl that referenced this issue Nov 21, 2017
Closes RedPRL#437

MLton's `polyvariance` optimization duplicates small higher-order
functions.

One (unresolved, though usually not significant) issue is that
polyvariance can duplicate code local to a function, even if it
doesn't depend on the higher-orderness (see
http://www.mlton.org/pipermail/mlton-devel/2002-January/021211.html).
This seems consistent with the discussion at MLton/mlton#196.
jonsterling pushed a commit that referenced this issue Nov 21, 2017
Closes #437

MLton's `polyvariance` optimization duplicates small higher-order
functions.

One (unresolved, though usually not significant) issue is that
polyvariance can duplicate code local to a function, even if it
doesn't depend on the higher-orderness (see
http://www.mlton.org/pipermail/mlton-devel/2002-January/021211.html).
This seems consistent with the discussion at MLton/mlton#196.
@favonia favonia mentioned this issue Jan 13, 2018
3 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants