Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent build steps #26

Open
jimafisk opened this issue May 15, 2020 · 13 comments
Open

Concurrent build steps #26

jimafisk opened this issue May 15, 2020 · 13 comments
Labels
enhancement New feature or request

Comments

@jimafisk
Copy link
Member

Build steps:

  • Eject temporary files
  • Direct copy .js files to build dir
  • Collect data sources from /content
  • Collect svelte templates from /layout
  • Execute the build.js script with NodeJS
  • Run Gopack to convert files to support ESM
  • Delete core files that weren't ejected by the user

These should be broken into goroutines so they run concurrently. If one step relies on a return value from another, we can use channels to block that thread until that data is available. This should speed up build times.

@jimafisk jimafisk added the enhancement New feature or request label Jul 12, 2020
@jimafisk jimafisk mentioned this issue Jan 28, 2021
@padraicbc
Copy link
Contributor

@jimafisk I have been benchmarking Build and unsurprisingly there is one standout bottleneck. runtime.cgocall from the v8go calls into C.RunScript etc... Some pprof output:

 flat  flat%   sum%        cum   cum%
 2.04s 48.92% 48.92%      2.04s 48.92%  [plenorig]
 1.37s 32.85% 81.77%      1.38s 33.09%  runtime.cgocall
 0.23s  5.52% 87.29%      0.27s  6.47%  regexp.(*machine).add
 0.17s  4.08% 91.37%      0.34s  8.15%  regexp.(*machine).step
 0.06s  1.44% 92.81%      0.06s  1.44%  [libc-2.32.so]
 0.06s  1.44% 94.24%      0.51s 12.23%  regexp.(*machine).match
 0.05s  1.20% 95.44%      0.05s  1.20%  runtime.memmove
 0.04s  0.96% 96.40%      0.04s  0.96%  runtime.usleep
 0.03s  0.72% 97.12%      0.03s  0.72%  [libstdc++.so.6.0.28]
 0.03s  0.72% 97.84%      0.03s  0.72%  syscall.Syscall
 0.01s  0.24% 96.84%      0.03s  0.73%  runtime.mallocgc
     0     0% 96.84%      2.35s 57.04%  main.main
  • If you do a list plenti/cmd/build.compileSvelte in pprof the most expensve call is ctx.RunScript("var { js, css } = svelte.compile(`"+componentStr+"`, {css: false, hydratable: true});", "compile_svelte") .

  • There are some small general speed ups:

    • Using go routines but there needs to be a lot of care making sure dependant steps finish in the correct order plus strange things can happen with CGo and go routines.
    • Calling regexp.MustCompile once as opposed to having them nested in functions repeatedly compiling each function call. compileSvelte has a good few examples.
    • Doing as much as possible in a single call to Runscript , maybe concatenating css.code and js.code using a unique delimiter then splitting in go.

I think by far the best long term approach is incremental builds #130 i.e only building exactly what is absolutely necessary.
Also seperating out concerns a bit maybe would speed things up and make local dev a nicer experience.
Currently Build behaves pretty much the same across the board whether local dev with plenti serve or plenti build to generate production files.
If parts can be omitted for local vs prod etc.. then the logic/code for those should maybe be separated out and opt require opt in using flags/config.
Lastly, anywhere a CGo call can be be avoided or at least not repeated it should be!

@jimafisk
Copy link
Member Author

Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (8c98790) back when I was executing node scripts. I tried it again later (c2b7b2a) but had some issues so I removed it (8a5693b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.

Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.

Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

@padraicbc
Copy link
Contributor

padraicbc commented Jan 31, 2021

For incremental builds the simplest might be to store the last modified time or a hash of the content and rebuild anything that violated whichever/maybe both of those. Hashing is reasonably expensive but it could be done concurrently and due to the high cost of compiling it should still be worth it. Generally in production if you are generating static files you won't be changing a large percentage each build so it should be reasonably efficient? The trickiest part is keeping tabs on relationships, what needs to be rebuilt on content changing etc.

For local dev we get the actual file names in Watch on change so once we have the name we should be able to just compile a very small subset of files. Again I still need to know what subset of files need rebuilding.

For anything to work we would need to break the build process out into smaller parts. I have started moving logic into separate functions here. There are a few optimizations also like compiling regex once and using sync.Once here to speed things up a little for local dev by creating some reused logic just once.

I am open to any ideas. You have a far better understanding of how everything works.

@s-kris
Copy link

s-kris commented Apr 30, 2021

Yeah compiling Svelte components is definitely the slowest part, thanks for benchmarking to verify. I used to do this concurrently (8c98790) back when I was executing node scripts. I tried it again later (c2b7b2a) but had some issues so I removed it (8a5693b). I think the performance enhancements you listed are spot on, I'll try to spend some time with that.

Incremental builds would be awesome, do you have a sense for how you would approach that? That would make the local dev experience a lot nicer, and I'm all in for anything that can speed up builds. Getting the full build faster will also be important so the feedback loop when running in CI is as quick as possible, but that's more of a prod issue like you said.

Good point on not repeating cgo calls. I was mainly focusing on just getting it working initially, but as you pointed out there are places where things are being run multiple times that can be optimized. Ultimately I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

Hey @jimafisk
You are doing great work and I've been observing plenti for sometime. I'm using elderjs right now and had done some benchmarks recently for Hugo, plenti, zola, and eleventy. Plenti didn't scale well (understandable) and I think concurrency can help vastly. I just love svelte! So, I'm resisting myself from rebuilding everything with zola + Tera.

I don't have any experience in go or rust. Can I know what issues were you facing with concurrent steps that you mentioned here?
Thank you.

@jimafisk
Copy link
Member Author

Thanks @s-kris! Are you able to share your benchmarks? I'd love to see them, even if Plenti falls flat a bit :). Concurrency would definitely speed things up, although there are some challenges to doing that with the way we're compiling things in v8. Basically we're loading everything into one giant context, which is not goroutine safe: rogchap/v8go#120

For your local dev, if you're on v0.4.13+ you should be able to take advantage of in-memory builds and live-reload to speed things up a bit when making changes after the initial build: plenti serve -ML

I'd love to avoid cgo altogether, but until someone builds a go-based svelte compiler it's probably something we'll just have to work with.

That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg

@s-kris
Copy link

s-kris commented Apr 30, 2021

Sure :)
All are on latest versions. Benchmarks ran on MacBook air(2017) with 8gb ram.

Here are the benchmarks: (Note: Plenti threw error: out of memory after 420 secs for 10k pages.)

_build time (sec)

Same graph without 10k pages data point:

_build time (sec) (1)

Numbers:

Capture

That won't happen overnight, but it's something we're thinking about: https://youtu.be/Ql6cZJ-Udkg

I was reading about your brainstorm discussion on #130. Thanks for the youtube link :)

Edit:

Some more thoughts here https://twitter.com/_skris/status/1388259901080621056?s=20

@jimafisk
Copy link
Member Author

jimafisk commented May 4, 2021

This is super interesting, thanks for sharing @s-kris! I was surprised by the elder/hugo crossover for large sites, might be something for us to look into there. Our bottleneck is probably happening when we render out the HTML fallbacks in V8. Any chance you can share the repo for the benchmark tests so I can debug a bit? Thank you!

@s-kris
Copy link

s-kris commented May 4, 2021

Thank you @jimafisk.
Here's the repo https://github.com/Elderjs/elderjs

@jimafisk
Copy link
Member Author

jimafisk commented May 4, 2021

Sorry I was clear, do you still have the plenti repo with all the variable number of pages @s-kris? Thanks!

@s-kris
Copy link

s-kris commented May 4, 2021

ah! It was nothing custom.

  1. I just setup a brand new plenti site.
  2. I wrote a small script to write some sample content to 10,000 files

haha!

@jimafisk
Copy link
Member Author

Tried building with different size sites (e.g. for 1,000 pages: tee about-{001..1000}.json < about.json >/dev/null)

Number of pages Build time
1,000 13.689803222s
2,000 52.887924446s
3,000 1m52.564471876s
4,000 3m15.451195416s
5,000 Timeout (see error below)
Error message
<--- Last few GCs --->

[470406:0x17ec00000000]   297579 ms: Mark-sweep 1361.2 (1447.8) -> 1356.1 (1444.4) MB, 7.0 / 0.1 ms  (average mu = 0.777, current mu = 0.821) allocation failure GC in old space requested
[470406:0x17ec00000000]   297675 ms: Mark-sweep 1371.0 (1448.1) -> 1361.9 (1448.6) MB, 17.2 / 0.1 ms  (average mu = 0.807, current mu = 0.821) allocation failure scavenge might not succeed


<--- JS stacktrace --->


#
# Fatal javascript OOM in Reached heap limit
#

SIGILL: illegal instruction
PC=0x13cff29 m=7 sigcode=2
instruction bytes: 0xf 0xb 0x55 0x48 0x89 0xe5 0xe8 0xdc 0xb9 0x3c 0xff 0x66 0x2e 0xf 0x1f 0x84

goroutine 0 [idle]:
runtime: unknown pc 0x13cff29
stack: frame={sp:0x7fd6e09adec8, fp:0x0} stack=[0x7fd6e01b3248,0x7fd6e09b2e48)
00007fd6e09addc8:  0000000000000001  00007fd6e09b2440 
00007fd6e09addd8:  000000000213b605  00007fd6e09adec0 
00007fd6e09adde8:  00000000013d057b  00007fd6e09b2440 
00007fd6e09addf8:  0000003000000018  00007fd6e09aded0 
00007fd6e09ade08:  00007fd6e09ade10  0000000000000000 
00007fd6e09ade18:  000000000210dc7e  000000000213b605 
00007fd6e09ade28:  0000000000000000  000000000000001a 
00007fd6e09ade38:  000000000000001a  00007fd6e09b3700 
00007fd6e09ade48:  0000000000000000  0000000000000192 
00007fd6e09ade58:  0000000000000192  0000000000000000 
00007fd6e09ade68:  0000000000000000  0000000000000000 
00007fd6e09ade78:  0000000000000000  0000000000000000 
00007fd6e09ade88:  0000000000000000  0000000000000000 
00007fd6e09ade98:  0000000000000000  0000000000000000 
00007fd6e09adea8:  0000000000000000  0000000000000000 
00007fd6e09adeb8:  0000000000000000  00007fd6e09adee0 
00007fd6e09adec8: <0000000000dbcc3e  000017ec00000000 
00007fd6e09aded8:  000017ec00000000  00007fd6e09b2890 
00007fd6e09adee8:  0000000000dbcf8f  00007fd6e09adfb8 
00007fd6e09adef8:  00007fd6e09b2650  0000000000000000 
00007fd6e09adf08:  ffffffffdecade00  000000000001c3b8 
00007fd6e09adf18:  000000000001ceec  0000000000000000 
00007fd6e09adf28:  0000000000f7bb00  00000000003d0704 
00007fd6e09adf38:  00000000003e8dac  000000000001b0a0 
00007fd6e09adf48:  000000000004a000  00000000000074b8 
00007fd6e09adf58:  000000000003edd8  00000000583d7000 
00007fd6e09adf68:  0000000000000000  0000000000003a00 
00007fd6e09adf78:  0000000000000000  0000000000000000 
00007fd6e09adf88:  0000000000000000  000000000000004b 
00007fd6e09adf98:  000000005a899000  000000005a899000 
00007fd6e09adfa8:  0000000000002000  00000000005d7e50 
00007fd6e09adfb8:  ffffffffdecade01  00007fd6e09adf08 
runtime: unknown pc 0x13cff29
stack: frame={sp:0x7fd6e09adec8, fp:0x0} stack=[0x7fd6e01b3248,0x7fd6e09b2e48)
00007fd6e09addc8:  0000000000000001  00007fd6e09b2440 
00007fd6e09addd8:  000000000213b605  00007fd6e09adec0 
00007fd6e09adde8:  00000000013d057b  00007fd6e09b2440 
00007fd6e09addf8:  0000003000000018  00007fd6e09aded0 
00007fd6e09ade08:  00007fd6e09ade10  0000000000000000 
00007fd6e09ade18:  000000000210dc7e  000000000213b605 
00007fd6e09ade28:  0000000000000000  000000000000001a 
00007fd6e09ade38:  000000000000001a  00007fd6e09b3700 
00007fd6e09ade48:  0000000000000000  0000000000000192 
00007fd6e09ade58:  0000000000000192  0000000000000000 
00007fd6e09ade68:  0000000000000000  0000000000000000 
00007fd6e09ade78:  0000000000000000  0000000000000000 
00007fd6e09ade88:  0000000000000000  0000000000000000 
00007fd6e09ade98:  0000000000000000  0000000000000000 
00007fd6e09adea8:  0000000000000000  0000000000000000 
00007fd6e09adeb8:  0000000000000000  00007fd6e09adee0 
00007fd6e09adec8: <0000000000dbcc3e  000017ec00000000 
00007fd6e09aded8:  000017ec00000000  00007fd6e09b2890 
00007fd6e09adee8:  0000000000dbcf8f  00007fd6e09adfb8 
00007fd6e09adef8:  00007fd6e09b2650  0000000000000000 
00007fd6e09adf08:  ffffffffdecade00  000000000001c3b8 
00007fd6e09adf18:  000000000001ceec  0000000000000000 
00007fd6e09adf28:  0000000000f7bb00  00000000003d0704 
00007fd6e09adf38:  00000000003e8dac  000000000001b0a0 
00007fd6e09adf48:  000000000004a000  00000000000074b8 
00007fd6e09adf58:  000000000003edd8  00000000583d7000 
00007fd6e09adf68:  0000000000000000  0000000000003a00 
00007fd6e09adf78:  0000000000000000  0000000000000000 
00007fd6e09adf88:  0000000000000000  000000000000004b 
00007fd6e09adf98:  000000005a899000  000000005a899000 
00007fd6e09adfa8:  0000000000002000  00000000005d7e50 
00007fd6e09adfb8:  ffffffffdecade01  00007fd6e09adf08 

goroutine 1 [syscall]:
runtime.cgocall(0xdaa750, 0xc00099b1e8, 0x7fd6c439ff80)
	/usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc00099b1b8 sp=0xc00099b180 pc=0x7a713b
rogchap.com/v8go._Cfunc_ValueToString(0x7fd6c439ff80, 0x0)
	_cgo_gotypes.go:1441 +0x45 fp=0xc00099b1e8 sp=0xc00099b1b8 pc=0x8e0e85
rogchap.com/v8go.(*Value).String.func1(0xc000060ec0, 0x0)
	/home/jimafisk/go/pkg/mod/rogchap.com/[email protected]/value.go:218 +0x55 fp=0xc00099b220 sp=0xc00099b1e8 pc=0x8e88b5
rogchap.com/v8go.(*Value).String(0xc000060ec0, 0x0, 0x0)
	/home/jimafisk/go/pkg/mod/rogchap.com/[email protected]/value.go:218 +0x48 fp=0xc00099b260 sp=0xc00099b220 pc=0x8e4b48
github.com/plentico/plenti/cmd/build.createHTML(0xc0009bf368, 0x5, 0xc0007f3f90, 0xa, 0xc0009bf5e0, 0x1c, 0xc00095f200, 0x2c2, 0xc0009bf36e, 0xf, ...)
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build/data_source.go:335 +0x178 fp=0xc00099b390 sp=0xc00099b260 pc=0xb1f258
github.com/plentico/plenti/cmd/build.DataSource(0xc000031d60, 0x6, 0xc000031d36, 0x6, 0x26d3dd8, 0x1, 0x0, 0x0, 0x0, 0xbb8, ...)
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build/data_source.go:141 +0x878 fp=0xc00099b7f8 sp=0xc00099b390 pc=0xb1c318
github.com/plentico/plenti/cmd.Build(0x0, 0x0)
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build.go:139 +0x5fd fp=0xc00099bb00 sp=0xc00099b7f8 pc=0xd9e39d
github.com/plentico/plenti/cmd.glob..func1(0x26a5d60, 0x274af38, 0x0, 0x0)
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/build.go:44 +0x25 fp=0xc00099bb20 sp=0xc00099bb00 pc=0xda3c65
github.com/plentico/plenti/cmd.glob..func4(0x26a5d60, 0x274af38, 0x0, 0x0)
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/serve.go:75 +0xa3e fp=0xc00099bd78 sp=0xc00099bb20 pc=0xda53be
github.com/spf13/cobra.(*Command).execute(0x26a5d60, 0x274af38, 0x0, 0x0, 0x26a5d60, 0x274af38)
	/home/jimafisk/go/pkg/mod/github.com/spf13/[email protected]/command.go:856 +0x2c2 fp=0xc00099be38 sp=0xc00099bd78 pc=0xb83e22
github.com/spf13/cobra.(*Command).ExecuteC(0x26a5ae0, 0xc000000180, 0x200000003, 0xc000000180)
	/home/jimafisk/go/pkg/mod/github.com/spf13/[email protected]/command.go:960 +0x375 fp=0xc00099bf18 sp=0xc00099be38 pc=0xb84b55
github.com/spf13/cobra.(*Command).Execute(...)
	/home/jimafisk/go/pkg/mod/github.com/spf13/[email protected]/command.go:897
github.com/plentico/plenti/cmd.Execute()
	/home/jimafisk/Projects/jantcu/plentico/plenti/cmd/root.go:60 +0x31 fp=0xc00099bf78 sp=0xc00099bf18 pc=0xd9fbd1
main.main()
	/home/jimafisk/Projects/jantcu/plentico/plenti/main.go:7 +0x25 fp=0xc00099bf88 sp=0xc00099bf78 pc=0xda91e5
runtime.main()
	/usr/local/go/src/runtime/proc.go:225 +0x256 fp=0xc00099bfe0 sp=0xc00099bf88 pc=0x7dc756
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc00099bfe8 sp=0xc00099bfe0 pc=0x810101

goroutine 6 [sleep]:
time.Sleep(0x5f5e100)
	/usr/local/go/src/runtime/time.go:193 +0xd2
github.com/briandowns/spinner.(*Spinner).Start.func1(0xc000134630)
	/home/jimafisk/go/pkg/mod/github.com/briandowns/[email protected]/spinner.go:319 +0x53
created by github.com/briandowns/spinner.(*Spinner).Start
	/home/jimafisk/go/pkg/mod/github.com/briandowns/[email protected]/spinner.go:281 +0x87

rax    0x33
rbx    0x17ec00000000
rcx    0x0
rdx    0x7fd6e09b3700
rdi    0x7fd6e09ab710
rsi    0x0
rbp    0x7fd6e09adee0
rsp    0x7fd6e09adec8
r8     0x33
r9     0x33
r10    0x210dca8
r11    0x0
r12    0x17ec00008f50
r13    0x1
r14    0x7fd6e09b2440
r15    0x213b605
rip    0x13cff29
rflags 0x10202
cs     0x33
fs     0x0
gs     0x0

@jimafisk jimafisk mentioned this issue Dec 23, 2021
jimafisk added a commit that referenced this issue May 28, 2023
@jimafisk
Copy link
Member Author

jimafisk commented May 28, 2023

I've been playing with different build structures. Historically, I was operating under the assumption that v8 ctx was expensive, so we were creating one global SSRCtx with all SSR components loaded in, and then adding props and generating HTML from it for every page.

I've recently tried restructuring so each SSR component gets saved to the virtual filesystem (via afero) during the Client step, then for each content source, creating a new v8 context, loading in those components, then adding props and generating HTML, and once that specific page is complete, closing that specific context: d1ba128

As I expected, this seems to slow down the build in general (I don't have the exact numbers but I think generating 1,000 pages went from like 14 seconds to 43 seconds). However, it did allow me to build larger sites without throwing a timeout error. I was able to build over 5,000 pages without timing out, it just takes a very long time to complete:

  • Creating data_source took 10m35.433924064s
  • Total build took 11m14.689474485s

I figured I could speed this up some by adding goroutines, and for small sites it seemed to work. One project I'm working on that usually takes 8 seconds with the old build structure, took 10 seconds with the new build structure + goroutines. However, large sites once again started having issues once goroutines were introduced: rogchap/v8go#347 (comment)

I probably wouldn't trade slightly slower average size sites (low hundreds of pages) for the ability to complete large site builds (thousands of pages), especially if it takes 10 minutes to do so, which isn't practical for most folks. Maybe there is an option of using v8go for compiling the DOM and SSR components, but using Goja to actually render the template strings to HTML.

jimafisk added a commit that referenced this issue May 28, 2023
@jimafisk
Copy link
Member Author

Building the client components concurrently (f64a079) seems to have little effect on the overall build:

Project 1 results: Consecutive wins ✔️ (5 to 1)
Concurrent build (issue-26) Consecutive Build (master)
9.137822741s 8.120130183s
8.476566434s 8.240791791s
9.014339352s 9.008943863s
9.173432853s 8.852069174s
8.53087334s 8.269849894s
8.542125053s 8.812322216s
Project 2 results: Consecutive wins ✔️ (4 to 2)
Concurrent build (issue-26) Consecutive Build (master)
2.225579422s 2.247196698s
2.253613138s 2.38226407s
2.272823551s 2.212533565s
2.343795652s 2.243285895s
2.42717278s 2.366094546s
2.410467855s 2.235620616s

Given these results, I don't intend to convert the architecture in the short term unless we can figure out ways to make improvements.

I also tried doing a concurrent HTML build using goja, and although I never quite got it working (wasn't actually producing HTML), it increased the build time for a project I'm working on from about 8 seconds to 35 - 50 seconds. Not sure if Goja can be viable speed-wise unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants