-
-
Notifications
You must be signed in to change notification settings - Fork 53
Notes on building alternatives
The current implementation of RE2 bindings uses C++ addons technique. The reason is purely historical: when I started we had no other alternatives. At this moment (March 2024) we have two possible alternatives:
These notes were recorded to document thoughts on alternative implementations, experiments, and attempts made.
The current implementation uses custom C++ bindings written with NaN and compiled by node-gyp.
The problems with this approach in no particular order:
- Compiling by users:
- Users should install all necessary software (Python, a suitable C++ compiler).
- These dependencies are not described by
package.json
. They are "extraband". - If something goes wrong, it is hard to debug.
- The hardware requirements for compiling can exceed what is required to run the final software.
- These dependencies are not described by
- The compilation can take a lot of time making it unsuitable for building short-lived instances.
- Users should install all necessary software (Python, a suitable C++ compiler).
- Precompiled binaries:
- Requires to maintain a build infrastructure.
- A separate build is required for every combination of ABI (usually a major Node version), OS, and processor architecture.
- It takes some time to build all combinations.
- Not all permutations are supported by GitHub Actions runners.
- Some builds have to be produced manually.
- Requires storage to keep all rebuild binaries.
- The storage should be secured.
- Requires a way to retrieve a required binary.
- If the binary is missing or cannot function properly, it should be compiled locally.
- Dependencies:
- Updating dependencies can lead to updating the code of C++ bindings.
- New Node versions can trigger updates in
NaN
, which may require extensive changes in the code. - Debugging of C++ addons is still in the Stone Age.
- Support of alternative platforms:
- WASM
- Pros:
- An addon can be compiled once and shipped for all supported platforms.
- No special build infrastructure.
- No special retrieval of precompiled binaries.
- Supported by web browsers, Deno, and Bun.
- An addon can be compiled once and shipped for all supported platforms.
- Cons:
- The compilation target is interpreted and can be slow when running.
- Support varies. For example, browsers expect an asynchronous loading of WASM files.
- Pros:
- Node-API:
- Pros:
- It does not depend on ABI. Instead, it depends on a Node-API version, which is more conservative and changes less frequently. See Node-API version matrix.
- It used to be that higher versions supported older versions being strict supersets. It is changing now and rules of support become less clear.
- It does not depend on ABI. Instead, it depends on a Node-API version, which is more conservative and changes less frequently. See Node-API version matrix.
- Supported by Bun.
- Cons:
- It still requires
node-gyp
and the associated setup locally. - It still requires a build infrastructure.
- It still requires
- Pros:
There is an alternative RE2 WASM project based on an old version of node-re2
: re2-wasm. Theoretically, it can be used as a benchmarking target for a possible WASM-based implementation of RE2. As of now (March 2024) it has not been updated for 2 years.
I tried to benchmark it against node-re2
using nano-bench
. The results:
$ node bench.js
re: (\w+)+$ s: aaaaaaaaaaaaaaaaaaaaaa!
measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...
"re2": median 581.3ns +12.6ns -9.7ns for 50k iterations in 100 series
"rew": median 10.4μs +8.4μs -1.2μs for 500 iterations in 100 series
The difference is STATISTICALLY SIGNIFICANT!
re: (.*a){30} s: aaaaaaaaaaaaaaaaaaaaaaaaaaaX
measuring (confidence interval: 95%, series: 100, bootstrap samples: 1,000) ...
"re2": median 659ns +83.7ns -8ns for 50k iterations in 100 series
"rew": median 91μs +48μs -20μs for 1 iterations in 100 series
The difference is STATISTICALLY SIGNIFICANT!
As we can see the difference is more than an order of magnitude. For these particular cases, it is 20-150 times slower.
Attempting to increase measurement intervals unexpectedly breaks WASM:
$ node bench2.js
measuring (confidence interval: 95%, series: 1,000, bootstrap samples: 10,000) ...
"re2": median 574.1ns +12.9ns -5.9ns for 50k iterations in 1,000 series
"rew": median 1.94ms +1.42ms -0.67ms for 1 iterations in 1,000 series
The difference is STATISTICALLY SIGNIFICANT!
measuring (confidence interval: 95%, series: 1,000, bootstrap samples: 10,000) ...
Cannot enlarge memory arrays to size 16785408 bytes (OOM). Either (1) compile with -s INITIAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -s ALLOW_MEMORY_GROWTH=1 which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with -s ABORTING_MALLOC=0
/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1393
var e = new WebAssembly.RuntimeError(what);
^
RuntimeError: abort(Cannot enlarge memory arrays to size 16785408 bytes (OOM). Either (1) compile with -s INITIAL_MEMORY=X with X higher than the current value 16777216, (2) compile with -s ALLOW_MEMORY_GROWTH=1 which allows increasing the size at runtime, or (3) if you want malloc to return NULL (0) instead of this abort, compile with -s ABORTING_MALLOC=0 ) at Error
at jsStackTrace (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1675:19)
at stackTrace (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1692:16)
at abort (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1387:44)
at abortOnCannotGrowMemory (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3521:7)
at _emscripten_resize_heap (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3525:7)
at wasm://wasm/003466ba:wasm-function[5377]:0xbfea0
at wasm://wasm/003466ba:wasm-function[5372]:0xbdcde
at wasm://wasm/003466ba:wasm-function[5259]:0xbb288
at wasm://wasm/003466ba:wasm-function[301]:0x8719
at wasm://wasm/003466ba:wasm-function[3163]:0x7ec85
at abort (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:1393:11)
at abortOnCannotGrowMemory (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3521:7)
at _emscripten_resize_heap (/home/eugene/Open/temp-re2/node_modules/re2-wasm/build/wasm/re2.js:3525:7)
at wasm://wasm/003466ba:wasm-function[5377]:0xbfea0
at wasm://wasm/003466ba:wasm-function[5372]:0xbdcde
at wasm://wasm/003466ba:wasm-function[5259]:0xbb288
at wasm://wasm/003466ba:wasm-function[301]:0x8719
at wasm://wasm/003466ba:wasm-function[3163]:0x7ec85
at wasm://wasm/003466ba:wasm-function[3147]:0x7e6d1
at wasm://wasm/003466ba:wasm-function[3138]:0x7e268
Node.js v21.5.0
We can see that boosting the number of series from 100 to 1,000 significantly slows down re2-wasm
for some unknown reason and breaks it later. This alone makes WASM an unsuitable target for mission-critical applications for now (March 2024).