Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed wasm by encoding binary data as UTF-8 code points. #21478

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

juj
Copy link
Collaborator

@juj juj commented Mar 6, 2024

Quick and dirty test to see what the CI says.

Possible alternative to #21426.

@juj juj force-pushed the binary_encode branch from 6585ad0 to 07421f2 Compare March 6, 2024 09:43
@msqr1
Copy link

msqr1 commented Mar 6, 2024

Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?

@juj
Copy link
Collaborator Author

juj commented Mar 6, 2024

Utf8? I like the idea, it is just that doesn't utf8 takes 2 bytes for cp > 127? If I have really bad luck, the file size will be big?

Since the contents of the wasm file will contain wasm opcodes, the byte distribution is going to be very determinstic. So I would expect a large Unity Boat Attack .wasm file to be as representative of a typical opcode distribution as any other Wasm file.

Also since the hypothesis was that gzip/brotli compress byte-aligned data better than non-byte aligned data, then presumably it won't matter as much that some of the input bytes expand out to two output bytes. (I think they could expand out to e.g. four output bytes as well, and gzip/brotli would likely pick up on such patterns)

Though like I mentioned in the other thread, if the file size on disk before the web server gz/br compression is important, then the expansion to two bytes will show up there.

@juj
Copy link
Collaborator Author

juj commented Mar 6, 2024

Looks like other tests pass, but CircleCI gets a failure in core2:

Call parameter type does not match function signature!
  %a.var = alloca ptr addrspace(10), align 1, addrspace(1)
 ptr  call void @llvm.lifetime.start.p0(i64 1, ptr addrspace(1) %a.var) #9
Call parameter type does not match function signature!
  %a.var = alloca ptr addrspace(10), align 1, addrspace(1)
 ptr  call void @llvm.lifetime.end.p0(i64 1, ptr addrspace(1) %a.var) #9
in function __original_main
fatal error: error in backend: Broken function found, compilation aborted!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/root/project/test/core/test_externref_emjs.c'.
4.	Running pass 'Module Verifier' on function '@__original_main'
 #0 0x00007fa15de63d38 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc0d38)
 #1 0x00007fa15de6189e llvm::sys::RunSignalHandlers() (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbe89e)
 #2 0x00007fa15de630bf llvm::sys::CleanupOnSignal(unsigned long) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dc00bf)
 #3 0x00007fa15dda0c07 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #4 0x00007fa15dda0b9f llvm::CrashRecoveryContext::HandleExit(int) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb9f)
 #5 0x00007fa15de5e5e7 llvm::sys::Process::Exit(int, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1dbb5e7)
 #6 0x0000555b23422473 (/root/emsdk/upstream/bin/clang+0x15473)
 #7 0x00007fa15ddb2efc llvm::report_fatal_error(llvm::Twine const&, bool) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fefc)
 #8 0x00007fa15ddb2de6 (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1d0fde6)
 #9 0x00007fa15e0bd210 void llvm::VerifierSupport::WriteTs<llvm::Instruction*, llvm::MDNode const*>(llvm::Instruction* const&, llvm::MDNode const* const&) Verifier.cpp:0:0
#10 0x00007fa15e0123b6 llvm::FPPassManager::runOnFunction(llvm::Function&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6f3b6)
#11 0x00007fa15e01a992 llvm::FPPassManager::runOnModule(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f77992)
#12 0x00007fa15e012f29 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1f6ff29)
#13 0x00007fa1639b40b5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__2::unique_ptr<llvm::raw_pwrite_stream, std::__2::default_delete<llvm::raw_pwrite_stream>>, clang::BackendConsumer*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x2e270b5)
#14 0x00007fa163de3356 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3256356)
#15 0x00007fa1625717d9 clang::ParseAST(clang::Sema&, bool, bool) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x19e47d9)
#16 0x00007fa1645ddc0f clang::FrontendAction::Execute() (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3a50c0f)
#17 0x00007fa164553dad clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x39c6dad)
#18 0x00007fa1646684af clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3adb4af)
#19 0x0000555b234211d4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/root/emsdk/upstream/bin/clang+0x141d4)
#20 0x0000555b2341ea08 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0
#21 0x00007fa1641f2b39 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const::$_0>(long) Job.cpp:0:0
#22 0x00007fa15dda0b36 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/root/emsdk/upstream/bin/../lib/libLLVM.so.19.0git+0x1cfdb36)
#23 0x00007fa1641f2392 clang::driver::CC1Command::Execute(llvm::ArrayRef<std::__2::optional<llvm::StringRef>>, std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>*, bool*) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3665392)
#24 0x00007fa1641b4671 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627671)
#25 0x00007fa1641b4c5e clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&, bool) const (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x3627c5e)
#26 0x00007fa1641d330d clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__2::pair<int, clang::driver::Command const*>>&) (/root/emsdk/upstream/bin/../lib/libclang-cpp.so.19.0git+0x364630d)
#27 0x0000555b2341de43 clang_main(int, char**, llvm::ToolContext const&) (/root/emsdk/upstream/bin/clang+0x10e43)
#28 0x0000555b2342c737 main (/root/emsdk/upstream/bin/clang+0x1f737)
#29 0x00007fa15b71db97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#30 0x0000555b2341ab2a _start (/root/emsdk/upstream/bin/clang+0xdb2a)
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 19.0.0git (https:/github.com/llvm/llvm-project 33e312137b065ba330b187f56ddd60df70927241)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /root/emsdk/upstream/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.c
clang: note: diagnostic msg: /tmp/test_externref_emjs-5d510e.sh
clang: note: diagnostic msg: 

********************
emcc: �[31merror: �[0m'/root/emsdk/upstream/bin/clang -target wasm32-unknown-emscripten -fignore-exceptions -fPIC -fvisibility=default -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr --sysroot=/root/cache/sysroot -DEMSCRIPTEN -Werror=implicit-function-declaration -Xclang -iwithsysroot/include/fakesdl -Xclang -iwithsysroot/include/compat -Werror -fsanitize=address -Wno-unused-command-line-argument -mreference-types /root/project/test/core/test_externref_emjs.c -c -o /tmp/emtest_w5r1ff8r/emscripten_temp_tftngyhr/test_externref_emjs_0.o' failed (returned 1)
None
None
test_externref_emjs_dynlink (test_core.asan) ... FAIL

======================================================================
FAIL [0.001s]: test_externref_emjs_dynlink (test_core.asan)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/root/project/test/common.py", line 630, in resulting_test
    return func(self, *args)
  File "/root/project/test/common.py", line 264, in decorated
    return func(self, *args, **kwargs)
  File "/root/project/test/common.py", line 129, in decorated
    func(self, *args, **kwargs)
  File "/root/project/test/test_core.py", line 9570, in test_externref_emjs
    self.do_core_test('test_externref_emjs.c')
  File "/root/project/test/test_core.py", line 367, in do_core_test
    self.do_run_in_out_file_test(Path('core', testname), **kwargs)
  File "/root/project/test/common.py", line 1609, in do_run_in_out_file_test
    output = self._build_and_run(srcfile, expected, **kwargs)
  File "/root/project/test/common.py", line 1631, in _build_and_run
    output_basename=output_basename)
  File "/root/project/test/common.py", line 1093, in build
    self.run_process(cmd, stderr=self.stderr_redirect if not DEBUG else None)
  File "/root/project/test/common.py", line 1430, in run_process
    self.fail(f'subprocess exited with non-zero return code({e.returncode}): `{shared.shlex_join(cmd)}`')
  File "/usr/lib/python3.6/unittest/case.py", line 670, in fail
    raise self.failureException(msg)
AssertionError: subprocess exited with non-zero return code(1): `/root/project/emcc /root/project/test/core/test_externref_emjs.c -o test_externref_emjs.js -sNO_DEFAULT_TO_CXX -sALLOW_MEMORY_GROWTH -sMAIN_MODULE=2 -Wclosure -Werror -Wno-limited-postlink-optimizations -fsanitize=address --profiling -Wno-unused-command-line-argument -mreference-types`

----------------------------------------------------------------------

I am unable to reproduce that failure locally on my Windows system.

At a first glance this failure looks like something that should not be caused by this PR(?), since this PR only affects later down the pipe, but I am not 100% sure yet.

In any case, since that is the only failure, this PR should now look good enough to have eyeballs to review as a first pass.

In particular everything is now validated to work. The notable thing here is that this scheme requires <meta charset='utf-8'> in the .html file, but that is something that our shells already have had for a long while, and I think all good behaving .html files also do (it would be extremely rare to want to do any other encoding than utf-8)

This PR still has the feature behind an extra setting. Would we want to remove that setting, and always have this on? Or maybe an extra setting might be good, so people have a way to opt out in case odd situations arise at first pass (such as sublimehq/sublime_text#6320 or google/closure-compiler#4159)

@curiousdannii
Copy link
Contributor

If you've found 2 bugs with tools/IDEs within the first day, then we should expect there to be more. It seems quite risky.

@juj
Copy link
Collaborator Author

juj commented Mar 7, 2024

Yeah, having an opt-out setting for a while would be good from that perspective.

src/preamble.js Show resolved Hide resolved
@@ -606,6 +606,8 @@ def closure_compiler(filename, advanced=True, extra_closure_args=None):
args += ['--language_out', 'NO_TRANSPILE']
# Tell closure never to inject the 'use strict' directive.
args += ['--emit_use_strict=false']
# Always output UTF-8 files, this helps generate UTF-8 code points instead of escaping code points with \uxxxx inside strings. https://github.com/google/closure-compiler/issues/4158
args += ['--charset=UTF8']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this land separately too if possible, along with a test that closure no longer produces \uxxx for UTF-8 strings in the input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still love to see this change (along with the test it effects) land separately, but I won't force you to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait the change in file size you are referring to is due to SINGLE_FILE_BINARY_ENCODE, right?

I was wondering if this change would effect other tests.. In theory it should effect any JS code which includes a UTF-8 string, right? Regardless of SINGLE_FILE usage

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this change would effect other tests.. In theory it should effect any JS code which includes a UTF-8 string, right? Regardless of SINGLE_FILE usage

Yeah, this would specify the encoding to be UTF-8 for all output when pushed through Closure.

@juj juj force-pushed the binary_encode branch from 94c3ec1 to d10c38c Compare March 7, 2024 19:18
@msqr1
Copy link

msqr1 commented Mar 15, 2024

Is this working. Can I try it?

@juj
Copy link
Collaborator Author

juj commented Mar 21, 2024

Is this working. Can I try it?

Yeah, looks like the CI now passes (there were some intermittent CI failures last week), so this should be good to land.

@@ -4851,7 +4853,7 @@ def test_single_file_locate_file(self):
# Tests that SINGLE_FILE works as intended in a Worker in JS output
def test_single_file_worker_js(self):
self.compile_btest('browser_test_hello_world.c', ['-o', 'test.js', '--proxy-to-worker', '-sSINGLE_FILE'])
create_file('test.html', '<script src="test.js"></script>')
create_file('test.html', '<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"></head><body><script src="test.js"></script></body></html>')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test break without this change?

Does the charset in the <head> somehow effect the script tag?

Does the mean that some/all users of SINGLE_FILE would ned to add an explicit charset to their html?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our existing shell files have already ~from the dawn of time had a <meta charset='utf-8'> directive. That is part of the general "best practices" boilerplate.

So we can expect quite safely that all the custom shell files that users have monkeyed off of any of the existing shell files we provide, will be UTF-8 (unless they have explicitly removed the meta charset directive, which is a bad idea in general)

About 98.2% of all websites utilize UTF-8, it is practically the only encoding used on the web. I don't really know of anyone who would use any other encoding on their web pages anymore.

It is just that our tests scripts have omitted the meta charset='utf-8' directive.

(If users did not want for some reason to specify that meta charset directive, they can also save the HTML file with a UTF-8 BOM marker, or they can send a Content-Type: application/html; charset=utf-8 HTTP header, but typically the meta charset directive is the simplest way)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, without this change, this test is broken?

I wonder if we should write a test to verify that -sSINGLE_FILE output is broken-by-default without UTF-8 encoding but that adding <meta charset='utf-8'> or disabling SINGLE_FILE_BINARY_ENCODE fixes it?

I guess we don't have any other non-ascii strings in our generated code otherwise <meta charset='utf-8'> would be required regardless of SINGLE_FILE and / or SINGLE_FILE_BINARY_ENCODE?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, without this change, this test is broken?

That's correct, the test would not work otherwise.

I wonder if we should write a test to verify that -sSINGLE_FILE output is broken-by-default without UTF-8 encoding but that adding <meta charset='utf-8'> or disabling SINGLE_FILE_BINARY_ENCODE fixes it?

I am not sure what would be the benefit of such test?

If we wanted to go the extra mile, I think we would want to either generate a UTF-8 BOM into the HTML file, or issue a link time warning if generated HTML file contains UTF-8 code points, but the HTML file does not contain a <meta charset='utf-8'> directive.

I guess we don't have any other non-ascii strings in our generated code otherwise <meta charset='utf-8'> would be required regardless of SINGLE_FILE and / or SINGLE_FILE_BINARY_ENCODE?

That is true. We (Emscripten) don't, but if users would have any UTF-8 chars they contain in JS libraries, then that would require them to specify the charset.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue a link time warning if generated HTML file contains UTF-8 code points, but the HTML file does not contain a <meta charset='utf-8'> directive.

This would a useful warning for users who generate HTML output but my understanding is that the vast majority of our users generate JavaScript and then write their own HTML separetely. I'm curious, does unity use emscripten to generate HTML? Is so I guess that would be big use case for HTML output in production. Otherwise I don't know of any.

src/proxyClient.js Outdated Show resolved Hide resolved
@@ -606,6 +606,8 @@ def closure_compiler(filename, advanced=True, extra_closure_args=None):
args += ['--language_out', 'NO_TRANSPILE']
# Tell closure never to inject the 'use strict' directive.
args += ['--emit_use_strict=false']
# Always output UTF-8 files, this helps generate UTF-8 code points instead of escaping code points with \uxxxx inside strings. https://github.com/google/closure-compiler/issues/4158
args += ['--charset=UTF8']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still love to see this change (along with the test it effects) land separately, but I won't force you to do it.

@@ -656,7 +658,8 @@ def move_to_safe_7bit_ascii_filename(filename):
# 7-bit ASCII range. Therefore make sure the command line we pass does not contain any such
# input files by passing all input filenames relative to the cwd. (user temp directory might
# be in user's home directory, and user's profile name might contain unicode characters)
proc = run_process(cmd, stderr=PIPE, check=False, env=env, cwd=tempfiles.tmpdir)
# https://github.com/google/closure-compiler/issues/4159: Closure outputs stdout/stderr in iso-8859-1 on Windows.
proc = run_process(cmd, stderr=PIPE, check=False, env=env, cwd=tempfiles.tmpdir, encoding='iso-8859-1' if WINDOWS else 'utf-8')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this and existing bug, or does it only show up with --charset=UTF8 is added above? (what is the default charset used by closure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an Closure compiler bug. It occurs both with and without --charset=UTF8, so does not relate to that.

I don't know what is the default charset used by Closure.

tools/link.py Show resolved Hide resolved
data = base64.b64encode(utils.read_binary(path))
return 'data:application/octet-stream;base64,' + data.decode('ascii')
if settings.SINGLE_FILE_BINARY_ENCODE:
return binary_encode(utils.read_binary(path))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not need some kind of data:application/octet-stream; here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these binary embedded strings are not passed to the browser to decode as Data URIs, so specifying a MIME type would be meaningless.

// Closure may analyze through the WASM_BINARY_DATA placeholder string into this
// function, leading into incorrect results.
/** @noinline */
function binaryDecode(bin) { for(var i = 0, l = bin.length, o = new Uint8Array(l); i < l; ++i) o[i] = bin.charCodeAt(i) - 1; return o; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this to be all on one line is there?

Seems a little unfortunate to create an entirely new file just to hold this tiny function. Perhaps you can add it to runtime_shared.js instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file proxyClient.js does not include runtime_shared.js, so that would not quite work.

Expanded the function on multiple lines.

@@ -1817,6 +1817,9 @@ var WASMFS = false;
// [link]
var SINGLE_FILE = false;

// If true, does binary encoding of data instead of base64 encoding.
var SINGLE_FILE_BINARY_ENCODE = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is true by default why have a setting for this?

Is it because some folks can't handle UTF-8? If so, it might be worth mentioning there. e.g. "Disable this if you can't handle UTF-8 chars in the generated JS".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that it would allow developers to pass -sSINGLE_FILE_BINARY_ENCODE=0 to revert back to the previous base64 encoding, should issues arise.

Updated the text doc.

juj added 2 commits March 26, 2024 14:30
# Conflicts:
#	src/preamble.js
#	test/code_size/random_printf_wasm.json
#	test/test_browser.py
juj added 6 commits August 27, 2024 21:08
… Improvement from previous base64 encoding to binary encoding is:

size of a.html == 17586, expected 18839, delta=-1253 (-6.65%)
size of a.html.gz == 10152, expected 10973, delta=-821 (-7.48%)
Total output size=17586 bytes, expected total size=18839, delta=-1253 (-6.65%)
Total output size gzipped=10152 bytes, expected total size gzipped=10973, delta=-821 (-7.48%)
Hey amazing, overall generated code size was improved by 1253 bytes!
@juj
Copy link
Collaborator Author

juj commented Aug 27, 2024

Updated this PR to latest.

Added new -sSINGLE_FILE code size test in binary encoded mode. The improvement delta from base64-encoded SINGLE_FILE to this binary-encoded SINGLE_FILE in that code size test is

size of a.html == 17586, expected 18839, delta=-1253 (-6.65%)
size of a.html.gz == 10152, expected 10973, delta=-821 (-7.48%)
Total output size=17586 bytes, expected total size=18839, delta=-1253 (-6.65%)
Total output size gzipped=10152 bytes, expected total size gzipped=10973, delta=-821 (-7.48%)
Hey amazing, overall generated code size was improved by 1253 bytes!

so the new test shows that the size improvement carries through post-gzip.

It's been a while since we discussed this. I still think this would be a good feature to land, be enabled by default, and allow an opt-out that users can have to revert to old base64 form.

@juj
Copy link
Collaborator Author

juj commented Aug 27, 2024

CI looks good and green now. This is good for another round of review.

@@ -2857,6 +2857,18 @@ then you can safely ignore this warning.

Default value: false

.. _single_file_binary_encode:

SINGLE_FILE_BINARY_ENCODE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling this SUPPORT_UTF8_EMBEDDING to match the existing SUPPORT_BASE64_EMBEDDING?

I do wish we could make this change without adding yet another setting, but I guess its not reasonable to make it unconditional. Do you think we could make it unconditional at some point in the future?

#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE
#include "binaryDecode.js"
var workerURL = URL.createObjectURL(new Blob([binaryDecode(filename)], {type: 'application/javascript'}));
#else
var workerURL = filename;
if (SUPPORT_BASE64_EMBEDDING) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this change but this line looks very strange. Shouldn't this be #if SUPPORT_BASE64_EMBEDDING?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants