Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store data path hash instead of string in binary #5886

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

sffc
Copy link
Member

@sffc sffc commented Dec 10, 2024

Proof-of-concept for #4991

@sffc
Copy link
Member Author

sffc commented Dec 10, 2024

@Manishearth What do you think about this approach?

The actual path strings are used in two places:

  1. Error messages, including impl Debug
  2. FsDataProvider

This PR makes the Debug impl use the hash and FsDataProvider get the path from the registry.

One thing that puzzles me is the size of tutorials_buffer.wasm.

Commit Description Size
9e5a25b Main 250697
68fcc7e Add data marker hash 246851
f8d1505 Remove data marker path 251087
8f34fd5 Simplify Debug impl 251888

My expectation is that the size should go up on row 2 and down on rows 3 and 4, but it is basically the opposite of my expectation. I've inspected the assembly and I have no explanation for this.

@Manishearth
Copy link
Member

I like this, as long as datagen is able to speak in terms of the regular paths and debugging hashes is not too hard. Which I think will be fine with the registry.

My expectation is that the size should go up on row 2 and down on rows 3 and 4, but it is basically the opposite of my expectation. I've inspected the assembly and I have no explanation for this.

Is the path included in the binary elsewhere? I bet something else includes it so it gets reused.

@sffc
Copy link
Member Author

sffc commented Dec 12, 2024

Briefly discussed in the WG call. @Manishearth and @hsivonen are supportive of the change, and @sffc will continue to investigate the binary size issue.

@sffc sffc added this to the ICU4X 2.0 Stretch ⟨P2⟩ milestone Dec 12, 2024
@sffc
Copy link
Member Author

sffc commented Jan 7, 2025

@robertbastian I still have a lot on my plate for 2.0. Would you mind taking a look at this PR, reproducing and possibly fixing the binary size regression shown above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants