-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(wasmstan): use Stan's json_data
var context
#145
Conversation
It's tricky to get the data in the right format otherwise. I'll try and revisit this at some point when I have more time to fix the unfortunate issues this creates with types, but for now this will at least fix things.
WalkthroughThe pull request introduces significant modifications to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
json_data
var context
Newtypes and comments on JS bindings, extra log lines in tracing output, commented code, versions, and refresh in tests.
97011a9
to
20a5efd
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Outside diff range and nitpick comments (9)
components/cpp/prophet-wasmstan/wit/prophet-wasmstan.wit (3)
82-85
: Enhance documentation for thedata-json
type.The type definition is well-structured and aligns with the objective to use Stan's JSON data context. However, the documentation could be more comprehensive.
Consider adding:
- The expected JSON schema or structure
- A reference to Stan's
json_data
format documentation- A simple example of valid JSON data
/// JSON representation of the Prophet data to pass to Stan. /// - /// This should be a string containing a JSONified `Data`. + /// This should be a string containing a JSONified `Data` that follows Stan's + /// json_data format. The JSON structure should match the fields defined in + /// the `data` record type above. + /// + /// Example: + /// { + /// "n": 100, + /// "y": [1.0, 2.0, ...], + /// "t": [0.0, 1.0, ...], + /// ... + /// } type data-json = string;
173-183
: Document error cases and performance considerations.The interface documentation is good, but could be enhanced with more operational details.
Consider adding:
- Documentation for possible error cases in the
result
type- Performance implications of JSON parsing
- Memory considerations for large datasets
/// The Prophet optimizer interface. /// /// An optimizer is a type that can take some initial parameters and data /// and produce some optimal parameters for those parameters using maximum /// likelihood estimation. This corresponds to cmdstan's `optimize` -/// command. +/// command. +/// +/// Note: Using JSON data involves serialization/deserialization overhead. +/// Consider the performance impact when dealing with large datasets. interface optimizer { use types.{inits, data-json, optimize-opts, optimize-output}; /// Optimize the initial parameters given the data, returning the /// optimal values under maximum likelihood estimation. + /// + /// # Errors + /// Returns a string error message if: + /// - The JSON data is malformed + /// - The data fields don't match the expected schema + /// - The optimization process fails optimize: func(init: inits, data: data-json, opts: optimize-opts) -> result<optimize-output, string>;
Line range hint
82-183
: Consider future optimizations for the JSON-based approach.While using Stan's
json_data
context is a pragmatic solution, there are some architectural considerations for future improvements:
- Performance: Consider implementing a binary serialization format (like Protocol Buffers or FlatBuffers) for better performance in future iterations.
- Memory: For large datasets, streaming JSON parsing might be beneficial to reduce memory overhead.
- Validation: Consider adding a validation layer to catch data issues before they reach Stan.
Would you like assistance in exploring these optimization strategies for a future iteration?
components/cpp/prophet-wasmstan/prophet_wasmstan.h (3)
99-102
: Enhance documentation with JSON schema details.While the documentation clearly indicates this is for JSON data, it would be helpful to include:
- The expected JSON schema or a reference to it
- An example of valid JSON input
- Any specific formatting requirements or constraints
210-210
: Document performance implications of JSON parsing.As mentioned in the PR objectives, this JSON-based approach is a temporary fix with known performance implications due to JSON stringification and parsing overhead. Consider adding a TODO comment to track the future optimization work.
Add a comment like:
// TODO: This JSON-based approach is temporary. Future optimization needed to avoid // JSON stringification and parsing overhead.
Line range hint
99-257
: Consider documenting migration strategy from JSON approach.While this JSON-based implementation provides a working solution by leveraging Stan's existing parsing mechanisms, it would be beneficial to:
- Document the rationale for choosing this approach in a comment block
- Outline the planned migration path to a more performant solution
- Create tracking issues for the future optimization work
This will help future maintainers understand the context and planned improvements.
components/js/prophet-wasmstan/prophet-wasmstan.test.js (2)
Line range hint
1-4858
: Improve test readability and maintainability.The test data initialization could be improved in several ways:
- Add JSDoc comments explaining the purpose and structure of the test data
- Consider moving the large data arrays to separate fixture files
- Add descriptive constants for magic numbers like array sizes
Example structure:
/** * Test fixtures for prophet-wasmstan optimization * @typedef {Object} ProphetTestData * @property {number} T - Number of time points * @property {number} S - Number of seasons * @property {number} K - Number of components * @property {number[]} y - Target values * @property {number[]} t - Time points * @property {number[]} t_change - Change points * @property {number[][]} X - Feature matrix */ const CONSTANTS = { TIME_POINTS: 480, SEASONS: 25, COMPONENTS: 6 }; // Move large arrays to separate files import { timeSeriesData } from './fixtures/time-series-data'; import { featureMatrix } from './fixtures/feature-matrix';
Line range hint
4859-4960
: Address floating-point precision issues and improve test robustness.The test has commented out assertions for important parameters due to potential float precision issues with WebAssembly. This needs to be investigated and properly handled.
Consider these improvements:
- Add error handling for the optimize function:
test('smoke test Prophet model', async () => { try { const { params } = await optimize(inits, JSON.stringify(data), { seed: 1237861298, iter: 10000, refresh: 100 }); // assertions... } catch (error) { fail(`Optimization failed: ${error.message}`); } });
- Add tolerance configuration for float comparisons:
const FLOAT_TOLERANCE = 1e-4; expect.extend({ toAllBeCloseTo: (received, expected, tolerance = FLOAT_TOLERANCE) => { // ... comparison logic with configurable tolerance } });
- Consider splitting the test into smaller, focused test cases for each parameter type.
components/cpp/prophet-wasmstan/optimizer.cpp (1)
264-264
: Consider renaming parameterend
tosize
inmembuf
constructor for clarityIn the
membuf
constructor, the parameterend
represents the size of the buffer rather than an end pointer. Renaming it tosize
can improve code readability and prevent confusion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (3)
components/js/prophet-wasmstan/package-lock.json
is excluded by!**/package-lock.json
components/package-lock.json
is excluded by!**/package-lock.json
crates/augurs-js/testpkg/package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (10)
- components/cpp/prophet-wasmstan/optimizer.cpp (5 hunks)
- components/cpp/prophet-wasmstan/prophet_wasmstan.cpp (4 hunks)
- components/cpp/prophet-wasmstan/prophet_wasmstan.h (5 hunks)
- components/cpp/prophet-wasmstan/wit/prophet-wasmstan.wit (2 hunks)
- components/js/prophet-wasmstan/package.json (1 hunks)
- components/js/prophet-wasmstan/prophet-wasmstan.test.js (4 hunks)
- components/justfile (1 hunks)
- crates/augurs-js/Cargo.toml (1 hunks)
- crates/augurs-js/src/prophet.rs (11 hunks)
- crates/augurs-js/testpkg/package.json (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- components/js/prophet-wasmstan/package.json
🔇 Additional comments (15)
crates/augurs-js/testpkg/package.json (1)
7-7
: LGTM! Version update aligns with requirements.The update to
^0.2.0
satisfies the previous review request. The caret (^
) allows updates to minor/patch versions while maintaining compatibility with 0.2.x, which is appropriate for this testing package.crates/augurs-js/Cargo.toml (1)
40-40
: LGTM: serde_json addition aligns with JSON data handling requirements.The addition of
serde_json
is appropriate for implementing the Stan'sjson_data
var context approach.components/cpp/prophet-wasmstan/prophet_wasmstan.h (2)
225-225
: LGTM! Function signature properly updated.The updated signature maintains proper error handling while transitioning to the JSON-based approach. The parameter order and types are logical and consistent with the rest of the codebase.
237-238
: Verify memory management in the implementation.The memory management functions are correctly declared for the new JSON type. Since these will handle JSON strings, ensure the implementation properly delegates to
prophet_wasmstan_string_free
to prevent memory leaks.Also applies to: 257-257
✅ Verification successful
Memory management implementation verified and correct
The implementation in
prophet_wasmstan.cpp
correctly delegates toprophet_wasmstan_string_free
as recommended, ensuring proper memory cleanup for the JSON data type.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify memory management implementation for JSON data types # Test: Check if the free functions properly delegate to string_free ast-grep --pattern $'void augurs_prophet_wasmstan_types_data_json_free(augurs_prophet_wasmstan_types_data_json_t *ptr) { $$$ prophet_wasmstan_string_free($$$); $$$ }'Length of output: 173
Script:
#!/bin/bash # Let's try a different approach to find the implementation # First, find all .c files that might contain the implementation fd -e c # Then use ripgrep to find the implementation of the free function rg -A 5 "void augurs_prophet_wasmstan_types_data_json_free"Length of output: 1264
components/cpp/prophet-wasmstan/optimizer.cpp (1)
276-276
: Verify that all calls toexports_augurs_prophet_wasmstan_optimizer_optimize
are updated to match the new signatureSince the function signature has changed to accept
data
asexports_augurs_prophet_wasmstan_optimizer_data_json_t*
, please ensure that all call sites have been updated accordingly to prevent potential compilation errors.Run the following script to verify:
✅ Verification successful
Based on the search results, I can see that there is only one call to
exports_augurs_prophet_wasmstan_optimizer_optimize
incomponents/cpp/prophet-wasmstan/prophet_wasmstan.cpp
at line 1024, and it's being called with the correct signature including thedata
parameter as&arg13
. The function declaration inprophet_wasmstan.h
also matches the implementation signature.All calls to
exports_augurs_prophet_wasmstan_optimizer_optimize
are correctly updated with the new signatureThe function is called with the correct parameters matching its declaration:
exports_augurs_prophet_wasmstan_optimizer_optimize(&arg12, &arg13, &arg14, &ok, &err)
. No issues were found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all calls to `exports_augurs_prophet_wasmstan_optimizer_optimize` and check their arguments. # Search for function calls in the codebase rg 'exports_augurs_prophet_wasmstan_optimizer_optimize\(' -A 5Length of output: 2531
components/cpp/prophet-wasmstan/prophet_wasmstan.cpp (2)
114-116
: Properly free JSON data structuresThe
augurs_prophet_wasmstan_types_data_json_free
function correctly frees the JSON data by callingprophet_wasmstan_string_free(ptr);
. This ensures that the allocated memory for the JSON data is properly released, preventing memory leaks.
386-386
: Verify correct casting and initialization of data structuresAt line 386, the assignment to
arg13
involves complex casting and pointer arithmetic:exports_augurs_prophet_wasmstan_optimizer_data_json_t arg13 = (prophet_wasmstan_string_t) { (uint8_t*)(*((uint8_t **) (arg + 40))), (*((size_t*) (arg + 44))) };Ensure that the casting accurately reflects the data's layout in memory and that the alignment requirements are met. Misalignment or incorrect casting can result in undefined behavior.
Suggested action:
Review the data structure definitions and confirm that the offsets (
arg + 40
andarg + 44
) correspond to the expected positions of the string pointer and its length. Consider usingmemcpy
for safer data extraction if alignment might be an issue.crates/augurs-js/src/prophet.rs (8)
80-80
: Ensure the JavaScript optimizer function handles the JSON data correctlyWith
data
now being passed as a JSON string (data_s
), verify that the JavaScript optimizer functionfunc
correctly parses and utilizes this data. Ensure that all instances wherefunc
is used are updated to accommodate the new data format.
141-148
: Addition ofopts
parameter enhances flexibility infit
methodThe updated signature of the
fit
method with the optionalopts
parameter allows for more flexible optimization configurations. This change makes the function more adaptable to different optimization scenarios.
238-256
: Conversion implementation appears correctThe
impl From<OptimizeOptions> for augurs_prophet::optimizer::OptimizeOpts
correctly maps all fields fromOptimizeOptions
toOptimizeOpts
. This ensures that optimization options are accurately translated between types.
Line range hint
300-331
: EnumAlgorithm
mapping is consistentThe definition and conversions for the
Algorithm
enum are properly set up, ensuring consistency between the JavaScript and Rust representations of the optimization algorithms.
356-393
: Verify compatibility ofData
struct changes with JavaScript bindingsChanging the
Data
struct fields fromFloat64Array
andInt32Array
toVec<f64>
andVec<i32>
simplifies the data structure in Rust. However, please verify that this change remains compatible with JavaScript bindings and that data serialization between Rust and JavaScript functions as intended.
400-406
: Refer to previous TODO about creating a newtype forProphetStanDataJSON
As previously noted, consider creating a newtype for the
data
argument to improve type safety and add documentation about required fields. This will enhance code clarity and maintainability.
407-412
: Refer to previous TODO regarding unused code commentsThere is commented-out code that is intended for future use. As mentioned earlier, adding comments to explain why this code is currently unused will help other developers understand the context and future plans.
583-586
: MakingOptimizedParams
public enables external accessExposing
OptimizedParams
as a public struct allows it to be accessed outside the module, which can be beneficial for users who need to interact with the optimized parameters directly.
@@ -37,6 +37,7 @@ console_error_panic_hook = "0.1.7" | |||
getrandom = { version = "0.2.10", features = ["js"] } | |||
js-sys = "0.3.64" | |||
serde.workspace = true | |||
serde_json = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Add documentation for JSON implementation and WASM flags
The codebase shows that:
- The
serde_json
dependency is used without any documentation about its temporary nature - The WASM optimization flags are unique to this package and lack proper documentation about their purpose and compatibility implications
Please make the following changes:
- Add a comment above the
serde_json
dependency explaining its temporary status - Document the WASM optimization flags as suggested in the original review
- Consider creating a dedicated section in the package documentation about the optimization strategy
🔗 Analysis chain
Document temporary nature of JSON implementation and verify WASM optimization flags.
The JSON-based implementation is noted as temporary in the PR description due to performance concerns. Consider:
- Adding a comment in the Cargo.toml explaining this is a temporary solution
- Verifying that the new WASM optimization flags (
--enable-bulk-memory
,--enable-threads
) are compatible with all target environments
Consider adding a comment above the wasm-opt configuration:
[package.metadata.wasm-pack.profile.release]
-# previously had just ['-O4']
+# TODO: This configuration includes additional optimization flags to help with JSON parsing
+# performance. This might need revision when moving away from the temporary JSON-based
+# solution.
wasm-opt = ['-O4', '--enable-bulk-memory', '--enable-threads']
Also applies to: 53-54
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if these optimization flags are used in related packages or if this is an isolated change
# Test: Look for other wasm-opt configurations in the codebase
rg -l "wasm-opt"
# Test: Check for any existing documentation about WASM optimization strategy
fd -g "*.md" -X rg -l "wasm-opt|bulk-memory|threads"
Length of output: 99
Script:
#!/bin/bash
# Let's check the PR description and related documentation for context about the JSON implementation
fd -g "*.md" -X rg -i "json.*temporary|temporary.*json"
# Check for any existing WASM configuration documentation
fd -g "*.md" -X rg -i "wasm.*configuration|wasm.*optimization"
# Look at the actual wasm-opt configuration in the Cargo.toml
rg "wasm-opt" -A 2 -B 2
# Check for any other JSON-related dependencies or configurations
rg "serde_json|json" crates/augurs-js/Cargo.toml
Length of output: 501
let data = | ||
serde_json::to_string(&data).map_err(augurs_prophet::optimizer::Error::custom)?; | ||
let data_s = JsValue::from_str(&data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider the performance impact of serializing data to JSON
Serializing data
to a JSON string using serde_json::to_string(&data)
may introduce performance overhead, especially with large datasets. This could affect the efficiency of the optimization process. In future iterations, consider using a more efficient data passing method that avoids the serialization step.
This is a hopefully temporary fix for the WASM-based optimizer which avoids manually creating an
array_var_context
. Instead the WASM component's API accepts a JSON string containing the data (similar to howcmdstan
accepts a JSON file) and reuses the parsing thatcmdstan
uses, because it was a bit of a nightmare getting the data in the right format.In future I'd like to revisit this and fix it since the string/JSON API kinda sucks and it's definitely less performant (needing to stringify/parse JSON isn't great), but this at least gets things working again.
Commits:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Chores