Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add SimdJsonParser2 base on bitindex #60

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

heykirby
Copy link

@heykirby heykirby commented Oct 4, 2024

issue: #59

@heykirby heykirby force-pushed the feature_simdjson2 branch 5 times, most recently from 5c92d47 to 3139b2c Compare October 7, 2024 09:53
@heykirby
Copy link
Author

heykirby commented Oct 19, 2024

@arouel thanks very much, I have fix the code based on your suggestion.
In the case of determining the parsing path, simdjsonParserWithFixPath provides better performance and supports compressing map and list type data into strings. It can quickly skip paths that do not require parsing and avoid creating instances of JSON nodes for each JSON node

Benchmark testing indicators. refer:
environment is Species[byte, 32, S_256_BIT]

Result "org.simdjson.AParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_Jackson":
693.528 ±(99.9%) 18.073 ops/s [Average]
(min, avg, max) = (687.806, 693.528, 699.113), stdev = 4.694
CI (99.9%): [675.455, 711.601] (assumes normal distribution)

Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJson":
2258.495 ±(99.9%) 41.596 ops/s [Average]
(min, avg, max) = (2242.400, 2258.495, 2269.942), stdev = 10.802
CI (99.9%): [2216.899, 2300.091] (assumes normal distribution)

Result "org.simdjson.ParseAndSelectFixPathBenchMark.parseMultiValuesForFixPaths_SimdJsonParserWithFixPath":
4075.984 ±(99.9%) 104.804 ops/s [Average]
(min, avg, max) = (4029.568, 4075.984, 4100.273), stdev = 27.217
CI (99.9%): [3971.180, 4180.789] (assumes normal distribution)

@piotrrzysko
Copy link
Member

How is this different from On-Demand parsing available in the c++ simdjson version?

I introduced a form of on-demand parsing in #51 (see: org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue, JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.

@arouel
Copy link

arouel commented Oct 21, 2024

@piotrrzysko I agree with you, a DOM-like API (JsonValue, JsonIterator, etc.) would be very helpful in use cases where only specific parts of the JSON are conditionally relevant, so that a mapping to an object would cause allocation that you want to avoid.

Can you guide us a bit, so that we can prepare a PR?

Copy link

@arouel arouel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heykirby I just want share some thoughts/questions:

With some minor API changes in simdjson-java, could we keep the SimdJsonParserWithFixPath in another codebase or it could life in a contribution module, because it is tailored for a very specific use case?

Isn't a record JsonNode sufficient compared to using lombok?

src/main/java/org/simdjson/SimdJsonParser.java Outdated Show resolved Hide resolved
@heykirby
Copy link
Author

heykirby commented Oct 22, 2024

@heykirby I just want share some thoughts/questions:

With some minor API changes in simdjson-java, could we keep the SimdJsonParserWithFixPath in another codebase or it could life in a contribution module, because it is tailored for a very specific use case?

Isn't a record JsonNode sufficient compared to using lombok?

@arouel Thanks arouel,the unused imports has been removed

@heykirby
Copy link
Author

How is this different from On-Demand parsing available in the c++ simdjson version?

I introduced a form of on-demand parsing in #51 (see: org.simdjson.OnDemandJsonIterator). The API requires specifying a target class to which the JSON will be parsed. However, it should be relatively easy to extend this to support a DOM-like API (JsonValue, JsonIterator, etc.), which I believe is more intuitive than introducing syntax for accessing fields and then returning an array of strings with the corresponding values.

hello, piotrrzysko, I used on-demand parsing,it is very convenient and efficient to deserialize json strings into java classes.it is also a solution provided by many mainstream json sdk.
However, this solution requires building a Java class before parsing the field, especially for deep paths, which is not very convenient for users. for example,if want to get field for $.a.b.c.d. first we need to define class a { class b { class c{class d}}},and then to parse value, and every time parse json string, we need to create an class instance for each node, in case of large-scale data, performance may be affected.

For SimdJsonParserWithFixPath, if we want get values for multi-paths: [$.a.c,$.a,$.a.d,$.b], we only need to provide the json paths, the usage is similar to hive's user define function: json_tuple. It also supports obtaining the value of the children of the container object while obtaining the compressed string value of the container object.
the path tree will only be created once during initialization,and the result array can be reused each time json string is parsed. In scenarios with large amounts of data, repeated creation and destruction of class instances can be avoided, and there will be some advantages in performance.
image

@piotrrzysko
Copy link
Member

Hi, sorry for the delayed reply.

@heykirby
What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveraging org.simdjson.OnDemandJsonIterator).

The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.

@arouel

Can you guide us a bit, so that we can prepare a PR?

I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.

@heykirby
Copy link
Author

Hi, sorry for the delayed reply.

@heykirby What I meant was that we can introduce on-demand parsing for a DOM-like API, which would significantly reduce the need for creating new objects. In fact, we could have a single instance of something like OnDemandJsonValue, which would be mutable and traverse a parsed JSON under the hood (likely leveraging org.simdjson.OnDemandJsonIterator).

The schema-based API you’re referring to is simply using logic that could potentially be utilized by the on-demand DOM API as well.

@arouel

Can you guide us a bit, so that we can prepare a PR?

I’d be happy to help. Perhaps I could start by creating a skeleton of the on-demand DOM API.

thanks,piotrrzysko, It's always an expected feature.

@heykirby
Copy link
Author

@piotrrzysko I submitted a new PR, could you give me some guidance? #63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants