-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestions about its performance / additon to the benchmarks #8
Comments
Hi @sugawarayuuta, thanks for the discussion! For the benchmarks in "github.com/go-json-experiment/jsonbench", I was able to run your module on a 1st generation Macbook Air with an M1 chipset and got:
All JSONv2 numbers are normalized to 1, while "sonnet" is shown as relative to JSONv2. Higher number is slower, lower number is faster. Analysis:
Staged pooling only works if you know the expected size up-front or can accurately predict it. One of the most difficult cases is figuring what sized buffer to use as the output for
An early prototype of this module actually used bit-wise operations for processing JSON strings. In isolation, it is faster, but has several issues:
As much as possible, we'd like to avoid re-inventing the wheel. If a custom hashmap is better than the builtin Go |
Thank you for the serious consideration. I am honestly happy that it did not give bad results with concrete typed encoding/decoding, which was my top priority.
I think what you're saying applies only if you do the parsing in 2 stages, tokenize and the actual decode which requires iterations twice. in contrast, I only do it once, (although I don't know this is a good idea, remember this is still in heavy development). It's possible by first finding one of double-quotes, control characters, or backslashes (escaped characters). In this way, you can read 8 bytes at once as long as the buffer doesn't end at the cursor, which usually doesn't.
The perfect hash function (what I use), is possible because it assumes that the key/value pairs are known from the start, (before the first Get/Put) as far as I know, and is not something that can easily merged into the Go standard |
Empirically, most JSON strings have no escape characters, so they can be "decoded" by simply truncating off the surrounding double-quotes. The JSONv2 implementation has a single bit to track whether a consumed JSON string contained any escape characters. If none are present, then the decode phase does the simple and fast O(1) operation.
Performance is one of our goals, but not our main goal. Our primary goal is actually with regard to correctness and functionality, in which case maintainability becomes more important. Once the API and semantics has stabilized, we could consider custom hashmap implementations. We're more open to optimizations that buy a decent gain, but have relatively little maintainability cost. As an aside, people care a lot about performance, but developer surveys have consistently shown that security and reliability is ranked as even more important. This is one of the reasons why we made the decision to forgo any use of "unsafe", even though it'll unfortunately leave performance behind on the table. |
Thanks again for your answer. I hope I'm not wasting your time.
Then the situation is not that far from mine, and perhaps the SWAR can be easily implemented. when consuming strings like you said you would after consuming the first byte (always a double quote), you could instead find special characters ( this can be implemented like this const (
// mapper has 1 mapped at every byte
mapper uint64 = 0x0101010101010101
)
var (
notSimpleTab = [256]bool{
'\\': true,
'"': true,
}
)
func init() {
for char := byte(utf8.RuneSelf); char != ' '; char++ {
notSimpleTab[char] = true
}
}
// consumeSimpleString consumes the next JSON string per RFC 7159, section 7
// but is limited to the grammar for an ASCII string without escape sequences.
// It returns 0 if it is invalid or more complicated than a simple string,
// in which case consumeString should be called.
func consumeSimpleString(src []byte) int {
var pos int
if len(src) != 0 && src[0] == '"' {
// starting quote
pos++
for {
if len(src)-pos < 8 {
// look for special characters OAAT (one at a time)
for pos < len(src) && !notSimpleTab[src[pos]] {
pos++
}
break
}
u64 := binary.LittleEndian.Uint64(src[pos:])
// include u64 itself to find characters that are > unicode.MaxASCII
u64 |= (u64 - (mapper * ' ')) | (u64 ^ (mapper * '"') - mapper) | (u64 ^ (mapper * '\\') - mapper)
// algo: https://graphics.stanford.edu/~seander/bithacks.html#ValueInWord
u64 &= mapper * 0x80
if u64 != 0 {
pos += int(((u64-1)&mapper*mapper)>>(64-8) - 1)
break
}
pos += 8
}
if pos < len(src) && src[pos] == '"' {
// ending quote
pos++
// first thing we found was a quote, valid and simple enough
return pos
}
}
// invalid or complicated string
return 0
} so I benchmarked this: (before.txt is for the inlined approach and after.txt is for SWAR). tested JSON files are here
Sorry for the wall. BTW, is this considered a significant maintenance cost? |
Ah yes, good point.
~2-8% gain is nice, but is also a non-trivial amount of cost. My judgment is that we should hold off on adding this for now. The main focus at the present moment is figuring out the compatibility story between v1 and v2 (which is turning out to be a harder problem than everything done so far). In working on that, I've unfortunately needed to work around and/or temporarily undo some optimizations that were previously added. As mentioned before, the main focus is correctness and functionality. What are you're thoughts on shelving this discussion until this work actually lands? I like performance gains, but I don't think now is the right time to focus on it. |
Of course. Glad if it helped in any way. I am going to create |
That will be interesting for showing the performance difference of unsafe "reflect"-like operations vs using "reflect" itself.
Sure, no problem. |
Hello. Sorry for being late. I burnt out after I made the version I posted months ago. but I finally made the version without unsafe/linkname(s), although this is not a fair comparison against the previous version, (I made some tweaks), it performed better than JSONV2 as well as the previous version of this. (generally, better here means its runtime was shorter, for concrete types.) |
Hi @sugawarayuuta. I hope you're doing well. I'm sorry to hear about feeling burnout. I've been in that situation myself before. I'm hope I'm not adding to your burden, but I was trying run Sonnet through the latest iteration of the benchmarks, but I'm getting test failures. If you patch go-json-experiment/jsonbench@b81ba2b with: diff --git a/bench_test.go b/bench_test.go
index e9bf517..33dd43d 100644
--- a/bench_test.go
+++ b/bench_test.go
@@ -30,6 +30,7 @@ import (
gojson "github.com/goccy/go-json"
jsoniter "github.com/json-iterator/go"
segjson "github.com/segmentio/encoding/json"
+ sonnetjson "github.com/sugawarayuuta/sonnet"
"github.com/google/go-cmp/cmp"
"github.com/google/go-cmp/cmp/cmpopts"
@@ -118,6 +119,13 @@ var arshalers = []struct {
unmarshal: sonicjson.Unmarshal,
marshalWrite: func(w io.Writer, v any) error { return sonicenc.NewStreamEncoder(w).Encode(v) },
unmarshalRead: func(r io.Reader, v any) error { return sonicdec.NewStreamDecoder(r).Decode(v) },
+}, {
+ name: "SonnetJSON",
+ pkgPath: "github.com/sugawarayuuta/sonnet",
+ marshal: sonnetjson.Marshal,
+ unmarshal: sonnetjson.Unmarshal,
+ marshalWrite: func(w io.Writer, v any) error { return sonnetjson.NewEncoder(w).Encode(v) },
+ unmarshalRead: func(r io.Reader, v any) error { return sonnetjson.NewDecoder(r).Decode(v) },
}}
func TestRoundtrip(t *testing.T) { I run into a failures like:
|
Thank you very much for testing! |
You don't need to test/benchmark this again if you have already done that but I updated the repository to make the string validator for skipped string more strict, resulting in |
Hey @sugawarayuuta, I apologize for the delay, but we've been focusing on getting discussion for "encoding/json/v2" kickstarted. Great work with fixing your package to pass most of the test! At the moment, we're focused on getting the API and correctness right, and then we'll come back to revisit performance. Unfortunately, performance is hard to nail down when we're making large-scale refactors to the entire package. For example, the change the to split out "jsontext" caused a 5% performance regression across the board even though there was minimal logic change; just a lot of code movement and I haven't had the chance to dig into the compiled code to understand exactly why. It is the right split from a code-maintenance perspective, even if it hurts performances. Given that the public discussion could result in yet more major changes to the API again, I'd prefer to have the performance optimizations come after possible refactors. If you're up for it, it'd be good to revisit this after the dust has settled and see how we could incorporate some of your ideas. That said, we do prioritize readability and maintainability, so even something like SWAR would need to be carefully explained or written in a way to aid readability. |
I came up with one particularly useful technique while writing my JSON parser. It was simple to implement and importantly many optimizations followed naturally from it. I'm struggling to find any other JSON parser that does it so I'm left in a confused state. The idea is simply that the String parser should always return a slice with if len(s) <= 16 {
extended := s[:16:16] // string parser guarantees that cap >= 16
masks := masksForLens0to16[len(s)] // masks for zeroing the s[len:16] bytes
var key uint64pair
key.lo = (binary.LittleEndian.Uint64(extended[0:]) & masks.lo)
key.hi = (binary.LittleEndian.Uint64(extended[8:]) & masks.hi)
fieldParser, ok = fieldParsers16B.Get(key) // perfect hash table for fields with len(name) <= 16
}
if !ok { /*fallback lookup path*/ } This requires making the String parser start off by checking if the remaining byte buffer has cap >= 16 and if not then it should This idea of ensuring that the buffer has at least N accessible bytes also simplified handling "null", "infinity", "\uXXXX" and other multi-byte sequences. To wrap it up this enabled a modest performance improvement while also simplifying parts of the parsing code. |
Hello! this is my first issue ever, so go easy on me.
I have published a JSON library that I was thinking about during my high school breaks. and I hope to share my knowledge I have gained from it, and if possible, I would like to ask you to add it to your benchmarks.
The following articles describes the general idea
The following is my repository
There are three major optimization I used, excluding
unsafe
things.(Staged) sync.Pool
By setting up staged in the pool, slices etc can be reused more reliably.
The idea is to create an array of pools and specify the size of slices that each pool accept.
My implementation is here.
This eliminates waste and simultaneously omits objects that are too large or too small.
Bit-wise operations (SWAR)
This can be used for multiple things, but I use it for byte searching. You can search multiple characters at once, so this can be efficient if assembly is not available.
My implementation is here.
This uses unsafe, however you can still replace it with
encoding/binary
.algorithms are described here.
Custom data structures
One thing I noticed in profiling the decoder is that lookups tend to be the bottleneck. (in this case, checking if a field exists with that name.) so first I created a hash map with robinhood hashing algorithm, that was faster than the standard
map
, because it's quite strong at handling keys that don't exist. I also tried cuckoo hash, which has interesting insertion and O(1) lookup. but I ended up using perfect hashing functions, as key(field name)-value(field information) pairs are known from the start, and don't change after. I also used OAAT fnv hash function to lowercase the field name and hash it at the same time.My implementation is (although it's probably too simple) here
These are optimizations I used.
Next, about the addition of the benchmark, I created a repository for reference. the result of benchmarks taken on darwin- amd64- Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz is here. the code for the benchmark is in the same repository. Check that out if you're interested.
As you may already see, the library is quite strong at stream-decoding JSON thanks to the technique I provided above.
By the way, I made the library so that it's compatible with the standard
encoding/json
, so the usage is the same as that.Thank you for reading and have a good day.
The text was updated successfully, but these errors were encountered: