Performance hog class ResourceSerializer needs refactoring #62

mtravnicek · 2019-12-24T14:06:34Z

When producing moderately small IIIF manfests (monograph with about 35 pages/images) using REST controller in Spring, added latency about 15ms can be noticed in process of serializing using Jackson. At first I thought that this was Jackson being slow... but slowdown can be traced down to custom serializer ResourceSerializer namely line 110 where replacing gen.writeObject with gen.writeString fixes most of the latency. This can be blatantly seen in Chrome response view as blue tail of time to download content.

iiif-apis/src/main/java/de/digitalcollections/iiif/model/jackson/serialization/ResourceSerializer.java

Line 110 in cd92524

gen.writeObject(value.getIdentifier().toString());

Rest of processing in this class namely ModelUtilities.getCompleteness and following decoding/if statements also should be rethinked. Because you are calling this code (reflection!) on every processed element in JSON serialization again and again - it is not cached anywhere. What is the purpose of this code? I reckon that you try to determine if elements should be unwrapped to list or not, but for example "on" #25 , do you ever want it written out in exploded form? Does it have to be defined as (anonymous?) resource in contrast to context represented as simple string (no resource) #36 ? So the whole model approach should be reanalyzed and these flags to unwrap element or not - set it in form or some static flags in Resource instance itself, do not reverse engineer it at runtime!
Perhaps even authors of Jackson can look into/share some opinion on this: @cowtowncoder

cowtowncoder · 2019-12-24T22:21:30Z

From quick look, wrt suggested change: absolutely yes one should just do:

gen.writeString(value.getIdentifier().toString());

instead of using writeObject() -- writeObject() will incur higher overhead as it will call ObjectMapper, locate serializer for String, check some of the settings but essentially in the end just calling JsonGenerator.writeString() ("Object" in writeObject() simply means that value can be arbitrary POJO that Jackson supports).
The only (?) possible but unlikely concern is if there is some use of filtering capabilities of ObjectMapper, overrides, or such, they would be by-passed. That seems unlikely given value is String.

While this makes sense to do and should be easy improvement, I would not have assumed it to have huge performance implications... always interesting to see where changes matter, where not. :)

I can't comment much about surrounding design as I do not really know the implementation.

mtravnicek · 2020-01-25T16:18:26Z

@jbaiter @bitzl @morpheus-87 @datazuul
Hello can you please submit fix for this, atleast that one line replacement of gen.writeObject with gen.writeString

jbaiter · 2020-01-27T08:46:28Z

Thanks for the excellent in-depth analysis @mtravnicek and @cowtowncoder!
I'll look into it, expect a new release in the next few days.

What is the purpose of this code? I reckon that you try to determine if elements should be unwrapped to list or not, but for example "on" #25 , do you ever want it written out in exploded form?

Yes, that's exactly the purpose. The exploded form is e.g. used in collections where you want the type, the identifier and the labels of a given resource.

Does it have to be defined as (anonymous?) resource in contrast to context represented as simple string (no resource) #36 ? So the whole model approach should be reanalyzed and these flags to unwrap element or not - set it in form or some static flags in Resource instance itself, do not reverse engineer it at runtime!

I agree that the code is an unholy mess and should be replaced with something better, but I'm not exactly sure how to go about it. The IIIF spec has a really large surface area, and for determining whether to unwrap a resource or not, we have to know if any of the allowed fields is set to a non-null or non-empty value. This is information that is only known at runtime, so this this means either:

Add a getCompleteness method to every type that checks those fields during runtime (advantage: no more reflection), but this is a lot of boilerplate code.
Manage a isComplex flag on all resources that gets updated with every setter (possible advantage: less runtime cost), also adds a ton of boilerplate code

I'll be doing some measurements on how much of this reflection-stuff is hurting performance and decide based on that whether to find a better way.

jbaiter · 2020-01-27T14:53:49Z

So I did some JMH benchmarking with a simple test case that creates a manifest with 128 canvases (and should hit the line 128 times), here are the results:

With .writeString():

Benchmark                              Mode  Cnt    Score   Error  Units
ManifestBench.testManifestGeneration  thrpt   25  53.258 ± 2.337  ops/s

With .writeObject():

Benchmark                              Mode  Cnt    Score    Error  Units
ManifestBench.testManifestGeneration  thrpt   25  51.479 ± 2.416  ops/s

There is no difference outside of the statistical margin with my benchmark :-/ Can you maybe provide one of your pathological samples so I can do more profiling?

As for the reflection stuff, that is a pretty huge performance hog indeed! I'll try to refactor it with one of the above solutions (or another one, if you have a good idea!).

edit: So I experimented a bit with both solutions, and neither of them is really satisfying as they both result in TONS of boilerplate code. I think the whole API is up for a more large-scale refactoring. Ideally we'd have something like a ResourceReference type that only has an id and optionally a label that you could pass in for any of the more specific types like Canvas, Annotation, etc. I think I'll work on that once I get around to implementing v3 of the IIIF APIs in the library (wich will probably be in Q2 or Q3 of 2020).

cowtowncoder · 2020-01-27T16:49:20Z

Ok. I am not entirely surprised by change not resulting in statistically meaningful improvement -- I know writeObject() does have additional overhead, but depending on relative sizes of content and so on it did not seem like a red flag. Profilers are known to sometimes point to red herrings, so measurements on real (or reasonable approximations) data are useful and necessary.

mtravnicek · 2020-03-02T20:08:23Z

Sorry for later response.... We are using your codebase (thanks) in this digital library app for generating iiif manifests: https://github.com/moravianlibrary/krameriusiiif
I have to admit that recorded slowdowns in milliseconds happen at my fairly slow notebook which has like i5-6200u cpu, so on decently fast desktop it may be on margin of error...
I have no benchmarks results only screenshots - i was running this spring boot app and Chrome with devtools in latency breakdown, then unusual blue "tail" representing Content Download latency in some ms was seen, so i was trying out everyting including Jackson Afterburner etc.. until i managed to pinpoint culprit.

(Here is some page explaining this tool)
https://blog.logrocket.com/visualizing-backend-performance-in-the-chrome-devtools-bb6fd232540/

When i tried to optimize related serializer class, then Content Download value went back to almost 0 which is normal on localhost and overall latency decreased, which was confirmed by in app measuring. So please replace this oneliner of gen.writeObject to gen.writeString it wont hurt your code (getIdenifier.toString()) and may help me :) and maybe others....
I presume that the latency is shifting into Content Download because the server response is already halfway prepared, but it stumbles on while converting/serializing...

jbaiter added a commit that referenced this issue Apr 24, 2020

Fix ResourceSerializer performance problem (fixes #62)

20556fd

jbaiter mentioned this issue Apr 24, 2020

Fix ResourceSerializer performance problem (fixes #62) #80

Merged

jbaiter closed this as completed in fb87875 Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance hog class ResourceSerializer needs refactoring #62

Performance hog class ResourceSerializer needs refactoring #62

mtravnicek commented Dec 24, 2019

cowtowncoder commented Dec 24, 2019

mtravnicek commented Jan 25, 2020

jbaiter commented Jan 27, 2020 •

edited

Loading

jbaiter commented Jan 27, 2020 •

edited

Loading

cowtowncoder commented Jan 27, 2020

mtravnicek commented Mar 2, 2020

Performance hog class ResourceSerializer needs refactoring #62

Performance hog class ResourceSerializer needs refactoring #62

Comments

mtravnicek commented Dec 24, 2019

cowtowncoder commented Dec 24, 2019

mtravnicek commented Jan 25, 2020

jbaiter commented Jan 27, 2020 • edited Loading

jbaiter commented Jan 27, 2020 • edited Loading

cowtowncoder commented Jan 27, 2020

mtravnicek commented Mar 2, 2020

jbaiter commented Jan 27, 2020 •

edited

Loading

jbaiter commented Jan 27, 2020 •

edited

Loading