Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hog class ResourceSerializer needs refactoring #62

Closed
mtravnicek opened this issue Dec 24, 2019 · 6 comments
Closed

Performance hog class ResourceSerializer needs refactoring #62

mtravnicek opened this issue Dec 24, 2019 · 6 comments

Comments

@mtravnicek
Copy link

When producing moderately small IIIF manfests (monograph with about 35 pages/images) using REST controller in Spring, added latency about 15ms can be noticed in process of serializing using Jackson. At first I thought that this was Jackson being slow... but slowdown can be traced down to custom serializer ResourceSerializer namely line 110 where replacing gen.writeObject with gen.writeString fixes most of the latency. This can be blatantly seen in Chrome response view as blue tail of time to download content.


Rest of processing in this class namely ModelUtilities.getCompleteness and following decoding/if statements also should be rethinked. Because you are calling this code (reflection!) on every processed element in JSON serialization again and again - it is not cached anywhere. What is the purpose of this code? I reckon that you try to determine if elements should be unwrapped to list or not, but for example "on" #25 , do you ever want it written out in exploded form? Does it have to be defined as (anonymous?) resource in contrast to context represented as simple string (no resource) #36 ? So the whole model approach should be reanalyzed and these flags to unwrap element or not - set it in form or some static flags in Resource instance itself, do not reverse engineer it at runtime!
Perhaps even authors of Jackson can look into/share some opinion on this: @cowtowncoder

@cowtowncoder
Copy link

From quick look, wrt suggested change: absolutely yes one should just do:

gen.writeString(value.getIdentifier().toString());

instead of using writeObject() -- writeObject() will incur higher overhead as it will call ObjectMapper, locate serializer for String, check some of the settings but essentially in the end just calling JsonGenerator.writeString() ("Object" in writeObject() simply means that value can be arbitrary POJO that Jackson supports).
The only (?) possible but unlikely concern is if there is some use of filtering capabilities of ObjectMapper, overrides, or such, they would be by-passed. That seems unlikely given value is String.

While this makes sense to do and should be easy improvement, I would not have assumed it to have huge performance implications... always interesting to see where changes matter, where not. :)

I can't comment much about surrounding design as I do not really know the implementation.

@mtravnicek
Copy link
Author

@jbaiter @bitzl @morpheus-87 @datazuul
Hello can you please submit fix for this, atleast that one line replacement of gen.writeObject with gen.writeString

@jbaiter
Copy link
Member

jbaiter commented Jan 27, 2020

Thanks for the excellent in-depth analysis @mtravnicek and @cowtowncoder!
I'll look into it, expect a new release in the next few days.

What is the purpose of this code? I reckon that you try to determine if elements should be unwrapped to list or not, but for example "on" #25 , do you ever want it written out in exploded form?

Yes, that's exactly the purpose. The exploded form is e.g. used in collections where you want the type, the identifier and the labels of a given resource.

Does it have to be defined as (anonymous?) resource in contrast to context represented as simple string (no resource) #36 ? So the whole model approach should be reanalyzed and these flags to unwrap element or not - set it in form or some static flags in Resource instance itself, do not reverse engineer it at runtime!

I agree that the code is an unholy mess and should be replaced with something better, but I'm not exactly sure how to go about it. The IIIF spec has a really large surface area, and for determining whether to unwrap a resource or not, we have to know if any of the allowed fields is set to a non-null or non-empty value. This is information that is only known at runtime, so this this means either:

  • Add a getCompleteness method to every type that checks those fields during runtime (advantage: no more reflection), but this is a lot of boilerplate code.
  • Manage a isComplex flag on all resources that gets updated with every setter (possible advantage: less runtime cost), also adds a ton of boilerplate code

I'll be doing some measurements on how much of this reflection-stuff is hurting performance and decide based on that whether to find a better way.

@jbaiter
Copy link
Member

jbaiter commented Jan 27, 2020

So I did some JMH benchmarking with a simple test case that creates a manifest with 128 canvases (and should hit the line 128 times), here are the results:

With .writeString():

Benchmark                              Mode  Cnt    Score   Error  Units
ManifestBench.testManifestGeneration  thrpt   25  53.258 ± 2.337  ops/s

With .writeObject():

Benchmark                              Mode  Cnt    Score    Error  Units
ManifestBench.testManifestGeneration  thrpt   25  51.479 ± 2.416  ops/s

There is no difference outside of the statistical margin with my benchmark :-/ Can you maybe provide one of your pathological samples so I can do more profiling?

As for the reflection stuff, that is a pretty huge performance hog indeed! I'll try to refactor it with one of the above solutions (or another one, if you have a good idea!).


edit: So I experimented a bit with both solutions, and neither of them is really satisfying as they both result in TONS of boilerplate code. I think the whole API is up for a more large-scale refactoring. Ideally we'd have something like a ResourceReference type that only has an id and optionally a label that you could pass in for any of the more specific types like Canvas, Annotation, etc. I think I'll work on that once I get around to implementing v3 of the IIIF APIs in the library (wich will probably be in Q2 or Q3 of 2020).

@cowtowncoder
Copy link

Ok. I am not entirely surprised by change not resulting in statistically meaningful improvement -- I know writeObject() does have additional overhead, but depending on relative sizes of content and so on it did not seem like a red flag. Profilers are known to sometimes point to red herrings, so measurements on real (or reasonable approximations) data are useful and necessary.

@mtravnicek
Copy link
Author

Sorry for later response.... We are using your codebase (thanks) in this digital library app for generating iiif manifests: https://github.com/moravianlibrary/krameriusiiif
I have to admit that recorded slowdowns in milliseconds happen at my fairly slow notebook which has like i5-6200u cpu, so on decently fast desktop it may be on margin of error...
I have no benchmarks results only screenshots - i was running this spring boot app and Chrome with devtools in latency breakdown, then unusual blue "tail" representing Content Download latency in some ms was seen, so i was trying out everyting including Jackson Afterburner etc.. until i managed to pinpoint culprit.

(Here is some page explaining this tool)
https://blog.logrocket.com/visualizing-backend-performance-in-the-chrome-devtools-bb6fd232540/

When i tried to optimize related serializer class, then Content Download value went back to almost 0 which is normal on localhost and overall latency decreased, which was confirmed by in app measuring. So please replace this oneliner of gen.writeObject to gen.writeString it wont hurt your code (getIdenifier.toString()) and may help me :) and maybe others....
I presume that the latency is shifting into Content Download because the server response is already halfway prepared, but it stumbles on while converting/serializing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants