Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with POST to large containers #1

Open
ThomasJejkal opened this issue Nov 19, 2020 · 0 comments
Open

Performance issues with POST to large containers #1

ThomasJejkal opened this issue Nov 19, 2020 · 0 comments

Comments

@ThomasJejkal
Copy link
Contributor

After evaluating the POST performance of the Web Annotation Protocol server the performance issues presented in the final report could be reproduced. Furthermore, the reason for the relationship between performance decrease and container size could be identified and is cause by the following Jena code:

https://github.com/apache/jena/blob/a7ba51f67e7af819178fea9a06a6dad0415877c3/jena-core/src/main/java/org/apache/jena/rdf/model/impl/ContainerImpl.java#L181

The size() method is used in org.​apache.​jena.​rdf.​model.​impl.SeqImpl to determine the current size of the container before adding a new element. Iterating through all elements will cause a steadily slowdown of POST operations as shown in the following table:

Number of Elements in a Container Time to add one new Element [ms] Time to add next 10K Elements [hh:mm:ss] (approx.)
10.000 40 00:06:40
20.000 56 00:09:20
30.000 69 00:11:30
40.000 83 00:13:50
50.000 107 00:17:50
60.000 117 00:19:30
... ... ...
140.000 253 00:39:10
... ... ...
500.000 768 (est.) 02:09:40

Of course, these values will depend on the local hardware, but one should at least estimate an increase of approx. 15 ms for posting one annotation every 10.000 elements.

Currently, there seem to be two solutions:

  1. Being aware of the described behaviour and prefer using small containers or containers of containers.
  2. Change the implementation of the Jena repository to store sequence information elsewhere, e.g. in a relational database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant