-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libzim cache size should be programmatically mutable #311
Comments
I don't think that making this enhancement in a standalone manner makes sense (then it may be either overdesigned or implemented the wrong way). A better approach is to do it in the context of an optimization that will immediately utilize the new feature. |
@veloman-yunkan Actually this is needed now for zimcheck and might I believe be the next ticket to implement as there is quite a consensus about it. |
Context:Currently libzim has 2 caches that are heavly utilized by several libzim API calls:
The number of max elements for each cache is fixed and is taken from 2 places:
Problem: This approach is not very smart since:
Solution: The proposed solutions allow the application to adapt the size of the caches considering the necessity of the specific zimfile being used. Proposed solution 1: Give control to the application using libzim. Have in the libzim API some way to change the cache values. (this is already exposed by Proposed solution 2: Give control to libzim. Libzim can have some inteligence and apply an algorithm to setup better appropriate values. Example: The dirent cache max elements can be 20-50% of the total nr of dirents. We can also implement both solutions and give the option to the user to choose. In the following priority order:
Use case: zimcheck heavly use |
We have to be sure that the smart solution is smart enough to not take all the memory when opening big zim file on small devices (android, ios, embedded). In all case, the user should have the end word here. Even if they have a lot of memory, they may want to limit the cache to avoid for any reason (reserve memory for other process...) |
Yes we should care about the memory, but considering all the memory used by libzim, these caches doesn't seem to be to occupy a lot. (its basically a <primitive type, pointer type> pair for each entry. With the priority order that I specified the end user has the end word. |
To me this is important that:
So just make the two cache properly mutable for software using libzim. @veloman-yunkan Does that sounds good to you? |
@kelson42 From the implementation perspective it is trivial. However I am not sure that sticking with the Worse is better software development philosophy is optimal in this case. Even if we close this ticket in such a non-sophisticated manner, we should come up with a solution that is more client-friendly. |
@veloman-yunkan If you believe doing what I propose generates a new problem, please express it because this is not clear to me unfortunately. If you believe part of the problem is not fixed and it should, please open a new ticket. |
@kelson42 Your proposal doesn't generate a new problem per se. It just shifts the burden of struggling with a known problem to the clients. They either have to solve that problem (determining the optimal cache size) on their own, or expose that same API, enabling the end user to control that setting. The latter may be perceived either good or bad (depending on the user). On the other hand, if applications invent their own strategies for setting the cache size, those may become obsolete by later optimizations in libzim and turn into wasted effort. If we are aware of any good cache management strategies that may be common to multiple applications we better build those into libzim and update them as libzim evolves. |
@veloman-yunkan Thank you for clarifying your concern for me. What you have written makes sense to me but these questions are really not easy to answer now and I'm not sure about the importance of the problem... maybe the problems will stay theoritical. Who knows, at least I really don't know. What I know:
For all these reasons, I prefer that we don't focus on this now. I can only reiterate my invitation to open an other ticket if you believe this worth a tracking. |
Reiterating my earlier comment:
If there is a ticket for performance improvement in zimcheck/zimdump, may I be assigned to it? |
@veloman-yunkan As long you agree with @MiguelRocha about who does what and that this is clear in the tickets, this is good to me. |
I second @veloman-yunkan here. We don't know what we need. It is useless to add a method if it will fix anything at all. As @MiguelRocha points it in openzim/zim-tools#134, the dirent cache take a lot of times. But changing the cache size (increase or decrease it?) may not help. Maybe the solution is to fix the cache system itself (#385) And I think it make less sens to configure a cache system by the number of item than by the total memory used. It all software using caches (and providing configuration), I always seen a configuration option about the allowed memory to the cache. Never about the number of entries. And it make sens, I know how to limit the cache usage depending of my configuration, I have no idea of how many items cache. |
@veloman-yunkan What approach do you want to follow for this ticket? Do you want to finish openzim/zim-tools#69 (introducing multithreading) and see if this is readlly needed? |
It seems this is not needed, so will close this ticket. |
Currently this depends from an ENV varaible. I propose to remove this and in place create a method which can be called whenever we want to be able to to increase/decrease it size.
This idea comes after discussing #303
The text was updated successfully, but these errors were encountered: