Namespace config operator is consuming too much memory #96

hanzala1234 · 2021-04-05T16:04:31Z

Whenever we start the operator, memory consumption goes upto 20GB and our api server becomes unresponsive. API server starts consuming more than 15GB and then it gets killed and the master becomes unhealthy.

we have to scale down the namespace-config operator for making api server responsive again. what could be the reason that it consumes too much memory once it starts? could there be some memory leak? Is it possible that it reconciles resources in chunks rather than reconciling all together? how can we find the root cause?

raffaelespazzoli · 2021-04-05T17:05:09Z

hello, which version are you using? we used to have this kind issue in an old version. A "controller manager" is allocated per NamespaceConfig object. Each controller manager creates a new cache and different sets of watches to the master api. I'd expect memory consumption to be proportional to the number of NamespaceConfig objects, not the number of objects created as an effect of a NamespaceConfig, so that should allow you to scale easily. Please let me know if your experience is different.

…

On Mon, Apr 5, 2021 at 12:04 PM hanzala1234 ***@***.***> wrote: Whenever we start the operator, memory consumption goes upto 20GB and our api server becomes unresponsive. API server starts consuming more than 15GB and then it gets killed and the master becomes unhealthy. [image: ns-operator] <https://user-images.githubusercontent.com/42064189/113594760-928a8100-9651-11eb-8224-f10f6b0e55eb.png> we have to scale down the namespace-config operator for making api server responsive again. what could be the reason that it consumes too much memory once it starts? could there be some memory leak? Is it possible that it reconciles resources in chunks rather than reconciling all together? how can we find the root cause? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#96>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPERXCFLUIQANVHKBNBFRLTHHNR3ANCNFSM42NBAB3A> .

-- ciao/bye Raffaele

rasheedamir · 2021-04-05T18:09:17Z

@raffaelespazzoli we have this version running currently "version":"1.0.3"

On this cluster where we are experiencing this issue we have 20 NamespaceConfig objects only.

We have been experience this issue for quite sometime now; and now it's a blocker! Last time it spiked to 20GB and then became stable at 6,5GB; but still 6,5GB is way too much for an operator

Currently it is scaled down to zero!

rasheedamir · 2021-04-05T18:23:15Z

Here is last 7 days usage

rasheedamir · 2021-04-06T15:17:20Z

@raffaelespazzoli any thoughts on how we can troubleshoot it?

raffaelespazzoli · 2021-04-06T16:00:52Z

In the past we had a memory leak, but this does not seem to be the case as the memory allocation is constant. Which types of objects are created by your namespaceconfigs? How big is your cluster and how big is the etcd database? Besides the namespace pod using a lot of memory did you see any other side effects? Looking at the api server metrics, you should be able to plot the number of watches that the namespace pods open against it. I think that would also be useful to see.

…

On Tue, Apr 6, 2021 at 11:17 AM Rasheed Amir ***@***.***> wrote: @raffaelespazzoli <https://github.com/raffaelespazzoli> any thoughts on how we can troubleshoot it? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPERXGADZBULCNVTOY7LZTTHMQZBANCNFSM42NBAB3A> .

-- ciao/bye Raffaele

hanzala1234 · 2021-04-06T17:59:04Z

Which types of objects are created by your namespaceconfigs?

We are creating mostly secrets,role and rolebindings and tekton resources (trigger template,triggerbinding, pipelines,event listener)
How big is your cluster and how big is the etcd database?

we have 16 nodes with 3 master nodes
ETCD size right now is 818Mi average.

Besides the namespace pod using a lot of memory did you see any other side
effects?

API server is crashing. once we scale up the namespace config operator, the whole cluster gets affected.

raffaelespazzoli · 2021-04-07T11:49:42Z

how many namespace config objects do you have, how many namespaces do you have?

Can you run an experiment in which you create your namespace config object one every 5 minutes and monitor how the memory increases?

hanzala1234 · 2021-04-07T11:57:57Z

we have 20 namespace config objects. we have a total of 134 namespaces. but namespace config operator only applies to 30-40 namespaces. also, in our environment, we create namespaces dynamically as well for PR testings.

raffaelespazzoli · 2021-04-07T12:06:17Z

ok, so we can predict that the cache size should be 20x(object types created*number/size of objects of that type in etcd across the 134 namespaces). I'd like to see the memory progression when you add namespaceconfigs with the experiment described above. 20 namespace configs is a high number, have you put some thoughts in perhaps collapsing some of them?

…

On Wed, Apr 7, 2021 at 7:58 AM hanzala1234 ***@***.***> wrote: we have 20 namespace config objects. we have a total of 134 namespaces. but namespace config operator only applies to 30-40 namespaces. also, in our environment, we create namespaces dynamically as well for PR testings. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPERXBYVHAFLXHXBTUHYUTTHRCFLANCNFSM42NBAB3A> .

-- ciao/bye Raffaele

rasheedamir · 2021-04-07T14:01:12Z

20 is high number :(

Why is a "controller manager" is allocated per NamespaceConfig object?

raffaelespazzoli · 2021-04-07T14:08:42Z

20 is high number :( by that I mean that I had never seen before a deployment where so many definitions were needed. And that perhaps there is a way to collapse some of them and optimize. I didn't mean to say that the operator should not support it.

Why is a "controller manager" is allocated per NamespaceConfig object? that's how the operator is designed. One can't dynamically add watchers to a running controller-manager. so each time a NamespaceConfig object is created, the needed watchers are grouped into a new controller-manager.

raffaelespazzoli · 2021-11-15T15:00:46Z

may I close this issue?

Florian-94 · 2022-02-22T09:02:11Z

Hello,
We are using the 1.2.0 version of the nsconfig operator on a 4.8.14 OCP cluster. We really appreciate it except for the RAM consumption ...
On one of our Openshift BUILD cluster, we have 125 namespaceconfigs objects. One for each namespace (and its rolebindings, netpol, resourcequotas, limitrange ... associated). And we plan to host new clients (ie namespaces) soon.
The limit for the nsconfig operator pod is 7Gb of RAM and it's not enough because the container restart every 15 minutes. We are going to set the limit to 10Gb of RAM which is huge and increase risk on scheduling this pod on ours workers.
Is there a way that you change the behaviour of the operator to limit this need of RAM ?
Thank you,

Florian

P.S : On another cluster, we have 35 namespaceconfigs objects for a RAM utilization stable with 1,15 Gb. It seems RAM consumption is not linear with number of namespaceconfigs objects.

raffaelespazzoli · 2022-02-22T13:31:40Z

I recommend upgrading but that will probably not solve your problem. @Florian-94
There is definitely a correlation between the number of namespace config and type of object being configured and the memory sued by this operator. This cannot be eliminated.
Having one NamespaceConfig object per namespace is technically possible, but it's not what was intended for this operator.
Can you share your use case? Maybe a couple of namespace config for different namespaces? I wonder if we can use the operator in a way that is more in line with what was intended.

Florian-94 · 2022-02-22T17:17:07Z

We have a web access portal where our customers can choose all specific parameters for limitrange and resourcequotas (the portal manages a process validation before creating namespaceconfigs CR on openshift cluster).
May be we could use the "tee-shirt size" system offers by nsconfig operator for this usage.
We also apply 2 network policies (in nsconfig CR) to be sure users can't modify / delete it (they are the same for all ns)

But, on this portal, our customers also manage users which will have an access on the namespace (kind: group in the namespaceconfigs objects with specifics userIDs in user field). So I can't see how to use a shared namespaceconfig template for this usage.
May be the nsconfig operator was not the right choice for our needs. May be we should just apply k8s objects and prevent namespace admin users to edit them (with gatekeeper for example). We didn't see the ram problem caused by too many namespaceconfig ressources when we did this choice.
Thanks for your help.

raffaelespazzoli · 2022-02-22T20:04:28Z

Both use cases should be addressable with a single namespace config. For the quotas create an annotation on the namespace with the needed values and then use the templating capability to apply them in a given namespace. Not sure about the network policy, I'd have to see it.

…

On Tue, Feb 22, 2022, 12:17 PM Florian-94 ***@***.***> wrote: We have a web access portal where our customers can choose all specific parameters for limitrange and resourcequotas (the portal manages a process validation before creating namespaceconfigs CR on openshift cluster). May be we could use the "tee-shirt size" system offers by nsconfig operator for this usage. We also apply 2 network policies (in nsconfig CR) to be sure users can't modify / delete it (they are the same for all ns) But, on this portal, our customers also manage users which will have an access on the namespace (kind: group in the namespaceconfigs objects with specifics userIDs in user field). So I can't see how to use a shared namespaceconfig template for this usage. May be the nsconfig operator was not the right choice for our needs. May be we should just apply k8s objects and prevent namespace admin users to edit them (with gatekeeper for example). We didn't see the ram problem caused by too many namespaceconfig ressources when we did this choice. Thanks for your help. — Reply to this email directly, view it on GitHub <#96 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPERXHV73DHYJGDJR6MLCLU4PAKDANCNFSM42NBAB3A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.*** com>

raffaelespazzoli · 2024-01-08T12:07:35Z

May I close this issue?

Florian-94 · 2024-01-08T12:17:32Z

Yes, you can close the issue for me. Thank you. We are not using namespace-configuration-operator anymore. Maybe one day if we decide to use size templates to manage quotas/limitrange for our projects.

cuttingedge1109 mentioned this issue Apr 12, 2021

Remove watching for existing underline resources at init stage #97

Open

raffaelespazzoli closed this as completed Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Namespace config operator is consuming too much memory #96

Namespace config operator is consuming too much memory #96

hanzala1234 commented Apr 5, 2021 •

edited

Loading

raffaelespazzoli commented Apr 5, 2021 via email

rasheedamir commented Apr 5, 2021 •

edited

Loading

rasheedamir commented Apr 5, 2021 •

edited

Loading

rasheedamir commented Apr 6, 2021

raffaelespazzoli commented Apr 6, 2021 via email

hanzala1234 commented Apr 6, 2021

raffaelespazzoli commented Apr 7, 2021

hanzala1234 commented Apr 7, 2021

raffaelespazzoli commented Apr 7, 2021 via email

rasheedamir commented Apr 7, 2021

raffaelespazzoli commented Apr 7, 2021

raffaelespazzoli commented Nov 15, 2021

Florian-94 commented Feb 22, 2022

raffaelespazzoli commented Feb 22, 2022

Florian-94 commented Feb 22, 2022

raffaelespazzoli commented Feb 22, 2022 via email

raffaelespazzoli commented Jan 8, 2024

Florian-94 commented Jan 8, 2024

Namespace config operator is consuming too much memory #96

Namespace config operator is consuming too much memory #96

Comments

hanzala1234 commented Apr 5, 2021 • edited Loading

raffaelespazzoli commented Apr 5, 2021 via email

rasheedamir commented Apr 5, 2021 • edited Loading

rasheedamir commented Apr 5, 2021 • edited Loading

rasheedamir commented Apr 6, 2021

raffaelespazzoli commented Apr 6, 2021 via email

hanzala1234 commented Apr 6, 2021

raffaelespazzoli commented Apr 7, 2021

hanzala1234 commented Apr 7, 2021

raffaelespazzoli commented Apr 7, 2021 via email

rasheedamir commented Apr 7, 2021

raffaelespazzoli commented Apr 7, 2021

raffaelespazzoli commented Nov 15, 2021

Florian-94 commented Feb 22, 2022

raffaelespazzoli commented Feb 22, 2022

Florian-94 commented Feb 22, 2022

raffaelespazzoli commented Feb 22, 2022 via email

raffaelespazzoli commented Jan 8, 2024

Florian-94 commented Jan 8, 2024

hanzala1234 commented Apr 5, 2021 •

edited

Loading

rasheedamir commented Apr 5, 2021 •

edited

Loading

rasheedamir commented Apr 5, 2021 •

edited

Loading