-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace config operator is consuming too much memory #96
Comments
hello,
which version are you using?
we used to have this kind issue in an old version.
A "controller manager" is allocated per NamespaceConfig object. Each
controller manager creates a new cache and different sets of watches to the
master api.
I'd expect memory consumption to be proportional to the number of
NamespaceConfig objects, not the number of objects created as an effect of
a NamespaceConfig, so that should allow you to scale easily.
Please let me know if your experience is different.
…On Mon, Apr 5, 2021 at 12:04 PM hanzala1234 ***@***.***> wrote:
Whenever we start the operator, memory consumption goes upto 20GB and our
api server becomes unresponsive. API server starts consuming more than 15GB
and then it gets killed and the master becomes unhealthy.
[image: ns-operator]
<https://user-images.githubusercontent.com/42064189/113594760-928a8100-9651-11eb-8224-f10f6b0e55eb.png>
we have to scale down the namespace-config operator for making api server
responsive again. what could be the reason that it consumes too much memory
once it starts? could there be some memory leak? Is it possible that it
reconciles resources in chunks rather than reconciling all together? how
can we find the root cause?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#96>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPERXCFLUIQANVHKBNBFRLTHHNR3ANCNFSM42NBAB3A>
.
--
ciao/bye
Raffaele
|
@raffaelespazzoli we have this version running currently On this cluster where we are experiencing this issue we have 20 NamespaceConfig objects only. We have been experience this issue for quite sometime now; and now it's a blocker! Last time it spiked to 20GB and then became stable at 6,5GB; but still 6,5GB is way too much for an operator Currently it is scaled down to zero! |
@raffaelespazzoli any thoughts on how we can troubleshoot it? |
In the past we had a memory leak, but this does not seem to be the case as
the memory allocation is constant.
Which types of objects are created by your namespaceconfigs?
How big is your cluster and how big is the etcd database?
Besides the namespace pod using a lot of memory did you see any other side
effects?
Looking at the api server metrics, you should be able to plot the number of
watches that the namespace pods open against it. I think that would also be
useful to see.
…On Tue, Apr 6, 2021 at 11:17 AM Rasheed Amir ***@***.***> wrote:
@raffaelespazzoli <https://github.com/raffaelespazzoli> any thoughts on
how we can troubleshoot it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPERXGADZBULCNVTOY7LZTTHMQZBANCNFSM42NBAB3A>
.
--
ciao/bye
Raffaele
|
Which types of objects are created by your namespaceconfigs? We are creating mostly secrets,role and rolebindings and tekton resources (trigger template,triggerbinding, pipelines,event listener) we have 16 nodes with 3 master nodes Besides the namespace pod using a lot of memory did you see any other side API server is crashing. once we scale up the namespace config operator, the whole cluster gets affected. |
how many namespace config objects do you have, how many namespaces do you have? Can you run an experiment in which you create your namespace config object one every 5 minutes and monitor how the memory increases? |
we have 20 namespace config objects. we have a total of 134 namespaces. but namespace config operator only applies to 30-40 namespaces. also, in our environment, we create namespaces dynamically as well for PR testings. |
ok, so we can predict that the cache size should be 20x(object types
created*number/size of objects of that type in etcd across the 134
namespaces). I'd like to see the memory progression when you add
namespaceconfigs with the experiment described above.
20 namespace configs is a high number, have you put some thoughts in
perhaps collapsing some of them?
…On Wed, Apr 7, 2021 at 7:58 AM hanzala1234 ***@***.***> wrote:
we have 20 namespace config objects. we have a total of 134 namespaces.
but namespace config operator only applies to 30-40 namespaces. also, in
our environment, we create namespaces dynamically as well for PR testings.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPERXBYVHAFLXHXBTUHYUTTHRCFLANCNFSM42NBAB3A>
.
--
ciao/bye
Raffaele
|
20 is high number :( Why is a "controller manager" is allocated per NamespaceConfig object? |
|
may I close this issue? |
Hello, Florian P.S : On another cluster, we have 35 namespaceconfigs objects for a RAM utilization stable with 1,15 Gb. It seems RAM consumption is not linear with number of namespaceconfigs objects. |
I recommend upgrading but that will probably not solve your problem. @Florian-94 |
We have a web access portal where our customers can choose all specific parameters for limitrange and resourcequotas (the portal manages a process validation before creating namespaceconfigs CR on openshift cluster). But, on this portal, our customers also manage users which will have an access on the namespace (kind: group in the namespaceconfigs objects with specifics userIDs in user field). So I can't see how to use a shared namespaceconfig template for this usage. |
Both use cases should be addressable with a single namespace config. For
the quotas create an annotation on the namespace with the needed values and
then use the templating capability to apply them in a given namespace. Not
sure about the network policy, I'd have to see it.
…On Tue, Feb 22, 2022, 12:17 PM Florian-94 ***@***.***> wrote:
We have a web access portal where our customers can choose all specific
parameters for limitrange and resourcequotas (the portal manages a process
validation before creating namespaceconfigs CR on openshift cluster).
May be we could use the "tee-shirt size" system offers by nsconfig
operator for this usage.
We also apply 2 network policies (in nsconfig CR) to be sure users can't
modify / delete it (they are the same for all ns)
But, on this portal, our customers also manage users which will have an
access on the namespace (kind: group in the namespaceconfigs objects with
specifics userIDs in user field). So I can't see how to use a shared
namespaceconfig template for this usage.
May be the nsconfig operator was not the right choice for our needs. May
be we should just apply k8s objects and prevent namespace admin users to
edit them (with gatekeeper for example). We didn't see the ram problem
caused by too many namespaceconfig ressources when we did this choice.
Thanks for your help.
—
Reply to this email directly, view it on GitHub
<#96 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPERXHV73DHYJGDJR6MLCLU4PAKDANCNFSM42NBAB3A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
May I close this issue? |
Yes, you can close the issue for me. Thank you. We are not using namespace-configuration-operator anymore. Maybe one day if we decide to use size templates to manage quotas/limitrange for our projects. |
Whenever we start the operator, memory consumption goes upto 20GB and our api server becomes unresponsive. API server starts consuming more than 15GB and then it gets killed and the master becomes unhealthy.
we have to scale down the namespace-config operator for making api server responsive again. what could be the reason that it consumes too much memory once it starts? could there be some memory leak? Is it possible that it reconciles resources in chunks rather than reconciling all together? how can we find the root cause?
The text was updated successfully, but these errors were encountered: