Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespace config operator is consuming too much memory #96

Closed
hanzala1234 opened this issue Apr 5, 2021 · 18 comments · May be fixed by #97
Closed

Namespace config operator is consuming too much memory #96

hanzala1234 opened this issue Apr 5, 2021 · 18 comments · May be fixed by #97

Comments

@hanzala1234
Copy link
Contributor

hanzala1234 commented Apr 5, 2021

Whenever we start the operator, memory consumption goes upto 20GB and our api server becomes unresponsive. API server starts consuming more than 15GB and then it gets killed and the master becomes unhealthy.

we have to scale down the namespace-config operator for making api server responsive again. what could be the reason that it consumes too much memory once it starts? could there be some memory leak? Is it possible that it reconciles resources in chunks rather than reconciling all together? how can we find the root cause?
ns-operator-cropped

@raffaelespazzoli
Copy link
Collaborator

raffaelespazzoli commented Apr 5, 2021 via email

@rasheedamir
Copy link

rasheedamir commented Apr 5, 2021

@raffaelespazzoli we have this version running currently "version":"1.0.3"

On this cluster where we are experiencing this issue we have 20 NamespaceConfig objects only.

We have been experience this issue for quite sometime now; and now it's a blocker! Last time it spiked to 20GB and then became stable at 6,5GB; but still 6,5GB is way too much for an operator

Currently it is scaled down to zero!

@rasheedamir
Copy link

rasheedamir commented Apr 5, 2021

Here is last 7 days usage

screencapture-grafana-openshift-monitoring-apps-devtest-41b996e9-kubeapp-cloud-d-a164a7f0339f99e89cea5cb47e9be617-kubernetes-compute-resources-workload-2021-04-05-20_22_13

@rasheedamir
Copy link

@raffaelespazzoli any thoughts on how we can troubleshoot it?

@raffaelespazzoli
Copy link
Collaborator

raffaelespazzoli commented Apr 6, 2021 via email

@hanzala1234
Copy link
Contributor Author

Which types of objects are created by your namespaceconfigs?

We are creating mostly secrets,role and rolebindings and tekton resources (trigger template,triggerbinding, pipelines,event listener)
How big is your cluster and how big is the etcd database?

we have 16 nodes with 3 master nodes
ETCD size right now is 818Mi average.

Besides the namespace pod using a lot of memory did you see any other side
effects?

API server is crashing. once we scale up the namespace config operator, the whole cluster gets affected.

@raffaelespazzoli
Copy link
Collaborator

how many namespace config objects do you have, how many namespaces do you have?

Can you run an experiment in which you create your namespace config object one every 5 minutes and monitor how the memory increases?

@hanzala1234
Copy link
Contributor Author

we have 20 namespace config objects. we have a total of 134 namespaces. but namespace config operator only applies to 30-40 namespaces. also, in our environment, we create namespaces dynamically as well for PR testings.

@raffaelespazzoli
Copy link
Collaborator

raffaelespazzoli commented Apr 7, 2021 via email

@rasheedamir
Copy link

20 is high number :(

Why is a "controller manager" is allocated per NamespaceConfig object?

@raffaelespazzoli
Copy link
Collaborator

20 is high number :( by that I mean that I had never seen before a deployment where so many definitions were needed. And that perhaps there is a way to collapse some of them and optimize. I didn't mean to say that the operator should not support it.

Why is a "controller manager" is allocated per NamespaceConfig object? that's how the operator is designed. One can't dynamically add watchers to a running controller-manager. so each time a NamespaceConfig object is created, the needed watchers are grouped into a new controller-manager.

@raffaelespazzoli
Copy link
Collaborator

may I close this issue?

@Florian-94
Copy link

Hello,
We are using the 1.2.0 version of the nsconfig operator on a 4.8.14 OCP cluster. We really appreciate it except for the RAM consumption ...
On one of our Openshift BUILD cluster, we have 125 namespaceconfigs objects. One for each namespace (and its rolebindings, netpol, resourcequotas, limitrange ... associated). And we plan to host new clients (ie namespaces) soon.
The limit for the nsconfig operator pod is 7Gb of RAM and it's not enough because the container restart every 15 minutes. We are going to set the limit to 10Gb of RAM which is huge and increase risk on scheduling this pod on ours workers.
Is there a way that you change the behaviour of the operator to limit this need of RAM ?
Thank you,

Florian

P.S : On another cluster, we have 35 namespaceconfigs objects for a RAM utilization stable with 1,15 Gb. It seems RAM consumption is not linear with number of namespaceconfigs objects.

@raffaelespazzoli
Copy link
Collaborator

I recommend upgrading but that will probably not solve your problem. @Florian-94
There is definitely a correlation between the number of namespace config and type of object being configured and the memory sued by this operator. This cannot be eliminated.
Having one NamespaceConfig object per namespace is technically possible, but it's not what was intended for this operator.
Can you share your use case? Maybe a couple of namespace config for different namespaces? I wonder if we can use the operator in a way that is more in line with what was intended.

@Florian-94
Copy link

We have a web access portal where our customers can choose all specific parameters for limitrange and resourcequotas (the portal manages a process validation before creating namespaceconfigs CR on openshift cluster).
May be we could use the "tee-shirt size" system offers by nsconfig operator for this usage.
We also apply 2 network policies (in nsconfig CR) to be sure users can't modify / delete it (they are the same for all ns)

But, on this portal, our customers also manage users which will have an access on the namespace (kind: group in the namespaceconfigs objects with specifics userIDs in user field). So I can't see how to use a shared namespaceconfig template for this usage.
May be the nsconfig operator was not the right choice for our needs. May be we should just apply k8s objects and prevent namespace admin users to edit them (with gatekeeper for example). We didn't see the ram problem caused by too many namespaceconfig ressources when we did this choice.
Thanks for your help.

@raffaelespazzoli
Copy link
Collaborator

raffaelespazzoli commented Feb 22, 2022 via email

@raffaelespazzoli
Copy link
Collaborator

May I close this issue?

@Florian-94
Copy link

Yes, you can close the issue for me. Thank you. We are not using namespace-configuration-operator anymore. Maybe one day if we decide to use size templates to manage quotas/limitrange for our projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants