-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Bitnami with Strimzi Kafka #1005
Conversation
kafkaResourceRequests["memory"] = "1Gi" | ||
kafkaResourceRequests["cpu"] = "200m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdunne Is there a reason why we don't use default values for cr.Spec.KafkaMemoryRequest
etc in upstream? I was trying to use those values here but its not in the CR spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream we usually only set requests and limits if cr.Spec.EnforceWorkerResourceConstraints is set to allow running on lower resource clusters
If set, we should primarily pull the values from the CR cr.Spec.KafkaMemoryLimit, cr.Spec.KafkaMemoryRequest, cr.Spec.KafkaCpuLimit, cr.Spec.KafkaCpuRequest
otherwise have some default values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should be done inside the mutate function in case the CR values are updated
kafkaSubscriptionSpec := map[string]interface{}{ | ||
"channel": "strimzi-0.35.x", | ||
"name": "strimzi-kafka-operator", | ||
"source": "community-operators", | ||
"sourceNamespace": "openshift-marketplace", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the strimzi release to 0.35.1 to match downstream, also to prevent breaking changes that may be introduced in latest releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to come up with an upgrade plan at some point. Do you just change the channel name?
843154a
to
3a51235
Compare
f5cadb8
to
094322f
Compare
// If manual certificates are introduced after Strimzi has generated its own certificates, | ||
// then the certificates must be "renewed" to replace the old. This process is outlined here: | ||
// https://strimzi.io/docs/operators/in-development/deploying#proc-replacing-your-own-private-keys-str | ||
func renewKafkaCASecret(cr *miqv1alpha1.ManageIQ, client client.Client, scheme *runtime.Scheme) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI key change that differentiates this from downstream. Added a comment in the code as I thought it would be beneficial but I can remove if not necessary
Close/reopen to kick tests |
84432a9
to
68d0602
Compare
"strconv" | ||
) | ||
|
||
var caGen = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to store this somewhere? Every time the operator boots, we start at 0 again, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the operator is redeployed then yes, we start at 0 again which I think is fine because the certs will be cleaned up and redeployed again? Let me know if that is okay. The intent behind this global var is so that the value persists between reconciles, so if a cert needs to be renewed we can increment the previous value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the operator could be rescheduled for any reason (host going down for maintenance, resource usage, etc), so every time a new one starts, it will start at 0 and copying the certs causing kafka to be restarted could be a problem
68d0602
to
66c7d7c
Compare
Applied the following feedback:
|
331642b
to
39eeed7
Compare
Applied the following:
|
func updateKafka(cr *miqv1alpha1.ManageIQ, client client.Client, scheme *runtime.Scheme, update func(*unstructured.Unstructured) *unstructured.Unstructured) error { | ||
err := retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
kafka := miqutilsv1alpha1.FindKafka(client, scheme, cr.Namespace, cr.Spec.AppName) | ||
kafka = update(kafka) | ||
|
||
err := client.Update(context.TODO(), kafka) | ||
return err | ||
}) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduced this function since updates are being made to the Kafka CR outside of reconciliation which can cause conflicts (metadata like creationTimestamp, uid), especially when dealing with unstructured objects since they are considered "dynamic" types. The recommended solution is to simply retry until the update succeeds. In my testing if there is a conflict the update usually succeeds on the 2nd retry.
You can read more about this here:
- https://github.com/kubernetes/client-go/blob/master/examples/dynamic-create-update-delete-deployment/README.md#typed-vs-dynamic
- https://github.com/kubernetes/client-go/blob/master/examples/dynamic-create-update-delete-deployment/main.go#L117-L129
- https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
olmv1 "github.com/operator-framework/api/pkg/operators/v1" | ||
olmv1alpha1 "github.com/operator-framework/api/pkg/operators/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason olmv1
doesn't contain the kinds available in olmv1alpha1
but it provides OperatorGroup
which is not available in olmv1alpha1
, and so we need to import both packages unfortunately 🤕
39eeed7
to
0f53d3b
Compare
@miq-bot add_label quinteros/yes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of the new objects need to be backed up? If so, we should add backup labels to them
zookeeperResourceLimits["memory"] = "512Mi" | ||
zookeeperResourceLimits["cpu"] = "250m" | ||
|
||
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That linked comment was about the platforms that our operator can run on. We only build for x86_64amd64 right now. This is determining placement for kafka and zookeeper, so it should match the platforms that they are built for.
zookeeperResourceLimits["memory"] = "512Mi" | ||
zookeeperResourceLimits["cpu"] = "250m" | ||
|
||
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's the same as 2 lines up
@@ -359,5 +362,10 @@ func addInternalRootCertificate(cr *miqv1alpha1.ManageIQ, d *appsv1.Deployment, | |||
d.Spec.Template.Spec.Containers[0].Env = addOrUpdateEnvVar(d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "MEMCACHED_ENABLE_SSL", Value: "true"}) | |||
d.Spec.Template.Spec.Containers[0].Env = addOrUpdateEnvVar(d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "MEMCACHED_SSL_CA", Value: "/etc/pki/ca-trust/source/anchors/root.crt"}) | |||
} | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an else if
checking that the manageiq-cluster-ca-cert
secret exists since we have the option to deploy without the messaging service in the CRD?
c023987
to
6b94a2f
Compare
CatalogSource: "community-operators", | ||
CatalogSourceNamespace: "openshift-marketplace", | ||
Package: "strimzi-kafka-operator", | ||
Channel: "strimzi-0.35.x", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.35 doesn't support ppc64le, 0.37.0 looks like the first version that does:
...
"0.35.0-rc1-amd64",
"0.35.0-rc1-arm64",
"0.35.0-rc1-s390x",
"0.35.0-rc1",
"0.35.0-0-amd64",
"0.35.0-0-arm64",
"0.35.0-0-s390x",
"0.35.0-0",
"0.35.0-amd64",
"0.35.0-arm64",
"0.35.0-s390x",
"0.35.0",
"0.35.1-rc1-amd64",
"0.35.1-rc1-arm64",
"0.35.1-rc1-s390x",
"0.35.1-rc1",
"0.35.1-0-amd64",
"0.35.1-0-arm64",
"0.35.1-0-s390x",
"0.35.1-0",
"0.35.1-1-amd64",
"0.35.1-1-arm64",
"0.35.1-1-s390x",
"0.35.1-1",
"0.35.1-amd64",
"0.35.1-arm64",
"0.35.1-s390x",
"0.35.1",
...
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) | ||
|
||
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) | |
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"}) | |
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64", "arm64", "ppc64le", "s390x"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bdunne arm? It can't hurt to add, but we don't need it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷🏻♂️ It's a scheduling limitation. Since they build it, we may as well allow running there if the resources are available.
Checked commits nasark/manageiq-pods@80790f6~...a1fb2fb with ruby 2.7.8, rubocop 1.56.3, haml-lint 0.51.0, and yamllint **
|
Backported to
|
Replace Bitnami with Strimzi Kafka (cherry picked from commit 0ccfda1)
community-operators
OLM CatalogSourcemiq-components/kafka.go
to its own packagemiq-components/kafka/kafka.go
, this will allow the CRs to be used elsewhere (i.e. downstream)Depends on:
TODO:
ca.crt
in orchestrator/workers regardless of whetherinternal-certificate-secret
is present or not@miq-bot assign @bdunne
@miq-bot add_reviewer @Fryguy
@miq-bot add_label enhancement