Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Bitnami with Strimzi Kafka #1005

Merged
merged 2 commits into from
Jan 4, 2024
Merged

Conversation

nasark
Copy link
Member

@nasark nasark commented Oct 23, 2023

  • install Strimzi by creating a subscription from the community-operators OLM CatalogSource
  • move miq-components/kafka.go to its own package miq-components/kafka/kafka.go, this will allow the CRs to be used elsewhere (i.e. downstream)

Depends on:

TODO:

  • Since Strimzi requires SSL, need to mount the ca.crt in orchestrator/workers regardless of whether internal-certificate-secret is present or not
  • set node affinity in Kafka CR
  • drop bitnami references in CR (marked as deprecated in CR)
  • topic management to be handled by orchestrator? (not needed for now)

@miq-bot assign @bdunne
@miq-bot add_reviewer @Fryguy
@miq-bot add_label enhancement

Comment on lines 196 to 309
kafkaResourceRequests["memory"] = "1Gi"
kafkaResourceRequests["cpu"] = "200m"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdunne Is there a reason why we don't use default values for cr.Spec.KafkaMemoryRequest etc in upstream? I was trying to use those values here but its not in the CR spec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upstream we usually only set requests and limits if cr.Spec.EnforceWorkerResourceConstraints is set to allow running on lower resource clusters

If set, we should primarily pull the values from the CR cr.Spec.KafkaMemoryLimit, cr.Spec.KafkaMemoryRequest, cr.Spec.KafkaCpuLimit, cr.Spec.KafkaCpuRequest otherwise have some default values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should be done inside the mutate function in case the CR values are updated

Comment on lines 333 to 505
kafkaSubscriptionSpec := map[string]interface{}{
"channel": "strimzi-0.35.x",
"name": "strimzi-kafka-operator",
"source": "community-operators",
"sourceNamespace": "openshift-marketplace",
}
Copy link
Member Author

@nasark nasark Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the strimzi release to 0.35.1 to match downstream, also to prevent breaking changes that may be introduced in latest releases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to come up with an upgrade plan at some point. Do you just change the channel name?

@nasark nasark force-pushed the strimzi_kafka branch 2 times, most recently from 843154a to 3a51235 Compare October 23, 2023 21:34
@nasark nasark mentioned this pull request Oct 24, 2023
@nasark nasark force-pushed the strimzi_kafka branch 4 times, most recently from f5cadb8 to 094322f Compare November 2, 2023 16:25
Comment on lines +73 to +114
// If manual certificates are introduced after Strimzi has generated its own certificates,
// then the certificates must be "renewed" to replace the old. This process is outlined here:
// https://strimzi.io/docs/operators/in-development/deploying#proc-replacing-your-own-private-keys-str
func renewKafkaCASecret(cr *miqv1alpha1.ManageIQ, client client.Client, scheme *runtime.Scheme) error {
Copy link
Member Author

@nasark nasark Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI key change that differentiates this from downstream. Added a comment in the code as I thought it would be beneficial but I can remove if not necessary

Also see https://github.com/ManageIQ/manageiq-pods/pull/1005/files#diff-64db691596174379fdf471b9867e0698ac2c8a8fb0948b10766bf4e75688b054R312-R320

@nasark nasark changed the title [WIP] Replace Bitnami with Strimzi Kafka Replace Bitnami with Strimzi Kafka Nov 6, 2023
@miq-bot miq-bot removed the wip label Nov 6, 2023
@nasark
Copy link
Member Author

nasark commented Nov 16, 2023

@bdunne @Fryguy Please review

@bdunne bdunne closed this Nov 28, 2023
@bdunne bdunne reopened this Nov 28, 2023
@bdunne
Copy link
Member

bdunne commented Nov 28, 2023

Close/reopen to kick tests

@nasark nasark force-pushed the strimzi_kafka branch 2 times, most recently from 84432a9 to 68d0602 Compare November 28, 2023 14:39
"strconv"
)

var caGen = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store this somewhere? Every time the operator boots, we start at 0 again, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the operator is redeployed then yes, we start at 0 again which I think is fine because the certs will be cleaned up and redeployed again? Let me know if that is okay. The intent behind this global var is so that the value persists between reconciles, so if a cert needs to be renewed we can increment the previous value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the operator could be rescheduled for any reason (host going down for maintenance, resource usage, etc), so every time a new one starts, it will start at 0 and copying the certs causing kafka to be restarted could be a problem

@nasark
Copy link
Member Author

nasark commented Nov 30, 2023

Applied the following feedback:

  • Limit/requests with cr.Spec.EnforceWorkerResourceConstraints, move to mutate function
  • Use addLabel function for labels (dependent PR AddLabels util function #1020)
  • Productize names
  • Use subscription api instead of unstructured objects

@nasark nasark force-pushed the strimzi_kafka branch 2 times, most recently from 331642b to 39eeed7 Compare December 6, 2023 16:49
@nasark
Copy link
Member Author

nasark commented Dec 6, 2023

Applied the following:

  • reworked renewKafkaCASecret() and relevant code to use caGen stored in last-known-revision annotation
    • also a bug fix for updating unstructured kafka objects (explained below)
  • Check if openshift is being used by looking for openshift-marketplace namespace and relevant catalog source
  • Create operatorgroup in order to allow creation of strimzi subscription from catalogsource

Comment on lines +96 to +109
func updateKafka(cr *miqv1alpha1.ManageIQ, client client.Client, scheme *runtime.Scheme, update func(*unstructured.Unstructured) *unstructured.Unstructured) error {
err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
kafka := miqutilsv1alpha1.FindKafka(client, scheme, cr.Namespace, cr.Spec.AppName)
kafka = update(kafka)

err := client.Update(context.TODO(), kafka)
return err
})
if err != nil {
return err
}

return nil
}
Copy link
Member Author

@nasark nasark Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced this function since updates are being made to the Kafka CR outside of reconciliation which can cause conflicts (metadata like creationTimestamp, uid), especially when dealing with unstructured objects since they are considered "dynamic" types. The recommended solution is to simply retry until the update succeeds. In my testing if there is a conflict the update usually succeeds on the 2nd retry.

You can read more about this here:

Comment on lines +9 to +10
olmv1 "github.com/operator-framework/api/pkg/operators/v1"
olmv1alpha1 "github.com/operator-framework/api/pkg/operators/v1alpha1"
Copy link
Member Author

@nasark nasark Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason olmv1 doesn't contain the kinds available in olmv1alpha1 but it provides OperatorGroup which is not available in olmv1alpha1, and so we need to import both packages unfortunately 🤕

@nasark nasark closed this Dec 19, 2023
@nasark nasark reopened this Dec 19, 2023
@nasark
Copy link
Member Author

nasark commented Dec 19, 2023

@miq-bot add_label quinteros/yes?

Copy link
Member

@bdunne bdunne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the new objects need to be backed up? If so, we should add backup labels to them

zookeeperResourceLimits["memory"] = "512Mi"
zookeeperResourceLimits["cpu"] = "250m"

kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})
Copy link
Member

@bdunne bdunne Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That linked comment was about the platforms that our operator can run on. We only build for x86_64amd64 right now. This is determining placement for kafka and zookeeper, so it should match the platforms that they are built for.

zookeeperResourceLimits["memory"] = "512Mi"
zookeeperResourceLimits["cpu"] = "250m"

kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's the same as 2 lines up

@@ -359,5 +362,10 @@ func addInternalRootCertificate(cr *miqv1alpha1.ManageIQ, d *appsv1.Deployment,
d.Spec.Template.Spec.Containers[0].Env = addOrUpdateEnvVar(d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "MEMCACHED_ENABLE_SSL", Value: "true"})
d.Spec.Template.Spec.Containers[0].Env = addOrUpdateEnvVar(d.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{Name: "MEMCACHED_SSL_CA", Value: "/etc/pki/ca-trust/source/anchors/root.crt"})
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an else if checking that the manageiq-cluster-ca-cert secret exists since we have the option to deploy without the messaging service in the CRD?

manageiq-operator/go.mod Outdated Show resolved Hide resolved
@nasark nasark force-pushed the strimzi_kafka branch 2 times, most recently from c023987 to 6b94a2f Compare January 4, 2024 16:44
CatalogSource: "community-operators",
CatalogSourceNamespace: "openshift-marketplace",
Package: "strimzi-kafka-operator",
Channel: "strimzi-0.35.x",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.35 doesn't support ppc64le, 0.37.0 looks like the first version that does:

...
        "0.35.0-rc1-amd64",
        "0.35.0-rc1-arm64",
        "0.35.0-rc1-s390x",
        "0.35.0-rc1",
        "0.35.0-0-amd64",
        "0.35.0-0-arm64",
        "0.35.0-0-s390x",
        "0.35.0-0",
        "0.35.0-amd64",
        "0.35.0-arm64",
        "0.35.0-s390x",
        "0.35.0",
        "0.35.1-rc1-amd64",
        "0.35.1-rc1-arm64",
        "0.35.1-rc1-s390x",
        "0.35.1-rc1",
        "0.35.1-0-amd64",
        "0.35.1-0-arm64",
        "0.35.1-0-s390x",
        "0.35.1-0",
        "0.35.1-1-amd64",
        "0.35.1-1-arm64",
        "0.35.1-1-s390x",
        "0.35.1-1",
        "0.35.1-amd64",
        "0.35.1-arm64",
        "0.35.1-s390x",
        "0.35.1",
...

Comment on lines 357 to 359
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})

kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64"})
kafkaCRSpec = miqutilsv1alpha1.SetKafkaNodeAffinity(kafkaCRSpec, []string{"amd64", "arm64", "ppc64le", "s390x"})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdunne arm? It can't hurt to add, but we don't need it right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏻‍♂️ It's a scheduling limitation. Since they build it, we may as well allow running there if the resources are available.

@miq-bot
Copy link
Member

miq-bot commented Jan 4, 2024

Checked commits nasark/manageiq-pods@80790f6~...a1fb2fb with ruby 2.7.8, rubocop 1.56.3, haml-lint 0.51.0, and yamllint
1 file checked, 1 offense detected

**

  • 💣 💥 🔥 🚒 - Linter/Yaml - missing config files

@bdunne bdunne merged commit 0ccfda1 into ManageIQ:master Jan 4, 2024
2 checks passed
@Fryguy
Copy link
Member

Fryguy commented Jan 5, 2024

Backported to quinteros in commit 22aa97a.

commit 22aa97ad00f6db6097eb1e65300aff2a894fddd5
Author: Brandon Dunne <[email protected]>
Date:   Thu Jan 4 17:18:11 2024 -0500

    Merge pull request #1005 from nasark/strimzi_kafka
    
    Replace Bitnami with Strimzi Kafka
    
    (cherry picked from commit 0ccfda19e605fa147218ce6ee3881c2f1934ca83)

Fryguy pushed a commit that referenced this pull request Jan 5, 2024
Replace Bitnami with Strimzi Kafka

(cherry picked from commit 0ccfda1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants