Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch between Beyla and Loki pod logs on service.name, service.namespace, and service.instance.id #942

Open
cyrille-leclerc opened this issue Nov 22, 2024 · 11 comments

Comments

@cyrille-leclerc
Copy link
Collaborator

cyrille-leclerc commented Nov 22, 2024

Problem description

There is inconsistency in critical resource attributes of the telemetry of applications instrumented with Beyla for metrics & traces combined with the Grafana K8s Helm Chart for logs emitted through k8s stdout.
This inconsistency, different service.name, service.namespace, and service.instance.id break correlations capabilities in Grafana Cloud, particularly in Grafana Application Observability.

Example of inconsistencies

  TRACE BY BEYLA LOGS BY GRAFANA K8S MONITORING HELM CHART
host.id 8b3ab3487d2d4b3d9ffa0b93accc6b78  
host.name authentication-deployment-7d77f7886b-6kwhn  
job   dev/authentication-container
k8s.cluster.name dev  dev
k8s.container.name  authentication-container
k8s.deployment.name authentication-deployment  
k8s.namespace.name dev  dev
k8s.node.name atlubun2devn2  
k8s.pod.name authentication-deployment-7d77f7886b-6kwhn  
k8s.pod.start_time 2024-11-07 19:53:02 +0000 UTC  
k8s.pod.uid 4caaca67-4386-4148-919c-1615e823cece  
k8s.replicaset.name authentication-deployment-7d77f7886b  
otel.library.name github.com/grafana/beyla  
pod   authentication-deployment-7d77f7886b-6kwhn
🔴 service.instance.id beyla-ld92z-2310310  
🔴 service.name authentication-deployment  authentication-container
🔴 service.namespace dev  
telemetry.sdk.language go  
telemetry.sdk.name beyla  


Root Cause Analysis - Inconsistent naming strategies

Resource Attribute OTel Operator Beyla K8s Alloy K8s + Loki
service.name first_non_null( pod.annotation[resource.opentelemetry.io/service.name] if (useLabelsForResourceAttributes) { pod.label[app.kubernetes.io/name] } k8s.deployment.name k8s.replicaset.name k8s.statefulset.name k8s.daemonset.name k8s.cronjob.name k8s.job.name ) first_non_null( k8s.deployment.name ?TODO? ) first_non_null( service app application name pod.label[app.kubernetes.io/name] container container_name component workload job ) -- ℹ
service.namespace first_non_null( pod.annotation[resource.opentelemetry.io/service.namespace] if (useLabelsForResourceAttributes) { pod.label[app.kubernetes.io/part-of] } k8s.namespace.name ) k8s.namespace.name 0
service.instance.id pod.annotation[resource.opentelemetry.io/service.instance.id] if (useLabelsForResourceAttributes) { pod.label[app.kubernetes.io/instance] } join(k8s.namespace.name, k8s.pod.name, k8s.container.name, ".") (don't use annotation or label - we want to remove it: https://github.com/open-telemetry/opentelemetry-operator/issues/3495) TODO GENERATED? 0
service.version first_non_null( pod.annotation[resource.opentelemetry.io/service.version] if (useLabelsForResourceAttributes) { pod.label[app.kubernetes.io/version] } docker tag, except when it contains a `/` ) ? ?
deployment.environment.name first_non_null( pod.annotation[resource.opentelemetry.io/deployment.environment.name] ) 0 0

@petewall
Copy link
Collaborator

Trying to determine how to resolve this.
service.namespace is obvious.
service.name appears to be the name of the deployment, where loki source only sees pods and containers.
service.instance.id appears to be the Beyla pod. Is that even the right setting? What should it be?

@cyrille-leclerc
Copy link
Collaborator Author

Thanks @petewall . I started brainstorming with @zeitlinger and plan to involve next week the Beyla team.

@mariomac
Copy link
Contributor

Since Beyla 1.9 (released Nov 25th), instance id is: <pod id>:<container name>

@zeitlinger
Copy link
Member

@cyrille-leclerc I just noticed that the operator uses the docker image to determine the service version - I think we should also add that to our spec: https://github.com/open-telemetry/opentelemetry-operator/blob/2389f9441912835fbd2af00d26dd76d6c1dae545/pkg/instrumentation/sdk.go#L461

@zeitlinger
Copy link
Member

@cyrille-leclerc
Copy link
Collaborator Author

Thanks, can you please update the doc?

@zeitlinger
Copy link
Member

Thanks, can you please update the doc?

done

@mariomac
Copy link
Contributor

I guess I can assume that useLabelsForResourceAttributes defaults to false, right?

https://github.com/open-telemetry/opentelemetry-operator/blob/main/apis/v1alpha1/instrumentation_types.go#L149C1-L157C2

@zeitlinger
Copy link
Member

I guess I can assume that useLabelsForResourceAttributes defaults to false, right?

correct

@zeitlinger
Copy link
Member

here's the PR to disable the label for service.instance.id: open-telemetry/opentelemetry-operator#3497

@zeitlinger
Copy link
Member

@mariomac I'm happy to review a PR if you add the logic to beyla

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants