Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Migrate from OpenTracing to OpenTelemetry #524

Closed
wants to merge 10 commits into from

Conversation

HarryET
Copy link
Contributor

@HarryET HarryET commented Jun 30, 2022

What kind of change does this PR introduce?

This PR migrates from Open Tracing to Open Telemetry. This retains DataDog support through Open Telemetry GRPC and HTTP collectors while also adding support directly for Prometheus and indirectly for anything that support the Open Telemetry Collectors e.g. AWS CloudWatch and Jager

What is the current behavior?

Uses the deprecated Open Tracing and only supports Datadog

What is the new behavior?

Uses Open Telemetry and supports a variety of tracing and metrics providers. Also gives the opportunity of adding metrics to GoTrue.

Additional context

@hf
Copy link
Contributor

hf commented Jul 1, 2022

This is really cool. I'll try to review it in a few days!

@hf hf self-requested a review July 1, 2022 15:44
@HarryET
Copy link
Contributor Author

HarryET commented Jul 10, 2022

Hey @hf, any update on this?

Copy link
Contributor

@hf hf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @HarryET,

Great work here. I'm leaving some comments on the code. We would really appreciate it if you could update the README or write some other doc on how to turn on and have an end-to-end visualization of the traces.

Internally at this time we don't collect metrics or traces from GoTrue instances, and we've got no way to test this in a production setting, so that is something we need more focus on to accept the PR.

Let me know if you need any help doing this.

conf/tracing.go Outdated Show resolved Hide resolved

switch tc.Exporter {
case OtlpGrpcExporter:
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this have a timeout context? While below for the HTTP one there is only a background context? Does this handle SIGTERM / SIGINT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Http is a push model, while GRPC is a 2 way sustained connection so if there isn't a timeout it could block startup when it should just be throwing the error earlier

api/api.go Outdated
Comment on lines 52 to 84
// Open Tracing GRPC requires some extra deferrals!
if a.config.Tracing.OtlpMetricExporter != nil {
defer a.config.Tracing.ContextCancel() // Defer GRPC Context Cancel
d := a.config.Tracing.OtlpMetricExporter
baseContext := context.Background()
defer func() {
ctx, cancel := context.WithTimeout(baseContext, time.Second)
defer cancel()
if err := d.Shutdown(ctx); err != nil {
otel.Handle(err)
}
}()
pusher := otelglobal.MeterProvider().(*otelcontroller.Controller)
if err := pusher.Start(baseContext); err != nil {
log.Fatalf("could not start metric controller: %v", err)
}
defer func() {
ctx, cancel := context.WithTimeout(baseContext, time.Second)
defer cancel()
// pushes any last exports to the receiver
if err := pusher.Stop(ctx); err != nil {
otel.Handle(err)
}
}()
}
if a.config.Tracing.TracingShutdown != nil {
defer func() {
if err := a.config.Tracing.TracingShutdown(context.Background()); err != nil {
log.Fatal("failed to shutdown TracerProvider: %w", err)
}
}()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this whole section can be extracted into one function that calls multiple functions, as the readability of all the defers and contexts is quite low.

otlpMetrics,
),
controller.WithExporter(otlpMetrics),
controller.WithCollectPeriod(2*time.Second),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made configurable?

conf/tracing.go Outdated Show resolved Hide resolved
customAttributes = append(customAttributes, semconv.ServiceNameKey.String(tc.ServiceName))
resourceAttributes := resource.WithAttributes(customAttributes...)

switch tc.Exporter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this giant switch into one that calls into different functions that do the setup?

@HarryET
Copy link
Contributor Author

HarryET commented Aug 16, 2022

👋🏼 As requested in Slack here is a link to how we were using it in production with AWS, a similar approach can be used for testing; Code.

We plan to add it back in the future and then you could use the entire terraform config to spin up a working text example you will want to switch the gotrue and gotrue proxy containers out though. or just use this as a basis for a docker-compose file

@hf
Copy link
Contributor

hf commented Sep 12, 2022

Hey @HarryET we've decided to try and focus on this this week. I'll be taking over your changes and making them production ready if that's all right. Check back in a few days!

@HarryET
Copy link
Contributor Author

HarryET commented Sep 12, 2022

Hey @hf that sounds awesome, might be worth a few messages on slack with some known issues I have? I'll reach out today and thanks for working on this; excited to see it in production!

@hf
Copy link
Contributor

hf commented Sep 12, 2022

Closing this PR as the work will continue in #679. Thanks @HarryET!

@hf hf closed this Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants