I'm using OpenTelemetry in a Go project to trace requests across multiple microservices. My goal is to maintain a consistent trace_id for debugging purposes across an API server (gin server) and a database service, both communicating over gRPC. Despite following OpenTelemetry and gRPC documentation for context propagation, I'm encountering an issue where trace_ids differ between the services.
I've simplified the code for brevity, focusing on the relevant OpenTelemetry and gRPC setup:
API Server Setup (Caller):
// Init router...
s.router.Use(otelgin.Middleware(os.Getenv("APP_NAME")))
// ... Set connection
func getDBConnection(ctx context.Context) (*grpc.ClientConn, error) {
// Setup connection with gRPC options including OpenTelemetry interceptors
ddTraceInterceptor := grpctrace.UnaryClientInterceptor(grpctrace.WithServiceName("apiserver"))
return grpc.DialContext(
ctx, // Propagate context
"db_service_endpoint",
grpc.WithUnaryInterceptor(ddTraceInterceptor),
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithStatsHandler(otelgrpc.NewClientHandler()))
}
Database Service Setup (Callee):
func main() {
// Simplified gRPC server initialization with OpenTelemetry instrumentation
grpcServer := grpc.NewServer(
grpc.StatsHandler(otelgrpc.NewServerHandler()))
// Service registration omitted for brevity
}
Trace Provider intialitzation (both server and client side):
tp := sdkTrace.NewTracerProvider()
otel.SetTracerProvider(tp)
defer func() { _ = tp.Shutdown(ctx) }()
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}))
Both the API server and the database service initialize OpenTelemetry with a TracerProvider and use the NewCompositeTextMapPropagator for context propagation.
Despite this setup, when tracing a request that flows from the API server to the database service, the trace_id logged in the database service does not match the trace_id from the API server.
I expected the trace_id to be consistent across these calls for end-to-end tracing. What might be causing this discrepancy, and how can I ensure the trace_id remains the same across microservices?
Additional Context:
- Both services are standalone Go applications.
- OpenTelemetry SDK and instrumentation versions are compatible across both services.
- No additional middleware or interceptors that might alter the context (except data dog interceptor).