I have a basic opentelemetry setup, consisting of:

  1. A deployment running a tomcat web server (with otel java agent attached on runtime)
  2. multiple deployments with different microservices (with otel java agent attached on all using java options)
  3. a jaeger collector, supported by ES storage, and finally a jaeger query to look at the traces.

When I make calls directly on the microservices (which further call other microservices), I am able to see the expected output of traces & spans (a long succession of calls, including mongo, ES etc., including calls to different microservices, and their instrumentation too). All of this comes under a single trace, which is totally what I expected.

Now I wanted to analyze my entire system, end-to-end, starting from the calls on the web server, ending on microservice calls.

But, what I am observing is that, the java agent on my web server just captures GET and POST calls (with the correct http target etc.), and doesn't connect the traces with the microservice calls. I can see the HTTP request trace on tomcat server (with a duration that matches the ENTIRE call, for ex. 7s or 8s), and the microservice call trace INDEPENDENTLY, on similar timestamps, but not collectively as a single trace (as expected, web-server -> microservice -> microservice etc.)

There is no sampling on and majorly all default parameters are used. (default sampling is used, also tried "always_on" instead of parent based "always_on", but no difference)

I wanted to understand how I can fix this,

  1. What may be the reason calls are not getting connected.
  2. I was unable to understand how exactly the otel java agent is propagating context, and exactly when does it inject its context information (I wanted to understand this, as I noticed some filters being applied on my tomcat server, and was wondering if this may be the reason context was being lost?)

Please help me with this, and let me know if any more information is required.

1

There are 1 best solutions below

3
On

So the issue you are referring to here is called context propagation. Since you are using autoinstrumentation you have to ensure that the Java application is supported by the instrumentation or you have to do this manually. Here is the supported frameworks https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/docs/supported-libraries.md

Are your other services using a supported library? Are the other microservices instrumented with Otel in the same way?