We are using Open Telemetry SDK in our ASP.NET & WCF application [.NET Framework applications].
In my test server, run the Otel collector agent and configured to send traces to Grafana Cloud. The trace data's are able to see on the dashboard.
We want to deploy the otel collector agent on the Production system. Our production environment have 4 servers in each regions on AWS.
As of now we run the agent on the all the 4 servers in each regions and enabled public access to all the servers to check the traces. But usually won't allow public access to all the servers. only limited servers configured with public access.
Scenario 1,
Consider 3 servers does not have public access & 1 server have the public access.
So tried to send server 1,2,3 trace data to server 4 and server 4 send traces to grafana cloud, but getting the below error on server 1,2,3.
server 1,2,3 config
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
http:
endpoint: localhost:4318
exporters:
debug:
verbosity: normal
otlp:
endpoint: 172.31.10.36:4317
processors:
batch:
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
telemetry:
metrics:
address: 0.0.0.0:9999
Server 4 config
receivers:
otlp:
protocols:
grpc:
http:
exporters:
otlp:
endpoint: tempo-prod-us-east-0.grafana.net:443
headers:
authorization: Basic <API key base 64 string>
processors:
batch:
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
2024-03-06T12:46:06.857Z warn zapgrpc/zapgrpc.go:195 [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "xx.xx.xx.xx:4317", ServerName: "xx.xx.xx.xx:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp xx.xx.xx.xx:4317: i/o timeout" {"grpc_log": true}
Scenario 2
All servers have public access & otel collector agent is running on all the servers.
Any suggestions or recommended way to run the Otel collector agents based on the above scenarios?