I'm trying to get Grafana:Loki & PromTail working in my environment. Our goal is to pull information from /var/logs/haproxy.log to track the traffic hitting each of our servers. Specifically, client IP addresses so we can graph it out over time. HAProxy has an exporter service that historically seems to work with Prometheus, however, we're unable to setup the exporter service due to specific security requirements on our end. Additionally, that involves a reboot that we do not want to do at the moment. So we've discovered Loki by Grafana that can pull the raw log, but it's up to us to design a proper regular expression configuration that pulls the information we want.
Long story long, I've managed to get Loki setup without much of an issue. Same with Promtail. However, I ran into an issue trying to configure Promtail to appropriately grab the information we want from our log files. I was able to find a regular expression that somebody else has written along with some labels, however, the labeling does not work appropriately on Grafana. So I'm kind of stuck. Below is the config file from Promtail with two stages: 1 to parse the data of the log and 2 to label said data.
I'm not sure this is the best way to approach this, but I'm stuck and don't know what to do. Is there a better way to grab the specific information I want from the HAProxy logs? Or is anyone able to help me make a regular expression/label for the information I want?
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
# This file is promtail A file that records log offsets , Every time you collect this file, it will be updated
# Even if the service goes down , The next restart will also start from the log offset recorded in this file
filename: /tmp/positions.yaml
clients:
- url: http://10.10.140.53:3100/loki/api/v1/push
scrape_configs:
# The following configuration section is similar to Prometheus Very similar.
- job_name: system
static_configs:
- targets:
- 10.10.140.53
labels:
# Static tags , On behalf of the job All logs under include at least these two tags
job: haproxy
# level, because haproxy The log itself is not well wrapped log level, therefore loki It doesn't work out
# If the normal log contains the mark of standard log registration, this label does not need to be set
level: info
__path__: /var/log/*log.1
pipeline_stages:
- match:
# This part deals with logs ,selector Selectors are used to filter log streams that meet the criteria
selector: '{job="haproxy"}'
stages:
- regex:
# RE2 Regular expression of format ,?P<XXX> Represents setting the matching part as a variable
expression: '^[^[]+\s+(\w+)\[(\d+)\]:([^:]+):(\d+)\s+\[([^\]]+)\]\s+[^\s]+\s+(\w+)\/(\w+)\s+(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d*)\s+(\d+)\s+(\d+)\s+([^\s]+)\s+([^\s]+)\s+([^\s]+)\s(\d+)\/(\d+)\/(\d+)\/(\d+)\/(\d+)\s+(\d+)\/(\d+)\s+\{([^}]*)\}\s\{([^}]*)\}\s+\"([^"]+)\"$'
# Don't filter out my time and other records , With output Variable output
- output:
source: output
- match:
# Although I still use the same conditional selector here , But the log stream at this time has been processed once
# So the unwanted information no longer exists
selector: '{job="haproxy"}'
stages:
- regex:
expression: 'frontend:(?P<frontend>\S+) haproxy:(?P<haproxy>\S+) client:(?P<client>\S+) method:(?P<method>\S+) code:(?P<code>\S+) url:(?P<url>\S+) destination:(?P<destination>\S+)}?$'
- labels:
# Dynamic label generation
frontend:
method:
code:
destination:
Here's a log file example:
Dec 13 11:35:50 haproxy haproxy[8733]: frontend:app_https_frontend/haproxy/10.10.150.53:443 client:111.222.333.444:38034 GMT:13/Dec/2022:11:35:50 +0000 body:- request:GET /bower_components/modernizr/modernizr.js?v=3.30.0 HTTP/1.1