How to pipe processing fields by regex and json in Google Ops Agent?

785 Views Asked by At

/etc/google-cloud-ops-agent/config.yaml

logging:
  receivers:
    app:
      type: files
      include_paths: [/www/app-*.log]
  processors:
    monolog:
      type: parse_regex
      field: message
      regex: "^\[(?<time>[^\]]+)\]\s+(?<environment>\w+)\.(?<severity>\w+):\s+(?<msg>.*?)(?<context>{.*})?\s*$"
    context:
      type: parse_json
      field: context
  service:
    pipelines:
      default_pipeline:
        receivers: [app]
        processors: [monolog,context]

I am trying to configure the pipeline so that:

  • first, cut out the appropriate fragments in flat text
  • then format one of those fragments as JSON.

However, it doesn't work. The resulting log only contains the JSON from this chunk, it drops all other data. What am I doing wrong? How to solve it? Fluentbit documentation does not help and Google documentation is minimum information.

1

There are 1 best solutions below

0
On

I have VMs running docker compose images on Google Cloud. The containers spit out logs at /var/log/syslog in the following format (and thats the way they are sent to Logs Explorer (formely Stack Driver):

Mar  8 20:32:12 service-instance-beta-f14506b-1 docker[4020]: service-cron-beta-3    | {"message":"subscribing: TASK_CANCELLATION_BETA_CRON_SUB","severity":"INFO","timestamp":{"seconds":1709929932,"nanos":973725672}}

Then, I went out to build a regex that works for structuring that log entry into json format. I tested it on https://rubular.com and it works:

The REGEX: ^(?<timestamp>\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2})\s(?<host>[\w-]+)\s[\w-]+\[\d+\]:\s(?<service>[\w-]+)\s+\|\s(?<message>{.*})$

enter image description here

How it works on the config?

First, I use a processor of type parse_regex to process and structure my flat log entry in a structured json format.

Then, I use another processor of type parse_json to collect that structured json and inject it into the jsonPayload object of the StackDriver log entry.

This is the final working configuration:

logging:
  receivers:
    syslog:
      type: files
      include_paths:
      - /var/log/syslog
  processors:
    extract_json:
      type: parse_regex
      field: message
      regex: "/^(?<timestamp>\w{3}\s+\d{1,2}\s\d{2}:\d{2}:\d{2})\s(?<host>[\w-]+)\s[\w-]+\[\d+\]:\s(?<service>[\w-]+)\s+\|\s(?<message>{.*})$/"
    parse_message:
      type: parse_json
      field: message
  service:
    pipelines:
      default_pipeline:
        receivers: [syslog]
        processors: [extract_json,parse_message]