Fluent bit on OpenShift 4 - can't read from /var/log/

195 Views Asked by At

I am experiencing an issue with the installation of FluentBit as a DaemonSet on my Openshift 4.13 cluster. The scenario is the following one.

We are using the official helm chart from Fluentbit, which is reported here below:

# Default values for fluent-bit.

# kind -- DaemonSet or Deployment
kind: DaemonSet

# replicaCount -- Only applicable if kind=Deployment
replicaCount: 1

image:
  repository: cr.fluentbit.io/fluent/fluent-bit:2.2.0-debug
  # Overrides the image tag whose default is {{ .Chart.AppVersion }}
  # Set to "-" to not use the default value
  tag: 2.2.0-debug
  digest:
  pullPolicy: Always

testFramework:
  enabled: true
  namespace:
  image:
    repository: busybox
    pullPolicy: Always
    tag: latest
    digest:

imagePullSecrets:
  - name: ecr-docker-secret
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  name: fluentbit-sa
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::<account_id>:role/<role_name>"
    eks.amazonaws.com/audience: "sts.amazonaws.com"
    eks.amazonaws.com/sts-regional-endpoints: "true"
    eks.amazonaws.com/token-expiration: "86400"

rbac:
  create: true
  nodeAccess: false
  eventsAccess: false

# Configure podsecuritypolicy
# Ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
# from Kubernetes 1.25, PSP is deprecated
# See: https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes
# We automatically disable PSP if Kubernetes version is 1.25 or higher
podSecurityPolicy:
  create: false
  annotations: {}

# OpenShift-specific configuration
openShift:
  enabled: true
  securityContextConstraints:
    # Create SCC for Fluent-bit and allow use it
    create: false
    name: ""
    annotations: {}
    # Use existing SCC in cluster, rather then create new one
    existingName: "fluentbit-scc"

podSecurityContext:
  privileged: true
  runAsUser: 0
  fsGroup: 0

hostNetwork: false
dnsPolicy: ClusterFirst

dnsConfig: {}
#   nameservers:
#     - 1.2.3.4
#   searches:
#     - ns1.svc.cluster-domain.example
#     - my.dns.search.suffix
#   options:
#     - name: ndots
#       value: "2"
#     - name: edns0

hostAliases: []
#   - ip: "1.2.3.4"
#     hostnames:
#     - "foo.local"
#     - "bar.local"

securityContext: {}
#   capabilities:
#     drop:
#     - ALL
#   readOnlyRootFilesystem: true
#   runAsNonRoot: true
#   runAsUser: 1000

service:
  type: ClusterIP
  port: 2020
  loadBalancerClass:
  loadBalancerSourceRanges: []
  labels: {}
  # nodePort: 30020
  # clusterIP: 172.16.10.1
  annotations: {}
#   prometheus.io/path: "/api/v1/metrics/prometheus"
#   prometheus.io/port: "2020"
#   prometheus.io/scrape: "true"

serviceMonitor:
  enabled: false
  #   namespace: monitoring
  #   interval: 10s
  #   scrapeTimeout: 10s
  #   selector:
  #    prometheus: my-prometheus
  #  ## metric relabel configs to apply to samples before ingestion.
  #  ##
  #  metricRelabelings:
  #    - sourceLabels: [__meta_kubernetes_service_label_cluster]
  #      targetLabel: cluster
  #      regex: (.*)
  #      replacement: ${1}
  #      action: replace
  #  ## relabel configs to apply to samples after ingestion.
  #  ##
  #  relabelings:
  #    - sourceLabels: [__meta_kubernetes_pod_node_name]
  #      separator: ;
  #      regex: ^(.*)$
  #      targetLabel: nodename
  #      replacement: $1
  #      action: replace
  #  scheme: ""
  #  tlsConfig: {}

  ## Beare in mind if youn want to collec metrics from a different port
  ## you will need to configure the new ports on the extraPorts property.
  additionalEndpoints: []
  # - port: metrics
  #   path: /metrics
  #   interval: 10s
  #   scrapeTimeout: 10s
  #   scheme: ""
  #   tlsConfig: {}
  #   # metric relabel configs to apply to samples before ingestion.
  #   #
  #   metricRelabelings:
  #     - sourceLabels: [__meta_kubernetes_service_label_cluster]
  #       targetLabel: cluster
  #       regex: (.*)
  #       replacement: ${1}
  #       action: replace
  #   # relabel configs to apply to samples after ingestion.
  #   #
  #   relabelings:
  #     - sourceLabels: [__meta_kubernetes_pod_node_name]
  #       separator: ;
  #       regex: ^(.*)$
  #       targetLabel: nodename
  #       replacement: $1
  #       action: replace

prometheusRule:
  enabled: false
#   namespace: ""
#   additionalLabels: {}
#   rules:
#   - alert: NoOutputBytesProcessed
#     expr: rate(fluentbit_output_proc_bytes_total[5m]) == 0
#     annotations:
#       message: |
#         Fluent Bit instance {{ $labels.instance }}'s output plugin {{ $labels.name }} has not processed any
#         bytes for at least 15 minutes.
#       summary: No Output Bytes Processed
#     for: 15m
#     labels:
#       severity: critical

dashboards:
  enabled: false
  labelKey: grafana_dashboard
  labelValue: 1
  annotations: {}
  namespace: ""

lifecycle: {}
#   preStop:
#     exec:
#       command: ["/bin/sh", "-c", "sleep 20"]

livenessProbe:
  httpGet:
    path: /
    port: http

readinessProbe:
  httpGet:
    path: /api/v1/health
    port: http

resources: {}
#   limits:
#     cpu: 100m
#     memory: 128Mi
#   requests:
#     cpu: 100m
#     memory: 128Mi

## only available if kind is Deployment
ingress:
  enabled: false
  ingressClassName: ""
  annotations: {}
  #  kubernetes.io/ingress.class: nginx
  #  kubernetes.io/tls-acme: "true"
  hosts: []
  # - host: fluent-bit.example.tld
  extraHosts: []
  # - host: fluent-bit-extra.example.tld
  ## specify extraPort number
  #   port: 5170
  tls: []
  #  - secretName: fluent-bit-example-tld
  #    hosts:
  #      - fluent-bit.example.tld

## only available if kind is Deployment
autoscaling:
  vpa:
    enabled: false

    annotations: {}

    # List of resources that the vertical pod autoscaler can control. Defaults to cpu and memory
    controlledResources: []

    # Define the max allowed resources for the pod
    maxAllowed: {}
    # cpu: 200m
    # memory: 100Mi
    # Define the min allowed resources for the pod
    minAllowed: {}
    # cpu: 200m
    # memory: 100Mi

    updatePolicy:
      # Specifies whether recommended updates are applied when a Pod is started and whether recommended updates
      # are applied during the life of a Pod. Possible values are "Off", "Initial", "Recreate", and "Auto".
      updateMode: Auto

  enabled: false
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 75
  #  targetMemoryUtilizationPercentage: 75
  ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics
  customRules: []
  #     - type: Pods
  #       pods:
  #         metric:
  #           name: packets-per-second
  #         target:
  #           type: AverageValue
  #           averageValue: 1k
  ## see https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-configurable-scaling-behavior
  behavior: {}
#      scaleDown:
#        policies:
#          - type: Pods
#            value: 4
#            periodSeconds: 60
#          - type: Percent
#            value: 10
#            periodSeconds: 60

## only available if kind is Deployment
podDisruptionBudget:
  enabled: false
  annotations: {}
  maxUnavailable: "30%"

nodeSelector: {}

tolerations: []

affinity: {}

labels: {}

annotations: {}

podAnnotations: {}

podLabels: {}

## How long (in seconds) a pods needs to be stable before progressing the deployment
##
minReadySeconds:

## How long (in seconds) a pod may take to exit (useful with lifecycle hooks to ensure lb deregistration is done)
##
terminationGracePeriodSeconds:

priorityClassName: ""

env: []
#  - name: FOO
#    value: "bar"

# The envWithTpl array below has the same usage as "env", but is using the tpl function to support templatable string.
# This can be useful when you want to pass dynamic values to the Chart using the helm argument "--set <variable>=<value>"
# https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
envWithTpl: []
#  - name: FOO_2
#    value: "{{ .Values.foo2 }}"
#
# foo2: bar2

envFrom: []

extraContainers: []
#   - name: do-something
#     image: busybox
#     command: ['do', 'something']

flush: 1

metricsPort: 2020

extraPorts: []
#   - port: 5170
#     containerPort: 5170
#     protocol: TCP
#     name: tcp
#     nodePort: 30517

extraVolumes: []

extraVolumeMounts: []

updateStrategy: {}
#   type: RollingUpdate
#   rollingUpdate:
#     maxUnavailable: 1

# Make use of a pre-defined configmap instead of the one templated here
existingConfigMap: ""

networkPolicy:
  enabled: false
#   ingress:
#     from: []

luaScripts: {}

## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file
config:
  service: |
    [SERVICE]
        Daemon Off
        Flush {{ .Values.flush }}
        Log_Level {{ .Values.logLevel }}
        Parsers_File /fluent-bit/etc/parsers.conf
        Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.metricsPort }}
        Health_Check On

  ## https://docs.fluentbit.io/manual/pipeline/inputs
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/pods/default_counter_d02c44ba-4751-4da6-8ad8-ef585031f032/count/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On


  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    [OUTPUT]
        Name                s3
        Match               *
        bucket              <bucket_name>
        region              eu-south-1
        store_dir           /home/s3

  ## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/upstream-servers
  ## This configuration is deprecated, please use `extraFiles` instead.
  upstream: {}

  ## https://docs.fluentbit.io/manual/pipeline/parsers
  customParsers: |
    [PARSER]
        Name docker_no_time
        Format json
        Time_Keep Off
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%L

  # This allows adding more files with arbitary filenames to /fluent-bit/etc/conf by providing key/value pairs.
  # The key becomes the filename, the value becomes the file content.
  extraFiles: {}
#     upstream.conf: |
#       [UPSTREAM]
#           upstream1
#
#       [NODE]
#           name       node-1
#           host       127.0.0.1
#           port       43000
#     example.conf: |
#       [OUTPUT]
#           Name example
#           Match foo.*
#           Host bar

# The config volume is mounted by default, either to the existingConfigMap value, or the default of "fluent-bit.fullname"
volumeMounts:
  - name: config
    mountPath: /fluent-bit/etc/conf

daemonSetVolumes:
  - name: varlog
    hostPath:
      path: /var/log
      type: ""
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers
  - name: etcmachineid
    hostPath:
      path: /etc/machine-id
      type: File

daemonSetVolumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true
  - name: etcmachineid
    mountPath: /etc/machine-id
    readOnly: true

command:
  - /fluent-bit/bin/fluent-bit

args:
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.conf

# This supports either a structured array or a templatable string
initContainers: []

# Array mode
# initContainers:
#   - name: do-something
#     image: bitnami/kubectl:1.22
#     command: ['kubectl', 'version']

# String mode
# initContainers: |-
#   - name: do-something
#     image: bitnami/kubectl:{{ .Capabilities.KubeVersion.Major }}.{{ .Capabilities.KubeVersion.Minor }}
#     command: ['kubectl', 'version']

logLevel: info

hotReload:
  enabled: false
  image:
    repository: ghcr.io/jimmidyson/configmap-reload
    tag: v0.11.1
    digest:
    pullPolicy: IfNotPresent
  resources: {}

We have configured for the pods the following SecurityContextConstraint (which is the one suggested by fluentbit itself):

kind: SecurityContextConstraints
apiVersion: security.openshift.io/v1
metadata:
  name: fluentbit-scc
allowPrivilegedContainer: true
allowHostNetwork: true
allowHostDirVolumePlugin: true
priority:
allowedCapabilities: []
allowHostPorts: true
allowHostPID: true
allowHostIPC: true
allowPrivilegeEscalation: true
readOnlyRootFilesystem: false
requiredDropCapabilities: []
defaultAddCapabilities: []
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
fsGroup:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
volumes:
  - configMap
  - downwardAPI
  - emptyDir
  - hostPath
  - persistentVolumeClaim
  - projected
  - secret
users:
  - system:serviceaccount:kube-system:builder
  - system:serviceaccount:kube-system:default
  - system:serviceaccount:kube-system:deployer
  - system:serviceaccount:kube-system:fluentbit-sa

The pod that is logging is just writing some json text to stdout, and we can see the log file under /var/log/pods/ (see the configmap above for the specific path).

When we launch the DaemonSet with Fluentbit, we get the following error:

[2023/11/30 11:51:19] [error] [input:tail:tail.0] read error, check permissions: /var/log/pods/default_counter_d02c44ba-4751-4da6-8ad8-ef585031f032/count/.log 14[2023/11/30 11:51:19] [ warn] [input:tail:tail.0] error scanning path: /var/log/pods/default_counter_d02c44ba-4751-4da6-8ad8-ef585031f032/count/.log

Of course, we have mountd on the fluentbit pods the /var/log as a volume.

Does anybody have experienced something like this?

Thank you,

Luca

I have tried to launch the container as privileged

0

There are 0 best solutions below