Fluentbit config parsing logic for isolated region compatibility #94

whoix · 2024-09-05T19:48:28Z

Issue #, if available:

Description of changes:

Isolated regions need the Cloudwatch logs endpoint specified in the linux fluent bit configmap in order to properly create log groups. The endpoints are different for isolated (ADC) regions and do not follow conventional formatting compared to commercial. I have refactored some logic to properly parse which region the addon is being applied to and appropriately apply the correct linux configmap. These changes already work in isolated regions and are in AWS code base.

###Testing
Images/addon components are already onboarded to an internal ImageReplicationService. This automatically syncs CW images lowside and transfers them up to all supported EKS regions. We have worked with the EKS team to ensure you guys are already onboarded.

I was able to confirm metrics, log groups, and that my metrics for Application Signals and GPU Container insights show up in Container Insights. I am not able to attach/illustrate specific screenshots or logs due to security implications.

Changes only affect ADC regions to incorporate the endpoint line to the configmap. Commercial is unaffected and i did test there too.

In commercial the configmap remains the same

[INPUT]
  Name                tail
  Tag                 application.*
  Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
  Path                /var/log/containers/*.log
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_container.db
  Mem_Buf_Limit       50MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Rotate_Wait         30
  storage.type        filesystem
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/fluent-bit*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_log.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/cloudwatch-agent*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_cwagent.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
  Name                kubernetes
  Match               application.*
  Kube_URL            https://kubernetes.default.svc:443
  Kube_Tag_Prefix     application.var.log.containers.
  Merge_Log           On
  Merge_Log_Key       log_processed
  K8S-Logging.Parser  On
  K8S-Logging.Exclude Off
  Labels              Off
  Annotations         Off
  Use_Kubelet         On
  Kubelet_Port        10250
  Buffer_Size         0

[OUTPUT]
  Name                cloudwatch_logs
  Match               application.*
  region              ${AWS_REGION}
  log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
  log_stream_prefix   ${HOST_NAME}-
  auto_create_group   true
  extra_user_agent    container-insights

while Isolated region now have the new added endpoint param

[INPUT]
  Name                tail
  Tag                 application.*
  Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
  Path                /var/log/containers/*.log
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_container.db
  Mem_Buf_Limit       50MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Rotate_Wait         30
  storage.type        filesystem
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/fluent-bit*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_log.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/cloudwatch-agent*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_cwagent.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
  Name                kubernetes
  Match               application.*
  Kube_URL            https://kubernetes.default.svc:443
  Kube_Tag_Prefix     application.var.log.containers.
  Merge_Log           On
  Merge_Log_Key       log_processed
  K8S-Logging.Parser  On
  K8S-Logging.Exclude Off
  Labels              Off
  Annotations         Off
  Use_Kubelet         On
  Kubelet_Port        10250
  Buffer_Size         0

[OUTPUT]
  Name                cloudwatch_logs
  Match               application.*
  region              ${AWS_REGION}
  log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
  log_stream_prefix   ${HOST_NAME}-
  auto_create_group   true
  endpoint             logs.${AWS_REGION}.c2s.ic.gov
  extra_user_agent    container-insights

charts/amazon-cloudwatch-observability/values.yaml

…outes (#98)" (#99) This reverts commit 4371d4a.

Updating values.yaml with new .NET auto instrumentation version v1.3.2

jefchien · 2024-11-22T21:03:48Z

charts/amazon-cloudwatch-observability/values.yaml

@@ -242,6 +242,318 @@ containerLogs:
            log_stream_prefix   ${HOST_NAME}.
            auto_create_group   true
            extra_user_agent    container-insights
+      adcIsoExtraFiles:


If the only difference is the endpoint, could we figure out a way to template it out instead of duplicating each of the configs?

The problem is that you would need to refactor your fluent-bit-configmap in your linux helm chart to parse only for the OUTPUTS section and for ADC regions which then gets convoluted. Its about the same refactoring as I have done already in the values currently. The endpoint only needs to be specified in the OUTPUTS section, other wise it can actually cause helm chart formatting errors -> causes the pod to be instable.

Open to suggestions but this was the safest way i could think that doesn't impact your helm charts too much.

Ideally, we would fix this in fluent bit so the right endpoint would be used without having to override it https://github.com/fluent/fluent-bit/blob/master/src/aws/flb_aws_util.c#L75, but until that's done, I'm fine with doing it this way.

jefchien · 2024-11-22T21:03:53Z

charts/amazon-cloudwatch-observability/templates/linux/fluent-bit-configmap.yaml

@@ -14,8 +14,20 @@ data:
    {{- end }}
  parsers.conf: |
    {{- .Values.containerLogs.fluentBit.config.customParsers  | nindent 4 }}
+{{- if contains "us-iso-" .Values.region }}


nit: Could use hasPrefix instead https://helm.sh/docs/chart_template_guide/function_list/#hasprefix-and-hassuffix

Ack. updating.

Wyatt Hicken added 2 commits September 5, 2024 13:40

Fluentbit config parsing logic for isolated region compatibility

25a7974

Fix nodejs value

3d77ca4

whoix marked this pull request as draft September 5, 2024 20:00

whoix marked this pull request as ready for review September 5, 2024 20:01

mitali-salvi reviewed Sep 9, 2024

View reviewed changes

charts/amazon-cloudwatch-observability/values.yaml Outdated Show resolved Hide resolved

charts/amazon-cloudwatch-observability/values.yaml Outdated Show resolved Hide resolved

mitali-salvi and others added 16 commits September 9, 2024 14:42

Updating values.yaml for CWAgent and CWAgent Operator version (#92)

8be708a

Adding NodeJS instrumentation SDK image to image-scanning GHA (#96)

9dedf74

Conform Helm naming conventions

64fe288

Merge branch 'aws-observability:main' into main

b2a6b5e

Remove unused privileges for leases, ingress, and openshift routes (#98)

c1ff1f4

Revert "Remove unused privileges for leases, ingress, and openshift r…

69b8b5c

…outes (#98)" (#99) This reverts commit 4371d4a.

Release for 2.1.0 (#95)

b6acd76

Correcting version in RELEASE_NOTES (#101)

d17724d

release 2.1.1 (#103)

12a0d70

update chart version (#104)

d9bea57

Update .NET auto instrumentation to v1.3.2 (#108)

9235575

Updating values.yaml with new .NET auto instrumentation version v1.3.2

Release 2.1.2 (#109)

d473311

Fluentbit config parsing logic for isolated region compatibility

a64c793

Fix nodejs value

4c0acf3

Merge branch 'aws-observability:main' into main

0982e72

Merge branch 'aws-observability:main' into main

0a0637e

whoix marked this pull request as draft November 21, 2024 16:03

whoix marked this pull request as ready for review November 21, 2024 16:04

mitali-salvi approved these changes Nov 22, 2024

View reviewed changes

jefchien reviewed Nov 22, 2024

View reviewed changes

jefchien approved these changes Dec 3, 2024

View reviewed changes

jefchien merged commit ba92eff into aws-observability:main Dec 3, 2024
0 of 3 checks passed

whoix mentioned this pull request Dec 3, 2024

More Fluentbit config parsing logic for isolated regions with different domains #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fluentbit config parsing logic for isolated region compatibility #94

Fluentbit config parsing logic for isolated region compatibility #94

whoix commented Sep 5, 2024 •

edited

Loading

jefchien Nov 22, 2024

whoix Nov 22, 2024

jefchien Dec 3, 2024

jefchien Nov 22, 2024

whoix Nov 22, 2024

Fluentbit config parsing logic for isolated region compatibility #94

Fluentbit config parsing logic for isolated region compatibility #94

Conversation

whoix commented Sep 5, 2024 • edited Loading

jefchien Nov 22, 2024

Choose a reason for hiding this comment

whoix Nov 22, 2024

Choose a reason for hiding this comment

jefchien Dec 3, 2024

Choose a reason for hiding this comment

jefchien Nov 22, 2024

Choose a reason for hiding this comment

whoix Nov 22, 2024

Choose a reason for hiding this comment

whoix commented Sep 5, 2024 •

edited

Loading