Skip to content

Commit

Permalink
Upgrade ME (for hdinsights OOM bug) (#877)
Browse files Browse the repository at this point in the history
* Upgrade ME for both linux & windows (this fixes the HD insights OOM
issue) ME version from: metricsext2-2.2024.328.1744 to:
metricsext2-2.2024.419.1535
* Up version for the release
* Update release notes
* Update .trivyignore for new CVEs (need to be fixed in next release)


There is no significant difference before v after in either cpu or mem
usage for ds (linux & windows) & rs (see below) , and also successful
metric ingestion volume is the same , with no drops in ME --

ds (linux) --

<img width="1851" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/1adc92cb-5ea9-46cf-9a43-642f13e4cf5f">



ds (windows) --

<img width="1862" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/c991f778-0fb1-4cdc-89e4-56ef427e144b">



rs --

<img width="1861" alt="image"
src="https://github.com/Azure/prometheus-collector/assets/10353076/e99fbd95-2002-42b1-be6f-745b273bc895">



[comment]: # (The below checklist is for PRs adding new features. If a
box is not checked, add a reason why it's not needed.)
# New Feature Checklist

- [ ] List telemetry added about the feature.
- [ ] Link to the one-pager about the feature.
- [ ] List any tasks necessary for release (3P docs, AKS RP chart
changes, etc.) after merging the PR.
- [ ] Attach results of scale and perf testing.

[comment]: # (The below checklist is for code changes. Not all boxes
necessarily need to be checked. Build, doc, and template changes do not
need to fill out the checklist.)
# Tests Checklist

- [ ] Have end-to-end Ginkgo tests been run on your cluster and passed?
To bootstrap your cluster to run the tests, follow [these
instructions](/otelcollector/test/README.md#bootstrap-a-dev-cluster-to-run-ginkgo-tests).
  - Labels used when running the tests on your cluster:
    - [ ] `operator`
    - [ ] `windows`
    - [ ] `arm64`
    - [ ] `arc-extension`
- [ ] Have new tests been added? For features, have tests been added for
this feature? For fixes, is there a test that could have caught this
issue and could validate that the fix works?
  - [ ] Is a new scrape job needed?
- [ ] The scrape job was added to the folder
[test-cluster-yamls](/otelcollector/test/test-cluster-yamls/) in the
correct configmap or as a CR.
  - [ ] Was a new test label added?
- [ ] A string constant for the label was added to
[constants.go](/otelcollector/test/utils/constants.go).
- [ ] The label and description was added to the [test
README](/otelcollector/test/README.md).
- [ ] The label was added to this [PR
checklist](/.github/pull_request_template).
- [ ] The label was added as needed to
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  - [ ] Are additional API server permissions needed for the new tests?
- [ ] These permissions have been added to
[api-server-permissions.yaml](/otelcollector/test/testkube/api-server-permissions.yaml).
  - [ ] Was a new test suite (a new folder under `/tests`) added?
- [ ] The new test suite is included in
[testkube-test-crs.yaml](/otelcollector/test/testkube/testkube-test-crs.yaml).
  • Loading branch information
vishiy authored May 4, 2024
1 parent 9facd0f commit 07438b2
Show file tree
Hide file tree
Showing 7 changed files with 34 additions and 20 deletions.
1 change: 0 additions & 1 deletion .pipelines/azure-pipeline-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ trigger:
branches:
include:
- main

pr:
autoCancel: true
branches:
Expand Down
21 changes: 21 additions & 0 deletions .trivyignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ CVE-2023-45288
CVE-2023-48795
CVE-2024-24557
CVE-2020-8559
CVE-2023-45289
CVE-2023-45290
CVE-2024-24783
CVE-2024-24784
CVE-2024-24785
# MEDIUM - promconfigvalidator
CVE-2023-48795
CVE-2024-24786
Expand All @@ -26,7 +31,23 @@ CVE-2020-8559
# MEDIUM - go vulnerabilities
CVE-2023-3978
CVE-2023-44487
CVE-2023-45283
CVE-2023-45287
CVE-2023-39318
CVE-2023-39319
CVE-2023-39326
CVE-2023-45284
# MEDIUM - mariner
CVE-2023-5678
# MEDIUM - ruby
CVE-2024-27281
# MEDIUM - KSM
CVE-2023-29406
CVE-2023-29409
CVE-2023-39318
CVE-2023-39319
CVE-2023-39326
CVE-2023-45284
# HIGH - KSM
CVE-2023-45283
CVE-2023-29403
22 changes: 8 additions & 14 deletions RELEASENOTES.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,11 @@
# Azure Monitor Metrics for AKS clusters

## Release 04-30-2024
## Release 05-03-2024

* Linux image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* Windows image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* TA image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* cfg sidecar image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* Change log -
* perf: add namespace selector to default jobs to improve perf - https://github.com/Azure/prometheus-collector/pull/867

## Release 04-25-2024

* Linux image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* Windows image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* TA image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* cfg sidecar image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:<tbd>`
* Linux image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.8.10-main-`
* Windows image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.8.10-main-`
* TA image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.8.10-main-`
* cfg sidecar image - `mcr.microsoft.com/azuremonitor/containerinsights/ciprod/prometheus-collector/images:6.8.10-main-`
* Change log -
* fix: update to use older proxy setup for mdsd in aks - https://github.com/Azure/prometheus-collector/pull/864
* add remaining sdl scans similar to onebranch default - https://github.com/Azure/prometheus-collector/pull/858
Expand All @@ -32,6 +23,9 @@
* fix: set hubble minimal ingestion profile - https://github.com/Azure/prometheus-collector/pull/829
* [fix] Minor fix in onboarding templates - https://github.com/Azure/prometheus-collector/pull/828
* Remove telegraf for telemetry and only use fluent-bit
* perf: add namespace selector to default jobs to improve perf - https://github.com/Azure/prometheus-collector/pull/867
* set hubble minimal ingestion profile - https://github.com/Azure/prometheus-collector/pull/860
* Upgrade Metrics Extension (Linux & windows) from metricsext2-2.2024.328.1744 --> metricsext2-2.2024.419.1535 (This fixes the HDInsights bug (OOM) on flint clusters)

## Release 04-08-2024

Expand Down
2 changes: 1 addition & 1 deletion otelcollector/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
6.8.9
6.8.10
4 changes: 2 additions & 2 deletions otelcollector/build/windows/scripts/setup.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ New-Item -Type Directory -Path /etc/genevamonitoringagent
############################################################################################
Write-Host ('Installing Metrics Extension');
try {
Invoke-WebRequest -Uri "https://github.com/Azure/prometheus-collector/releases/download/metricsext2-2.2024.328.1744/MdmMetricsExtension.2.2024.328.1744.nupkg" -OutFile /installation/ME/mdmmetricsextension.2.2024.328.1744.zip
Expand-Archive -Path /installation/ME/mdmmetricsextension.2.2024.328.1744.zip -Destination /installation/ME/
Invoke-WebRequest -Uri "https://github.com/Azure/prometheus-collector/releases/download/v6.8.9-main-05-02-2024-9facd0f8/MdmMetricsExtension.2.2024.419.1535.nupkg" -OutFile /installation/ME/mdmmetricsextension.2.2024.419.1535.zip
Expand-Archive -Path /installation/ME/mdmmetricsextension.2.2024.419.1535.zip -Destination /installation/ME/
Move-Item /installation/ME/MetricsExtension /opt/metricextension/
}
catch {
Expand Down
2 changes: 1 addition & 1 deletion otelcollector/scripts/ccpsetup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ mkdir /opt/microsoft/linuxmonagent

# Install ME
echo "Installing Metrics Extension..."
sudo tdnf install -y metricsext2-2.2024.328.1744
sudo tdnf install -y metricsext2-2.2024.419.1535
sudo tdnf list installed | grep metricsext2 | awk '{print $2}' > metricsextversion.txt

# Remove any RPMs downloaded not from Mariner
Expand Down
2 changes: 1 addition & 1 deletion otelcollector/scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ cp /etc/cron.daily/logrotate /etc/cron.hourly/

# Install ME
echo "Installing Metrics Extension..."
sudo tdnf install -y metricsext2-2.2024.328.1744
sudo tdnf install -y metricsext2-2.2024.419.1535
sudo tdnf list installed | grep metricsext2 | awk '{print $2}' > metricsextversion.txt

# tdnf does not have an autoremove feature. Only necessary packages are copied over to distroless build. Below reduces the image size if using non-distroless
Expand Down

0 comments on commit 07438b2

Please sign in to comment.