-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(eventgenerator): use PromQL API for metric types throughput
and responsetime
#2890
Conversation
} | ||
|
||
var _ EnvelopeProcessor = &Processor{} | ||
|
||
type Processor struct { | ||
logger lager.Logger | ||
collectionInterval time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed collectionInterval
because it was completely unused in the Processor
@@ -34,7 +34,7 @@ func NewMetricServerClient(logger lager.Logger, url string, httpClient *http.Cli | |||
} | |||
} | |||
func (c *MetricServerClient) GetMetrics(appId string, metricType string, startTime time.Time, endTime time.Time) ([]models.AppInstanceMetric, error) { | |||
c.logger.Debug("GetMetric") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: fixed debug message to be aligned with actual function name
@@ -65,7 +65,7 @@ func (m *MetricPoller) retrieveMetric(appMonitor *models.AppMonitor) error { | |||
metrics, err := m.metricClient.GetMetrics(appId, metricType, startTime, endTime) | |||
m.logger.Debug("received metrics from metricClient", lager.Data{"retrievedMetrics": metrics}) | |||
if err != nil { | |||
return fmt.Errorf("retriveMetric Failed: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: typo fix
func (p Processor) GetTimerMetrics(envelopes []*loggregator_v2.Envelope, appID string, currentTimestamp int64) []models.AppInstanceMetric { | ||
p.logger.Debug("GetTimerMetrics") | ||
p.logger.Debug("Compacted envelopes", lager.Data{"Envelopes": envelopes}) | ||
return GetHttpStartStopInstanceMetrics(envelopes, appID, currentTimestamp, p.collectionInterval) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to process the timer metrics ourself since log-cache does it now for us after switching to a PromQL API call 👼
@@ -91,10 +83,10 @@ func GetHttpStartStopInstanceMetrics(envelopes []*loggregator_v2.Envelope, appID | |||
var metrics []models.AppInstanceMetric | |||
|
|||
numRequestsPerAppIdx := calcNumReqs(envelopes) | |||
sumReponseTimesPerAppIdx := calcSumResponseTimes(envelopes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: typo fix
|
||
throughputMetrics := getThroughputInstanceMetrics(envelopes, appID, numRequestsPerAppIdx, collectionInterval, currentTimestamp) | ||
responseTimeMetric := getResponsetimeInstanceMetrics(envelopes, appID, numRequestsPerAppIdx, sumReponseTimesPerAppIdx, currentTimestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated: typo fix
TLSConfig *tls.Config | ||
uaaCreds models.UAACreds | ||
url string | ||
collectionInterval time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added collectionInterval
to the client after removing it (https://github.com/cloudfoundry/app-autoscaler-release/pull/2890/files#r1579368032) since it's required for the PromQL API calls
Quality Gate failedFailed conditions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thank you!
…loggregator mode # Issue In #2890 the acceptance tests for the `throughput` and `responsetime` tests have been made more strict. Since then the acceptance tests testing the legacy scenario of integrating with `loggregator` have been failing in the CI. Note: In production environments the tests have not been failing. # Fix As using the `loggregator` is no longer recommended, we skip `throughput` and `responsetime` acceptance tests in the loggregator CI tests.
Problem
After the log-cache enablement with 3a, we've received a bug stating that
responsetime
andthroughput
are no longer working. It turned out, that the envelopes are being read via REST API which has a default of 100 maximum envelopes per response:app-autoscaler-release/src/autoscaler/eventgenerator/client/log_cache_client.go
Line 98 in d02b7b8
Using 100 envelopes to calculate the average
throughput
andresponsetime
for the last x seconds breaks the calculations. The following example shows that the limited set of envelopes causesthroughput
(a.k.a. requests per seconds) to not grow beyond 3:The reason why this wasn't discovered earlier is because the acceptance test for
throughput
only checks if the threshold grows above1
:app-autoscaler-release/src/acceptance/app/dynamic_policy_test.go
Line 180 in d02b7b8
Solution
Make use of the PromQL API of log-cache for
throughput
andresponsetime
: https://github.com/cloudfoundry/log-cache-release/tree/main/src#get-apiv1query.Using the PromQL API shifts the whole metric aggregation logic to log-cache and we can simply consume the results.