fix(infra): Bust cache for already-reported min/max metrics (#8026)

On staging (only), we see these logs spammed:


```
{400, "{\n  \"error\": {\n    \"code\": 400,\n    \"message\": \"One or more TimeSeries could not be written: timeSeries[22]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/queue_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[11]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/max must be CUMULATIVE, but is GAUGE.; timeSeries[8]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/idle_time/max must be CUMULATIVE, but is GAUGE.; timeSeries[7]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/idle_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[10]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[14]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/max must be CUMULATIVE, but is GAUGE.; timeSeries[13]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[16]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[23]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/queue_time/max must be CUMULATIVE, but is GAUGE.; timeSeries[20]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/max must be CUMULATIVE, but is GAUGE.; timeSeries[19]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/min must be CUMULATIVE, but is GAUGE.; timeSeries[17]: Metric kind for metric custom.googleapis.com/elixir/domain/repo/query/query_time/max must be CUMULATIVE, but is GAUGE.\",\n    \"status\": \"INVALID_ARGUMENT\",\n    \"details\": [\n      {\n        \"@type\": \"type.googleapis.com/google.monitoring.v3.CreateTimeSeriesSummary\",\n        \"totalPointCount\": 36,\n        \"successPointCount\": 24,\n        \"errors\": [\n          {\n            \"status\": {\n              \"code\": 3\n            },\n            \"pointCount\": 12\n          }\n        ]\n      }\n    ]\n  }\n}\n"}
```

This does not happen on prod. As far as I can tell, GCP will cache the
initial metric kind used when the metric ID was first used, which
appears to be `CUMULATIVE` here.

The correct metric kind for these is `GAUGE` since they're reporting a
min/max value.

Since GCP doesn't support resetting the auto-defined metric kind of a
particular metric ID, we need to alter the ID to "bust" the cache and
create a new definition.
This commit is contained in:
Jamil
2025-02-05 10:30:04 -08:00
committed by GitHub
parent 7a5354ba36
commit d1761e5a5d
2 changed files with 4 additions and 4 deletions

View File

@@ -376,7 +376,7 @@ defmodule Domain.Telemetry.GoogleCloudMetricsReporter do
},
%{
metric: %{
type: "custom.googleapis.com/elixir/#{Enum.join(name, "/")}/min",
type: "custom.googleapis.com/elixir/#{Enum.join(name, "/")}/min_val",
labels: labels
},
resource: resource,
@@ -392,7 +392,7 @@ defmodule Domain.Telemetry.GoogleCloudMetricsReporter do
},
%{
metric: %{
type: "custom.googleapis.com/elixir/#{Enum.join(name, "/")}/max",
type: "custom.googleapis.com/elixir/#{Enum.join(name, "/")}/max_val",
labels: labels
},
resource: resource,

View File

@@ -347,7 +347,7 @@ defmodule Domain.Telemetry.GoogleCloudMetricsReporterTest do
%{
"metric" => %{
"labels" => %{"app" => "myapp", "foo" => "bar"},
"type" => "custom.googleapis.com/elixir/foo/min"
"type" => "custom.googleapis.com/elixir/foo/min_val"
},
"metricKind" => "GAUGE",
"points" => [
@@ -365,7 +365,7 @@ defmodule Domain.Telemetry.GoogleCloudMetricsReporterTest do
%{
"metric" => %{
"labels" => %{"app" => "myapp", "foo" => "bar"},
"type" => "custom.googleapis.com/elixir/foo/max"
"type" => "custom.googleapis.com/elixir/foo/max_val"
},
"metricKind" => "GAUGE",
"points" => [