The code path for handling non-JSON output from etcd was broken:
- It did not skip over already parsed JSON output.
- It got stuck in the wrong for loop and repeatedly tried
parsing the same non-JSON output. This prevented test shutdown.
Not sure why yet, but in the branch with DRA v1 graduation the following error
started to show up for the first time (?!):
2025/07/18 19:24:48 WARNING: [core] [Server #3]grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
Previously, etcd wrote to stderr in JSON format:
{"level":"warn","ts":"2025-04-11T03:32:06.676527Z","caller":"embed/config.go:689","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2025-04-11T03:32:06.676707Z","caller":"embed/config.go:689","msg":"Running http and grpc server on single port. This is not recommended for production."}
{"level":"warn","ts":"2025-04-11T03:32:06.677056Z","caller":"etcdmain/etcd.go:146","msg":"failed to start etcd","error":"listen tcp 127.0.0.1:37803: bind: address already in use"}
{"level":"fatal","ts":"2025-04-11T03:32:06.677104Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"listen tcp 127.0.0.1:37803: bind: address already in use","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
This has several drawbacks:
- Not very readable.
- When used in tests which start etcd themselves (for example, scheduler_perf),
the output is not associated with the current test.
- Contains warnings that are confusing for developers who don't know that they
are harmless.
Intercepting output, parsing it and reformating makes the output nicer. Instead
of a mixture of JSON messages (see above) and normal test output, we now get
the etcd output embedded inside the test output. We can also filter out some
known harmless messages. Cleaning up more output or avoiding it in the first
place might be a good next step.
With `go test -v ./test/integration/scheduler_perf/dra`:
=== RUN TestSchedulerPerf
=== RUN TestSchedulerPerf/SchedulingWithResourceClaimTemplate
=== RUN TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast
I0411 13:21:03.353458 65212 feature_gate.go:385] feature gates: {map[SchedulerQueueingHints:false]}
...
I0411 13:21:10.552975 65212 cidrallocator.go:210] stopping ServiceCIDR Allocator Controller
I0411 13:21:10.567327 65212 etcd.go:210] "etcd output" logger="TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast" error="accept tcp 127.0.0.1:42245: use of closed network connection" level="warn" ts="2025-04-11T13:21:10.567045+0200" caller="embed/serve.go:160" msg="stopping insecure grpc server due to error"
I0411 13:21:10.567398 65212 etcd.go:210] "etcd output" logger="TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast" ts="2025-04-11T13:21:10.567198+0200" caller="embed/serve.go:162" msg="stopped insecure grpc server due to error" error="accept tcp 127.0.0.1:42245: use of closed network connection" level="warn"
I0411 13:21:15.567917 65212 etcd.go:227] "etcd didn't exit in 5 seconds, killing it" logger="TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast"
I0411 13:21:15.567964 65212 etcd.go:234] "etcd exited" logger="TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast" err="signal: terminated"
With per-test output `go test -v ./test/integration/scheduler_perf/dra -args -use-testing-log`:
=== RUN TestSchedulerPerf
=== RUN TestSchedulerPerf/SchedulingWithResourceClaimTemplate
=== RUN TestSchedulerPerf/SchedulingWithResourceClaimTemplate/fast
I0411 13:19:03.540497 28645 feature_gate.go:385] feature gates: {map[DynamicResourceAllocation:true]}
...
I0411 13:19:10.519994 28645 cidrallocator.go:210] stopping ServiceCIDR Allocator Controller
etcd.go:210: I0411 13:19:10.533131] etcd output msg="stopping insecure grpc server due to error" error="accept tcp 127.0.0.1:46637: use of closed network connection" level="warn" ts="2025-04-11T13:19:10.532900+0200" caller="embed/serve.go:160"
etcd.go:210: I0411 13:19:10.533274] etcd output msg="stopped insecure grpc server due to error" error="accept tcp 127.0.0.1:46637: use of closed network connection" level="warn" ts="2025-04-11T13:19:10.533022+0200" caller="embed/serve.go:162"
etcd.go:227: I0411 13:19:15.533715] etcd didn't exit in 5 seconds, killing it
etcd.go:234: I0411 13:19:15.533803] etcd exited err="signal: terminated"
Choosing a port in advance is racy. A better solution is to use a Unix Domain
socket in the per-etcd-instance data directory. Then the name can be determined
in advance and there's no risk of conflicts with other etcd instances.
With unix:// for the endpoint, we have to be a bit more careful about
passing a TLS config to the etcd client library because for unix://, in
contrast to http://, it tries to use an incomplete config which
then fails to establish the connection.
The main goroutine always waited for the background goroutine, which itself
only proceeded after the 5 second timeout. To tell the goroutine that it is
okay to stop immediately, the main goroutine must cancel the context as soon
as etcd has quit.
If the caller needs to reconfigure klog, then calling klog without proper
synchronizing while stopping causes a data race. We have to ensure that
the goroutine has terminated before stop returns.
Doing the initialization once was not good enough because it was not guaranteed
that RunCustomEtcd gets called early enough, before there are other goroutines
which use gRPC. The data race for
test/integration/apiserver.TestWatchCacheUpdatedByEtcd was:
WARNING: DATA RACE
Read at 0x00000cfffb90 by goroutine 140052:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/grpclog.go:41 +0x30
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.(*componentData).V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/component.go:103 +0x4e
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.(*http2Client).Close()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:955 +0xca
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.(*http2Client).reader()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:1619 +0xbfb
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.newHTTP2Client.func11()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:394 +0x47
Previous write at 0x00000cfffb90 by goroutine 145643:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.SetLoggerV2()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/loggerv2.go:75 +0x104
k8s.io/kubernetes/test/integration/framework.RunCustomEtcd.func2()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/framework/etcd.go:157 +0x33
sync.(*Once).doSlow()
/usr/local/go/src/sync/once.go:74 +0x101
sync.(*Once).Do()
/usr/local/go/src/sync/once.go:65 +0x46
k8s.io/kubernetes/test/integration/framework.RunCustomEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/framework/etcd.go:156 +0xb97
k8s.io/kubernetes/test/integration/apiserver.multiEtcdSetup()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:41 +0xc4
k8s.io/kubernetes/test/integration/apiserver.TestWatchCacheUpdatedByEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:92 +0xa9
testing.tRunner()
/usr/local/go/src/testing/testing.go:1576 +0x216
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1629 +0x47
grpclog.SetLoggerV is not thread-safe and may only be called before code starts
using GRPC. Calling RunCustomEtcd multiple times, for example in
k8s.io/kubernetes/test/integration/apiserver.TestWatchCacheUpdatedByEtcd,
causes a data race:
WARNING: DATA RACE
Read at 0x00000c8e8d20 by goroutine 135612:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/grpclog.go:41 +0x30
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.(*componentData).V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/component.go:103 +0x4e
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.(*loopyWriter).run.func1()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/controlbuf.go:528 +0xf1
runtime.deferreturn()
/home/prow/go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/runtime/panic.go:476 +0x32
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.newHTTP2Client.func6()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:442 +0x112
Previous write at 0x00000c8e8d20 by goroutine 140228:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.SetLoggerV2()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/loggerv2.go:76 +0xc6a
k8s.io/kubernetes/test/integration/framework.RunCustomEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/framework/etcd.go:153 +0xb89
k8s.io/kubernetes/test/integration/apiserver.multiEtcdSetup()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:40 +0xac
k8s.io/kubernetes/test/integration/apiserver.TestWatchCacheUpdatedByEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:88 +0x4a
testing.tRunner()
/home/prow/go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/testing/testing.go:1576 +0x216
testing.(*T).Run.func1()
/home/prow/go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/testing/testing.go:1629 +0x47
In contrast to EtcdMain, it can be called by individual tests or benchmarks and
each caller will get a fresh etcd instance. However, it uses the same
underlying code and the same port for all instances, so tests cannot run in
parallel.
It provides more readable output and has additional APIs for using it inside a
unit test. goleak.IgnoreCurrent is needed to filter out the goroutine that gets
started when importing go.opencensus.io/stats/view.
In order to handle background goroutines that get created on demand and cannot
be stopped (like the one for LogzHealth), a helper function ensures that those
are running before calling goleak.IgnoreCurrent. Keeping those goroutines
running is not a problem and thus not worth the effort of adding new APIs to
stop them.
Other goroutines are genuine leaks for which no fix is available. Those get
suppressed via IgnoreTopFunction, which works as long as that function
is unique enough.
Example output for the leak fixed in https://github.com/kubernetes/kubernetes/pull/115423:
E0202 09:30:51.641841 74789 etcd.go:205] "EtcdMain goroutine check" err=<
found unexpected goroutines:
[Goroutine 4889 in state chan receive, with k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop on top of the stack:
goroutine 4889 [chan receive]:
k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc0076183c0)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:268 +0x65
created by k8s.io/apimachinery/pkg/watch.NewBroadcaster
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/watch/mux.go:77 +0x116
>
we are saving this information in an env variable `KUBE_INTEGRATION_ETCD_URL`
So just pick it up from there when needed. Currently when someone uses
framework.RunCustomEtcd directly, the global variable is *not* set and the
code that uses `GetEtcdURL` returns empty string.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
- add feature gate
- add encrypted object and run generated_files
- generate protobuf for encrypted object and add unit tests
- move parse endpoint to util and refactor
- refactor interface and remove unused interceptor
- add protobuf generate to update-generated-kms.sh
- add integration tests
- add defaulting for apiVersion in kmsConfiguration
- handle v1/v2 and default in encryption config parsing
- move metrics to own pkg and reuse for v2
- use Marshal and Unmarshal instead of serializer
- add context for all service methods
- check version and keyid for healthz
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
when running integration tests without an external etcd, the framework
spawns an etcd instance executing it in its own process and killing
it once the test stops.
Instead of killing it directly, allow etcd to exit gracefully or kill
it after 5 seconds.
By parsing flags in the test's main function before starting etcd we bail out
early without ever starting etcd when the test was invoked with -help.
Otherwise etcd must be available, gets started and then hangs because
flag.Parse itself exits when called by testing.go. This bypasses the code in
EtcdMain which normally stops etcd.
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
* github.com/kubernetes/repo-infra
* k8s.io/gengo/
* k8s.io/kube-openapi/
* github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods
Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
The EtcdMain function in integration tests is designed to launch an etcd
instance only when running in a bazel test. For non-bazel, we rely on
hack/make-rules/test-integration.sh to bring up the etcd instance.
This patch fixes the following in EtcdMain:
1. If etcd is not found in ${RUNFILES_DIR} then look in ${PATH}.
2. Try to connect to the etcd started by `make test-integraion`; if it
is up, then don't start etcd.
3. Gracefully shut down etcd after tests.
4. Get a port from the OS instead of deriving it from argv[0].
5. Don't use sync.Once.
The benefit of this change is that integration tests work with `go test`
as well as `make test-integration` without users needing to do anything
special. That makes it much easier to pass go testing flags to tests and
integrate with IDEs.
abstract out etcd server creation
test/integration/framework: cleanup master_utils.go
kube-apiserver: move StartTestServer tests into test/integration/master
Fix the failing scale test
kube-apiserver's TestServer now returns a struct instead of individual values