kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-03 11:48:15 +00:00

Author	SHA1	Message	Date
Kubernetes Prow Robot	2b196cff8b	Merge pull request #127589 from soltysh/timestamp_e2e e2e: add test covering cronjob-scheduled-timestamp annotation added by cronjob	2024-09-25 17:46:09 +01:00
Kubernetes Prow Robot	5fc4e71a30	Merge pull request #127499 from pohly/scheduler-perf-updates scheduler_perf: updates to enhance performance testing of DRA	2024-09-25 13:32:00 +01:00
Maciej Szulik	f11ddad99d	e2e: add test covering cronjob-scheduled-timestamp annotation added by cronjob	2024-09-25 12:47:27 +02:00
Kubernetes Prow Robot	75214d11d5	Merge pull request #127428 from googs1025/scheduler/plugin chore(scheduler): refactor import package ordering in scheduler	2024-09-25 11:40:07 +01:00
Lukasz Szaszkiewicz	ae35048cb0	adds watchListEndpointRestrictions for watchlist requests (#126996 ) * endpoints/handlers/get: intro watchListEndpointRestrictions * consistencydetector/list_data_consistency_detector: expose IsDataConsistencyDetectionForListEnabled * e2e/watchlist: extract common function for adding unstructured secrets * e2e/watchlist: new e2e scenarios for convering watchListEndpointRestrict	2024-09-25 10:12:01 +01:00
Patrick Ohly	d100768d94	scheduler_perf: track and visualize progress over time This is useful to see whether pod scheduling happens in bursts and how it behaves over time, which is relevant in particular for dynamic resource allocation where it may become harder at the end to find the node which still has resources available. Besides "pods scheduled" it's also useful to know how many attempts were needed, so schedule_attempts_total also gets sampled and stored. To visualize the result of one or more test runs, use: gnuplot.sh *.dat	2024-09-25 11:09:15 +02:00
Patrick Ohly	ded96042f7	scheduler_perf + DRA: load up cluster by allocating claims Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating claims and then allocating them more or less like the scheduler would when scheduling pods is much faster and in practice has the same effect on the dynamicresources plugin because it looks at claims, not pods. This allows defining the "steady state" workloads with higher number of devices ("claimsPerNode") again. This was prohibitively slow before.	2024-09-25 09:45:39 +02:00
Patrick Ohly	385599f0a8	scheduler_perf + DRA: measure pod scheduling at a steady state The previous tests were based on scheduling pods until the cluster was full. This is a valid scenario, but not necessarily realistic. More realistic is how quickly the scheduler can schedule new pods when some old pods finished running, in particular in a cluster that is properly utilized (= almost full). To test this, pods must get created, scheduled, and then immediately deleted. This can run for a certain period of time. Scenarios with empty and full cluster have different scheduling rates. This was previously visible for DRA because the 50% percentile of the scheduling throughput was lower than the average, but one had to guess in which scenario the throughput was lower. Now this can be measured for DRA with the new SteadyStateClusterResourceClaimTemplateStructured test. The metrics collector must watch pod events to figure out how many pods got scheduled. Polling misses pods that already got deleted again. There seems to be no relevant difference in the collected metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions): │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ 157.1 ± 0% 157.1 ± 0% ~ (p=0.329 n=6) │ before │ after │ │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50 vs base │ 48.99 ± 8% 47.52 ± 9% ~ (p=0.937 n=6) │ before │ after │ │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6)	2024-09-25 09:45:39 +02:00
Patrick Ohly	51cafb0053	scheduler_perf: more useful errors for configuration mistakes Before, the first error was reported, which typically was the "invalid op code" error from the createAny operation: scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny" Now the opcode is determined first, then decoding into exactly the matching operation is tried and validated. Unknown fields are an error. In the case above, decoding a string into time.Duration failed: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration Some typos: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"} scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into benchmark.createPodsOp: json: unknown field "countParram"	2024-09-25 09:45:39 +02:00
Kubernetes Prow Robot	5dd244ff00	Merge pull request #125796 from haorenfsa/fix-gc-sync-blocked garbagecollector: controller should not be blocking on failed cache sync	2024-09-25 04:02:00 +01:00
Kubernetes Prow Robot	e9cde03b91	Merge pull request #127598 from aojea/servicecidr_seconday_dualwrite bugfix: initialize secondary range registry with the right value	2024-09-24 21:08:08 +01:00
Antonio Ojea	7a9bca3888	bugfix: initialize secondary range registry with the right value When MultiCIDRServiceAllocator feature is enabled, we added an additional feature gate DisableAllocatorDualWrite that allows to enable a mirror behavior on the old allocator to deal with problems during cluster upgrades. During the implementation the secondary range of the legacy allocator was initialized with the valuye of the primary range, hence, when a Service tried to allocate a new IP on the secondary range, it succeded in the new ip allocator but failed when it tried to allocate the same IP on the legacy allocator, since it has a different range. Expand the integration test that run over all the combinations of Service ClusterIP possibilities to run with all the possible combinations of the feature gates. The integration test need to change the way of starting the apiserver otherwise it will timeout.	2024-09-24 17:48:13 +00:00
Patrick Ohly	7bbb3465e5	scheduler_perf: more realistic structured parameters tests Real devices are likely to have a handful of attributes and (for GPUs) the memory as capacity. Most keys will be driver specific, a few may eventually have a domain (none standardized right now).	2024-09-24 18:52:45 +02:00
Kubernetes Prow Robot	b071443187	Merge pull request #127592 from dims/wait-for-gpus-even-for-aws-kubetest2-ec2-harness Wait for GPUs even for AWS kubetest2 ec2 harness	2024-09-24 17:26:08 +01:00
Davanum Srinivas	472ca3b279	skip control plane nodes, they may not have GPUs Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-24 10:09:33 -04:00
Kubernetes Prow Robot	6ded721910	Merge pull request #127496 from macsko/add_metricscollectionop_to_scheduler_perf Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 14:34:00 +01:00
Davanum Srinivas	349c7136c9	Wait for GPUs even for AWS kubetest2 ec2 harness Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-24 09:11:18 -04:00
Maciej Skoczeń	a273e5381a	Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 12:28:53 +00:00
Kubernetes Prow Robot	f0036aac21	Merge pull request #127572 from soltysh/reuse_helper Reuse CreateTestCRD helper for kubectl e2e	2024-09-24 06:05:59 +01:00
Kubernetes Prow Robot	94df29b8f2	Merge pull request #127464 from sanposhiho/trigger-nodedelete fix(eventhandler): trigger Node/Delete event	2024-09-24 02:24:00 +01:00
Kubernetes Prow Robot	1137a6a0cc	Merge pull request #127093 from jpbetz/retry-generate-name-ga Promote RetryGenerateName to GA	2024-09-24 00:46:06 +01:00
Kubernetes Prow Robot	d6bb550b10	Merge pull request #122890 from HirazawaUi/fix-pod-grace-period [kubelet]: Fix the bug where pod grace period will be overwritten	2024-09-24 00:45:59 +01:00
Kubernetes Prow Robot	7ff0580bc8	Merge pull request #127458 from ii/promote-volume-attachment-status-test Promote e2e test for VolumeAttachmentStatus Endpoints +3 Endpoints	2024-09-23 18:08:00 +01:00
Maciej Szulik	b51d6308a7	Reuse CreateTestCRD helper for kubectl e2e	2024-09-23 18:32:27 +02:00
Kubernetes Prow Robot	ff391cefe2	Merge pull request #127547 from dims/skip-reinstallation-of-gpu-daemonset Skip re-installation of GPU daemonset	2024-09-23 15:28:00 +01:00
Kubernetes Prow Robot	f187480140	Merge pull request #127558 from pohly/e2e-framework-docs e2e framework: better documentation of ExpectNoError	2024-09-23 14:12:00 +01:00
Kubernetes Prow Robot	15d08bf7c8	Merge pull request #127323 from vrutkovs/tracing-cacher-get tracing: add span for get cacher	2024-09-23 10:27:59 +01:00
Patrick Ohly	e5aa609513	e2e framework: better documentation of ExpectNoError It wasn't clear from the comments what "explain" does, leading to calls like this: framework.ExpectNoError(fmt.Errorf("additional info ....: %v", ..., err))	2024-09-23 10:58:06 +02:00
Kubernetes Prow Robot	89f418f29e	Merge pull request #127481 from kannon92/fix-mount-propogation-flake Use the last kubelet pid in the pidof command	2024-09-23 09:05:59 +01:00
Davanum Srinivas	1abbb00067	Double a couple of other timeouts Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-22 19:36:39 -04:00
Davanum Srinivas	92683139d7	Skip re-installation of GPU daemonset Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-22 13:54:12 -04:00
Kensei Nakada	421f87a4e3	feat: add a requeueing integration test for PodTopologySpread with Node/delete event (QHint: disabled)	2024-09-23 00:29:56 +09:00
Kensei Nakada	bf8f7a3ad7	feat: add a requeueing integration test for PodTopologySpread with Node/delete event	2024-09-22 17:34:37 +09:00
Kubernetes Prow Robot	61dbc03563	Merge pull request #127471 from macsko/add_deletepodsop_to_scheduler_perf Add deletePodsOp to scheduler_perf	2024-09-22 07:00:04 +01:00
Vadim Rutkovsky	dff0075e7c	tracing: add span for cacher.Get Also updates tracing integration tests for cacher.GetList	2024-09-21 09:53:43 +02:00
Kubernetes Prow Robot	221bf19ee0	Merge pull request #127309 from ii/create-csinode-lifecycle-test Write e2e test for StorageV1CSINode Endpoints +6 Endpoints	2024-09-21 03:59:59 +01:00
Kubernetes Prow Robot	a8fd8f5a41	Merge pull request #127516 from dims/bump-timeout-to-account-for-slow-gpu-operations Bump timeout to account for slow GPU daemonset activation	2024-09-21 02:55:58 +01:00
Davanum Srinivas	3d7d06e7cd	Bump timeout to account for slow GPU operations Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-20 20:52:51 -04:00
Kubernetes Prow Robot	52095a8b7b	Merge pull request #127509 from dims/test-more-gpu-stuff Test MOAR GPU stuff (add the cuda demo suite!)	2024-09-20 23:53:58 +01:00
Kubernetes Prow Robot	7a58803c84	Merge pull request #127281 from ii/remove-node-endpoints Remove Node endpoints from pending_eligible_endpoints.yaml	2024-09-20 22:50:04 +01:00
Kubernetes Prow Robot	f9a57ba82d	Merge pull request #126760 from ncdc/ncdc/emeritus Move ncdc to emeritus	2024-09-20 21:01:58 +01:00
Davanum Srinivas	e516e003c5	Test MOAR GPU stuff Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-20 11:40:33 -04:00
HirazawaUi	9d4e272c16	add e2e test for pod grace period being overridden	2024-09-20 22:25:03 +08:00
HirazawaUi	7c85784b9f	fix the bug where pod grace period will be overwritten	2024-09-20 22:25:01 +08:00
Kubernetes Prow Robot	f2700895a4	Merge pull request #127422 from srivastav-abhishek/go-vet-fix Go vet fixes for gotip	2024-09-20 14:37:58 +01:00
Kevin Hannon	9b6ef250fc	always use the last entry in the pidof command as that is the oldest	2024-09-20 09:05:31 -04:00
Maciej Skoczeń	287b61918a	Add deletePodsOp to scheduler_perf	2024-09-20 09:46:27 +00:00
Kubernetes Prow Robot	ffabcdc6d1	Merge pull request #127448 from Nordix/esotsal/fix-123852 Potentially deflake "RuntimeClass should reject a Pod requesting a deleted RuntimeClass" test	2024-09-20 08:07:43 +01:00
Abhishek Kr Srivastav	95860cff1c	Fix Go vet errors for master golang Co-authored-by: Rajalakshmi-Girish <rajalakshmi.girish1@ibm.com> Co-authored-by: Abhishek Kr Srivastav <Abhishek.kr.srivastav@ibm.com>	2024-09-20 12:36:38 +05:30
Kubernetes Prow Robot	08aefc8a92	Merge pull request #119362 from pacoxu/add-new-eviction-pid-test add new e2e test with PodAndContainerStatsFromCRI enabled for pid eviction order	2024-09-20 05:44:45 +01:00

1 2 3 4 5 ...

25997 Commits