kubernetes

mirror of https://github.com/optim-enterprises-bv/kubernetes.git synced 2025-11-27 03:44:04 +00:00

Author	SHA1	Message	Date
Stephen Kitt	3124a13587	hack: configure Go environments where necessary These scripts rely on the system's default go; this changes that using kube::golang::setup_env so that the appropriate go is used when the system's isn't sufficient. Signed-off-by: Stephen Kitt <skitt@redhat.com>	2024-09-26 23:32:33 +02:00
Kubernetes Prow Robot	5ebd0da6cc	Merge pull request #127662 from macsko/make_scheduler_perf_sleepop_duration_parametrizable Make sleepOp duration parametrizable in scheduler_perf	2024-09-26 20:10:01 +01:00
Kubernetes Prow Robot	421436a94c	Merge pull request #127473 from dom4ha/fine-grain-qhints-fit feature(scheduler): more fine-grained Node QHint for NodeResourceFit plugin	2024-09-26 18:34:02 +01:00
Maciej Skoczeń	837d917d91	Make sleepOp duration parametrizable in scheduler_perf	2024-09-26 13:07:22 +00:00
dom4ha	c7db4bb450	Fine grain QueueHints for nodeaffinity plugin. Skip queue on unrelated change that keeps pod schedulable when QueueHints are enabled. Split add from QHints disabled case Remove case when QHints are disabled Remove two GHint alternatives in unit tests more fine-grained Node QHint for NodeResourceFit plugin Return early when updated Node causes unmatch Revert "more fine-grained Node QHint for NodeResourceFit plugin" This reverts commit dfbceb60e0c1c4e47748c12722d9ed6dba1a8366. Add integration test for requeue of a pod previously rejected by NodeAffinity plugin when a suitable Node is added Add integratin test for a Node update operation that does not trigger requeue in NodeAffinity plugin Remove innacurrate comment Apply review comments	2024-09-26 10:21:08 +00:00
tuhui1	c5e951946e	fix: enable formatter rule from testifylint in module test/utils	2024-09-26 18:00:12 +08:00
dom4ha	903b1f7e28	more fine-grained Node QHint for NodeResourceFit plugin	2024-09-26 09:51:36 +00:00
Kubernetes Prow Robot	239802e4f7	Merge pull request #127574 from bouaouda-achraf/e2e-test-add-network-subnet-param feat(test-e2e): support custom network and subnet on remote e2e mode	2024-09-26 03:50:08 +01:00
Jefftree	dacc2e1f5d	Allow emulation version to be set in integration test	2024-09-25 22:01:15 -04:00
Kubernetes Prow Robot	b62d364195	Merge pull request #127200 from omerap12/version_fg_apiserver chore: moving apiserver featuregates to versioned.	2024-09-26 02:19:28 +01:00
Kubernetes Prow Robot	45676184d4	Merge pull request #127560 from macsko/add_updateanyop_to_scheduler_perf Add updateAnyOp to scheduler_perf	2024-09-26 00:47:28 +01:00
Kubernetes Prow Robot	2b196cff8b	Merge pull request #127589 from soltysh/timestamp_e2e e2e: add test covering cronjob-scheduled-timestamp annotation added by cronjob	2024-09-25 17:46:09 +01:00
Maciej Skoczeń	40154baab0	Add updateAnyOp to scheduler_perf	2024-09-25 12:42:25 +00:00
Kubernetes Prow Robot	5fc4e71a30	Merge pull request #127499 from pohly/scheduler-perf-updates scheduler_perf: updates to enhance performance testing of DRA	2024-09-25 13:32:00 +01:00
Maciej Szulik	f11ddad99d	e2e: add test covering cronjob-scheduled-timestamp annotation added by cronjob	2024-09-25 12:47:27 +02:00
Kubernetes Prow Robot	75214d11d5	Merge pull request #127428 from googs1025/scheduler/plugin chore(scheduler): refactor import package ordering in scheduler	2024-09-25 11:40:07 +01:00
Lukasz Szaszkiewicz	ae35048cb0	adds watchListEndpointRestrictions for watchlist requests (#126996 ) * endpoints/handlers/get: intro watchListEndpointRestrictions * consistencydetector/list_data_consistency_detector: expose IsDataConsistencyDetectionForListEnabled * e2e/watchlist: extract common function for adding unstructured secrets * e2e/watchlist: new e2e scenarios for convering watchListEndpointRestrict	2024-09-25 10:12:01 +01:00
Patrick Ohly	d100768d94	scheduler_perf: track and visualize progress over time This is useful to see whether pod scheduling happens in bursts and how it behaves over time, which is relevant in particular for dynamic resource allocation where it may become harder at the end to find the node which still has resources available. Besides "pods scheduled" it's also useful to know how many attempts were needed, so schedule_attempts_total also gets sampled and stored. To visualize the result of one or more test runs, use: gnuplot.sh *.dat	2024-09-25 11:09:15 +02:00
Patrick Ohly	ded96042f7	scheduler_perf + DRA: load up cluster by allocating claims Having to schedule 4999 pods to simulate a "full" cluster is slow. Creating claims and then allocating them more or less like the scheduler would when scheduling pods is much faster and in practice has the same effect on the dynamicresources plugin because it looks at claims, not pods. This allows defining the "steady state" workloads with higher number of devices ("claimsPerNode") again. This was prohibitively slow before.	2024-09-25 09:45:39 +02:00
Patrick Ohly	385599f0a8	scheduler_perf + DRA: measure pod scheduling at a steady state The previous tests were based on scheduling pods until the cluster was full. This is a valid scenario, but not necessarily realistic. More realistic is how quickly the scheduler can schedule new pods when some old pods finished running, in particular in a cluster that is properly utilized (= almost full). To test this, pods must get created, scheduled, and then immediately deleted. This can run for a certain period of time. Scenarios with empty and full cluster have different scheduling rates. This was previously visible for DRA because the 50% percentile of the scheduling throughput was lower than the average, but one had to guess in which scenario the throughput was lower. Now this can be measured for DRA with the new SteadyStateClusterResourceClaimTemplateStructured test. The metrics collector must watch pod events to figure out how many pods got scheduled. Polling misses pods that already got deleted again. There seems to be no relevant difference in the collected metrics (SchedulingWithResourceClaimTemplateStructured/2000pods_200nodes, 6 repetitions): │ before │ after │ │ SchedulingThroughput/Average │ SchedulingThroughput/Average vs base │ 157.1 ± 0% 157.1 ± 0% ~ (p=0.329 n=6) │ before │ after │ │ SchedulingThroughput/Perc50 │ SchedulingThroughput/Perc50 vs base │ 48.99 ± 8% 47.52 ± 9% ~ (p=0.937 n=6) │ before │ after │ │ SchedulingThroughput/Perc90 │ SchedulingThroughput/Perc90 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc95 │ SchedulingThroughput/Perc95 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6) │ before │ after │ │ SchedulingThroughput/Perc99 │ SchedulingThroughput/Perc99 vs base │ 463.9 ± 16% 460.1 ± 13% ~ (p=0.818 n=6)	2024-09-25 09:45:39 +02:00
Patrick Ohly	51cafb0053	scheduler_perf: more useful errors for configuration mistakes Before, the first error was reported, which typically was the "invalid op code" error from the createAny operation: scheduler_perf.go:900: parsing test cases error: error unmarshaling JSON: while decoding JSON: cannot unmarshal {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into any known op type: invalid opcode "createPods"; expected "createAny" Now the opcode is determined first, then decoding into exactly the matching operation is tried and validated. Unknown fields are an error. In the case above, decoding a string into time.Duration failed: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"collectMetrics":true,"count":10,"duration":"30s","namespace":"test","opcode":"createPods","podTemplatePath":"config/dra/pod-with-claim-template.yaml","steadyState":true} into benchmark.createPodsOp: json: cannot unmarshal string into Go struct field createPodsOp.Duration of type time.Duration Some typos: scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: unknown opcode "sleeep" in {"duration":"5s","opcode":"sleeep"} scheduler_test.go:29: parsing test cases error: error unmarshaling JSON: while decoding JSON: decoding {"countParram":"$deletingPods","deletePodsPerSecond":50,"opcode":"createPods"} into benchmark.createPodsOp: json: unknown field "countParram"	2024-09-25 09:45:39 +02:00
Omer Aplatony	ade7305940	chore: moving apiserver featuregates to versioned Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-09-25 07:41:26 +03:00
Kubernetes Prow Robot	5dd244ff00	Merge pull request #125796 from haorenfsa/fix-gc-sync-blocked garbagecollector: controller should not be blocking on failed cache sync	2024-09-25 04:02:00 +01:00
Kubernetes Prow Robot	e9cde03b91	Merge pull request #127598 from aojea/servicecidr_seconday_dualwrite bugfix: initialize secondary range registry with the right value	2024-09-24 21:08:08 +01:00
Antonio Ojea	7a9bca3888	bugfix: initialize secondary range registry with the right value When MultiCIDRServiceAllocator feature is enabled, we added an additional feature gate DisableAllocatorDualWrite that allows to enable a mirror behavior on the old allocator to deal with problems during cluster upgrades. During the implementation the secondary range of the legacy allocator was initialized with the valuye of the primary range, hence, when a Service tried to allocate a new IP on the secondary range, it succeded in the new ip allocator but failed when it tried to allocate the same IP on the legacy allocator, since it has a different range. Expand the integration test that run over all the combinations of Service ClusterIP possibilities to run with all the possible combinations of the feature gates. The integration test need to change the way of starting the apiserver otherwise it will timeout.	2024-09-24 17:48:13 +00:00
Patrick Ohly	7bbb3465e5	scheduler_perf: more realistic structured parameters tests Real devices are likely to have a handful of attributes and (for GPUs) the memory as capacity. Most keys will be driver specific, a few may eventually have a domain (none standardized right now).	2024-09-24 18:52:45 +02:00
Kubernetes Prow Robot	b071443187	Merge pull request #127592 from dims/wait-for-gpus-even-for-aws-kubetest2-ec2-harness Wait for GPUs even for AWS kubetest2 ec2 harness	2024-09-24 17:26:08 +01:00
Davanum Srinivas	472ca3b279	skip control plane nodes, they may not have GPUs Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-24 10:09:33 -04:00
Kubernetes Prow Robot	6ded721910	Merge pull request #127496 from macsko/add_metricscollectionop_to_scheduler_perf Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 14:34:00 +01:00
Davanum Srinivas	349c7136c9	Wait for GPUs even for AWS kubetest2 ec2 harness Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-24 09:11:18 -04:00
Maciej Skoczeń	a273e5381a	Add separate ops for collecting metrics from multiple namespaces in scheduler_perf	2024-09-24 12:28:53 +00:00
Kubernetes Prow Robot	f0036aac21	Merge pull request #127572 from soltysh/reuse_helper Reuse CreateTestCRD helper for kubectl e2e	2024-09-24 06:05:59 +01:00
Kubernetes Prow Robot	94df29b8f2	Merge pull request #127464 from sanposhiho/trigger-nodedelete fix(eventhandler): trigger Node/Delete event	2024-09-24 02:24:00 +01:00
Kubernetes Prow Robot	1137a6a0cc	Merge pull request #127093 from jpbetz/retry-generate-name-ga Promote RetryGenerateName to GA	2024-09-24 00:46:06 +01:00
Kubernetes Prow Robot	d6bb550b10	Merge pull request #122890 from HirazawaUi/fix-pod-grace-period [kubelet]: Fix the bug where pod grace period will be overwritten	2024-09-24 00:45:59 +01:00
Achraf BOUAOUDA	d900efafcc	feat(test-e2e): support custom network and subnet on remote e2e mode	2024-09-24 00:25:41 +02:00
Kubernetes Prow Robot	7ff0580bc8	Merge pull request #127458 from ii/promote-volume-attachment-status-test Promote e2e test for VolumeAttachmentStatus Endpoints +3 Endpoints	2024-09-23 18:08:00 +01:00
Maciej Szulik	b51d6308a7	Reuse CreateTestCRD helper for kubectl e2e	2024-09-23 18:32:27 +02:00
Kubernetes Prow Robot	ff391cefe2	Merge pull request #127547 from dims/skip-reinstallation-of-gpu-daemonset Skip re-installation of GPU daemonset	2024-09-23 15:28:00 +01:00
Davanum Srinivas	2fbbca6279	Remove remants of broken stuff - nvidia/autoscaling Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-23 13:17:00 +00:00
Kubernetes Prow Robot	f187480140	Merge pull request #127558 from pohly/e2e-framework-docs e2e framework: better documentation of ExpectNoError	2024-09-23 14:12:00 +01:00
moriya	cd0e0fc881	add_test	2024-09-23 21:49:09 +09:00
moriya	090145aadf	add_non_queued_pod	2024-09-23 21:24:09 +09:00
Kubernetes Prow Robot	15d08bf7c8	Merge pull request #127323 from vrutkovs/tracing-cacher-get tracing: add span for get cacher	2024-09-23 10:27:59 +01:00
Patrick Ohly	e5aa609513	e2e framework: better documentation of ExpectNoError It wasn't clear from the comments what "explain" does, leading to calls like this: framework.ExpectNoError(fmt.Errorf("additional info ....: %v", ..., err))	2024-09-23 10:58:06 +02:00
Kubernetes Prow Robot	89f418f29e	Merge pull request #127481 from kannon92/fix-mount-propogation-flake Use the last kubelet pid in the pidof command	2024-09-23 09:05:59 +01:00
Davanum Srinivas	1abbb00067	Double a couple of other timeouts Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-22 19:36:39 -04:00
Davanum Srinivas	92683139d7	Skip re-installation of GPU daemonset Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2024-09-22 13:54:12 -04:00
Kensei Nakada	421f87a4e3	feat: add a requeueing integration test for PodTopologySpread with Node/delete event (QHint: disabled)	2024-09-23 00:29:56 +09:00
Kensei Nakada	bf8f7a3ad7	feat: add a requeueing integration test for PodTopologySpread with Node/delete event	2024-09-22 17:34:37 +09:00

... 3 4 5 6 7 ...

26230 Commits