Commit Graph

1790 Commits

Author SHA1 Message Date
Francesco Romani
70cce5e3f1 e2e: topomgr: introduce sriov setup/teardown funcs
Reorganize the code with setup and teardown functions,
to make room for the future addition of more device plugin
support, and to make the code a bit tidier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
2f0a6d2c76 e2e: topomgr: use constants for test limits
Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
fee1dba054 e2r: topomgr: improve the test logs
Add clarification to which test is doing what, to make
the test output easier to understand.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
83c344647f e2e: topomgr: better check for AffinityError
Add a helper function to check if a Pod failed
admission for Topology Affinity Error.
So far we only check the Status.Reason.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
512a4e8a3e e2e: topomgr: reduce node readiness timeout
Five minutes was initially used only to be overcautious.
From my experiments, the node is ready in usually less than a minute.
Double it to give some buffer space.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
3b4122bd03 e2e: topomgr: get and use topology hints from conf
TO properly implement some e2e tests, we need to know
some basic topology facts about the system running the tests.
The bare minimum we need to know is how many PCI SRIOV devices
are attached to which NUMA node.

This way we know which core we can reserve for kube services,
and which NUMA socket we can take to test full socket reservation.

To let the tests know the PCI device topology, we use annotations
in the SRIOV device plugin ConfigMap we need anyway.
The format is

```yaml
  metadata:
    annotations:
      pcidevice_node0: "2"
      pcidevice_node1: "0"
```

with one annotation per NUMA node in the system.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
d9d652e867 e2e: topomgr: initial negative tests
Negative tests is when we request a gu Pod we know the system cannot
fullfill - hence we expect rejection from the topology manager.

Unfortunately, besides the trivial case of excessive cores (request
more socket than a NUMA node provides) we cannot easily test the
devices, because crafting a proper pod will require detailed knowledge
of the hw topology.

Let's consider a hypotetical two-node NUMA system with two PCIe busses,
one per NUMA node, with a SRIOV device on each bus.
A proper negative test would require two SRIOV device, that the system
can provide but not on the same single NUMA node.
Requiring for example three devices (one more than the system provides)
will lead to a different, legitimate admission error.

For these reasons we bootstrap the testing infra for the negative tests,
but we add just the simplest one.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
ee92b4aae0 e2e: topomgr: add more positive tests
this patch builds on the topology manager e2e infrastructure to
add more positive e2e test cases.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
1b5801a086 e2e: topomgr: add option to specify the SRIOV conf
We cannot anticipate all the possible configurations
needed by the SRIOV device plugin: there is too much variety.

Hence, we need to allow the test environment to supply
a host-specific ConfigMap to properly configure the device
plugin and avoid false negatives.

We still provide a the default config map as fallback and reference.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
6687fcc78c e2e: topomgr: autodetect SRIOV resource to use
The SRIOV device plugin can create different resources depending
on both the hardware present on the system and the configuration.
As long as we have at least one SRIOV device, the tests don't actually
care about which specific device is.

Previously, the test hardcoded the most common intel SRIOV device
identifier. This patch lifts the restriction and let the test
autodetect and use what's available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
fa26fb6817 e2e: topomgr: check pod resource alignment
This patch extends and completes the previously-added
empty topology manager test for single-NUMA node policy
by adding reporting in the test pod and checking
the resource alignment.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
cd7e3d626c e2e: topomgr: add test infra
This patch all the testing infra and utilities needed
to run e2e topology manager tests. This include setup
a guaranteed pod which needs some devices.

The simplest real device available for the purpose
are the SRIOV devices, hence we use them.

This patch pulls the SRIOV device plugin from
the official, yet external, repository.
We do it as close as possible for the nvidia GPU plugin.

This patch also performs minor refactoring for some
test framework utilities, needed to support the new
e2e tests.

Finally, we add an empty e2e topology manager test,
to be completed by the next patch.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
1fdf262137 e2e: topomgr: explicit save the kubelet config
For the sake of readability, save the old Kubelet config
once.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Sakura
44bf3475ea Fix non-ascii characters in test/e2e_node and test/network.
Signed-off-by: Sakura <longfei.shang@daocloud.io>
2020-02-08 17:47:19 +08:00
Mike Danese
2637772298 some manual fixes 2020-02-07 18:17:40 -08:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
tanjunchen
7ff3a1f8db test/e2e/framework: remove skip.go and use e2eskipper subpackage 2020-02-01 01:18:48 +08:00
Kubernetes Prow Robot
b32725b80b Merge pull request #86413 from ohsewon/e2e_typo_fix
Fix cpu manager e2e test typo
2020-01-30 10:31:48 -08:00
Kubernetes Prow Robot
89714227ff Merge pull request #78819 from justaugustus/cni
cni: Update CNI version to v0.8.5
2020-01-30 02:12:14 -08:00
Mike Danese
d55d6175f8 refactor 2020-01-29 08:50:45 -08:00
Stephen Augustus
1174e6698e cni: Update CNI version to v0.8.5
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
2020-01-29 04:41:29 -05:00
Stephen Augustus
96f2588b61 cni: Update CNI download URLs to use new GCS bucket (k8s-artifacts-cni)
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
2020-01-29 02:32:22 -05:00
Kubernetes Prow Robot
4fc5254c2f Merge pull request #87456 from mattjmcnaughton/mattjmcnaughton/delete-todo-to-use-docker-client
Delete TODO to use docker client
2020-01-22 20:37:37 -08:00
Kubernetes Prow Robot
a06d16565c Merge pull request #86184 from vpickard/e2e-topologyManager
e2e-topology-manager: Initial commit for E2E tests
2020-01-22 08:41:14 -08:00
mattjmcnaughton
d6d08b152e Delete TODO to use docker client
Re conversation in https://github.com/kubernetes/kubernetes/pull/87373,
we should keep the current behavior (i.e. using the docker binary
instead of the docker client). Delete the TODO instructing us to change
the behavior.
2020-01-22 08:45:07 -05:00
mattjmcnaughton
16de853c5d Clean up TODO around running test as sudo
Re the TODO, this command no longer needs to be prefixed by `sudo`, as
the test is already running as `root`.
2020-01-18 13:36:57 -05:00
Davanum Srinivas
d7d316e1e7 switch to docker command line 2020-01-17 21:08:13 -05:00
Kubernetes Prow Robot
9277bac9b8 Merge pull request #87003 from odinuge/node-e2e-instance-failure
Add error check for instance insert in node e2e
2020-01-16 13:14:45 -08:00
zouyee
c1de3d6e5b fix ci-kubernetes-node-kubelet-serial Non-system critical priority classes are not allowed to have a value larger than HighestUserDefinablePriority
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2020-01-14 09:43:04 +08:00
Odin Ugedal
c04ead5fb1 Add error check for instance insert
Not all errors will happen in sync during Instances.Insert(...).Do(), so
it is important to verify the operation object to see why insert fails.
An example is when exceeding the resource quota.

Eg.
could not create instance test-cos-beta-80-12739-29-0: [&{Code:QUOTA_EXCEEDED Location: Message:Quota 'CPUS' exceeded.  Limit: 24.0 in region europe-west6. ForceSendFields:[] NullFields:[]}

This fixes the issue where tests will fail "silently" when instance
insert fails.
2020-01-09 09:49:38 +01:00
Kubernetes Prow Robot
9781bb60e0 Merge pull request #86438 from klueska/upstream-e2e-approver
Add klueska as an approver in test/e2e_node/OWNERS
2020-01-06 11:12:16 -08:00
tanjunchen
fc3b210ad8 if no cycle dependency , use framework in test/e2e_node subpackage 2020-01-02 15:52:05 +08:00
Kenichi Omichi
52ddae0267 Remove Delete/CreateSyncInNamespace()
DeleteSyncInNamespace() was used at an e2e node test and DeleteSync()
only. In addition, the part of the e2e node test can be replaced with
DeleteSync(). CreateSyncInNamespace() is the same thing and can be
replaced with CreateSync(). So this replaces these functions and
removes them for the cleanup.
2019-12-30 18:59:42 +00:00
Kubernetes Prow Robot
a097243cba Merge pull request #86062 from haosdent/clean-e2e-framework-gpu
e2e: move funs of framework/gpu to e2e_node
2019-12-28 21:23:39 -08:00
danielqsj
6596a14d39 add missing alias of api errors under test 2019-12-26 17:29:38 +08:00
Kubernetes Prow Robot
9fa1e00be9 Merge pull request #83437 from matthyx/startupprobe-beta
Promote StartupProbe to beta for 1.18
2019-12-20 00:59:32 -08:00
Kevin Klues
c60802e893 Add klueska as an approver in test/e2e_node/OWNERS 2019-12-19 15:37:03 +01:00
Kubernetes Prow Robot
4e35750abc Merge pull request #86156 from tanjunchen/use-framework-Equal-test-e2e_node
test/e2e_node/:use framework.Equal() instead of using gomega.Expect(b…
2019-12-19 02:39:56 -08:00
Kubernetes Prow Robot
2f39e7304d Merge pull request #86119 from haosdent/clean-e2e-framework-metrics
e2e: move funs of framework/metrics to e2e_node
2019-12-19 00:37:56 -08:00
sewon.oh
745248dd6f Fix cpu manager e2e test typo
Signed-off-by: sewon.oh <sewon.oh@samsung.com>
2019-12-19 13:41:05 +09:00
Kubernetes Prow Robot
a1fc96f41e Merge pull request #84462 from klueska/upstream-cpu-manager-update-state-semantics
Update CPUManager stored state semantics
2019-12-17 12:00:12 -08:00
Haosdent Huang
973fddd155 e2e: move funs of framework/gpu to e2e_node 2019-12-16 00:53:01 +08:00
Haosdent Huang
4536ed50a0 e2e: move funs of framework/deviceplugin to e2e_node 2019-12-16 00:46:56 +08:00
Haosdent Huang
8d3a8d5a6c e2e: move funs of framework/metrics to e2e_node 2019-12-16 00:27:58 +08:00
Matthias Bertschy
6603f41a13 Promote StartupProbe to beta for 1.18 2019-12-15 14:49:34 +01:00
vpickard
0e644c8749 e2e-topology-manager: Fix bazel tests
Fix some tests

Signed-off-by: vpickard <vpickard@redhat.com>
2019-12-12 19:52:59 -05:00
vpickard
31b0d7f853 e2e-topology-manager: Fix package name
Change package name to e2enode

Signed-off-by: vpickard <vpickard@redhat.com>
2019-12-12 16:37:35 -05:00
vpickard
fba4a7be34 e2e-topology-manager: fixes for gofmt
Some cleanup for gofmt fixes

Signed-off-by: vpickard <vpickard@redhat.com>
2019-12-12 16:32:58 -05:00
vpickard
337fdf2f37 [WIP] e2e-topology-manager: Initial commit for E2E tests
This is the initial commit for E2E testing for Topology
Manager.

For now, run a subset of the CPU Manager tests.

Additional tests will be forthcoming.

Signed-off-by: vpickard <vpickard@redhat.com>
2019-12-12 16:32:58 -05:00