Commit Graph

1166 Commits

Author SHA1 Message Date
Fabio Bertinatto
8d644092ed Create pod to force volume provisioning in storage e2e test
Otherwise, tests can fail if the default StorageClass
is configured with late binding.
2020-06-10 08:45:41 +02:00
lixiaobing1
2d66e7ecd3 another:Replace framework.Failf with ExpectNoError 2020-06-05 16:43:22 +08:00
Kubernetes Prow Robot
64bba294ae Merge pull request #91741 from oomichi/nit-ExpectError
Replace framework.Failf with ExpectNoError
2020-06-04 13:53:05 -07:00
Kubernetes Prow Robot
1925eb81ac Merge pull request #91689 from gnufied/fix-after-suite-race
Ensure CleanupActionHandle always completes
2020-06-04 10:51:15 -07:00
Kenichi Omichi
0ebaae88b1 Replace framework.Failf with ExpectNoError 2020-06-03 20:16:12 +00:00
Kubernetes Prow Robot
f2e3154a14 Merge pull request #91642 from huffmanca/update-azure-e2e
Adjust Azure e2e binding mode
2020-06-03 05:44:32 -07:00
Hemant Kumar
74be9f04fa Ensure CleanupActionHandle always completes
The way gingko handles interrupts is:
 - It starts running AfterSuite hooks in a separate goroutine (this includes cleanupAction hooks)
 - Once AfterSuite hook is done executing it calls
   os.Exit(1) on test suite.

So how cleanupFunc() that runs via defer in test can be interrupted
is:
 - cleanupFunc starts running via defer (or AfterEach hook) but first
   thing that function does is to remove cleanupHandle from
   framework.RemoveCleanupAction.
 - Test suite receives interrupt from user and AfterSuite block
   starts executing
 - remember that while cleanupFunc is running in goroutine#1,
   AfterSuite is running concurrently in goroutine#2.
 - AfterSuite hook has bunch of CleanupActions it needs to run which
   were registered via framework.AddCleanupAction(cleanupFunc) but
   once cleanupFunc starts executing via defer in the test, it will
   remove the cleanupHandle from framework's aftersuite hooks.
 - So if AfterSuite did not had anything to run (because
   those actions were removed via framework.RemoveCleanupAction
   then it will simply go to the last framework.AfterEach action and call os.Exit(1)
 - So if os.Exit(1) is called before cleanupFunc has a chance to finish in defer, it will not complete.
2020-06-02 12:40:32 -04:00
Christian Huffman
7a55d3978c Adjust Azure e2e binding mode 2020-06-01 14:55:46 -04:00
Jordan Liggitt
c9638d54d0 Defer ginkgo recovers 2020-06-01 11:02:41 -04:00
Davanum Srinivas
b1742f19ef Switch kube-controller-manager to distroless image
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-21 22:33:54 -04:00
Kubernetes Prow Robot
bded41a817 Merge pull request #90689 from aojea/nfsv6
add ipv6 support to the e2e nfs tests
2020-05-21 03:30:36 -07:00
Kubernetes Prow Robot
0e8a2d2244 Merge pull request #90793 from pohly/flaky-mount-volume-calls
mock e2e test: reduce flakiness by not testing all calls
2020-05-19 15:22:19 -07:00
Davanum Srinivas
07d88617e5 Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kubernetes Prow Robot
9978c281ec Merge pull request #90773 from gnufied/fix-csi-e2e-orphans
Fix CSI e2e leaving pods in terminating state
2020-05-13 22:14:21 -07:00
Hemant Kumar
da941d8d3e Create mock CSI driver resources in different namespace 2020-05-13 11:16:00 -04:00
Hemant Kumar
708261e06c Make AfterSuite hooks ordered
ginkgo has a weird bug that - AfterEach does not get called when
testsuite exits with certain kind of interrupt (Ctrl-C for example).
More info - https://github.com/onsi/ginkgo/issues/222

We workaround this issue in Kubernetes by adding a special hook into
AfterSuite call, but AfterSuite can not be used to peforms certain
kind of cleanup because it can race with AfterEach hook and
framework.AfterEach hook will set framework.ClientSet to nil.

This presents a problem in cleaning up CSI driver and testpods. This
PR removes cleanup of driver manifest via CleanupAction because that
is not safe and racy (such as f.ClientSet may disappear!) and makes
AfterSuite hooks run in a ordered fashion
2020-05-13 11:15:27 -04:00
Kubernetes Prow Robot
620b7720e6 Merge pull request #90828 from gaurav1086/fix_data_race_storage
Fix date race in storage tests
2020-05-13 00:18:40 -07:00
Gaurav Singh
af74fbabf4 Remove unused err variable 2020-05-08 14:20:35 -04:00
Saikat Roychowdhury
dcfaaefc60 Pickup Snapshot Provisioner from the snapshot class "driver" info.
When using FromFile or FromExisitingClass options, snapshot provisioner
should be picked up from the "driver" tag of VolumeSnapshotClass object.
2020-05-08 05:45:36 +00:00
Kubernetes Prow Robot
7f78048594 Merge pull request #90781 from msau42/increase-timeout
Increase timeout waiting for driver to start on nodes
2020-05-06 22:23:08 -07:00
Gaurav Singh
37458b350e Fix date race in storage 2020-05-06 22:57:08 -04:00
Patrick Ohly
5aa3805a5f mock e2e test: reduce flakiness by not testing all calls
kubelet sometimes calls NodeStageVolume an NodePublishVolume too
often, which breaks this test and leads to flakiness. The test isn't
about that, so we can relax the checking and it still covers what it
was meant to cover.
2020-05-06 11:43:16 +02:00
Michelle Au
fc08f74157 Increase timeout waiting for driver to start on nodes to reduce test flakiness
Change-Id: Id553943e4473b387bf0ae14a18a90cb3a1bcd5c1
2020-05-05 18:10:10 -07:00
Kubernetes Prow Robot
fbacb6e264 Merge pull request #90335 from pohly/cleanup-late-binding
e2e storage: wait for PV deletion also for late binding
2020-05-04 18:05:07 -07:00
Antonio Ojea
26a00f9032 add ipv6 support to the e2e nfs tests
nfs mount command need to use the IP enclosed with square brackets
if is an IPv6 address
2020-05-03 11:06:10 +02:00
Patrick Ohly
e3d258d6ca e2e storage: wait for PV deletion also for late binding
When a test pattern or storage class uses late binding, the cleanup
code didn't know about the PV that may have been created for the PVC
since setting it up and thus then also didn't wait for PV deletion.

This is problematic for test isolation because the next test was
allowed to be started before fully cleaning up. Worse, it the driver
gets removed after the test, the volume might never get deleted.
2020-04-27 10:34:50 +02:00
Di Xu
3f5e09b6e2 add e2e tests for HostPathType and mark as slow 2020-04-23 10:52:40 +08:00
Kubernetes Prow Robot
fc9d174102 Merge pull request #88248 from claudiubelu/tests/reduce-to-agnhost-mounttest
tests: Replaces mounttest images used with agnhost (part 4)
2020-04-22 04:53:52 -07:00
Kubernetes Prow Robot
07179d0207 Merge pull request #87998 from msau42/e2e-reattach-stress
Add stress test to repeatedly restart Pods with PVCs in parallel
2020-04-21 03:04:57 -07:00
Kubernetes Prow Robot
7c53c1eb91 Merge pull request #89819 from pohly/enhance-podlogs-master
tests: enhance podlogs
2020-04-16 22:19:07 -07:00
Kubernetes Prow Robot
ae8d30631d Merge pull request #90214 from pohly/stop-pod-master
storage tests: really wait for pod to disappear
2020-04-16 20:49:07 -07:00
Michelle Au
6596e20b18 Make stress test parameters configurable
Change-Id: Ia062f3433b6043825a51a54c7c07eb4cdf809631
2020-04-16 14:18:21 -07:00
Kubernetes Prow Robot
aa0665dfee Merge pull request #90147 from gnufied/use-random-node-zone-for-inline-e2e
Use random zone for inline volume e2e tests
2020-04-16 13:59:08 -07:00
Patrick Ohly
0cdd5365a1 storage tests: really wait for pod to disappear
As seen in one case (https://github.com/intel/pmem-csi/issues/587), a
pod can reach the "not running" state although its ephemeral volumes
are still being torn down by kubelet and the CSI driver. What happens
then is that the test returns too early and even deleting the
namespace and thus the pod succeeds before the NodeVolumeUnpublish
really finishes.

To avoid this, StopPod now waits for the pod to really disappear.
2020-04-16 21:10:56 +02:00
Michelle Au
e132b77ae4 Add stress test to repeatedly restart Pods with PVCs in parallel
Change-Id: I499571cc86b1058d0e16d79e5e998d1dedfd9a4a
2020-04-15 18:10:35 -07:00
Hemant Kumar
7d6712632c Use random zone for inline volume e2e tests 2020-04-14 23:37:21 -04:00
Patrick Ohly
2ae6cf5984 mock tests: per-test timeout for ResourceExhausted
The timeout for the two loops inside the test itself are now bounded
by an upper limit for the duration of the entire test instead of
having their own, rather arbitrary timeouts.
2020-04-14 09:11:42 +02:00
Patrick Ohly
48f8e398fb mock tests: remove redundant wrapping of error
The "error waiting for expected CSI calls" is redundant because it's
immediately followed by checking that error with:

   framework.ExpectNoError(err, "while waiting for all CSI calls")
2020-04-07 13:09:31 +02:00
Patrick Ohly
2550051f3b mock tests: add timeout
The for loop that waited for the signal to delete pod had no timeout,
so if something went wrong, it would wait for the entire test suite to
time out.
2020-04-07 13:09:31 +02:00
Patrick Ohly
f117849582 mock tests: ResourceExhausted error handling in external-provisioner
The mock driver gets instructed to return a ResourceExhausted error
for the first CreateVolume invocation via the storage class
parameters.

How this should be handled depends on the situation: for normal
volumes, we just want external-scheduler to retry. For late binding,
we want to reschedule the pod. It also depends on topology support.
2020-04-07 13:09:31 +02:00
Patrick Ohly
367a23e4d9 mock tests: remove redundant retrieval of log output
The code became obsolete with the introduction of parseMockLogs
because that will retrieve the log itself. For debugging of a running
test the normal pod output logging is sufficient.
2020-04-07 13:07:09 +02:00
Patrick Ohly
d06589e4b6 mock tests: less verbose log output checking
parseMockLogs is called potentially multiple times while waiting for
output. Dumping all CSI calls each time is quite verbose and
repetitive. To verify what the driver has done already, the normal
capturing of the container log can be used instead:

csi-mockplugin-0/mock@127.0.0.1: gRPCCall: {"Method":"/csi.v1.Node/NodePublishVolume","Request"...
2020-04-07 13:07:09 +02:00
Patrick Ohly
981aae35dd mock tests: do not give up immediately for pod output errors
As seen in some test
runs (https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/89041),
retrieving output can fail with "the server rejected our request for
an unknown reason (get pods csi-mockplugin-0)".

If this truly an intermittent error, then the existing retry logic in
the callers can deal with this.
2020-04-06 15:03:44 +02:00
Jan Safranek
e23a26a380 Update to new javascript 2020-04-06 15:03:22 +02:00
Jan Safranek
a4f080861f Test NodeStage error cases
Especially related to "uncertain" global mounts. A large refactoring of CSI
mock tests were necessary:
- to be able to script the driver to return errors as required by the test
- to parse the CSI driver logs to check kubelet called the right CSI calls
2020-04-06 15:03:22 +02:00
Patrick Ohly
b9c5c55c09 podlogs: avoid dumping a terminated container more than once
The original logic was that dumping can stop (for example, due to
loosing the connection to the apiserver) and then will start again as
long as the container exists. That it duplicates output on restarts
is better than skipping output that might not have been dumped yet.

But that logic then also dumped the output of containers that have
terminated multiple times:
- logging is started, dumps all output and stops because the
  container has terminated
- next check finds the container again, sees no active logger,
  repeats

This wasn't a problem for short-lived logging in a custom
namespace (the way how it is done for CSI drivers in Kubernetes E2E),
but other testsuites (like the one from PMEM-CSI) keep logging running
for the entire test suite duration: there duplicate output became a
problem when adding driver redeployment as part of the suite's run.

To avoid duplicated output for terminated containers, which containers
have been handled is now stored permanently. For terminated containers,
restarting of dumping is prevented. This comes with the risk that if
the previous dumping ended before capturing all output, some output
will get lost.

Marking the start and stop of the log was also useful when streaming
to a single writer and thus gets enabled.
2020-04-03 14:45:00 +02:00
Patrick Ohly
dbac2a369a podlogs: adapt to modified error message
Commit 8a495cb5e4 changed the spelling of the error message that we
want to ignore. In case of version skew we suppress both the old and
new spelling.
2020-04-03 14:43:52 +02:00
Kubernetes Prow Robot
7bd48eb3f6 Merge pull request #89784 from oomichi/sshPort
Add common SSHPort on e2essh
2020-04-02 21:40:40 -07:00
Kenichi Omichi
48fdb95a82 Add common SSHPort on e2essh
There were several sshPort values in e2e test packages because
we've migrated code from e2e framework by copying and pastting.
This adds common SSHPort on e2essh package to reduce such duplicated
code.
2020-04-02 17:41:49 +00:00