Fix bad config in flaky test documentation and add script to help check

for flakes.
2025-11-04 04:08:16 +00:00 · 2015-02-11 12:16:16 -08:00
parent 15c57efde2
commit d3d71df943
1 changed files with 18 additions and 6 deletions
--- a/docs/devel/flaky-tests.md
+++ b/docs/devel/flaky-tests.md
@@ -11,7 +11,7 @@ There is a testing image ```brendanburns/flake``` up on the docker hub.  We will

 Create a replication controller with the following config:
 ```yaml
-id: flakeController
+id: flakecontroller
 kind: ReplicationController
 apiVersion: v1beta1
 desiredState:
@@ -41,14 +41,26 @@ labels:

 ```./cluster/kubectl.sh create -f controller.yaml```

-This will spin up 100 instances of the test.  They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient
-runs for your purposes, and you can stop the replication controller by setting the ```replicas``` field to 0 and then running:
+This will spin up 24 instances of the test.  They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test.
+You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently.
+You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes:

 ```sh
-./cluster/kubectl.sh update -f controller.yaml
-./cluster/kubectl.sh delete -f controller.yaml
+echo "" > output.txt
+for i in {1..4}; do
+  echo "Checking kubernetes-minion-${i}"
+  echo "kubernetes-minion-${i}:" >> output.txt
+  gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt
+done
+grep "Exited ([^0])" output.txt
 ```

-Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller)
+Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running:
+
+```sh
+./cluster/kubectl.sh stop replicationcontroller flakecontroller
+```
+
+If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller.

 Happy flake hunting!