mirror of
				https://github.com/optim-enterprises-bv/kubernetes.git
				synced 2025-11-04 04:08:16 +00:00 
			
		
		
		
	Fix bad config in flaky test documentation and add script to help check
for flakes.
This commit is contained in:
		@@ -11,7 +11,7 @@ There is a testing image ```brendanburns/flake``` up on the docker hub.  We will
 | 
			
		||||
 | 
			
		||||
Create a replication controller with the following config:
 | 
			
		||||
```yaml
 | 
			
		||||
id: flakeController
 | 
			
		||||
id: flakecontroller
 | 
			
		||||
kind: ReplicationController
 | 
			
		||||
apiVersion: v1beta1
 | 
			
		||||
desiredState:
 | 
			
		||||
@@ -41,14 +41,26 @@ labels:
 | 
			
		||||
 | 
			
		||||
```./cluster/kubectl.sh create -f controller.yaml```
 | 
			
		||||
 | 
			
		||||
This will spin up 100 instances of the test.  They will run to completion, then exit, the kubelet will restart them, eventually you will have sufficient
 | 
			
		||||
runs for your purposes, and you can stop the replication controller by setting the ```replicas``` field to 0 and then running:
 | 
			
		||||
This will spin up 24 instances of the test.  They will run to completion, then exit, and the kubelet will restart them, accumulating more and more runs of the test.
 | 
			
		||||
You can examine the recent runs of the test by calling ```docker ps -a``` and looking for tasks that exited with non-zero exit codes. Unfortunately, docker ps -a only keeps around the exit status of the last 15-20 containers with the same image, so you have to check them frequently.
 | 
			
		||||
You can use this script to automate checking for failures, assuming your cluster is running on GCE and has four nodes:
 | 
			
		||||
 | 
			
		||||
```sh
 | 
			
		||||
./cluster/kubectl.sh update -f controller.yaml
 | 
			
		||||
./cluster/kubectl.sh delete -f controller.yaml
 | 
			
		||||
echo "" > output.txt
 | 
			
		||||
for i in {1..4}; do
 | 
			
		||||
  echo "Checking kubernetes-minion-${i}"
 | 
			
		||||
  echo "kubernetes-minion-${i}:" >> output.txt
 | 
			
		||||
  gcloud compute ssh "kubernetes-minion-${i}" --command="sudo docker ps -a" >> output.txt
 | 
			
		||||
done
 | 
			
		||||
grep "Exited ([^0])" output.txt
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Now examine the machines with ```docker ps -a``` and look for tasks that exited with non-zero exit codes (ignore those that exited -1, since that's what happens when you stop the replica controller)
 | 
			
		||||
Eventually you will have sufficient runs for your purposes. At that point you can stop and delete the replication controller by running:
 | 
			
		||||
 | 
			
		||||
```sh
 | 
			
		||||
./cluster/kubectl.sh stop replicationcontroller flakecontroller
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
If you do a final check for flakes with ```docker ps -a```, ignore tasks that exited -1, since that's what happens when you stop the replication controller.
 | 
			
		||||
 | 
			
		||||
Happy flake hunting!
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user