Add glide.yaml and vendor deps

2016-12-03 22:43:32 -08:00 · 2016-12-03 22:43:32 -08:00 · 5b3d5e81bd
commit 5b3d5e81bd
parent db918f12ad
18880 changed files with 5166045 additions and 1 deletions
--- a/vendor/k8s.io/kubernetes/docs/devel/flaky-tests.md
+++ b/vendor/k8s.io/kubernetes/docs/devel/flaky-tests.md
@ -0,0 +1,194 @@
+# Flaky tests
+
+Any test that fails occasionally is "flaky". Since our merges only proceed when
+all tests are green, and we have a number of different CI systems running the
+tests in various combinations, even a small percentage of flakes results in a
+lot of pain for people waiting for their PRs to merge.
+
+Therefore, it's very important that we write tests defensively. Situations that
+"almost never happen" happen with some regularity when run thousands of times in
+resource-constrained environments. Since flakes can often be quite hard to
+reproduce while still being common enough to block merges occasionally, it's
+additionally important that the test logs be useful for narrowing down exactly
+what caused the failure.
+
+Note that flakes can occur in unit tests, integration tests, or end-to-end
+tests, but probably occur most commonly in end-to-end tests.
+
+## Filing issues for flaky tests
+
+Because flakes may be rare, it's very important that all relevant logs be
+discoverable from the issue.
+
+1. Search for the test name. If you find an open issue and you're 90% sure the
+   flake is exactly the same, add a comment instead of making a new issue.
+2. If you make a new issue, you should title it with the test name, prefixed by
+   "e2e/unit/integration flake:" (whichever is appropriate)
+3. Reference any old issues you found in step one. Also, make a comment in the
+   old issue referencing your new issue, because people monitoring only their
+   email do not see the backlinks github adds. Alternatively, tag the person or
+   people who most recently worked on it.
+4. Paste, in block quotes, the entire log of the individual failing test, not
+   just the failure line.
+5. Link to durable storage with the rest of the logs. This means (for all the
+   tests that Google runs) the GCS link is mandatory! The Jenkins test result
+   link is nice but strictly optional: not only does it expire more quickly,
+   it's not accessible to non-Googlers.
+
+## Finding filed flaky test cases
+
+Find flaky tests issues on GitHub under the [kind/flake issue label][flake].
+There are significant numbers of flaky tests reported on a regular basis and P2
+flakes are under-investigated. Fixing flakes is a quick way to gain expertise
+and community goodwill.
+
+[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake
+
+## Expectations when a flaky test is assigned to you
+
+Note that we won't randomly assign these issues to you unless you've opted in or
+you're part of a group that has opted in. We are more than happy to accept help
+from anyone in fixing these, but due to the severity of the problem when merges
+are blocked, we need reasonably quick turn-around time on test flakes. Therefore
+we have the following guidelines:
+
+1. If a flaky test is assigned to you, it's more important than anything else
+   you're doing unless you can get a special dispensation (in which case it will
+   be reassigned).  If you have too many flaky tests assigned to you, or you
+   have such a dispensation, then it's *still* your responsibility to find new
+   owners (this may just mean giving stuff back to the relevant Team or SIG Lead).
+2. You should make a reasonable effort to reproduce it. Somewhere between an
+   hour and half a day of concentrated effort is "reasonable". It is perfectly
+   reasonable to ask for help!
+3. If you can reproduce it (or it's obvious from the logs what happened), you
+   should then be able to fix it, or in the case where someone is clearly more
+   qualified to fix it, reassign it with very clear instructions.
+4. PRs that fix or help debug flakes may have the P0 priority set to get them
+   through the merge queue as fast as possible.
+5. Once you have made a change that you believe fixes a flake, it is conservative
+   to keep the issue for the flake open and see if it manifests again after the
+   change is merged.
+6. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes
+   back, at least 2 hours of merge time is wasted. So we need to make monotonic
+   progress towards narrowing it down every time a flake occurs. If you can't
+   figure it out from the logs, add log messages that would have help you figure
+   it out.  If you make changes to make a flake more reproducible, please link
+   your pull request to the flake you're working on.
+7. If a flake has been open, could not be reproduced, and has not manifested in
+   3 months, it is reasonable to close the flake issue with a note saying
+   why.
+
+# Reproducing unit test flakes
+
+Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress).
+
+Just
+
+```
+$ go install golang.org/x/tools/cmd/stress
+```
+
+Then build your test binary
+
+```
+$ go test -c -race
+```
+
+Then run it under stress
+
+```
+$ stress ./package.test -test.run=FlakyTest
+```
+
+It runs the command and writes output to `/tmp/gostress-*` files when it fails.
+It periodically reports with run counts. Be careful with tests that use the
+`net/http/httptest` package; they could exhaust the available ports on your
+system!
+
+# Hunting flaky unit tests in Kubernetes
+
+Sometimes unit tests are flaky. This means that due to (usually) race
+conditions, they will occasionally fail, even though most of the time they pass.
+
+We have a goal of 99.9% flake free tests. This means that there is only one
+flake in one thousand runs of a test.
+
+Running a test 1000 times on your own machine can be tedious and time consuming.
+Fortunately, there is a better way to achieve this using Kubernetes.
+
+_Note: these instructions are mildly hacky for now, as we get run once semantics
+and logging they will get better_
+
+There is a testing image `brendanburns/flake` up on the docker hub. We will use
+this image to test our fix.
+
+Create a replication controller with the following config:
+
+```yaml
+apiVersion: v1
+kind: ReplicationController
+metadata:
+  name: flakecontroller
+spec:
+  replicas: 24
+  template:
+    metadata:
+      labels:
+        name: flake
+    spec:
+      containers:
+      - name: flake
+        image: brendanburns/flake
+        env:
+        - name: TEST_PACKAGE
+          value: pkg/tools
+        - name: REPO_SPEC
+          value: https://github.com/kubernetes/kubernetes
+```
+
+Note that we omit the labels and the selector fields of the replication
+controller, because they will be populated from the labels field of the pod
+template by default.
+
+```sh
+kubectl create -f ./controller.yaml
+```
+
+This will spin up 24 instances of the test. They will run to completion, then
+exit, and the kubelet will restart them, accumulating more and more runs of the
+test.
+
+You can examine the recent runs of the test by calling `docker ps -a` and
+looking for tasks that exited with non-zero exit codes. Unfortunately, docker
+ps -a only keeps around the exit status of the last 15-20 containers with the
+same image, so you have to check them frequently.
+
+You can use this script to automate checking for failures, assuming your cluster
+is running on GCE and has four nodes:
+
+```sh
+echo "" > output.txt
+for i in {1..4}; do
+  echo "Checking kubernetes-node-${i}"
+  echo "kubernetes-node-${i}:" >> output.txt
+  gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt
+done
+grep "Exited ([^0])" output.txt
+```
+
+Eventually you will have sufficient runs for your purposes. At that point you
+can delete the replication controller by running:
+
+```sh
+kubectl delete replicationcontroller flakecontroller
+```
+
+If you do a final check for flakes with `docker ps -a`, ignore tasks that
+exited -1, since that's what happens when you stop the replication controller.
+
+Happy flake hunting!
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/flaky-tests.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->