Flakiness

Managing flakiness

Ideally, engineers write only deterministic tests. Not only is that unlikely to happen, it's sometimes not the best use of their time. What we all really want is for a passing test to mean everything is good, and for a failing test to not waste our time, assuming the infrastructure can get it to pass with some retries.

📖

Bazel returns a special FLAKY status when a test has a mix of fail and pass.

There are two reasonable approaches for CI:

Use --flaky_test_attempts=[number] commonly with a value like 2 or 3. This will run any test 1-2 additional times if it fails. This is nice since you don't have to tell Bazel which tests are flaky ahead of time. Also, if you only perform retries on CI, you still see the failure locally, which is a good reminder to fix the problem. However, the downside is that it increases the time to report an actual failing test to 2-3x the test's runtime.
Allow a single failure of a test to fail the build, then tag the test target with flaky = True. Bazel will run a flaky test up to two additional times after the first failure. The downside is that the version control system becomes the “database” of which tests are flaky, and the database needs to be maintained manually.

📖

We recommend giving the BuildCop a one-click way to mark a test as flaky (or remove it) by making a bot commit to the repository that uses Buildozer to edit the BUILD file. Aspect plans to build a GitHub bot which does exactly this.

Determining if flakiness is fixed

When fixing a flaky test, it can be hard to know that the fix is working since it passes sometimes, so a few passing results in a row aren’t proof that it’s reliable. If the test's non-determinism can be reproduced locally by running it a few times, then use the flag --runs_per_test=[number] to "roll the dice" this many times.

Next: Wrapping Up