Flaky tests
What's a flaky test?
It's a test that sometimes fails, but if you retry it enough times, it passes, eventually.
Quarantined tests
When a test frequently fails in master
,
a ~"master:broken" issue
should be created.
If the test cannot be fixed in a timely fashion, there is an impact on the
productivity of all the developers, so it should be placed in quarantine by
assigning the :quarantine
metadata.
This means it will be skipped unless run with --tag quarantine
:
bin/rspec --tag quarantine
Before putting a test in quarantine, you should make sure that a ~"master:broken" issue exists for it so it won't stay in quarantine forever.
Once a test is in quarantine, there are 3 choices:
- Should the test be fixed (i.e. get rid of its flakiness)?
- Should the test be moved to a lower level of testing?
- Should the test be removed entirely (e.g. because there's already a lower-level test, or it's duplicating another same-level test, or it's testing too much etc.)?
Quarantine tests on the CI
Quarantined tests are run on the CI in dedicated jobs that are allowed to fail:
-
rspec-pg-quarantine
andrspec-mysql-quarantine
(CE & EE) -
rspec-pg-quarantine-ee
andrspec-mysql-quarantine-ee
(EE only)
Automatic retries and flaky tests detection
On our CI, we use rspec-retry to automatically retry a failing example a few
times (see spec/spec_helper.rb
for the precise retries count).
We also use a home-made RspecFlaky::Listener
listener which records flaky
examples in a JSON report file on master
(retrieve-tests-metadata
and update-tests-metadata
jobs), and warns when a new flaky example
is detected in any other branch (flaky-examples-check
job). In the future, the
flaky-examples-check
job will not be allowed to fail.
This was originally implemented in: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/13021.
Problems we had in the past at GitLab
-
rspec-retry
is bitting us when some API specs fail: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9825 -
Sporadic RSpec failures due to
PG::UniqueViolation
: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9846 - FFaker generates funky data that tests are not ready to handle (and tests should be predictable so that's bad!):
-
Make
spec/mailers/notify_spec.rb
more robust: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10015 - Transient failure in spec/requests/api/commits_spec.rb: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/9944
- Replace FFaker factory data with sequences: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10184
- Transient failure in spec/finders/issues_finder_spec.rb: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10404
-
Make
Time-sensitive flaky tests
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10046
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10306
Array order expectation
Feature tests
- Be sure to create all the data the test need before starting exercize: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12059
- Bis: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12604
- Bis: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12664
- Assert against the underlying database state instead of against a page's content: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10934
Capybara viewport size related issues
- Transient failure of spec/features/issues/filtered_search/filter_issues_spec.rb: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10411
Capybara JS driver related issues
- Don't wait for AJAX when no AJAX request is fired: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/10454
- Bis: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12626
PhantomJS / WebKit related issues
- Memory is through the roof! (TL;DR: Load images but block images requests!): https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12003
Resources
- Flaky Tests: Are You Sure You Want to Rerun Them?
- How to Deal With and Eliminate Flaky Tests
- Tips on Treating Flakiness in your Rails Test Suite
- 'Flaky' tests: a short story
- Using Insights to Discover Flaky, Slow, and Failed Tests