We have a large test codebase with more than 1500 tests for a Python/Django application. Most of the tests use factory-boy
for generating data for the project models.
Currently, we are using nose
test runner, but open to switching to py.test
.
The problem is, from time to time, when running parts of tests, combination of tests we encounter unexpected test failures that are not reproduced when running all the tests or these tests individually.
It looks like the tests are actually coupled.
The Question: Is it possible to automatically detect all the coupled tests in the project?
My current thinking is to run all the tests in different random combinations or order and report the failures, can nose
or py.test
help with that?
For a definite answer you'd have to run each test in complete isolation from the rest.
With
pytest
, which is what I use, you could implement a script that first runs it with--collect-only
and then use the test node ids returned to initiate an individualpytest
run for each of them. This will take a good while for your 1500 tests, but it should do the job as long as you completely recreate the state of your system between each individual test.For an approximate answer, you can try running your tests in random order and see how many start failing. I had a similar question recently, so I tried two
pytest
plugins --pytest-randomly
andpytest-random
: https://pypi.python.org/pypi/pytest-randomly/ https://pypi.python.org/pypi/pytest-random/From the two,
pytest-randomly
looks like the more mature one and even supports repeating a certain order by accepting aseed
parameter.These plugins do a good job in randomising the test order, but for a large test suite complete randomisation may not be very workable because you then have too many failing tests and you don't know where to start.
I wrote my own plugin that allows me to control the level at which the tests can change order randomly (module, package, or global). It is called
pytest-random-order
: https://pypi.python.org/pypi/pytest-random-order/UPDATE. In your question you say that the failure cannot be reproduced when running tests individually. It could be that you aren't completely recreating the environment for individual test run. I think it's ok that some tests leave state dirty. It is the responsibility of each test case to set up the environment as they need it and not necessarily clean up afterwards due to performance overhead this would cause for subsequent tests or just because of the burden of doing it.
If test X fails as part of a larger test suite and then does not fail when running individually, then this test X is not doing a good enough job in setting up the environment for the test.