We have a large test codebase with more than 1500 tests for a Python/Django application. Most of the tests use factory-boy for generating data for the project models.
Currently, we are using nose test runner, but open to switching to py.test.
The problem is, from time to time, when running parts of tests, combination of tests we encounter unexpected test failures that are not reproduced when running all the tests or these tests individually.
It looks like the tests are actually coupled.
The Question: Is it possible to automatically detect all the coupled tests in the project?
My current thinking is to run all the tests in different random combinations or order and report the failures, can nose or py.test help with that?
For a definite answer you'd have to run each test in complete isolation from the rest.
With
pytest, which is what I use, you could implement a script that first runs it with--collect-onlyand then use the test node ids returned to initiate an individualpytestrun for each of them. This will take a good while for your 1500 tests, but it should do the job as long as you completely recreate the state of your system between each individual test.For an approximate answer, you can try running your tests in random order and see how many start failing. I had a similar question recently, so I tried two
pytestplugins --pytest-randomlyandpytest-random: https://pypi.python.org/pypi/pytest-randomly/ https://pypi.python.org/pypi/pytest-random/From the two,
pytest-randomlylooks like the more mature one and even supports repeating a certain order by accepting aseedparameter.These plugins do a good job in randomising the test order, but for a large test suite complete randomisation may not be very workable because you then have too many failing tests and you don't know where to start.
I wrote my own plugin that allows me to control the level at which the tests can change order randomly (module, package, or global). It is called
pytest-random-order: https://pypi.python.org/pypi/pytest-random-order/UPDATE. In your question you say that the failure cannot be reproduced when running tests individually. It could be that you aren't completely recreating the environment for individual test run. I think it's ok that some tests leave state dirty. It is the responsibility of each test case to set up the environment as they need it and not necessarily clean up afterwards due to performance overhead this would cause for subsequent tests or just because of the burden of doing it.
If test X fails as part of a larger test suite and then does not fail when running individually, then this test X is not doing a good enough job in setting up the environment for the test.