Managing Audio Back2Back Tests
So long since last post, and a lot of things to explain (GSoC results, GSoC Mentor Submit, QtDevDays, CLAM network refactoring script, typed controls…). But, let’s start explaining some work we did on a Back-to-back system we recently deployed for CLAM and our 3D acoustic project in Barcelona Media.
Back-to-back testing background
You, extreme programmer, might want to have unit tests (white box testing) for every single line of code you write. But, sometimes, this is a hard thing to achieve. For example, canonical test cases for audio processing algorithms that exercise a single piece of code are very hard to find. You might also want to take control of a piece of untested code in order to refactor it without introducing new bugs. In all those cases back-to-back tests are your most powerful tool.
Back-to-back tests (B2B) are black box tests that compare the output of a reference version of an algorithm with the output of an evolved version, given the same set of inputs. When a back-to-back test fails, it means that something changed but normally it doesn’t give you any more information than that. If the change was expected to alter the output, you must revalidate the new output again and make it the new reference. But if the alteration was not expected, you should either roll-back the change or fix it.
In back-to-back tests there is no truth to be asserted. You just rely on the fact that the last version was OK. If b2b tests get red because an expected change of behaviour but you don’t validate the new results, you will loose any control on posterior changes. So, is very important to keep them green or validating any new correct result. Because of that, B2B tests are very helpful to be used in combination of a continuous integration system such as TestFarm, that can point you to the guilty commit even if further commits have been done.
CLAM’s OfflinePlayer is very convenient to do back2back testing of CLAM networks. It runs them off-line specifying some input and outputs wave files. Automate the check by subtracting the output with a reference file and checking the level against a threshold, and you have a back-to-back test.
But still maintaining the outputs up-to-date is hard. So, we have developed a python module named audiob2b.py that makes defining and maintaining b2b test on audio very easy.
Defining a b2b test suite
A test suite is defined by defining back-to-back data path, and a list of test cases, each one defining a name, a command line and a set of outputs to be checked:
#!/usr/bin/env python # back2back.py from audiob2b import runBack2BackProgram data_path="b2b/mysuite" back2BackTests = [ ("testCase1", "OfflinePlayer mynetwork.clamnetwork b2b/mysuite/inputs/input1.wav -o output1.wav output2.wav" , [ "output1.wav", "output2.wav", ]), # any other testcases there ] runBack2BackProgram(data_path, sys.argv, back2BackTests)
Notice that this example uses OfflinePlayer but, as you write the full command line, you are not just limited to that. Indeed for 3D acoustics algorithms we are testing other programs that also generate wave files.
Back-to-back work flow
When you run the test suite the first time (./back2back.py without parameters) there is no reference files (expectation) and you will get a red. Current outputs will be copied into the data path like that:
b2b/mysuite/testCase1_output1_result.wav b2b/mysuite/testCase1_output2_result.wav ...
After validating that the outputs are OK, you can accept a test case by issuing:
$ ./back2back.py --validate testCase1
The files will be moved as:
b2b/mysuite/testCase1_output1_expected.wav b2b/mysuite/testCase1_output2_expected.wav ...
And the next time you run the tests, they will be green. At this point you can add and commit the ‘expected’ files on the data repository.
Whenever the output is altered in a sensible way and you get a red, you will have again the ‘_result’ files and also some ‘_diff’ files so that you can easily check the difference. All those files will be cleaned as soon you validate them or you get back the old results. So the main benefit of that is that the expectation files management is almost automated so it is easier to maintain them in green.
Supporting architecture differences
Often the same algorithm provides slightly different values depending on the architecture you are running on, mostly because different precision (ie. 32 vs. 64 bits) or different implementations of the floating point functions.
Having back-to-back tests changing all the time depending on which platform you run them is not something desirable. The audiob2b module generate platform dependant expectations by validating them with the —arch flag. Platform dependant expectations are used instead the regular ones just if the ones for the current platform are found.
The near future of the tool is just being used. We should extend the set of controlled networks and processing modules in CLAM. So I would like to invite other CLAM contributors to add more back2back’s. Place your suite data in
clam-test-data/b2b/. We should decide where the suite definitions themselves should be placed. Maybe somewhere in
CLAM/test but it won’t be fair because dependencies on NetworkEditor and maybe in plugins.
Also a feature that would extend the kind of code we control with back-to-back, would be supporting file types other than wave files such as plain text files, or XML files (some kind smarter than just plain text). Any ideas? Comments?
Update 2012-11-21: The last version of that code is not any more maintained at CLAM subversion. I deployed a separate repository in github. So everybody is invited to fork and contribute or just clone it.