Wednesday, December 1, 2010

Unit tests: the fallacy of 100% coverage


"The Hounds of Tindalos!" he muttered. "The can only reach us through angles. We must eliminate all angles from this room. I shall plaster up all of the corners, all of the crevices. We must make this room resemble the interior of a sphere."


- "The Hounds of Tindalos", Frank Belknap Long
I'm an ardent supporter of having unit tests for one's codes, having first hand experienced the benefit they give when you need to refactor code. As a result, one thing that really irks me are programmers who view unit tests as, at best, a necessary evil and therefore try to write as few tests as possible. To them, a suite of tests is good if the tests execute a lot of the code. If these programmers could achieve 100% code coverage, then they would feel their work is done: no more tests need to be written! While this belief might be true for some code, in general it does not hold true: for most classes, simply having 100% code coverage says nothing about whether the code is truly doing what it should do.

As always, code speaks louder than words, so I'll move on to a concrete demonstration involving a utility class with a method to determine if a given angle measures more than 120 degrees (and hence is sufficiently like a sphere to prevent a Hound of Tindalos from appearing). We define the angle using two points a and b: start at a, move to the origin, and then to b. The method will return a value of true if the angle formed by the points measures more than 120 degrees. Here is our first (flawed) implementation:
HoundsUtil (flawed)
import java.awt.geom.Point2D;

public class HoundsUtil {

  public boolean isAngleSafe(Point2D.Double a, Point2D.Double b) {
    double dotProduct = a.getX()*b.getX() + a.getY()*b.getY();
    double angle = Math.acos(dotProduct);
    return (Math.abs(angle) > 2 * Math.PI / 3) ? true : false;
  }
}


Our test suite is composed of two JUnit tests, one checking the response for a safe angle and the other checking the response for an unsafe angle.
HoundsUtilTest
import java.awt.geom.Point2D;
import junit.framework.Assert;
import org.junit.Test;

public class HoundsUtilTest {
  @Test
  public void checkUnsafe() {
    // 90 degrees: not safe!
    Point2D.Double a = new Point2D.Double(0, 1);
    Point2D.Double b = new Point2D.Double(1, 0);
    Assert.assertFalse((new HoundsUtil()).isAngleSafe(a, b));
  }

  @Test
  public void checkSafe() {
    // 180 degrees: safe!
    Point2D.Double a = new Point2D.Double(0, 1);
    Point2D.Double b = new Point2D.Double(0, -1);
    Assert.assertTrue((new HoundsUtil()).isAngleSafe(a, b));
  }
}


We'll compile the code and then and run the JUnit tests using Emma (an open source Java code coverage tool that I highly recommend) to instrument the code and give us the code coverage info (the -ix entries are to exclude JUnit and our test class from the coverage report):
codefhtagn: javac -cp .:junit.jar *.java
codefhtagn: java -cp .:emma.jar:junit.jar emmarun -ix -*junit* -ix -*Test -cp .:junit.jar org.junit.runner.JUnitCore HoundsUtilTest
JUnit version 4.8.1
..
Time: 0.227

OK (2 tests)

EMMA: writing [txt] report to [/codefhtagn/coverage.txt] ...

codefhtagn: cat coverage.txt
[EMMA v2.0.5312 report, generated Tue Nov 30 15:31:23 MST 2010]
-------------------------------------------------------------------------------
OVERALL COVERAGE SUMMARY:

[class, %]      [method, %]     [block, %]      [line, %]       [name]
100% (1/1)      100% (2/2)      100% (27/27)    100% (4/4)      all classes

OVERALL STATS SUMMARY:

total packages: 1
total classes:  1
total methods:  2
total executable files: 1
total executable lines: 4

COVERAGE BREAKDOWN BY PACKAGE:

[class, %]      [method, %]     [block, %]      [line, %]       [name]
100% (1/1)      100% (2/2)      100% (27/27)    100% (4/4)      default package
-------------------------------------------------------------------------------

100% coverage for classes, methods, blocks, and lines plus all our tests passed! What more could you ask for? The tests pass, the code is covered, it's time to go party!


Unfortunately, it's not party time yet. As noted earlier, the algorithm being used in the code is flawed and is only guaranteed to give a correct answer if the points lie at a distance of 1 from the origin (which happened to be the case for the two tests we wrote). This can be demonstrated by adding an additional test:
HoundsUtilTest (additional test)
@Test
  public void checkSafeCloserPoint() {
    // 180 degrees: safe!
    Point2D.Double a = new Point2D.Double(0, 0.5);
    Point2D.Double b = new Point2D.Double(0, -0.5);
    Assert.assertTrue((new HoundsUtil()).isAngleSafe(a, b));
  }

Recompiling and rerunning the test suite gives:
codefhtagn: javac -cp .:junit.jar *.java
codefhtagn: java -cp .:junit.jar org.junit.runner.JUnitCore HoundsUtilTest
JUnit version 4.8.1
...E
Time: 0.009
There was 1 failure:
1) checkSafeCloserPoint(HoundsUtilTest)
junit.framework.AssertionFailedError: null
 at junit.framework.Assert.fail(Assert.java:47)
 at junit.framework.Assert.assertTrue(Assert.java:20)
 at junit.framework.Assert.assertTrue(Assert.java:27)
 at HoundsUtilTest.checkSafeCloserPoint(HoundsUtilTest.java:27)
... lines ommitted ...

FAILURES!!!
Tests run: 3,  Failures: 1

For the sake of completeness, here is the corrected algorithm:
HoundsUtil (corrected)
import java.awt.geom.Point2D;

public class HoundsUtil {

  public  boolean isAngleSafe(Point2D.Double a, Point2D.Double b) {
    Point2D.Double origin = new Point2D.Double(0, 0);
    double aLength = a.distance(origin);
    double bLength = b.distance(origin);
    double dotProduct = a.getX()*b.getX() + a.getY()*b.getY();
    double angle = Math.acos(dotProduct / (aLength * bLength));
    return (Math.abs(angle) > 2 * Math.PI / 3) ? true: false;
  }
}

And proof that it works (for these tests):
codefhtagn: javac -cp .:junit.jar *.java
codefhtagn: java -cp .:junit.jar org.junit.runner.JUnitCore HoundsUtilTest
JUnit version 4.8.1
...
Time: 0.008

OK (3 tests)

So, if simple code coverage is not enough, then how can one tell if a class is fully tested? The unfortunate answer for most classes is: never. There are many reasons why, but the most common is that the set of possible inputs is so large that you simply cannot test them all in any reasonable fashion.


While you can't cover all the possible inputs, it isn't unrealistic to cover a reasonable sample of the inputs. Try to think of think of some situations where there might be a problem and write tests to cover them. For example, a good test suite for HoundUtils should consider including the tests for the following:
  • What happens when the angle is slightly more or less (whatever you take this to mean) than 120 degrees?
  • Does it give the correct answer the angle is exactly 120 degrees (and is it even possible to come up with points that cover this case)?
  • Could there be a problem with overflow or underflow? (Answer: Yes, there is a problem --- the current implementation gives an incorrect answer when (0, Double.MAX_VALUE) and (0, Double.MIN_VALUE) are the inputs)
Always be willing to expand you suite of tests. In particular, if a bug is found in your code, your first response should be to write a test that demonstrates the bug. After you have the test written (and it fails), then fix the code and watch the test pass.

At the end of the day the only thing that is certain is that having 100% code coverage (even if you can achieve it), in and of itself, is not enough. In the real world, the best test suites may fall short of 100% coverage and yet manage to truly test more of the functionality.

1 comment:

  1. Excellent article. I'm in the middle of writing a similar article, with similar conclusions.

    ReplyDelete