# Measure coverage of QuickCheck generators

Potential supervisors:
Research groups/keywords:
Description:

The most general formulation of this problem is: Given a random generator of Haskell values (members of algebraic data types), try to determine if the random distribution is 'suitable' for testing properties of the data type. The goal of this project is to find good measurements for this suitability and algorithms for approximating them for a given generator. In the end this can help developers of generators (for QuickCheck testing) find and eliminate weaknesses or biases in their generators.

Ususally the generators have some condition that all generated values should respect, and then the coverage should not be measured over the whole type but only over the values that satisfy the condition.

The random generators are black box. Given a random seed (and maybe a size bound) it produces a stream of values. The definition of the random generators should not be analyzed to determine suitability, only the streams of random values they produce.

A subproblem of this is approximating the range of a generator, i.e. produce the set of values it can generate (if the range is finite). If the range is infinite one can still approximate the set of values whose probability exceeds some threashold, for instance the values with more than one in a million chance of being generated in a single run of the generator.

One possible algorithm for this is as follows: At every step of the algorithm we have a set of values (possibly in some clever representation), and the hypothesis is that these are the only values with a relevant probability of generating. This hypothesis can be tested to a given P-value (e.g. 0.05) by generating lots of values and check if they are all in the set. If the hypothesis holds we have a good approximation, if it is falsified we have a new hypothesis (because we found another value to add to the set).

When the set is accumulated it can perhaps be used to extract some useful information about the generator, such as the smallest value that is not covered.

Relevant Courses: