Understanding Dieharder, Chi-Square, FIPS-140-2 Results

We have received a number of emails on the performance testing of the TrueRNG. Every one of these have ultimately been attributed to a small sample size or incorrect testing.

The TrueRNG products generate true random numbers – there is no guaranty that chi-squared or other statistical tests will fall within any given range for a small sample size.

To show this, I captured a large number of 10k blocks from the TrueRNG and ran the ‘ent’ tool on each one and captured the results. I then imported them into Openoffice Calc and did a frequency distribution on the chi-squared distribution and exceed percent values

Here are plots of the results

The Openoffice Calc spreadsheet is here. Here is the source data.

Notice that the chi-squared values follow a normal distribution centered around 256 and that the exceed percent values have a uniform distribution. For a large number of runs, this is the expected result.

From this, you can see that taking a small number of tests from a small sample size may give results that ‘seem’ bad. With a perfect generator, it is expected to get percent values < 5 or > 95 in 10% of the results. If you don’t see this distribution over a large number of runs, then there may be an issue.

It is expected that a true random number generator will fail statistical tests a certain percentage of the time.

For example, typical results from rngtest tool which runs FIPS 140-2 tests are:


Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source drained
rngtest: bits received from input: 114688000000
rngtest: FIPS 140-2 successes: 5729915
rngtest: FIPS 140-2 failures: 4484
rngtest: FIPS 140-2(2001-10-10) Monobit: 595
rngtest: FIPS 140-2(2001-10-10) Poker: 548
rngtest: FIPS 140-2(2001-10-10) Runs: 1653
rngtest: FIPS 140-2(2001-10-10) Long run: 1708
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=1713062.099; avg=9994464560.376; max=0.000)bits/s
rngtest: FIPS tests speed: (min=1.557; avg=82.152; max=85.531)Mibits/s
rngtest: Program run time: 1343537517 microseconds

Notice that there are 5729915 successes and 595 Monobit failures (out of 5734399 runs). For FIPS 140-2, the monobit test is EXPECTED to fail about 1 out of every 9662 runs with a perfect random source.

For 5,734,339 runs, we would expect about 593.5 failures which is very close to the actual number of failures of 595. Similarly, a certain number of failures is expected for each of the tests (except for the continuous run obviously).

Notes on other failures

For Dieharder, using the correct options and having sufficient input data is very important. You need 14+ gigabytes of input data to run dieharder with a small number of rewinds. Having a small data file input feeds the same data multiple times into the test and often gives false ‘FAILED’ results. I run most dieharder tests with:

dieharder -g 201 -k 2 -Y 1 -f FILENAME

-g 201 = use external file as input

-k 2 = use maximum accuracy to machine precision (slower)

-Y 1 = resolve ambiguity mode = reruns ‘WEAK’ results until PASSED or FAILED result is obtained

-f FILENAME = the name of the file to use for the source

diehard_sums test

According to the dieharder web page, the diehard_sums test is suspect and may fail on good generators

Other dieharder failures

As with other statistical tests for random number generators, it is expected that each dieharder test gets a ‘FAILED’ result a certain percentage of the time.

A single of small number of failures on a particular test is not a cause to label the generator as bad. For a ‘FAILED’ result, a particular test can be re-ran on the same input file with a larger block size to show that the failure is an anomaly.

I use the following to re-run a single particular dieharder test and increase the p_value multiplier (-m) and/or use split to chop up the input file into chunks then test each one separately (re-running the same test on the same input will get the same result and not tell you anything about the rest of the file)

dieharder -d 201 -g 201 -k 2 -Y 1 -m 2 -f FILENAME

-d 201 = test number to run — see below