Understanding Dieharder, Chi-Square, FIPS-140-2 Results

We have received a number of emails on the performance testing of the TrueRNG. Every one of these have ultimately been attributed to a small sample size or incorrect testing.

The TrueRNG products generate true random numbers – there is no guaranty that chi-squared or other statistical tests will fall within any given range for a small sample size.

To show this, I captured a large number of 10k blocks from the TrueRNG and ran the ‘ent’ tool on each one and captured the results. I then imported them into Openoffice Calc and did a frequency distribution on the chi-squared distribution and exceed percent values

Here are plots of the results

Ent Freq Dist Chi Squared Dist Ent Freq Dist Exceed Percent

The Openoffice Calc spreadsheet is here. Here is the source data.

Notice that the chi-squared values follow a normal distribution centered around 256 and that the exceed percent values have a uniform distribution. For a large number of runs, this is the expected result.

From this, you can see that taking a small number of tests from a small sample size may give results that ‘seem’ bad. With a perfect generator, it is expected to get percent values < 5 or > 95 in 10% of the results. If you don’t see this distribution over a large number of runs, then there may be an issue.

It is expected that a true random number generator will fail statistical tests a certain percentage of the time.

For example, typical results from rngtest tool which runs FIPS 140-2 tests are:


Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rngtest: starting FIPS tests...
rngtest: entropy source drained
rngtest: bits received from input: 114688000000
rngtest: FIPS 140-2 successes: 5729915
rngtest: FIPS 140-2 failures: 4484
rngtest: FIPS 140-2(2001-10-10) Monobit: 595
rngtest: FIPS 140-2(2001-10-10) Poker: 548
rngtest: FIPS 140-2(2001-10-10) Runs: 1653
rngtest: FIPS 140-2(2001-10-10) Long run: 1708
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=1713062.099; avg=9994464560.376; max=0.000)bits/s
rngtest: FIPS tests speed: (min=1.557; avg=82.152; max=85.531)Mibits/s
rngtest: Program run time: 1343537517 microseconds
	

Notice that there are 5729915 successes and 595 Monobit failures (out of 5734399 runs). For FIPS 140-2, the monobit test is EXPECTED to fail about 1 out of every 9662 runs with a perfect random source.

FIPS

For 5,734,339 runs, we would expect about 593.5 failures which is very close to the actual number of failures of 595. Similarly, a certain number of failures is expected for each of the tests (except for the continuous run obviously).

Notes on other failures

For Dieharder, using the correct options and having sufficient input data is very important. You need 14+ gigabytes of input data to run dieharder with a small number of rewinds. Having a small data file input feeds the same data multiple times into the test and often gives false ‘FAILED’ results. I run most dieharder tests with:

dieharder -g 201 -k 2 -Y 1 -f FILENAME

-g 201 = use external file as input

-k 2 = use maximum accuracy to machine precision (slower)

-Y 1 = resolve ambiguity mode = reruns ‘WEAK’ results until PASSED or FAILED result is obtained

-f FILENAME = the name of the file to use for the source

diehard_sums test

According to the dieharder web page, the diehard_sums test is suspect and may fail on good generators

Other dieharder failures

As with other statistical tests for random number generators, it is expected that each dieharder test gets a ‘FAILED’ result a certain percentage of the time.

A single of small number of failures on a particular test is not a cause to label the generator as bad. For a ‘FAILED’ result, a particular test can be re-ran on the same input file with a larger block size to show that the failure is an anomaly.

I use the following to re-run a single particular dieharder test and increase the p_value multiplier (-m) and/or use split to chop up the input file into chunks then test each one separately (re-running the same test on the same input will get the same result and not tell you anything about the rest of the file)

dieharder -d 201 -g 201 -k 2 -Y 1 -m 2 -f FILENAME

-d 201 = test number to run — see below

-g 201 = use external file as input

-k 2 = use maximum accuracy to machine precision (slower)

-Y 1 = resolve ambiguity mode = reruns ‘WEAK’ results until PASSED or FAILED result is obtained

-m 2 = use twice the number of p_samples as input

-f FILENAME = the name of the file to use for the source

Dieharder Test Flags
	
-d 0                            Diehard Birthdays Test              Good
-d 1                               Diehard OPERM5 Test              Good
-d 2                    Diehard 32x32 Binary Rank Test              Good
-d 3                      Diehard 6x8 Binary Rank Test              Good
-d 4                            Diehard Bitstream Test              Good
-d 5                                      Diehard OPSO           Suspect
-d 6                                 Diehard OQSO Test           Suspect
-d 7                                  Diehard DNA Test           Suspect
-d 8                Diehard Count the 1s (stream) Test              Good
-d 9                  Diehard Count the 1s Test (byte)              Good
-d 10                         Diehard Parking Lot Test              Good
-d 11         Diehard Minimum Distance (2d Circle) Test             Good
-d 12         Diehard 3d Sphere (Minimum Distance) Test             Good
-d 13                             Diehard Squeeze Test              Good
-d 14                                Diehard Sums Test        Do Not Use
-d 15                                Diehard Runs Test              Good
-d 16                               Diehard Craps Test              Good
-d 17                     Marsaglia and Tsang GCD Test              Good
-d 100                                STS Monobit Test              Good
-d 101                                   STS Runs Test              Good
-d 102                   STS Serial Test (Generalized)              Good
-d 200                       RGB Bit Distribution Test              Good
-d 201           RGB Generalized Minimum Distance Test              Good
-d 202                           RGB Permutations Test              Good
-d 203                             RGB Lagged Sum Test              Good
-d 204                RGB Kolmogorov-Smirnov Test Test              Good
-d 205                               Byte Distribution              Good
-d 206                                         DAB DCT              Good
-d 207                              DAB Fill Tree Test              Good
-d 208                            DAB Fill Tree 2 Test              Good
-d 209                              DAB Monobit 2 Test              Good

If there is consistent failure on a particular test using multiple input runs and the test is not suspect, then there may be something wrong with the random number generator or testing methodology.

Leave a Reply