Understanding Dieharder, Chi-Square, FIPS-140-2 Results
We have received a number of emails on the performance testing of the TrueRNG. Every one of these have ultimately been attributed to a small sample size or incorrect testing.
The TrueRNG products generate true random numbers – there is no guaranty that chi-squared or other statistical tests will fall within any given range for a small sample size.
To show this, I captured a large number of 10k blocks from the TrueRNG and ran the ‘ent’ tool on each one and captured the results. I then imported them into Openoffice Calc and did a frequency distribution on the chi-squared distribution and exceed percent values
Here are plots of the results
The Openoffice Calc spreadsheet is here. Here is the source data.
Notice that the chi-squared values follow a normal distribution centered around 256 and that the exceed percent values have a uniform distribution. For a large number of runs, this is the expected result.
From this, you can see that taking a small number of tests from a small sample size may give results that ‘seem’ bad. With a perfect generator, it is expected to get percent values < 5 or > 95 in 10% of the results. If you don’t see this distribution over a large number of runs, then there may be an issue.
It is expected that a true random number generator will fail statistical tests a certain percentage of the time.
For example, typical results from rngtest tool which runs FIPS 140-2 tests are:
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rngtest: starting FIPS tests...
rngtest: entropy source drained
rngtest: bits received from input: 114688000000
rngtest: FIPS 140-2 successes: 5729915
rngtest: FIPS 140-2 failures: 4484
rngtest: FIPS 140-2(2001-10-10) Monobit: 595
rngtest: FIPS 140-2(2001-10-10) Poker: 548
rngtest: FIPS 140-2(2001-10-10) Runs: 1653
rngtest: FIPS 140-2(2001-10-10) Long run: 1708
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=1713062.099; avg=9994464560.376; max=0.000)bits/s
rngtest: FIPS tests speed: (min=1.557; avg=82.152; max=85.531)Mibits/s
rngtest: Program run time: 1343537517 microseconds
Notice that there are 5729915 successes and 595 Monobit failures (out of 5734399 runs). For FIPS 140-2, the monobit test is EXPECTED to fail about 1 out of every 9662 runs with a perfect random source.
For 5,734,339 runs, we would expect about 593.5 failures which is very close to the actual number of failures of 595. Similarly, a certain number of failures is expected for each of the tests (except for the continuous run obviously).
Notes on other failures
For Dieharder, using the correct options and having sufficient input data is very important. You need 14+ gigabytes of input data to run dieharder with a small number of rewinds. Having a small data file input feeds the same data multiple times into the test and often gives false ‘FAILED’ results. I run most dieharder tests with:
dieharder -g 201 -k 2 -Y 1 -f FILENAME
-g 201 = use external file as input
-k 2 = use maximum accuracy to machine precision (slower)
-Y 1 = resolve ambiguity mode = reruns ‘WEAK’ results until PASSED or FAILED result is obtained
-f FILENAME = the name of the file to use for the source
diehard_sums test
Other dieharder failures
As with other statistical tests for random number generators, it is expected that each dieharder test gets a ‘FAILED’ result a certain percentage of the time.
A single of small number of failures on a particular test is not a cause to label the generator as bad. For a ‘FAILED’ result, a particular test can be re-ran on the same input file with a larger block size to show that the failure is an anomaly.
I use the following to re-run a single particular dieharder test and increase the p_value multiplier (-m) and/or use split to chop up the input file into chunks then test each one separately (re-running the same test on the same input will get the same result and not tell you anything about the rest of the file)
dieharder -d 201 -g 201 -k 2 -Y 1 -m 2 -f FILENAME
-d 201 = test number to run — see below
-g 201 = use external file as input
-k 2 = use maximum accuracy to machine precision (slower)
-Y 1 = resolve ambiguity mode = reruns ‘WEAK’ results until PASSED or FAILED result is obtained
-m 2 = use twice the number of p_samples as input
-f FILENAME = the name of the file to use for the source
Dieharder Test Flags
-d 0 Diehard Birthdays Test Good
-d 1 Diehard OPERM5 Test Good
-d 2 Diehard 32x32 Binary Rank Test Good
-d 3 Diehard 6x8 Binary Rank Test Good
-d 4 Diehard Bitstream Test Good
-d 5 Diehard OPSO Suspect
-d 6 Diehard OQSO Test Suspect
-d 7 Diehard DNA Test Suspect
-d 8 Diehard Count the 1s (stream) Test Good
-d 9 Diehard Count the 1s Test (byte) Good
-d 10 Diehard Parking Lot Test Good
-d 11 Diehard Minimum Distance (2d Circle) Test Good
-d 12 Diehard 3d Sphere (Minimum Distance) Test Good
-d 13 Diehard Squeeze Test Good
-d 14 Diehard Sums Test Do Not Use
-d 15 Diehard Runs Test Good
-d 16 Diehard Craps Test Good
-d 17 Marsaglia and Tsang GCD Test Good
-d 100 STS Monobit Test Good
-d 101 STS Runs Test Good
-d 102 STS Serial Test (Generalized) Good
-d 200 RGB Bit Distribution Test Good
-d 201 RGB Generalized Minimum Distance Test Good
-d 202 RGB Permutations Test Good
-d 203 RGB Lagged Sum Test Good
-d 204 RGB Kolmogorov-Smirnov Test Test Good
-d 205 Byte Distribution Good
-d 206 DAB DCT Good
-d 207 DAB Fill Tree Test Good
-d 208 DAB Fill Tree 2 Test Good
-d 209 DAB Monobit 2 Test Good
If there is consistent failure on a particular test using multiple input runs and the test is not suspect, then there may be something wrong with the random number generator or testing methodology.