DavidJohnston's Replies |

Forum Replies Created

Viewing 1 post (of 1 total)

Author

Posts
February 6, 2020 at 7:05 pm in reply to: TrueRNG fails the Chi-square Test #2262

DavidJohnston
Member

A few years has passed, but no one has corrected this thread. It is based on misconceptions of the Chi-Sq Test of Randomness.

The ‘result’ from the test given uniform random data should vary uniformly between 0.0 and 1.0.
When testing fresh, fully uniform data from an RNG and your criteria is say between 1% and 99%, then you would expect 2% of samples to fail even when fed uniform random data.

The Chi-Sq test is not a test with a pass/fail result and a P-value. It’s a transformation of the bias of the data (that follows a chi-sq distribution) to a uniform distribution. If the result of multiple tests are hard up against the endpoint, then you have some bias.

If you see this from an entropy source, that is fine. Bias is expected from an entropy source and you should expect to be feeding that into an entropy extractor to get uniform data.

Now what is completely out of whack is this bit…
would exceed this value 25.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 99.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 75.00 percent of the times.
would exceed this value 2.50 percent of the times.
would exceed this value 25.00 percent of the times.
would exceed this value 75.00 percent of the times.
would exceed this value 10.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 50.00 percent of the times.
would exceed this value 25.00 percent of the times.
would exceed this value 50.00 percent of the times.

What is that? Why are most results quantised to 75.00, 50.00 or 25.00? Why do none have anything in the bottom two significant digits. That is fantastically unlikely in uniform random data. It suggests skulduggery. I reference section 8.11, page 170 in my book “Random Number Generators, Principles and practices” titled ‘Results That are “Too Good”‘ for a description of the statistics of such results.

Example runs of the chi-sq test over 10 samples of 1MiByte uniform random data (from RdRand) give this..
58.49%, 64.21%, 1.26%, 72.63%, 40.34%, 55.45%, 78.88%, 52.52%, 52.12%, 32.92%

At 4 digits of precision there are 10,000 uniformly likely results from random data. The quoted results above seem to be stuck on 5 of those results. (5/10000)^18 is a very very small probability. The result is bunk.

I recommend my improved version of ent – djent (https://github.com/dj-on-github/djent) for testing such data since you have better control over symbol sizes which you could match to the ADC symbol size for correct analysis.

I suggest posting actual data samples so we can run out own tests.
Author

Posts

Viewing 1 post (of 1 total)