Monday, September 4, 2023

Pure software RNG can be truly random

This is a sequel to https://skyspiral7.blogspot.com/2014/09/truly-random.html so read that first. I noticed that Wikipedia was using the phrase "true random" without ever defining what is required for a generator to be truly random. Wikipedia seems to imply that true randomness requires measuring physical things for entropy and that software alone can't do it.

In my last post about randomness, I said "A truly random event has an outcome (or occurrence) that can never be predicted" which is true but not specific enough. If God is the only person who can predict the event and humans never can then that should also be considered true random. But "humans can never predict (even using tools like computers)" is still more strict than it needs to be.

The whole point is indeed to be impossible to predict but consider a 6 sided die: if a computer knows the exact inputs it could predict the result. If a computer doesn't know the inputs then it would have to guess the height from table, spin, initial orientation, and size of die at a minimum (table's friction, die's friction, die's weight, and die's fairness are minor). However, there are way too many possible values to guess compared to the 6 possible outcomes. This means that given a lack of information about the inputs it is easier to guess the outcome than to simulate the process. This makes the process effectively truly random since without the inputs there's no possible formula. Therefore, I propose this definition: a truly random event has an outcome (or occurrence) that can't currently be predicted with more certainty than evenly spread over every possible outcome. Given more information something can lose the "true" status but as long as it is "true" then prediction is impossible which means that a pure software pseudorandom number generator could achieve true randomness (hold your fire there's more requirements to come). To explain why I'll use OTP as an example.

I won't explain how a One Time Pad (OTP) works since it isn't very relevant (you can look it up: it's simple). OTP's perfect encryption relies on the lack of pattern which is only possible if the keys are truly random. Suppose Alice encrypted a message using OTP but the key was digits of pi. Eve knows that Alice's message starts with "DearBob" so she subtracts and sees that the keys would have to be 3141592 and Eve realizes that this is pi. Using that information Eve is able to crack the entire message. These keys are not truly random because the rest of the numbers were predicted with certainty. If Alice tries again using a pseudorandom number generator seeded once at the start, then Eve can do the same thing by guessing the generator and seed (again not truly random). One feature of OTP using truly random keys is that even if the first few letters can be guessed it will tell you nothing about the rest of the message because the rest has no connection at all to the previous numbers. In order for a pseudorandom number generator to meet that requirement it would need a new seed for every number generated, which is to say it would use entropy as-is (stretching the entropy isn't allowed since that would form a small pattern).

Wikipedia says that a cryptographically secure pseudorandom number generator (CSPRNG) needs 2 things. The first is that given the sequence so far it shouldn't be possible to guess the next bit with more than 50% accuracy. The second requirement is that if the entire state of the CSPRNG is revealed or correctly guessed, the rest of the sequence (forward and back) can't be figured out. The first requirement obviously meets my definition of true random and the second does as well because if a pattern is revealed then "true" status is lost. I won't say that all CSPRNG are true random because Wikipedia makes it sound like there's different levels of strength based on what's required and that's technically also true for true random but I just don't know enough about CSPRNG.

It is possible for software to obtain good entropy without special hardware. /dev/random for example looks at timings of low level stuff and as such is truly random (assuming it doesn't stretch the values). Measuring background static, radioactive decay, etc isn't needed (although such hardware might have a higher bandwidth of entropy). Now you might be saying that /dev/random can be fed information and in that case yes it would lose its true random status and if it can lose "true-ness" then it was never true to begin with. However, that's how all "true random" works: they can all lose their status. Rolling dice is considered true random and yet it's possible to roll dice in a way to intentionally feed it specific input values (spin etc) which destroys the entropy and makes it possible for a computer to pre-calculate the result with high certainty. Thus, destroying the entropy causes the process to lose its "true" status. That's universally true: true randomness requires entropy (which is unknown info) and by knowing that info (thus demoting the entropy to regular input data) you also cause the true randomness to be demoted to pseudorandom.

Now you may be thinking that no one can know or control certain quantum stuff and while technically true (for now at least) that isn't useful because the device that reports the value can be hijacked. Assuming that your atomic decay detector hasn't been hijacked is the same as assuming that no one has messed with /dev/random. Now imagine a "change my mind" booth meme and try to think of a reason why pure software entropy is "not as random" as hardware based entropy. That's not to say hardware entropy is useless: it might (or might not) have higher bandwidth (depending on device vs software entropy's source).

No comments:

Post a Comment