Consistent Results from Inconsistent Data

This post is a continuation of my report on the NIST Cryptographic Key Management Workshop. In the previous post I said I had more to say about Rene Struik's presentation [1] because there are multiple connections between his talk and our own presentation [2].

Dr. Struik proposed to use a physical uncloneable function (PUF) to generate a cryptographic key when the device is turned on. A PUF is a function implemented by a physical device that is difficult to predict and difficult to duplicate in a different device. One way of implementing a PUF is by applying stimuli to a device such as a chip, and reading responses that are not determined by the design of the chip, but depend instead on small random variations of the fabrication process within the tolerances of the process specification. For key generation, one stimulus is applied to the device, viz. power, and one response is produced, an uncloneable key.

One problem with this key generation method is that the response of the chip will be slightly different each time the device is turned on. Struik proposed an elegant method of removing these differences in order to obtain the same key consistently, and using these same differences as an entropy source for true random number generation.

I saw an interesting connection between Struick's uncloneable key and the hardware key used by Apple for data protection in iOS, which I discussed during my talk.

Both keys are intended to be difficult to extract from the device where they are used. The hardware key is built into the silicon of a hardware encryption chip, whose interface does not allow the key to be exported from the chip; but no claims are made as to the chip being tamper resistant, so a well-equipped attacker may be able to read the key by probing the silicon. By contrast, the uncloneable key is not present in the device when the device is turned off and thus cannot be extracted by probing the silicon; but custom code running on the device could obtain the key when the device is turned on.

The two methods could be combined. For example, a device could contain an encryption chip that would generate an uncloneable key when power is applied, and the interface of the chip could prevent the key from being exported. However, custom code running on the device could still make use of the key, just like forensic tools are able to use the hardware key in the iPhone in brute-force attacks to crack the user's passcode [3] [4].

Another connection between the two presentations is that they both proposed regenerating keys instead of storing them. In Struik's presentation a symmetric key was regenerated from the physical properties of the device. In our presentation an RSA key pair was regenerated from user input (a PIN and/or a biometric sample). Our regeneration method is not applicable if there is no user, while Struik's is; on the other hand, when used in a user-controlled mobile device, our method has the advantage that the key pair caonnot be obtained by custom code running on a device that has been lost or stolen.

But the most interesting connection, for me, was between Struik's method for consistently producing the same uncloneable key from a succession of slightly different physical responses to power-on, and the method of Hao, Anderson and Daugman [5] for producing a consistent biometric key from a succession of genuine but slightly different iris image samples. (In our presentation, we proposed to use the method of Hao et al. in two- and/or three-factor authentication.) Both methods are based on the same ingenious adaptation of the error correction paradigm for producing consistent results from inconsistent data (which was pioneered by Juels and Wattenberg [6] for implementing fuzzy commitments as explained below).

Error correction techniques were originally intended, and are most often used, for correcting errors caused by the transmission of data over a noisy channel. Redundancy added to the data at the source makes it possible to correct a small number of errors at the destination. (A unit of data to which redundancy has been added is called a codeword.)

The small variations that occur between successive biometric samples or between successive samples of an uncloneable key are similar to errors produced by channel noise, but there is no correct source data to which redundancy could be added to eliminate errors. If fact, no errors are involved because, although one particular sample may serve as a reference, no sample is more correct than any other sample.

To remove data variations, error correction is used as follows:

A random codeword c is generated by adding redundancy to a random string. The random codeword is of same length as a data sample, but is otherwise unrelated to any data sample.
The random codeword c is bitwise x-ored with a reference sample r to produce helper data h: h = c ⊕ r.
When a new sample s is obtained, it is bitwise x-ored with the helper data to produce (c ⊕ r) ⊕ s. Since x-or is associative, this is the same as c ⊕ (r ⊕ s) and since r and s are similar, i.e. differ only in a few bits, the bit-string d = r ⊕ s consists mostly of zeros. Bitwise x-oring c with d flips a few bits in c, thus having the same effect that could be produced by transmitting c over a noisy channel.
Error correction is applied to recover the random codeword c from c ⊕ d. If desired, c can then be bitwise x-ored with h = c ⊕ r to obtain r.

What this accomplishes is that the same result is consistently produced from the variable sample s, as long as s in not too different from r. Either c or r can be used as the consistent result. Struik uses r, but I believe he could equivalently use c. Hao et al. use c, as their biometric key. In biometric applications, it is important to use c rather than r to avoid revealing the user's biometric sample r, which could be a privacy violation. The codeword c contains no biometric information.

Juels and Wattenberg [6] used the same technique for implementing fuzzy commitments; as far as I can tell, they were the original inventors of the technique. In a cryptographic commitment scheme, a sender commits to a value by combining it with a witness to produce a blob, and sends the blob to a receiver. The sender can later reveal the value by providing the witness, which the receiver uses to compute the blob and verify that it coincides with the one received earlier. The receiver cannot derive the committed value from the blob without the witness, and only one value, the original committed value, can be successfully revealed by the sender. Juels and Wattenberg used the above technique to devise a fuzzy commitment scheme that tolerates small variations in the witness. With the above notations (different from their own), the committed value is the codeword c, the witness is r, and the blob is an ordered pair consisting of a conventional cryptographic hash of c and the helper data h. When the sender provides s to decommit the value, the receiver computes c as above and verifies that the hash of the computed c coincides with the hash in the blob received from the sender.

In their paper [6], Juels and Wattenberg made the connection with biometric authentication, but not with physical uncloneable functions.

References

[1]	Rene Struik. Secure Key Storage and True Random Number Generation - An Overview. NIST Cryptographic Key Management Workshop, September 2012.
[2]	Francisco Corella and Karen Lewison. Key Management Challenges of Derived Credentials and Techniques for Addressing Them. NIST Cryptographic Key Management Workshop, September 2012.
[3]	Andrey Belenko. Overcoming iOS Data Protection to Re-enable iPhone Forensics. Black Hat USA, July-August 2011.
[4]	Jean-Baptiste Bedrune and Jean Sigwald. iPhone data protection in depth. HITB Security Conference, May 2011.
[5]	Feng Hao, Ross Anderson and John Daugman. Combining Cryptography with Biometrics Effectively. IEEE Transactions on Computers, volume 5, number 9, pages 1081 - 1088, 2006. Available as a Cambridge Computer Laboratory technical report.
[6]	Ari Juels and Martin Wattenberg. A Fuzzy Commitment Scheme. ACM Conference on Computer and Communications Security, 1999.