Remote Identity Proofing Discussed at the Internet Identity Workshop

This is Part 3 of a series of posts presenting results of a project sponsored by an SBIR Phase I grant from the US Department of Homeland Security. These posts do not necessarily reflect the position or the policy of the US Government.

To get community feedback on our remote identity proofing project we made a presentation two days ago at the 23rd Internet Identity Workshop in Mountain View. The slides can be found here. We were gratified that the feedback was positive and there were in-depth discussions with identity experts both during and after the presentation.

We started by explaining the goal of the project. Remote identity proofing has often relied on asking the subject multiple-choice “knowledge questions” (e.g. which of the following zip codes did you live in five years ago?). This method is terrible for privacy, since it relies on the identity proofing service gathering and using troves of personal information about people. Furthermore, due to the proliferation of personal data available online, it has now become ineffective. This became clear when attackers succeeded last year in impersonating taxpayers by answering such knowledge questions and obtaining fraudulent tax refunds. The goal of the project was to identify five remote identity proofing solutions that could be used as alternatives to knowledge-based verification.

In-person identity proofing typically requires the presentation of a picture ID such as a driver's license or a passport as primary evidence of identity, plus secondary evidence from other identity sources, such as evidence of ownership of utility, financial and/or mobile accounts and address verification. It is not too difficult to present the secondary evidence remotely, so we focused on solutions to the problem of replacing the picture ID with primary evidence that can be presented remotely.

We have identified five solutions, which we described during the presentation. Solution 1 uses a rich credential (see below), issued by a DMV and containing a facial image, for three factor verification with spoofing detection. Solution 2 uses an adaptation of a rich credential for use in conjunction with a blockchain. In Solution 3, the subject demonstrates possession of a contactless EMV credit card to a remote verifier via a native app that interacts with the card over a Near Field Communication (NFC) connection and with the verifier over a secure Internet connection, and relays Application Protocol Data Units (APDUs) between the card and the verifier. In Solution 4, the subject demonstrates possession of a contactless medical identification smart card containing a certificate and a facial image, via a native app that relays APDUs and submits an audio-visual stream that the verifier uses for face recognition with spoofing detection. Finally, Solution 5 relies on face recognition with spoofing detection using the signed facial image and biographic data contained in the RFID chip embedded in a passport.

Rich Credentials

The concept of a rich credential, which is new, was described by going over several figures in the slides. (More details can be found in this paper.) A rich credential provides multifactor verification of a subject to a verifier by demonstrating possession of a private key, knowledge of a password, and possession of one or more biometric features, even though the subject has no prior relationship with the verifier. To enable this, the issuer embeds biometric verification data into the credential and uses a salted hash of a password chosen by the subject as input to the computation of the credential signature.

A rich credential provides selective disclosure of attributes, a privacy feature that allows the subject and the verifier to negotiate the set of attributes to be disclosed in a particular presentation. It also provides selective presentation of verification factors, a novel privacy feature that allows the subject and the verifier to negotiate whether the subject is to demonstrate knowledge of the password, whether the subject is to submit one or more biometric samples, and which biometric modalities to the subject must submit samples for, if the credential has embedded verification data for multiple modalities.

In the presentation, Slide 6 shows the components of a rich credential: a private key, a secret salt, and a rich certificate. The private key and the secret salt are generated in the device where the credential is stored, and never leave the device. The certificate has a signature by the issuer that binds the public key to metadata and asserted data. The metadata is the same that may be found in a traditional (e.g. X.509) public key certificate. But the asserted data is a typed hash tree rather than just a list of attributes. Moreover, the signature is applied to the root label of the tree, allowing the tree to be modified for the purpose of selective disclosure of attributes and selective presentation of verification factors without invalidating the signature.

Slide 7 shows an example of a typed hash tree representing a collection of key-value pairs. Each node has a type (numeric in the example) and a label. An internal node has a distinguished type (0 in the example) and its label is a hash of the types and labels of its children. Some leaf nodes that have the same distinguished type as internal nodes are labeled by salts. The type-label pairs of the leaf nodes not labeled by salts comprise the collection of key-value pairs represented by the tree.

Slide 8 shows an example of pruning a subtree. The nodes of the subtree other than its root (two nodes in the example) are removed from the subtree, but the root stays, and its type and label are not modified. The root of the subtree is now a leaf node with type 0, so we refer to its label as a salt, or, more specifically, as a computed salt because it can be computed from the types and labels of the leaf nodes of the subtree.

In a rich certificate, as shown in Slide 6, the typed hash tree is represented by a node array and a sparse label array. The node array has entries for the nodes of the tree, listed in depth-first post order, each entry containing the type of the node and the number of its children, but not the label. The sparse label array contains the labels of some of the nodes. The certificate goes through different states shown in Slide 10, and different labels are present in the label array in different states.

A typed hash tree can be used to represent any collection of key value pairs, and does not have any prescribed structure. But the typed hash tree in the figures is the typed hash tree of a rich credential, which has certain kinds of peripheral subtrees. A peripheral node is an internal node whose children are leaf nodes, and a peripheral subtree is a subtree rooted at a peripheral node. Selective disclosure of attributes and selective presentation of verification factors is achieved by pruning peripheral subtrees.

Slide 9 shows the kinds of peripheral subtrees that may be included in the typed hash tree of a rich credential. The first three subtrees in the first row are attribute subtrees, each having a leaf node labeled by a random salt and a leaf node labeled by an attribute, such as the subject's name, birthdate or address in the example. The last subtree in the first row is the password subtree, which has a leaf node labeled by a random salt and a leaf node labeled by a hash of the password and the secret salt, which is indicated in the figure by the acronym SHoHP (Salted Hash of Subject Password). (The salted hash is not present in the label array in the storage state of the certificate.) The three peripheral subtrees in the bottom row are biometric subtrees. A rich credential supports both revocable biometrics and traditional, non-revocable biometrics. The first two subtrees of the bottom row support two revocable biometric modalities (left iris and right iris), each comprising a node labeled by a random salt, a node labeled by a helper datum, and a node labeled by a biometric key. (The biometric key is not present in the label array in the storage state of the certificate.) The last subtree in the bottom row supports a non-revocable biometric modality, comprising a node labeled by a random salt and a node labeled by a biometric template.

During the presentation, the audience asked for a clarification of the different kinds of salts used in a rich credential. There are three kinds of salts: the secret salt that is a component of the credential; the random salts in the peripheral subtrees; and the computed salts that result from pruning peripheral subtrees. The purpose of the secret salt is to mitigate the threat of capture of the password if the subject uses the password by itself to log in to a malicious web site. The operator of the malicious web site learns the password, but cannot use it in connection with the rich credential without the secret salt. The random salts are not secret, because the random salts of peripheral subtrees that have not been pruned are sent to the verifier. But the random salt of a peripheral node that has been pruned is not sent, and without it the verifier cannot use the computed salt that labels the root of the peripheral subtree to mount a guessing attack against the label(s) of the other leaf nodes of the omitted peripheral subtree.

A rich certificate can be in one four states, shown in Slide 10: the issuance state, the storage state, the presentation state, and the verification state. At issuance time, the subject submits one or more biometric samples to the issuer and chooses the password. Also at issuance time, the subject's device generates the key pair and the secret salt, and sends the public key and the salted hash of the password and the secret salt to the issuer. The issuer constructs the rich certificate in the issuance state, where the label array comprises the labels of all the leaf nodes. Before delivering the certificate to the subject's device, the issuer puts it in the storage state by “dropping” the salted hash of the password and the biometric keys, i.e. removing them from the label array. Before presenting the credential, the subject's device prunes the subtrees of the attributes not to be disclosed and the verification factors not to be presented, as agreed upon with the verifier. Then it submits the certificate in the presentation state to the verifier, together with the salted hash of the password entered by the subject. Also, the subject submits a biometric sample for each biometric modality whose subtree has not been pruned. Before verifying the signature, the verifier puts the certificate in the verification state by adding to the label array the salted hash of the password and, for each revocable biometric verification factor being presented, if any, the biometric key computed from its helper datum and the biometric sample submitted for the modality.

Slide 10 showed a biometric sample being submitted by the subject's device to the issuer at issuance time and later to the verifier at presentation time. But in Solution 1 the issuance-time biometric sample is a facial image obtained by the DMV during an in-person visit by the subject, and the presentation-time biometric sample is a facial image extracted by the verifier from an audio-visual stream submitted by a native app that may be running in the subject's device or in a separate device. So we have revised Slide 10 to show the biometric sample as being submitted by the subject rather than by the subject's device.

Use of a rich credential in Solution 1 is illustrated in Slide 13, together with Slide 11, which describes the method used for spoofing detection. Details can be found in Section 3 of the paper.

See also:

The Remote Identity Proofing page, with links to other materials related to the project.