Using an Omission-Tolerant Checksum for Selective Disclosure of Attributes Asserted by a Public Key Certificate

This is part 2 of a series on omission-tolerant integrity protection and related topics. A technical report on the topic is available on this site and in the IACR ePrint Archive.

The first post of this series introduced a new technical report that describes the concept of an omission-tolerant checksum in a broader context than the rich credentials paper and includes a formal proof of security in an asymptotic security setting. In this post I give an example showing how an omission-tolerant checksum implemented using a typed hash tree can provide selective disclosure of attributes asserted by a public key certificate.

Typed hash trees vs. Merkle trees

A typed hash tree is a hash tree where every node has a type in addition to a label, and the label of an internal node is a cryptographic hash of an encoding of the types and labels of its children. A distinguished type is assigned to all the internal nodes, and may also be assigned to some of the leaf nodes.

Hash trees were first proposed by Merkle. In a Merkle tree, each internal node is labeled by a hash of the concatenation of the labels of its children, and each leaf node is labeled by a hash of a data block. In both a typed hash tree and a Merkle tree, a subtree can be pruned without modifying the root label of the tree. (By “pruning a subtree” we mean removing the descendants of an internal node.) But in the case of a Merkle tree this is a “bug” to be mitigated if the tree is to provide integrity protection, while in the case of a typed hash tree it is a “feature” that allows the root label of the tree to be used as an omission-tolerant checksum.

The root label of a Merkle tree is a checksum of a collection of data blocks whose hashes label the leaf nodes. The root label of a typed hash tree, on the other hand, is a checksum of a collection of key-value pairs. Each key-value pair is represented by a leaf node with a type other than the distinguished type, the type and label of the node being bit-string encodings of the key and value components of the key pair. (The types and labels of all leaf nodes contribute to the value of the root label, but only the types and labels and leaf nodes with undistinguished types are encodings of keys and values of the collection.) Pruning a subtree causes data blocks in the case of a Merkle tree, or key-value pairs in the case of typed hash tree, to be removed from the checksummed collection without modifying the checksum. But in the case of a Merkle tree, it also causes an extraneous data block to be added to the collection, viz. the concatenation of the labels of the children of the root of the subtree. In the case of a typed hash tree, on the other hand, nothing is added to the collection, because a leaf node with a distinguished type does not represent an element of the collection of key-value pairs.

Selective disclosure of attributes asserted by a public key certificate

Here is an example of selective disclosure of attributes using the root label of a typed hash tree as an omission-tolerant checksum.

Suppose you are an employee of the US Government with security clearance. Suppose you log in to a government web site using the public key certificate in your PIV card. If the site does not require security clearance, you don't want it to know whether you have one, or what level of clearance you have, so the PIV certificate does not have security clearance information. But if the site requires security clearance, it needs to know the information, so it has to look it up via an API, which introduces availability and security risks if the site that hosts the API is liable to be down or compromised.

The idea of selective disclosure is to include the selective disclosure information in the certificate but only disclose it as needed. This can be done by encoding all your attributes as key-value pairs in leaf nodes of a typed hash tree, with the security clearance information placed in a subtree that may be pruned when the information does not have to be disclosed. The tree is serialized into a subject-info bit string, omitting the bulky labels of the internal nodes since they can be computed from the types and labels of the leaf nodes. The subject-info bit string is included in a public key certificate whose signature covers metadata (such as validity period, serial number, issuer ID, etc.) and the root label of the tree computable from the bit string, but not the bit string itself. This requires a selective disclosure certificate with a format that differs from X.509. Alternatively, the root label may be placed together with the metadata in an X.509 certificate while the subject-info bit string is separate from the certificate but presented along with the certificate when the certificate is submitted to a verifier.

Now assume that you have a selective disclosure certificate in your PIV card, with a subject-info bit string that contains your security clearance information. For simplicity, assume that such information consists only of a security clearance level with two possible values, “Secret”, or “Top Secret”. Say your level is “Top Secret”. The security-clearance subtree consists of a root node with two children, a security-level node and a “salt node”; I'll describe the salt node and its purpose momentarily. The security-level node is a leaf node whose type is a bit-string indicating that its label is a security level. In your case, the label is a bit-string encoding of “Top Secret”.

Let's assume that a new version of TLS allows a selective disclosure certificate to be used as a TLS client certificate, and suppose you visit a web site that does not require security clearance. Your client software recognizes this and modifies your selective disclosure certificate as follows, without invalidating the signature. It deserializes the subject-info bit string, obtaining a typed hash tree without internal-node labels. It prunes the selective disclosure subtree, removing the selective disclosure node and the salt node, but leaving the root node of the subtree in place. The type and label of the root node are not modified, but the node becomes a leaf node whose label must be present in the serialization, so the client software computes its label from the types and labels of its children and adds it to the pruned tree before serializing it. Then the client software replaces the original subject-info bit string in the certificate with the serialization of the pruned tree, and submits the modified certificate to the web site.

To verify the signature in the certificate the web site deserializes the pruned tree and computes its root label from the types and labels of the leaf nodes. Then it computes a hash of the certificate metadata and the root label. This is the same hash that was signed by the certificate issuer, even though the subject-info bit string has been modified and no longer contains security-clearance information.

The web site cannot tell whether you have a security clearance or not. This is because the typed hash tree should include a secure-clearance subtree even if there is no security clearance. In that case the subtree consists of a single leaf node with a distinguished type and a random label that is indistinguishable from a cryptographic hash.

Even though the certificate that you submit to the web site contains no security clearance information, a malicious insider at the web site may try to figure it out by means of a brute-force guessing attack. If the pruned security-clearance subtree consisted only of its root node and the security-level node this would be very easy to do, since there are only two possible levels. The malicious insider would construct the security-clearance subtree first with “Secret” and then with “Top Secret” as the label of the leaf node. He or she would compute the label of the root node of the subtree in both cases, find out that in the second case it agrees with the label of the root node of the pruned subtree, and conclude that you have a Top Secret clearance. But the security-clearance subtree has the salt node as a second leaf node. If the salt node has a random high-entropy label, the guessing attack will fail because the malicious insider will be unable to construct guesses of the label of the root node of the subtree without knowing the label of the salt node.

In practice, the type of the salt node does not matter. In the asymptotic security setting of Section 5 of technical paper, however, it makes sense for the type of the salt node to be the distinguished type. This is because the label of a node whose type is the distinguished type, whether the node is an internal node labeled by a cryptographic hash or a leaf node with a random label, has a bit length equal to the width of the output of the hash function, which grows with the security parameter, while the labels of other nodes are independent of the security parameter.

We refer to a leaf node that is labeled by a distinguished type as a dangling node. We have seen three examples of dangling nodes, used for three different purposes. First, when there is no security clearance, a dangling node is used as the root of the security-clearance subtree to hide the fact that there is no security clearance. Second, when there is a security clearance and the subject-info bit string is presented to a web site that does not require clearance, the root of the security-clearance subtree becomes a dangling node when the subtree is pruned; this is what allows a typed hash tree to provide integrity protection, in contrast with a Merkle tree. Finally, a dangling node is included as a salt node in a security-clearance subtree to defeat guessing attacks on the security level.