A Formal Proof of Omission-Tolerant Integrity Protection

This is the fourth and last part of a series on omission-tolerant integrity protection and related topics. See also part 1, part 2, and part 3. A technical report on the topic is available on this site and in the IACR ePrint Archive.

In part 3 we saw how the system parameterization concept introduced by Boneh and Shoup in their Graduate Course in Applied Cryptography makes it possible to provide a formal definition of collision resistance for keyless hash functions. In this last post I will explain how that definition, and the security and system parameters used in the definition, are used in the proof of Theorem 3 of the technical report, which states that the root label of a typed hash tree can serve as an omission-tolerant checksum of an encoding of a set of key-value pairs.

Constructing an encoding with checksum of a set of key value-pairs dependent on security and system parameters

In part 2 I described the structure of a typed hash tree and gave an example of how a typed hash tree can be used to construct a bit-string encoding of a set of key-value pairs in a way that allows the root label of the tree to serve as an omission-tolerant checksum of the set. Now I'll show how the encoding and its checksum can be made dependent on the security and system parameters. The role of the security and system parameters in the construction is to select the hash function used to compute the labels of the internal nodes of the tree, from the family of hash functions defined by a hashing scheme with system parameterization. The encoding algorithm takes the following three steps.

First, an ordered tree is constructed with a shape freely chosen by the algorithm, the key and value of each key-value pair are encoded as bit strings and assigned as the type and label of a leaf node, and a distinguished type other than any type used to encode a key is assigned to the internal nodes and any remaining leaf nodes. Nodes with the distinguished type are called distinguished nodes, and distinguished leaf nodes are called dangling nodes. This first step does not depend on the security and system parameters. It produces an incomplete typed hash tree without the distinguished node labels.

Second, strings with a bit length equal to the bit length of output the hash function are generated at random with uniform distribution and assigned to the dangling nodes. This step depends on the security parameter, which determines the bit length of the output of the hash function. (Although the security and system parameters together determine the hash function, the security parameter by itself determines the bit length of its output.) It produces an incomplete typed hash tree without internal node labels.

Third, the tree is serialized to produce the bit-string encoding of the set of key-value pairs.

A checksum algorithm computes the omission-tolerant checksum by deserializing the bit-string encoding, computing the labels of the internal nodes, and outputting the label of the root node. The computation may fail if it encounters a prelabel of an internal node (defined as the sequence of the types and labels of its children that is input to the hash function to compute its label) that is too long for the hash function. In that case the checksum algorithm fails and stops without output.

Attack games as algorithms that take other algorithms as inputs

Security reductions based on attack games are often constructed according to the following basic pattern. A security property of a cryptographic scheme, or cryptosystem, is defined by an attack game played between a challenger and an adversary, and is said to hold if the probability that an efficient adversary has of winning the game is a negligible function of the security parameter. (By being efficient we mean, as usual, running within time polynomial on the security parameter; Boneh and Shoup use a more complex definition that allows for non-termination.) To prove that a security property of a cryptosystem 𝔄 defined by game A holds under the assumption that a security property of a cryptosystem 𝔅 defined by game B holds, one assumes that there exists an efficient adversary 𝒜 able to win game A with a probability that is a non-negligible function of the security parameter, and uses 𝒜 to construct an efficient adversary ℬ that wins game B with a probability that is likewise a non-negligible function of the security parameter.

For lack of a better term I shall refer to the games A and B used in the basic pattern as the antecedent and consequent games respectively, alluding to the fact that the existence of 𝒜 implies the existence of ℬ. The contrapositive of this implication means that the security property of cryptosystem 𝔄 defined by game A implies the security property of cryptosystem 𝔅 defined by game B.

The basic attack-game pattern specifies how the challenger and the adversary behave, but not how they are formalized. In Section 2.4.3 of their online textbook Boneh and Shoup formalize the challenger and the adversary of an attack game as a pair of interactive machines. We use a simpler formalization inspired by modern programming languages like JavaScript that are both imperative and functional, and where functions can be passed as arguments. As described in more detail in Section 2.2 of the technical report, the challenger is an algorithm, and the adversary is another algorithm that is provided to the challenger as an input, just as a function may be passed as an argument to another function in JavaScript. In a security reduction that follows the basic pattern, the efficient adversary of game B is constructed as an algorithm that runs the challenger algorithm of game A, providing it as input the efficient adversary assumed to win game A with a probability that is a non-negligible function of the security parameter.

In the attack games of the technical report the adversary does not make any queries. In games involving queries, the oracle answering the queries would be modeled as an algorithm that the challenger would provide as an input to the adversary.

The proof of Theorem 3 extends the basic attack-game pattern in two ways: there are two consequent games instead of one, and the adversary of each consequent game modifies the challenger and the adversary of the antecedent game.

Games used in the proof of Theorem 3

If the typed hash tree used in the encoding has dangling nodes, then the omission-tolerant integrity protection provided by the root label depends on both the collision resistance and the preimage resistance of the hash function. Therefore the proof of Theorem 3 uses two consequent games: the attack game against collision resistance discussed in part 3 (Game 1 in our technical report, or Figure 8.3 in Boneh and Shoup's online book), and an attack game against preimage resistance (Game 2 in the report, not provided in Boneh and Shoup's book). A hashing scheme is collision resistant according to Definition 18 (resp. preimage resistant according to Definition 19) if every efficient adversary has a probability of winning Game 1 (resp. Game 2) that is a negligible function of the security parameter.

The antecedent game used in the proof of Theorem 3 is Game 4 of the technical report. I won't try to paraphrase it here, but I'll point out the differences with Game 3, using the notations and variable names of the technical report. Game 3 is the game used in the definition of integrity protection (Definition 22), while Game 4 is used in the definition of omission-tolerant integrity protection (Definition 24).

One difference is unsurprising: in both games, the adversary has to produce an encoding y' of an element x' of S that has the same checksum z as y; but in Game 3 x' must be different from x while in Game 4, where the elements of the set S are themselves sets, x' must not be a subset of x.

The other difference may seem puzzling. In Game 4, the encoding algorithm E is split into two phases called E_I and E_II. The first phase E_I takes as input the element x of S to be encoded and is meant to produce a fixed representation X of x that does depend on the security and system parameters λ and Λ. The second phase E_II takes λ, Λ and X as inputs to produce the bit string-encoding y of x. When Game 4 is used in the context of Theorem 3, E_I performs the first of the three steps of the encoding algorithm for sets of key-value pairs and outputs a typed hash tree X without distinguished node labels, while E_II performs step 2, which assigns random labels to the dangling nodes of X, and step 3, which serializes the tree. E_II is invoked by the challenger after the challenger has run the system parameterization algorithm P on input λ to obtain Λ, while the output X of E_I is a precomputed game parameter.

The reason for splitting E into two phases is to allow Game 4 to be used in the proof of Theorem 3, where the encoding algorithm is a hybrid non-deterministic algorithm that makes both probabilistic and non-probabilistic choices. As discussed in Section 2.1 of the technical report, the output of a hybrid algorithm is a well-defined random variable only if the non-probabilistic choices are made before the probabilistic ones. Phase I of E is meant to make all the non-probabilistic choices, so that when the adversary of Game 1 or Game 2 runs the challenger of Game 4 in the security reduction of the proof of Theorem 3, the non-probabilistic choices are made before the challenger runs the system parameterization algorithm P, which makes probabilistic choices to compute the system parameter Λ.

Summary of the proof of Theorem 3

Theorem 3 proves that the root label of a typed hash tree provides omission-tolerant integrity protection, as defined by Game 4, for an encoding of a set of key-value pairs obtained as a serialization of a typed hash tree.

The proof begins by assuming that there exists an efficient adversary 𝒜 whose probability of winning Game 4 is a non-negligible function p of the security parameter λ. In an execution of Game 4 where 𝒜 wins, the challenger runs E_II on a fixed representation X of an element x of S and produces an encoding y of x with checksum z, which it provides to 𝒜 as an input. Then 𝒜 outputs an encoding y' of an element x' of S that has the same checksum z even though x' is not a subset of x.

In the context of Theorem 3, z is the root label of a typed hash tree Z obtained by deserializing y, as well as the root label of a typed hash tree Z' obtained by running the deserialization algorithm on y'. Since Z and Z' have the same root label, by Theorem 1, either (i) Z' has been obtained from Z by pruning subtrees, or (ii) the label of an internal node of Z' is equal to the label of a dangling node of Z, or (iii) an internal node of Z' has the same label as an internal node of Z but a different prelabel.

Case (i) is ruled out by the fact that x' is a subset of x. In case (ii), the prelabel of the internal node is a preimage of a random element of the codomain of the hash function, because Z is obtained by assigning random labels to the dangling nodes of X in step 2 of the encoding algorithm for sets of key-value pairs. In case (iii), the prelabels of the internal nodes that have the same label constitute a collision of the hash function.

The probability p(λ) that 𝒜 wins Game 4 is no greater than the sum of the probability p'₀(λ) that it does so and furthermore case (iii) holds, and the probabilities p'_i(λ) that it does so and furthermore case (ii) holds with the label of the i-th dangling node of Z being equal to the label of an internal node of Z'. Therefore if p is a non-negligible function, then either p'₀ or one of the p'_i(λ) must be a non-negligible function.

If p'₀ is non-negligible, an efficient adversary 𝒜_CR of Game 1 can be constructed as follows. 𝒜_CR runs Game 4 passing 𝒜 as an input to the challenger; but it modifies the challenger of Game 4 so that it uses the system parameter Λ generated by the challenger of Game 1 instead of generating a fresh one. It also modifies 𝒜 so that it outputs Z and Z' if it wins the game. When that is the case, 𝒜_CR looks for a node of Z and a node of Z' that have the same label, and outputs their prelabels if it finds them. 𝒜_CR wins Game 1 with non-negligible probability p'_0(λ).

If p'_i is non-negligible for positive i, an efficient adversary 𝒜_PR that wins Game 2 with non-negligible probability p'_i(λ) is constructed similarly. In this case 𝒜_PR modifies the challenger of Game 4 not only so that it uses the Λ generated by the challenger of Game 2, but also so that it assigns the random element of the codomain of the hash function generated by the challenger of Game 2 to the i-th dangling node of Z.

Therefore if the root label z of the typed hash tree does not provide omission-tolerant integrity protection for the encoding y of the set x of key-value pairs, then either the hash function of the typed hash tree is not collision-resistant, or it is not preimage-resistant.