* This is the fourth and last part of a series on omission-tolerant integrity
protection and related topics.
See also
part 1,
part 2,
and
part 3.
A technical report on the topic is available
on this site and
in the IACR ePrint Archive.
*

In part
3 we saw how the *system parameterization* concept introduced
by Boneh and Shoup in their
Graduate
Course in Applied Cryptography makes it possible to provide a
formal definition of *collision resistance* for keyless hash
functions. In this last post I will explain how that definition, and
the security and system parameters used in the definition, are used in
the proof of Theorem 3 of
the technical report, which states
that the root label of a typed hash tree can serve as an
omission-tolerant checksum of an encoding of a set of key-value pairs.

### Constructing an encoding with checksum of a set of key value-pairs dependent on security and system parameters

In part 2 I described the structure of a typed hash tree and gave an example of how a typed hash tree can be used to construct a bit-string encoding of a set of key-value pairs in a way that allows the root label of the tree to serve as an omission-tolerant checksum of the set. Now I’ll show how the encoding and its checksum can be made dependent on the security and system parameters. The role of the security and system parameters in the construction is to select the hash function used to compute the labels of the internal nodes of the tree, from the family of hash functions defined by a hashing scheme with system parameterization. The encoding algorithm takes the following three steps.

First, an ordered tree is constructed with a shape freely chosen by the algorithm, the key and value of each key-value pair are encoded as bit strings and assigned as the type and label of a leaf node, and a distinguished type other than any type used to encode a key is assigned to the internal nodes and any remaining leaf nodes. Nodes with the distinguished type are called distinguished nodes, and distinguished leaf nodes are called dangling nodes. This first step does not depend on the security and system parameters. It produces an incomplete typed hash tree without the distinguished node labels.

Second, strings with a bit length equal to the bit length of output the hash function are generated at random with uniform distribution and assigned to the dangling nodes. This step depends on the security parameter, which determines the bit length of the output of the hash function. (Although the security and system parameters together determine the hash function, the security parameter by itself determines the bit length of its output.) It produces an incomplete typed hash tree without internal node labels.

Third, the tree is serialized to produce the bit-string encoding of the set of key-value pairs.

A checksum algorithm computes the omission-tolerant checksum by deserializing the bit-string encoding, computing the labels of the internal nodes, and outputting the label of the root node. The computation may fail if it encounters a prelabel of an internal node (defined as the sequence of the types and labels of its children that is input to the hash function to compute its label) that is too long for the hash function. In that case the checksum algorithm fails and stops without output.

### Attack games as algorithms that take other algorithms as inputs

Security reductions based on attack games are often constructed
according to the following basic pattern. A security property of a
cryptographic scheme, or cryptosystem, is defined by an attack game
played between a challenger and an adversary, and is said to hold if
the probability that an efficient adversary has of winning the game is
a negligible function of the security parameter. (By being efficient we
mean, as usual, running within time polynomial on the security
parameter; Boneh and Shoup use a more complex definition that allows
for non-termination.) To prove that a security property of a
cryptosystem 𝔄 defined by game *A* holds under the assumption that
a security property of a cryptosystem 𝔅 defined by game *B*
holds, one assumes that there exists an efficient adversary 𝒜
able to win game *A* with a probability that is a non-negligible
function of the security parameter, and uses 𝒜 to construct an
efficient adversary ℬ that wins game *B* with a probability
that is likewise a non-negligible function of the security parameter.

For lack of a better term I shall refer to the games *A*
and *B* used in the basic pattern as the antecedent and
consequent games respectively, alluding to the fact that the existence
of 𝒜 implies the existence of ℬ. The contrapositive of
this implication means that the security property of cryptosystem
𝔄 defined by game *A* implies the security property of
cryptosystem 𝔅 defined by game *B*.

The basic attack-game pattern specifies how the challenger and the
adversary behave, but not how they are formalized. In Section 2.4.3
of
their online
textbook Boneh and Shoup formalize the challenger and the
adversary of an attack game as a pair of interactive machines. We use
a simpler formalization inspired by modern programming languages like
JavaScript that are both imperative and functional, and where
functions can be passed as arguments. As described in more detail in
Section 2.2 of the technical
report, the challenger is an algorithm, and the adversary is
another algorithm that is provided to the challenger as an input, just
as a function may be passed as an argument to another function in
JavaScript. In a security reduction that follows the basic pattern,
the efficient adversary of game *B* is constructed as an
algorithm that runs the challenger algorithm of game *A*,
providing it as input the efficient adversary assumed to win
game *A* with a probability that is a non-negligible function of
the security parameter.

In the attack games of the technical report the adversary does not make any queries. In games involving queries, the oracle answering the queries would be modeled as an algorithm that the challenger would provide as an input to the adversary.

The proof of Theorem 3 extends the basic attack-game pattern in two ways: there are two consequent games instead of one, and the adversary of each consequent game modifies the challenger and the adversary of the antecedent game.### Games used in the proof of Theorem 3

If the typed hash tree used in the encoding has dangling nodes, then the omission-tolerant integrity protection provided by the root label depends on both the collision resistance and the preimage resistance of the hash function. Therefore the proof of Theorem 3 uses two consequent games: the attack game against collision resistance discussed in part 3 (Game 1 in our technical report, or Figure 8.3 in Boneh and Shoup’s online book), and an attack game against preimage resistance (Game 2 in the report, not provided in Boneh and Shoup’s book). A hashing scheme is collision resistant according to Definition 18 (resp. preimage resistant according to Definition 19) if every efficient adversary has a probability of winning Game 1 (resp. Game 2) that is a negligible function of the security parameter.

The antecedent game used in the proof of Theorem 3 is Game 4 of the technical report. I won’t try to paraphrase it here, but I’ll point out the differences with Game 3, using the notations and variable names of the technical report. Game 3 is the game used in the definition of integrity protection (Definition 22), while Game 4 is used in the definition of omission-tolerant integrity protection (Definition 24).

One difference is unsurprising: in both games, the adversary has to
produce an encoding *y’* of an element *x’* of *S* that
has the same checksum *z* as *y*; but in Game 3 *x’*
must be different from *x* while in Game 4, where the elements of
the set *S* are themselves sets, *x’* must not be a subset
of *x*.

The other difference may seem puzzling. In Game 4, the encoding
algorithm *E* is split into two phases
called *E*_{I} and *E*_{II}. The first
phase *E*_{I} takes as input the element *x*
of *S* to be encoded and is meant to produce a fixed
representation *X* of *x* that does depend on the security
and system parameters λ and Λ. The second
phase *E*_{II} takes λ, Λ and *X* as
inputs to produce the bit string-encoding *y* of *x*. When
Game 4 is used in the context of Theorem 3, *E*_{I}
performs the first of the three steps of the encoding algorithm for
sets of key-value pairs and outputs a typed hash tree *X* without
distinguished node labels, while *E*_{II} performs step
2, which assigns random labels to the dangling nodes of *X*, and
step 3, which serializes the tree. *E*_{II} is invoked
by the challenger after the challenger has run the system
parameterization algorithm *P* on input λ to obtain
Λ, while the output *X* of *E*_{I} is a
precomputed game parameter.

The reason for splitting *E* into two phases is to allow Game 4
to be used in the proof of Theorem 3, where the encoding algorithm is
a *hybrid non-deterministic algorithm* that makes both
probabilistic and non-probabilistic choices. As discussed in Section
2.1 of the technical report, the
output of a hybrid algorithm is a well-defined random variable only if
the non-probabilistic choices are made before the probabilistic ones.
Phase I of *E* is meant to make all the non-probabilistic
choices, so that when the adversary of Game 1 or Game 2 runs the
challenger of Game 4 in the security reduction of the proof of Theorem
3, the non-probabilistic choices are made before the challenger runs
the system parameterization algorithm *P*, which makes
probabilistic choices to compute the system parameter Λ.

### Summary of the proof of Theorem 3

Theorem 3 proves that the root label of a typed hash tree provides omission-tolerant integrity protection, as defined by Game 4, for an encoding of a set of key-value pairs obtained as a serialization of a typed hash tree.

The proof begins by assuming that there exists an efficient adversary
𝒜 whose probability of winning Game 4 is a non-negligible
function *p* of the security parameter λ. In an execution
of Game 4 where 𝒜 wins, the challenger
runs *E*_{II} on a fixed representation *X* of an
element *x* of *S* and produces an encoding *y*
of *x* with checksum *z*, which it provides to 𝒜 as
an input. Then 𝒜 outputs an encoding *y’* of an
element *x’* of *S* that has the same checksum *z* even
though *x’* is not a subset of *x*.

In the context of Theorem 3, *z* is the root label of a typed
hash tree *Z* obtained by deserializing *y*, as well as the
root label of a typed hash tree *Z’* obtained by running the
deserialization algorithm on *y’*. Since *Z* and *Z’*
have the same root label, by Theorem 1, either (i) *Z’* has been
obtained from *Z* by pruning subtrees, or (ii) the label of an
internal node of *Z’* is equal to the label of a dangling node
of *Z*, or (iii) an internal node of *Z’* has the same label
as an internal node of *Z* but a different prelabel.

Case (i) is ruled out by the fact that *x’* is a subset
of *x*. In case (ii), the prelabel of the internal node is a
preimage of a random element of the codomain of the hash function,
because *Z* is obtained by assigning random labels to the
dangling nodes of *X* in step 2 of the encoding algorithm for
sets of key-value pairs. In case (iii), the prelabels of the internal
nodes that have the same label constitute a collision of the hash
function.

The probability *p*(λ) that 𝒜 wins Game 4 is no
greater than the sum of the
probability *p’*_{0}(λ) that it does so and
furthermore case (iii) holds, and the
probabilities *p’*_{i}(λ) that it does so and
furthermore case (ii) holds with the label of the *i*-th dangling
node of *Z* being equal to the label of an internal node
of *Z’*. Therefore if *p* is a non-negligible function,
then either *p’*_{0} or one of
the *p’*_{i}(λ) must be a non-negligible function.

If *p’*_{0} is non-negligible, an efficient adversary
𝒜_{CR} of Game 1 can be constructed as follows.
𝒜_{CR} runs Game 4 passing 𝒜 as an input to
the challenger; but it modifies the challenger of Game 4 so that it
uses the system parameter Λ generated by the challenger of Game
1 instead of generating a fresh one. It also modifies 𝒜 so
that it outputs *Z* and *Z’* if it wins the game. When that
is the case, 𝒜_{CR} looks for a node of *Z* and a
node of *Z’* that have the same label, and outputs their
prelabels if it finds them. 𝒜_{CR} wins Game 1 with
non-negligible probability *p’*_0(λ).

If *p’*_{i} is non-negligible for positive *i*, an
efficient adversary 𝒜_{PR} that wins Game 2 with
non-negligible probability *p’*_{i}(λ) is
constructed similarly. In this case 𝒜_{PR} modifies
the challenger of Game 4 not only so that it uses the Λ
generated by the challenger of Game 2, but also so that it assigns the
random element of the codomain of the hash function generated by the
challenger of Game 2 to the *i*-th dangling node of *Z*.

Therefore if the root label *z* of the typed hash tree does not
provide omission-tolerant integrity protection for the
encoding *y* of the set *x* of key-value pairs, then either the
hash function of the typed hash tree is not collision-resistant, or it
is not preimage-resistant.