plus/minus epsilon

White-Box Cryptography: Introduction

1 Feb 2016

White-Box Cryptography is the study of securing symmetric encryption algorithms in the white-box attack context (WBAC), where an adversary obtains an implementation of the algorithm and is allowed to observe/alter every step of its execution (with instantiated cryptographic keys).

It is a subfield of the more general Mobile Cryptography (coined in the late 90s), which today is more likely to be called Secure Outsourcing.

Secure outsourcing handles all cases where you want an adversary to run code for you. There are a lot of natural goals that a cryptographer might like to achieve in this setting. A few examples,

• Obfuscation I need you to do something, but you can't know what it is you're doing.
• Context-Locking I need you to do something, but only with inputs provided by Alice and such that only Bob knows the output.
• Constraint I want to give you the ability to do something special (sign messages in my name), but only in special cases.
• Space-Hardness, Sequential Memory-Hardness I want to give you the ability to do something special, but I want it to be really time-consuming and exhausting because I don't want you to do it that often.

I want to focus on obfuscation. Encrypting a plaintext and obfuscating a program are superficially similar. When encryption was first being explored, it took time to develop a good understanding of what it means for encryption to be secure. Similarly, researchers are now working on constructive definitions of program obfuscation.

Virtual Black Boxes

An early candidate definition was virtual black box obfuscation (VBBO), which states that anything which can be learned from the obfuscated program can also be learned with a polynomial number of queries to an oracle computing the program.

Digging into this definition, two families of circuits seem important:

• Learnable Circuits Those which can be efficiently learned by oracle queries.
• Normed Circuits Those which have efficiently computable canonical forms.

Learnable circuits are nice because they can all be virtual black box obfuscated. Define the obfuscation of a learnable circuit $C$ as what an efficient learner would output after interacting with an oracle computing $C$.

However, normed circuits cause a problem, because there are normed circuits which are not learnable and this makes them inherently unobfuscatable under this definition.

Take, for example, Ordered Binary Decision Diagrams (OBDDs). That is, functions that reference their inputs only once, in a fixed order, and output 0 or 1. The number of oracle queries it takes to learn an OBDD is obviously exponential in the number of inputs. But if we try to obfuscate an OBDD as another OBDD, this doesn't work because there are efficient algorithms to convert it back to a canonical form. The canonical form would let an adversary distinguish two circuits which are the same everywhere except at a single input from two circuits which are truly the same at every point (which strictly access to an oracle wouldn't allow).

Stronger versions of this idea are true for other families of circuits, and this is why VBBO is believed to be fundamentally flawed. Indeed, maybe VBBO is too strong and should be weakened, because learning nothing from an obfuscation is a very high bar.

Drawing a parallel to early PRP games: Say that we have a program $P$, and we give an obfuscation $\mathcal{O}(P - X)$ to an adversary, which computes $P$ on all inputs not in $X$. Does it matter if our adversary can use the obfuscation to distinguish $P$ from random on inputs in $X$, if he can't compute them? (Meaning that possessing an obfuscation is more powerful than interacting with an oracle.) It depends on context.

Obfuscating Ciphers

When obfuscating a cipher, the goal is to hide the key that the cipher was instantiated with. This is usually a precursor to any of the other goals in the intro.

For any cipher $E$, there's a family of circuits ${\forall k \in K \colon E_k }$ which compute encryption for any given key $k$. By the definition of a secure cipher, these circuits are not learnable, so we have no trivial way to obfuscate them.

At a high level, obfuscation must consist of mapping each $E_k$ to a family $\mathcal{O}(E_k)$ which:

1. Is exponentially large in a security parameter.
2. Consists of circuits of polynomial size.
3. Consists of circuits which are functionally similar to $E_k$ (similar, meaning that the actual value of $E_k$ is easy to recover if it is not given exactly).
4. Lacks an efficient way to find $E_k$ from $C \in \mathcal{O}(E_k)$.

An obfuscation of $E_k$ is then just a random element from $\mathcal{O}(E_k)$. You can see this idea in action in the construction of many multivariate public-key encryption schemes, but particularly with the Hidden Field Equations cryptosystem. They take a polynomial $P$ which is easy-to-invert, and generate random invertible affine transformations $S, T$. The hard-to-invert system $S \circ P \circ T$ is the public key, and the tuple $(P, S, T)$ is the private key. $S \circ P \circ T$ is reasonably sized and 'similar' to $P$, even though there are an exponential number of choices for $S$ and $T$. And ideally, it is difficult to recover $P$ from this system.

WBC often ends up being very close to multivariate cryptography for this reason, but it is distinguished in that WBC covers the attempted obfuscation of given ciphers like AES and DES as well as the construction of new ciphers that are easily obfuscated.