# plus/minus epsilon

### White-Box Cryptography: Introduction

1 Feb 2016White-Box Cryptography is the study of securing symmetric encryption algorithms
in the *white-box attack context (WBAC)*, where an adversary obtains an
implementation of the algorithm and is allowed to observe/alter every step of
its execution (with instantiated cryptographic keys).

It is a subfield of the more general *Mobile Cryptography* (coined in the late
90s), which today is more likely to be called *Secure Outsourcing*.

Secure outsourcing handles all cases where you want an adversary to run code for you. There are a lot of natural goals that a cryptographer might like to achieve in this setting. A few examples,

**Obfuscation**I need you to do something, but you can't know what it is you're doing.**Context-Locking**I need you to do something, but only with inputs provided by Alice and such that only Bob knows the output.**Constraint**I want to give you the ability to do something special (sign messages in my name), but only in special cases.**Space-Hardness**,**Sequential Memory-Hardness**I want to give you the ability to do something special, but I want it to be really time-consuming and exhausting because I don't want you to do it that often.

I want to focus on obfuscation. Encrypting a plaintext and obfuscating a program are superficially similar. When encryption was first being explored, it took time to develop a good understanding of what it means for encryption to be secure. Similarly, researchers are now working on constructive definitions of program obfuscation.

### Virtual Black Boxes

An early candidate definition was *virtual black box obfuscation (VBBO)*, which
states that anything which can be learned from the obfuscated program can also
be learned with a polynomial number of queries to an oracle computing the
program.

Digging into this definition, two families of circuits seem important:

**Learnable Circuits**Those which can be efficiently learned by oracle queries.**Normed Circuits**Those which have efficiently computable canonical forms.

Learnable circuits are nice because they can all be virtual black box obfuscated. Define the obfuscation of a learnable circuit $$C$$ as what an efficient learner would output after interacting with an oracle computing $$C$$.

However, normed circuits cause a problem, because there are normed circuits
which are not learnable and this makes them *inherently unobfuscatable* under
this definition.

Take, for example, Ordered Binary Decision
Diagrams (OBDDs). That
is, functions that reference their inputs only once, in a fixed order, and
output 0 or 1. The number of oracle queries it takes to learn an OBDD is
obviously exponential in the number of inputs. But if we try to obfuscate an
OBDD *as another* OBDD, this doesn't work because there are efficient algorithms
to convert it back to a canonical form. The canonical form would let an
adversary distinguish two circuits which are the same everywhere except at a
single input from two circuits which are truly the same at every point (which
strictly access to an oracle wouldn't allow).

Stronger versions of this idea are true for other families of circuits, and this
is why VBBO is believed to be fundamentally flawed. Indeed, maybe VBBO is too
strong and should be weakened, because learning *nothing* from an obfuscation is
a very high bar.

Drawing a parallel to early PRP games: Say that we have a program $$P$$, and we
give an obfuscation $$\mathcal{O}(P - X)$$ to an adversary, which computes $$P$$
on all inputs *not* in $$X$$. Does it matter if our adversary can use the
obfuscation to distinguish $$P$$ from random on inputs in $$X$$, if he can't
compute them? (Meaning that possessing an obfuscation is more powerful than
interacting with an oracle.) It depends on context.

### Obfuscating Ciphers

When obfuscating a cipher, the goal is to hide the key that the cipher was instantiated with. This is usually a precursor to any of the other goals in the intro.

For any cipher $$E$$, there's a family of circuits $${\forall k \in K \colon E_k }$$ which compute encryption for any given key $$k$$. By the definition of a secure cipher, these circuits are not learnable, so we have no trivial way to obfuscate them.

At a high level, obfuscation must consist of mapping each $$E_k$$ to a family $$\mathcal{O}(E_k)$$ which:

- Is exponentially large in a security parameter.
- Consists of circuits of polynomial size.
- Consists of circuits which are functionally similar to $$E_k$$ (similar, meaning that the actual value of $$E_k$$ is easy to recover if it is not given exactly).
- Lacks an efficient way to find $$E_k$$ from $$C \in \mathcal{O}(E_k)$$.

An obfuscation of $$E_k$$ is then just a random element from $$\mathcal{O}(E_k)$$. You can see this idea in action in the construction of many multivariate public-key encryption schemes, but particularly with the Hidden Field Equations cryptosystem. They take a polynomial $$P$$ which is easy-to-invert, and generate random invertible affine transformations $$S, T$$. The hard-to-invert system $$S \circ P \circ T$$ is the public key, and the tuple $$(P, S, T)$$ is the private key. $$S \circ P \circ T$$ is reasonably sized and 'similar' to $$P$$, even though there are an exponential number of choices for $$S$$ and $$T$$. And ideally, it is difficult to recover $$P$$ from this system.

WBC often ends up being very close to multivariate cryptography for this reason, but it is distinguished in that WBC covers the attempted obfuscation of given ciphers like AES and DES as well as the construction of new ciphers that are easily obfuscated.