# plus/minus epsilon

### Notes on the BN256 Pairing

17 Mar 2018

I've realized that some design choices in my bn256 implementation don't seem well-motivated to everybody... including myself. So I'd like to document here all of the tricks I find myself forgetting.

### Subgroup Security

<b>Background.</b> Imagine that I have a secret scalar $a$ and I'll multiply my $a$ by any point $P$ that you give me, and return the output $[a]P$ to you as long as $P$ passes a simple publicly-known test. Your goal is to learn $a$.

If I don't check anything about your point (and we're operating on a Weierstrass curve $y^2=x^3+ax+b$), your best bet is an Invalid Curve Attack. See, elliptic curve addition and doubling formulas only incorporate the $a$ coefficient of a Weierstrass curve. So if you give me any random $P = (x, y)$ and I start doing arithmetic on it, I'm operating on an implicit curve $y^2=x^3+ax+c$ where $c \neq b$ is chosen to satisfy the equation, and this curve will likely have a drastically different structure than the one I intended to work on. In particular, the implicit curve may have subgroups which are small enough for the discrete logarithm problem (DLP) to be solvable by brute force, and you may choose $P$ to lie in one of these subgroups. When you get back $[a]P$, you can solve the DLP and learn $a\ (mod\ n)$ where $n$ is the order of the small subgroup. Given $a\ (mod\ n)$ for a handful of different relatively-prime $n$, it's easy to reconstruct the full 256-bit value of $a$ with the Chinese-Remainder Theorem.

This makes it necessary for me to check that your point is on the curve, which can easily be done by checking that $(x, y)$ satisfies the curve formula. Is this sufficient? Sometimes yes, but not always: it's possible that the correct curve also has subgroups where DLP is easy, outside of the subgroup we're using for cryptography where DLP is hard. In this case, you would mount an Invalid Subgroup Attack, which is exactly the same as an Invalid Curve Attack but restricted to just using different subgroups of the same curve.

Should a sufficient number of small subgroups exist on the curve to make an Invalid Subgroup Attack feasible, then I'd need to check both that your point is on the curve as well as in the correct subgroup. This is quite expensive because I'd need to multiply $P$ by the 256-bit order of the correct subgroup and check that the result is the point at infinity, which is at least as expensive as multiplying $P$ by $a$ in the first place. (Side note: Subgroup security for Montgomery curves is different than discussed, and involves Twist Attacks; there are no Twist Attacks on a Weierstrass curve.)

<b>Discussion.</b> I've given all this background to say that my bn256 implementation implements solely IsOnCurve checks on both G1 and G2. This is sufficient for G1 because it is a prime curve, meaning it has only one prime-order subgroup, and that's the subgroup we use for cryptography. G2, on the other hand, is quite a large curve and has many subgroups which are not used for cryptography. However we can compute G2's order and see that these subgroups aren't sufficient to launch and Invalid Subgroup Attack.

The full order of $E^\prime(\mathbb{F}_{p^2})$ is given by formula, thanks to the way BN curves are constructed: $\#E^\prime(\mathbb{F}_{p^2}) = h_2(u) \cdot q$ where $u$ is the BN-parameter, $q$ is the order of the subgroup we're using for cryptography, and

$h_2(u) = 36u^4 + 36u^3 + 30u^2 + 6u + 1.$

Finally, $\#E^\prime(\mathbb{F}_{p^2}) =$

13 *
7369 *
678523854563781785784486348657681406965565833599613983254937284932557401 *
65000549695646603732796438742359905742570406053903786389881062969044166799969


The final, largest factor is $q$. I confess that it is clearly ideal to not accept adversary-provided points for G2 and rely on the stronger structure of G1. But if it is unavoidable, in order to recover a secret scalar an adversary would likely have to solve DLP in the subgroup corresponding to the third prime factor. This would take around $2^{100}$ steps.

[1]

### Lattice Decomposition

Lattice Decomposition is a generic technique for speeding up scalar multiplication on certain kinds of elliptic curves. I open-sourced an implementation of this technique as an un-merged feature branch.

<b>Background.</b> Take the curve $y^2 = x^3 + 3\ (mod\ 7)$ (playground). The first interesting aspect of this curve it that it has 13 points, so it's a prime-order curve. You can choose any point on this curve and say that it maps to 1 (besides the point at infinity, which maps to 0), and from there build up a map from any other point on the curve to an integer 2 through 12 which respects both addition on the curve and integer addition mod 13. This map is a group isomorphism.

The second interesting aspect is that both the 'base field' (integers mod 7) and 'curve field' (integers mod 13) have cubic roots of unity, or solutions to the equation $n^3 = 1$. For the base field, this number is 2 since $2^3 = 8 = 1 (mod\ 7)$, and for the curve field this number is 3 since $3^3 = 27 = 1\ (mod 13)$. Cubic roots of unity are nice because we can multiply the x-coordinate of a point and get another point on the curve since $y^2 = (xn)^3 + 3 = x^3 + 3$, and they ultimately end up playing well with the elliptic curve's addition law. That is, the function $f(x, y) = (2x, y)$ on our curve is equivalent to multiplying the point $(x, y)$ by 3. You can see this in the playground, where for example, $[3](1, 2) = (2, 2)$, then $[3](2, 2) = (4, 2)$, and finally $[3](4, 2) = (1, 2)$.

From the point of view of optimization, what we've found is a way to multiply a curve point by a fixed scalar with single base field operation. For curves of a cryptographically-useful size, a single base field multiplication is going to be more than 1000x faster than scalar multiplication on the curve. Now we just need to find a way to harness this speed-up to multiply by any scalar, not just our cubic root of unity.

<b>Lattices.</b> Our strategy for integrating this information into our scalar multiplication routine is to take the user-provided point $P$, use this trick to quickly compute $[S]P$ ($S = 3$ in the example above), multiply $P$ and $[S]P$ by different (hopefully small) scalars $x$ and $y$, and output $[x]P + [y][S]P = [x+Sy]P = [k]P$ where $k$ is the user-provided scalar. There's something called Shamir's Trick that makes computing $[x]P+[y]Q$, given $P$ and $Q$, about as fast as computing $[x]P$ or $[y]Q$ individually; it goes like this:

// combinedMult returns [x]P+[y]Q
func combinedMult(P, Q *Point, x, y *big.Int) *Point {
acc := new(Point).SetZero()

for (i := max(x.BitLen(), y.BitLen())-1; i >= 0; i-- {
acc.Double()

if (x.Bit(i) == 1 && y.Bit(i) == 1 {
} else if (x.Bit(i) == 1) {
} else if (y.Bit(i) == 1) {
}
}

return acc
}

So by minimizing the size of both $x$ and $y$, we minimize the time spent in the algorithm. Then what is the smallest $x, y$ pair such that $f(x, y) = x+Sy\ (mod\ q) = k$ where $q$ is the order of the curve? ($q = 13$ in the example above) There's a sharp edge here to be mindful of: the inputs to the function $x$ and $y$ are arbitrary integers, but $f$'s output is computed modulo $q$.

The first thing we observe is that $f$ is additive: if we have $f(x_1, y_1) = k_1$ and $f(x_2, y_2) = k_2$, then we know that $f(x_1+x_2, y_1+y_2) = k_1+k_2\ (mod\ q)$. But honestly, it's nicest when $k_1$ and $k_2$ are zero because then we can add multiples $\langle x_1, y_1 \rangle$ and $\langle x_2, y_2 \rangle$ to each other and get as many inputs to $f$ that output zero as we want. Wait, hold up?

1. Saying that any integer combination of solutions to $f(x, y) = 0$ gives another solution, is the same as saying that these solutions have closure under addition.
2. The set of solutions includes the zero vector since $f(0, 0) = 0$.
3. If $f(x, y) = 0$ is a solution, then so is the inverse solution $\langle -x, -y \rangle$. We call it an inverse because $\langle x, y \rangle + \langle -x, -y \rangle = \langle 0, 0 \rangle$, the zero vector.

We've ticked all the boxes that make this a group! Though we call it a lattice, specifically, since the solutions are integer vectors combined under addition. What this says about the output of $f$ as a whole, is that its output sort of tessellates according to the lattice of inputs that take it to zero, and that the smallest $x, y$ such that $f(x, y) = k$ for any particular $k$ is going to be the $\langle x, y \rangle$ that does so in the lattice's fundamental region. To find this $\langle x, y \rangle$, we only need to observe that $f(k, 0) = k$ and if we find the nearest vector of the lattice $\langle x^\prime, y^\prime \rangle$ to $\langle k, 0 \rangle$, then $f(k-x^\prime, -y^\prime) = k$ and $\langle x, y \rangle = \langle k-x^\prime, -y^\prime \rangle$ will be in the fundamental region. This is not as easy as it sounds though, so let's continue to unpack it.

<b>Technicalities.</b> The standard algorithm for finding the closest vector in a lattice to some other arbitrary vector is called Babai's Rounding Technique:

1. Find a short basis for the lattice. While there's an infinite number of bases for any particular lattice, a short basis is a set of linearly independent, roughly orthogonal vectors of minimal bit length. This basis generates the lattice in a relatively intuitive way, such that a linear combination of the basis vectors with large coefficients doesn't incidentally result in a very small vector.
2. Use linear algebra to find a linear combination of the basis vectors with rational coefficients that exactly equals the target vector, and then round the rational coefficients to integers. Perform the linear combination of the basis vectors with the rounded integer coefficients to get an element of the lattice that is close to (but not typically equal to) the target vector, and output this. Specifically, we embed the basis vectors as column vectors in a matrix $\boldsymbol{B}$ and solve the equation $\langle k, 0\rangle^T = \boldsymbol{B} \cdot \langle x, y\rangle^T$ for the rational vector $\langle x, y \rangle^T = \boldsymbol{B}^{-1} \cdot \langle k, 0 \rangle^T$. Note that we only need to store the first column vector of $\boldsymbol{B}^{-1}$, and in our implementation we make this column vector (which would normally be rational) an integer vector by storing a common denominator separately. The common denominator is called det because it ends up being the determinant of $\boldsymbol{B}$, due to how matrix inversion works.
3. (Optional) The last remaining issue is that Babai's Rounding Technique will sometimes return vectors with negative components, which can break lazily-written scalar multiplication algorithms. To prevent this from happening, we add the additional constraint to our short basis that at least one vector has all-positive components. We add a small multiple of this basis vector to every output, enough to ensure that every component of the output vector is positive as well.

A short basis is easily found by giving the LLL algorithm some example solutions to $f(x, y) = 0$, and tinkering with the output until a basis that satisfies the additional constraint in step 3 is found. The overall approach generalizes to groups with more than one efficiently-computable endomorphism; in the discussion so far we've only used the endormorphism provided by a cubic root of unity. With GT, for example, we have multiple Frobenius endomorphisms to choose from.

<b>Results / Caveats.</b> The lattice decomposition technique described so far makes scalar multiplication in G1 and G2 twice as fast, and four times faster in GT. One major caveat to keep in mind though, is that the algorithm relies on input points having a prescribed order $q$ to work correctly. If it is applied to attacker-provided points that are on an invalid curve or invalid subgroup, it will produce unexpected results.

[2]

### References

1. Subgroup security in pairing-based cryptography by Barreto, Costello, et al. [eprint]
2. Faster Point Multiplication on Elliptic Curves with Efficient Endomorphisms by Gallant, Lambert, and Vanstone. [iacr]
Cryptography