libsecp256k1 tutorial

1. Abstract algebra

This is a summary on groups and fields as abstract mathematical objects, and the properties that will be relevant to us. It's something between a cheat sheet and a full course, without proofs or rigorous theorems, but there are links for further reading.

1.1. Introduction

The idea behind abstract algebra, and its use to us, is that instead of working with concrete objects, like integers, fractions, elliptic curve points, and operations like addition and multiplication between them, we introduce a layer of abstraction first. As a programming analogy, one can think of multiple implementations of a common interface. For example, a map/dictionary data structure might be implemented as a red-black tree, or a hashtable, and while the internal details are completely different, they exhibit a potentially identical interface. Similarly, we can see algebraic structures like groups and fields as interfaces, consisting of collections of operations and associated required properties that can apply to several different concrete structures.

And in many cases, it suffices to describe mathematical objects in terms of just a few properties that define them - and when seen through that lens, their distinctions and complexities disappear. Like in software development, hiding implementation details of logic behind an abstraction layer helps by allowing programmers to focus on higher-level properties, and the same holds here. There are many different concrete structures, but often all that matters for higher-level protocols and reasoning is a few properties that they exhibit.

Running example. Consider the following 4 examples:

  • Example 1: the set $\{0, 1, 2, 3\}$ with the "addition modulo 4" operation
  • Example 2: the set $\{0, 1, 2, 3\}$ with the "XOR" operation
  • Example 3: the set $\{1, 3, 7, 9\}$ with the "multiplication modulo 10" operation
  • Example 4: the set of rotations by multiples of 90° with the "combine rotations" operation.

We can show the full effect of the relevant operators using so-called Cayley tables:

Example 1Example 2Example 3Example 4
0123
00123
11230
22301
33012
0123
00123
11032
22301
33210
1379
11379
33917
77193
99731
·
··
·
·
·

Which of these tables look "similar"?

  • Example 1 and Example 4 clearly behave similarly despite using different notations, as the tables line up. Each number $n$ in Example 1 becomes a $n \times 90°$ left rotation.
  • Example 1 and Example 3 actually also behave similarly, but it requires reordering the elements in the table (swapping rows 2 & 3, and columns 2 & 3): $\{0 \rightarrow 1, 1 \rightarrow 3, 2 \rightarrow 9, 3 \rightarrow 7\}$.
  • However, Example 1 and Example 2 are fundamentally different. Note that the diagonal in the table contains all the same elements, which is not true for any of the other examples. No mapping between the elements of Example 1 (or 3 or 4) can achieve this property.

So we conclude that even when the number of elements operated on is the same, many different (and even very differently-looking) operations can fundamentally behave the same, but similarly-looking operations can also behave fundamentally differently (e.g. Example 1 and Example 2).

The next sections go a bit deeper into:

1.2. Cyclic groups

Background: https://en.wikipedia.org/wiki/Group_(mathematics) and https://en.wikipedia.org/wiki/Cyclic_group. This is a summary of what we care about. Where theorems are listed, I don't expect you to be able to prove them, just understand what is claimed, and find it sounds plausible. Working out a few examples yourself may help.

A group $G$ is an algebraic structure consisting of a set $X$ and a single binary operation called the group operation (something that takes two elements of $X$, and returns an element of $X$), which together satisfy a number of properties listed below. The group operation is typically denoted $+$ or $\times$, but could be anything. The notation for a group is $G = (X,+)$ or $G = (X,\times)$.

There exist many other group-like structures with one binary operation, but lacking some of these properties, including magmas, semigroups, and monoids. They do not matter for us, but you may encounter these names.

Most groups we will encounter will be written additively (group operation is $+$, identity is $0$, inverse is $-x$) or written multiplicatively (group operation is $\times$, identity is $1$, inverse is ${}^{-1}$). However, groups exist where neither of these notations are used, and for example the group operation is written $\circ$ or $\oplus$, or the identity element might be written $e$.

The number of elements in a group is called its order, which may be a positive integer, or infinite. Groups exist whose order is any strictly positive integer (the order cannot be 0, because this implies there is no identity).

An abelian group is a group which has one additional property: commutativity. Non-abelian groups exist, and they tend to be more complicated to study. We do not need them, so little will be said about them. Examples of non-abelian groups include multiplication of invertible square matrices, or the set of transformations of a Rubik's cube.

A group isomorphism is a bijective function that preserves the group structure. That is, it maps the identity element in one group to the identity in the other group, and it preserves the group operation: $f(a+b) = f(a)+f(b)$ if the operation is denoted $+$ in both.

Running example. The mapping $\{0 \rightarrow 1, 1 \rightarrow 3, 2 \rightarrow 9, 3 \rightarrow 7\}$ (equivalent to $f(x) = 3^x \pmod{10}$) between Example 1 and Example 3 above is a group isomorphism.

Another example. The $f(x) = \exp(x)$ function from $(\mathbb{R},+)$ to $(\mathbb{R}_{>0},\times)$, because $f(a+b) = \exp(a+b) = \exp(a) \times \exp(b) = f(a) \times f(b)$. The identity element is $0$ in the first group and $1$ in the second group.

If an isomorphism exists between two groups, then those groups are called isomorphic (or "equal up to isomorphism"), and essentially "behave" the same, except the elements and operation are written differently. The notation for isomorphic groups is $G_1 \cong G_2$.

A group is a subgroup of another group if its set is a subset, and the operation is the same but restricted to the subset, and the result is still a group. Lagrange's theorem states that the order of a subgroup always divides the order of the group it is a subgroup of. Given a group $G$, and an element $g$ in it, the group $\langle g \rangle$ consists of the set $\{0, g, g+g, g+g+g, \ldots\}$, with the same operator $+$ (or $\{1, g, g^2, g^3, g^4, \ldots\}$ if written multiplicatively). This is called the subgroup generated by $g$, as it will always be a group. If a group has a single element that generates the entire group, i.e., there exists at least one element $g$ in the group such that $\langle g \rangle = G$, the group is called cyclic, and $g$ is called a generator of the group. A cyclic group may have multiple generators that each individually generate the entire group. The order of a generated subgroup for a given element $g$ is also called the order of the element $g$. As a corollary of Lagrange's theorem, the order of any element divides the order of the group.

Running example. In Example 1 above (integer addition modulo 4, also denoted $(\mathbb{Z}/4\mathbb{Z}, +)$ ), the elements $\{1, 3\}$ are generators as for example $3$ generates in order $[0, 3, 2, 1, 0, 3, \ldots]$, which contains the whole set, thus Example 1 forms a cyclic group, as do Examples 3 and 4 that are isomorphic to it. However, $2$ is not a generator because it only generates $[0, 2, 0, 2, \ldots]$, which does form a subgroup with set $\{0, 2\}$. Example 2 above has no generators that generate the entire group: $1$ generates $[0, 1, 0, \ldots]$, $2$ generates $[0, 2, 0, \ldots]$, and $3$ generates $[0, 3, 0, \ldots]$. Thus, Example 2 is not a cyclic group.

In what follows, it will be useful to have a notation for repeated application of the group operation. In additively-denoted groups, we will use $n \cdot x$ to mean $x + x + x + \ldots + x$, with $n$ repetitions of $x$. It is also common to just write $nx$ with this meaning, but to avoid ambiguity with the multiplicative group operation, we will use $\cdot$ exclusively in what follows. In multiplicatively-denoted groups, $x^n$ is used to refer to $n$-fold repetition of the group operation. Note that in these notations, $n$ is a non-negative integer, not a group element.

In groups we can define the discrete logarithm $\log_b(a)$ or $\textrm{DL}_b(a)$ for element $a$ and $b$ in the group as the smallest non-negative integer $x$ for which $x$ repetitions of the group operation on $b$ yields $a$. In other words, $a = x \cdot b$ (additively) or $a = b^x$ (multiplicatively). The discrete logarithm only exists when $a$ is an element of the group generated by $b$. In cyclic groups, the discrete logarithm is well-defined for any generator as $b$, regardless of $a$.

Cyclic groups have a number of interesting properties:

Groups can be composed and decomposed in various ways, but here we just care about one: the direct product. Given two groups $G_1 = (X_1, +)$ and $G_2 = (X_2, +)$, their direct product $G_1 \times G_2$ is the group whose set is the cartesian product of $X_1$ and $X_2$ (all tuples whose first element is in $X_1$ and whose second element is in $X_2$), and whose operation is defined as $(a,b) + (c,d) = (a+c, b+d)$.

The fundamental theorem of finite abelian groups states that every finite abelian group is isomorphic to the direct product of one or more cyclic groups of prime-power order. This fundamental theorem means that generally when working in a group whose order can be factored into multiple prime powers, it is generally possible to switch to a subgroup of each size, solve the problem in those, and then combine the results.

Example. For example, $C_{12} \cong C_4 \times C_3$. Using integer addition modulo 12, modulo 4, and modulo 3 as instance, we could have the function $f((a,b)) = 3a+8b \pmod{12}$ (integer addition). It holds that $2 + 3 = 1$ (modulo $4$), and it holds that $2 + 2 = 1$ (modulo $3$), so $(2,2) + (3,2) = (1,1)$ (in the direct product). Thus, $f((2,2)) + f((3,2)) = f((1,1))$ (modulo $12$), which holds indeed because $10 + 1 = 11$ (modulo $12$). However, $C_{8} \not\cong C_{4} \times C_{2} \not\cong C_2 \times C_2 \times C_2$, because $4$ and $8$ are already prime powers.

Running example. Example 1 is $C_4$ and Examples 3 and 4 are isomorphic to it. However, Example 2 is isomorphic to $C_2 \times C_2$, which is different.

Prime-ordered groups are groups whose order is a prime number. They are always cyclic (and abelian), and thus isomorphic to $C_p$. Furthermore, each of its elements besides the identity is a generator, and thus their discrete logarithm can use any non-identity element as a base. Because subgroups also have an order that divides the original group order, prime-ordered groups have no non-trivial subgroups (only the singleton group and themselves). This makes them particularly useful in cryptography, because problems in them cannot trivially be split into smaller groups to be combined, as otherwise permitted by the fundamental theorem.

1.3. Finite fields

Background: https://en.wikipedia.org/wiki/Finite_field, though anything involving non-prime fields can be skipped.

Fields correspond to what we can roughly be described as being "number-like" things: mathematical structures in which most of our traditional techniques for solving equations work: you can negate terms and move them to the other side of an equation, you can divide both sides by the same non-zero value, etc. The most familiar fields are things like the rational numbers $\mathbb{Q}$, the real numbers $\mathbb{R}$, the complex numbers $\mathbb{C}$, but also (and this will be what we care about) arithmetic modulo prime numbers, and many others (see below).

Formally, a field is an algebraic structure consisting of a set $X$ together with two binary operators, typically denoted $+$ and $\times$ (unlike a group which has only one operation). It requires the following properties of its operations:

The field order is the number of elements in a field, which is either an integer $q$, or infinity. The field characteristic is the smallest non-zero number of times $1$ has to be added to itself to obtain $0$, if that is possible. Otherwise the characteristic is $0$.

A finite field or Galois field is a field of finite order. The order of a finite field is always a prime power $q = p^m$, for some prime number $p$, and strictly positive integer $m$. It is not possible to construct a structure of another finite size that has all field properties. All finite fields of the same order are isomorphic with each other (w.r.t. both the $+$ and $\times$ operations). Thus, we can talk about the field of order $q = p^m$, called $\textrm{GF}(q)$. The multiplicative group of a field $(X\setminus\{0\},\times)$ is always a cyclic group, $C_{q-1}$. In other words, in every finite field every non-zero element can be written as the power of some fixed constant element of the field. Such a generator is called a primitive element of the field. The characteristic of $\textrm{GF}(p^m) = p$.

A prime-ordered field is a finite field with $m=1$, or $q=p$. The canonical way of constructing a field of prime order $p$ is $\mathrm{GF}(p) \cong (\mathbb{Z}/p\mathbb{Z},+,\times)$, i.e., the integers modulo $p$ (with modular addition and modular multiplication as operations). As all equal-order finite fields are isomorphic, every field of order $p$ is isomorphic to this. Note that while finite fields of non-prime order exist ($m>1$), these finite fields are not constructed as integer addition/multiplication modulo an integer, but using extension fields, which are out of scope here.

Example. For example, a field of order $9$ exists, but $\mathbb{Z}/9\mathbb{Z}$ (integer addition/multiplication modulo 9) has no modular inverse for $3$.

Weaker structures than fields with two binary operations exist too, like rings or domains, which are similar but lack some of the properties of fields. They do not matter for us, but you may encounter their names.

Other number-like operations can be extended to general fields:

1.4. Isomorphisms

Background: https://en.wikipedia.org/wiki/Homomorphism (Definition, Examples, Special Homomorphisms). The following is a summary of what we care about:

Given two structures (a set with one or more associated operations, like groups or fields), a function $f$ is an isomorphism from one to the other if:

Running example. Recall, Example 1 is addition modulo 4 on $\{0, 1, 2, 3\}$, Example 2 is the XOR operation on that same set, Example 3 is the multiplication modulo 10 on $\{1, 3, 7, 9\}$, and Example 4 is the rotations of multiples of 90°.

There exist isomorphisms between Example 1, Example 3, and Example 4. From 1 to 3, it is $f(x) = 3^x \pmod{10}$. From 1 to 4, it is $f(x) =$ left rotation by $90x$ °. There exist no isomorphisms between Example 2 and the others.

There are some related terms:

Running example.

  • An example of a homomorphism is the mapping from Example 1 to Example 3, $f(x) = 9^x \pmod{10}$. It never reaches the elements $3$ or $7$ in Example 3, yet the structure is still preserved.
  • An example of an endomorphism is $f(x) = 2x \pmod{4}$ (or $f(x) = x$ or $f(x) = 3x$) within Example 1, or equivalently, $f(x) = x^2 \pmod{10}$ within Example 3, or "do the rotation twice" in Example 4.
  • An example of an automorphism is any permutation of the non-zero elements within Example 2, for example mapping $2$ to $3$ and vice versa.

1.5. Scalars

Say we have an additively-written cyclic group $G = (X, +)$ of order $n$, with a generator $g$. It doesn't matter what generator is used, but one is fixed. Let $f(a) = a \cdot g$, the function that multiplies the argument (the scalar, $a$) with $g$.

Now observe that:

In other words, $f$ is an isomorphism from $C_n$ (integer addition modulo $n$) to $G$, and $f^{-1}$ (the inverse function from the group back to $C_n$) is actually the discrete logarithm $\mathrm{DL}_g$.

Of course, we can also define the multiplication as multiplication modulo $n$ in this set, and obtain $(\mathbb{Z}/n\mathbb{Z}, +, \times)$ for the scalars. If $n$ is prime, this is the field $\mathrm{GF}(n)$. Its additive group is isomorphic with $G$, but multiplication is meaningful too, as $a \cdot f(b) = f(ab) = b \cdot f(a) = ab \cdot f(1)$.

This lets us answer questions like how to perform "group halving" (i.e., the inverse operation of adding an element of $G$ to itself). For a given $x$, we want to find $y$ such that $y + y = x$. If $x = f(d)$, then $2 \cdot y = f(d)$, or $y = f(d/2)$, where the $/2$ is evaluated in $\mathrm{GF}(n)$, i.e., it is the multiplicative inverse of $2$ modulo $n$, and $y = f((2^{-1} \pmod{n}) \times d)$, or $y = (2^{-1} \pmod{n}) \cdot f(d)$, and $y = (2^{-1} \pmod{n}) \cdot x$.

This can be generalized: we can perform a scalar division in a cyclic group by multiplying with the modular inverse modulo the group order. If the group order is prime, this inverse exists for every divisor that is not a multiple of the group order.

If the inverse does not exist, it may still be possible to divide, but it is less trivial. For example, group halving in an even-ordered group is possible if the element has an even discrete logarithm (w.r.t. any generator), by halving the discrete logarithm, and optionally adding half the group order. For example, in $(\mathbb{Z}/34\mathbb{Z}, +)$, half of $18$ can be $9$ or $26$. In fact, doing this for a multiplicative group lets us compute a square root in $\mathrm{GF}(p)$. In general, one can use the Tonelli-Shanks algorithm to do this (which is, in essence, a general group-halving algorithm), though for the specific case of square roots in $\mathrm{GF}(p)$ where $p \equiv 3 \pmod{4}$, it simplifies to $\sqrt{x} = x^{\frac{p+1}{4}}$.

Notation-wise, in prime-ordered groups, we will just treat scalars (the numeric argument in scalar multiplication), as well as discrete logarithms, as inhabiting $\mathrm{GF}(n)$, so we can write $\frac{2}{3} \cdot x$ to mean $(2 \times 3^{-1} \pmod{n}) \cdot x$.

Extra theory. It was possible to formalize these properties without needing to resort to fixing a specific generator $g$.

Consider the set $M$ of all functions of the form $m_a(x) = a \cdot x$, i.e., the "scalar multiple with a constant integer $a$" functions, for any integer $a$, and group element $x \in X$. So for example $m_0$ is the function that multiplies with $0$, i.e., the constant function $m_0(P) = 0$. The function $m_1(x) = x$ is the identity function. The function $m_2(x) = 2 \cdot x = x + x$ is the group doubling function. Define two operations on this set:

  • $(m_a + m_b)(x) = m_a(x) + m_b(x) = m_{a+b}(x)$, i.e., we can write $m_2 + m_3$ to mean the $m_5$ function. This operation has $m_0$ as identity, and $m_{-a \pmod n}$ as inverse of $m_a$.
  • $(m_a \times m_b)(x) = m_a(m_b(x)) = m_{ab}(x)$, i.e., we can write $m_2 \times m_3$ to mean the $m_6$ function. This operation has $m_1$ as identity, and $m_{a^{-1} \pmod n}$ as inverse of $m_a$.

This structure $(M, +, \times)$ is isomorphic to $\mathrm{GF}(n)$ when $n$ is prime. If it isn't, it is a ring (which is a 2-operation structure like a field, but without the requirement that every non-zero element has a multiplicative inverse). It is called the endomorphism ring, because the functions $m_a \in M$ are the set of all endomorphisms of our original group $G$. Every abelian group has an endomorphism ring. If the group is cyclic, the endomorphism ring is isomorphic to the integers modulo the order. If the group is prime-ordered, the endomorphism ring is a field.

Scalar division in the group by $a$ can now be seen as the function $(m_a)^{-1}$, the multiplicative inverse in $M$ of multiplication by $a$. If we are working in a field, this always exists.

In group-based cryptography (which includes elliptic curve cryptography, Diffie-Hellman, as well as DSA, and the original non-ECC Schnorr signature scheme), the isomorphism $f$ above is exactly the mapping between private keys (scalars) and public keys (group elements).

Note that scalar division is very different from discrete logarithms. Scalar division is dividing a group element by an integer, and is cheap to do in any prime-ordered group as demonstrated above. Computing a discrete logarithm is dividing a group element by another group element, and this operation is (as far as we know) very hard in large groups used for cryptographic purposes.

Following the abstract algebra overview, it is time to dive into the actual cyclic group we care about, the secp256k1 curve.

Here you find a description of its definition, properties, links to various sources and some less relevant tidbits of information about it. There are exercises at the bottom. Feel free to post or DM me your answers.

2. The secp256k1 curve

In this section, the secp256k1 curve is defined and its properties are discussed.

It is an elliptic curve defined by the SECG (the Standards for Efficient Cryptography Group). General elliptic-curve cryptography standards by them can be found in their SEC1 standard, and the exact specification of the secp256k1 curve can be found in the SEC2 standard.

The libsecp256k1 repository contains Sage code that defines all the curve parameters. Sage is an open-source mathematics software system built on Python. I encourage you to experiment with it.

2.1. The coordinate field

The curve has points whose coordinates are elements of the finite field $F = (\mathbb{Z}/p\mathbb{Z}, +, \times)$, or $\textrm{GF}(p)$, where $p = 2^{256} - 2^{32} - 977$, or 0xfffffffffffffffffffffffffffffffffffffffffffffffffffffffefffffc2f, i.e., integer addition and multiplication modulo $p$. Its additive and multiplicative groups are both cyclic. $p \equiv 3 \pmod{4}$, so square roots can be computed as $x^\frac{p+1}{4}$. Modular inverses can be computed as $x^{p-2}$ (which follows from Fermat's little theorem), but approaches using the extended Euclidean algorithm are more efficient (see a later section).

2.2. The curve group

The secp256k1 curve $E$ consists of the points $(x,y) \in F \times F$, for which $y^2 = x^3 + 7$, plus a special point at infinity, denoted $0$ (also, $O$ and $\mathcal{O}$ are in use). The point at infinity forms the identity for the elliptic curve group. It does not have coordinates; it exists outside of the curve, but is still part of the group.

This makes it an instance of a short Weierstrass curve (which are curves of the form $y^2 = x^3 + ax + b$), with coefficients $a=0$ and $b=7$ for secp256k1. The fact that $a=0$ enables a number of optimizations that are specific to the secp256k1 curve, but is an uncommon feature among elliptic curves used for cryptography.

Note that the curve is symmetric around the X-axis (if $(x,y) \in E$ then $(x,-y) \in E$ as well, due to $y$ only appearing in squared form). This holds geometrically if you were to plot the curve over $\mathbb{R}$, but it also holds for our finite field $F$.

2.2.1. The group operation

The group operation is defined as follows:

Algebraically, this means that (non-infinity) point addition $(x_R, y_R) = (x_P, y_P) + (x_Q, y_Q)$ is defined as:

For more information, see Point operations.

Note that terms like "line through" and "tangent" do not really have geometric meaning in a finite field (at least not in traditional Euclidean geometry). However, the same formulas as for real ($\mathbb{R}$) geometry can be used, just with the arithmetic operations reinterpreted to refer to the corresponding finite field operations instead.

All together, these rules result in an abelian group. It is easy to show that it is closed, every element has an inverse, and that it is commutative, but showing associativity is less trivial. All elliptic curves over finite fields are abelian groups (except a few degenerate cases like $y^2 = x^3$), and are either cyclic, or the product of exactly two cyclic groups. The secp256k1 group is cyclic, and has prime order n = 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141.

When representing coordinates as integers in the range $[0, p)$, as we will commonly do, negating a non-zero value flips its parity (even to odd, odd to even). Because the two Y coordinates for a given X coordinate are each other's negation, exactly one of them will be even, and exactly one of them will be odd. Similarly, exactly one of them will be below $p/2$ (integer division by $2$, not modular) and one of them will be above $p/2$. Lastly, exactly one of them will have a square root and the other one will not.

Note on curve order. All elliptic curves over a finite field satisfy Hasse's theorem which roughly says that the order $n$ of the curve group is very close to the field size $q$ (and specifically, $|n-(q+1)| < 2\sqrt{q}$). This theorem was known in the 1930's, and indeed holds for secp256k1, where $n - (p+1) \approx -1.270769397 \sqrt{p}$.

History. Interestingly, it has only been since the 1980's, with Schoof's algorithm, and later the Schoof-Elkies-Atkin algorithm, that it became computationally feasible to exactly count the number of points on an elliptic curve (it takes a few seconds on a modern computer for secp256k1). Without this, elliptic curve cryptography as we know it would not be possible. But thanks to it, we know $n$ for secp256k1, and more importantly, secp256k1's designers were able to use this algorithm to choose curve parameters such that the curve has the desired properties (like having a prime group order).

2.2.2. Scalar multiplication

Repeated application of the group operation in an elliptic curve is called scalar multiplication, or point multiplication, which we denote $v \cdot P = P + P + \ldots + P$, though $v \times P$ or $vP$ are also common (as well as $P^v$, see below). Efficient algorithms exist to compute $v \cdot P$ for large $v$ (in our case, 256-bit numbers), similar to exponentiation by squaring in modular arithmetic. For example:

We want to compute $571 \cdot P$. The scalar $571 = 1000111011_2$ in binary, i.e., $v = 571 = 2^9 + 2^5 + 2^4 + 2^3 + 2^1 + 2^0$. By invoking doubling 9 times, starting with $P$, we can precompute $P_i = 2^i \cdot P$, for $i=0 \ldots 9$. Then $571 \cdot P$ is just $P_9 + P_5 + P_4 + P_3 + P_1 + P_0$. So overall we have computed this result using just 9 point doublings and 5 point additions, far better than the 570 point additions that would naively be required.

This is just a very basic scalar multiplication algorithm, but it gives an intuition about why scalar multiplication with a scalar just needs $\mathcal{O}(\log_2(v))$ point operations, and not $\mathcal{O}(v)$ point operations. Later we will go into much more detail on the actual algorithms used (which improve the constant factors and memory usage of the algorithm, though the complexity remains largely unchanged).

Additive vs. multiplicative notation. While the above description, and the rest of this text, uses additive notation for elliptic curve groups, it is also common to use multiplicative notation. The difference stems because of a merging of two scientific domains. In the mathematical description of elliptic curves (the domain of algebraic geometry), the additive notation is common, and one uses terms like point addition. In the domain of cryptography, many schemes were originally formulated over other groups than elliptic curve groups (as they are rarely elliptic-curve specific), in particular over groups defined as integer multiplication modulo a large prime (which is true for Diffie-Hellman, non-EC DSA, the original Schnorr signature scheme, ...). Because of this history of using (literal) multiplication as the group operation, it is common to use multiplicative notation there.

This is also where the term "discrete logarithm" originates. Logically, in an additive group, the discrete logarithm problem ought to be called the "discrete division" problem, yet it is called discrete logarithm in both settings.

Among the most ferocious Twitter arguments involve cryptographers arguing about group notation for elliptic curves.

2.3. Isomorphism between scalars and points

2.3.1. The secp256k1 generator

As the secp256k1 group is prime-ordered, all its elements besides $0$ are generators. However, many cryptographic schemes that will be built on top need a fixed, well-defined generator. For secp256k1, this is the point $G$, with X coordinate 0x79be667ef9dcbbac55a06295ce870b07029bfcdb2dce28d959f2815b16f81798 and Y coordinate 0x483ada7726a3c4655da4fbfc0e1108a8fd17b448a68554199c47d08ffb10d4b8.

Origin of the secp256k1 generator. It is not known how this point was picked by the entity that standardized it, but it has one remarkably property that almost certainly reveals part of its creation story. If $H$ is the point for which $G = 2 \cdot H$ (i.e., the point halving of $G$), then the X coordinate of $H$ is 0x3b78ce563f89a0ed9414f5aa28ad0d96d6795f9c63, a 166-bit number, which is far too small to be a coincidence. It is assumed that somehow this $H$ point was constructed first, and then doubled to obtain $G$. For most protocols, the choice of the generator does not affect its security, so the uncertainty about the construction is not a cause for concern; it is just very peculiar. See also this video of a short talk on the subject by Nadia Heninger.

2.3.2. The scalar field

Consider the function $f(a) = a \cdot G$, which maps scalars to their corresponding scalar multiple of the generator $G$, then we get properties like:

This function $f$ is an isomorphism between $(\mathbb{Z}/n\mathbb{Z}, +)$ (integer addition modulo $n$) and the secp256k1 elliptic curve group, where $n$ is the secp256k1 group order given above.

The reverse operation, $f^{-1}$, from group elements (elliptic curve points) back to scalars (integers modulo $n$) is exactly the discrete logarithm w.r.t. $G$ as base. This reverse operation is assumed to be hard, which is the basis for elliptic curve cryptography. One algorithm for doing this, Pollard's rho, takes around $2^{129} \approx 10^{39}$ group additions and doublings, but only needs $\mathcal{O}(1)$ memory. More advanced algorithms exist that exploit secp256k1's special properties, or exploit a CPU-memory trade-off exist, but they only improve this number by a small constant factor. Note that there is no proof that no more efficient algorithm exists; the security of the group (or really, of almost all practically used cryptographic constructs) rests on the unproven assumption that no (significantly) better algorithm is discovered for the secp256k1 curve (or elliptic curves in general). There is a proof from 1997 by Shoup that if the attacker can only use group operations, and not exploit any elliptic-curve specific properties of the group (called the generic group model, GGM), computing the discrete logarithm in a prime-ordered group like ours does need $\mathcal{O}(\sqrt{n})$ group operations.

Because $n$ is prime, we can think of the scalars as inhabiting the field $\mathrm{GF}(n)$, which also has multiplication and division operations that are well-defined for all scalars (except division by $0$, of course). This lets us for example write (for the half-generator $H$ defined above) $H = f(1/2)$, or without using the $f$ function, $H = (1/2) \cdot G$.

Note that the coordinate field $\mathrm{GF}(p)$ and the scalar field $\mathrm{GF}(n)$ are not the same. $\mathrm{GF}(p)$ is the space inhabited by the X and Y coordinates of the curve. $\mathrm{GF}(n)$ is the space inhabited by private key, and is isomorphic with the secp256k1 group itself. Think of $\mathrm{GF}(p)$ (and the constant $p$) as an internal implementation detail of the secp256k1 curve, which only matters if you care about implementing the group operation yourself. $\mathrm {GF}(n)$ (and the constant $n$) are externally-observable properties of the group; part of its "public API" so to speak.

2.4. The GLV endomorphism

The secp256k1 curve exhibits a property that is unusual among elliptic curves used for cryptography, the efficiently-computable GLV endomorphism, named after its inventors (Gallant, Lambert, and Vanstone). It enables a specific optimization in secp256k1's implementation, which was noted by Hal Finney, and is the direct motivation that led to the development of the codebase that would become libsecp256k1. However, the GLV endomorphism was patent-encumbered, and only enabled by default in libsecp256k1 in 2020, when the patent expired.

To motivate this a bit, note the remarkable property that $-(x,y) = (x, -y)$. Because we are in a cyclic group of order $n$, $-(x,y)$ is the same as $(n-1) \cdot (x, y)$, and we discover $(n-1) \cdot (x,y) = (x, (-1) \times y)$. Yet, $n-1$ is a huge (256-bit) number, which would ordinarily necessitate the use of a relatively-slow scalar multiplication algorithm (involving hundreds of group operations). Still, for this very specific scalar, it suffices to simply negate the Y coordinate.

It turns out that specifically for secp256k1, there exists another scalar with a similar property. Let:

In short, this means that it is possible to multiply a group element by $\lambda$ by simply multiplying its X coordinate by $\beta$. Since this operation can be repeated a second time, it also holds that multiplying a point by $\lambda^2$ is the same as multiplying its X coordinate by $\beta^2$. Doing it a third time does not yield an interesting result, since $\lambda^3 = 1$ (which implies $\beta^3 = 1$ too, because multiplying by 1 needs to be an identity). It can be composed with the negated-Y-coordinate property, however. Let $h((X,Y)) = (\beta X, -Y)$, then we get

$$ \begin{split} (-\lambda)^0 \cdot (x, y) & = & 1 \cdot (x, y) & = & h^0(x, y) & = & (x, y) & = & (x, y) \\ (-\lambda)^1 \cdot (x, y) & = & -\lambda \cdot (x, y) & = & h^1(x, y) & = & h((x, y)) & = & (\beta x, -y) \\ (-\lambda)^2 \cdot (x, y) & = & \lambda^2 \cdot (x, y) & = & h^2(x, y) & = & h(h((x, y))) & = & (\beta^2 x, y) \\ (-\lambda)^3 \cdot (x, y) & = & -(x, y) & = & h^3(x, y) & = & h(h(h((x, y)))) & = & (x, -y) \\ (-\lambda)^4 \cdot (x, y) & = & \lambda \cdot (x, y) & = & h^4(x, y) & = & h(h(h(h((x, y))))) & = & (\beta x, y) \\ (-\lambda)^5 \cdot (x, y) & = & -\lambda^2 \cdot (x, y) & = & h^5(x, y) & = & h(h(h(h(h((x, y)))))) & = & (\beta^2 x, -y) \end{split} $$

We will discuss why this property is useful in a later section on efficient scalar multiplication algorithms. For now, it suffices to know that certain specific scalar multiplications can be done (much) more cheaply than general ones.

2.5. Jacobian coordinates

The coordinate system we have introduced so far, with pairs $(x,y)$ satisfying the equation $y^2 = x^3 + 7$, are called the affine coordinates. When implementing the group operation in these coordinates, we notice that there are divisions involved in the computation of the slope $\lambda$ (note: not the same $\lambda$ as the constant used in the GLV endomorphism). In our field, these correspond to modular inversions, which are relatively slow (around 50x to 100x slower than a field multiplication in practice). Since naively we would need to compute one $\lambda$ per point addition or point doubling, these would overwhelmingly dominate the total runtime of most operations involving elliptic curve points.

One potential solution is representing the field elements as fractions, with rules like $\frac{n_1}{d_1} \times \frac{n_2}{d_2} = \frac{n_1 n_2}{d_1 d_2}$, $\frac{n_1}{d_1} / \frac{n_2}{d_2} = \frac{n_1 d_2}{n_2 d_1}$, and $\frac{n_1}{d_1} + \frac{n_2}{d_2} = \frac{n_1 d_2 + n_2 d_1}{d_1 d_2}$. This lets us avoid all divisions, instead rewriting them (two) multiplications, but at the cost of converting additions and subtractions to three multiplications each as well. Only at the very end, when a coordinate needs to be serialized as an integer, would an inversion be involved to convert $\frac{n}{d}$ to $n d^{-1}$.

It turns out that one can do somewhat better than this. Instead of giving a denominator for every coordinate, we use a single denominator for every point. And, because this results in simpler resulting formulae, the denominator is raised to the second and third power for the X and Y coordinates. Specifically, we will use coordinate triplets $(X,Y,Z)$, where $Z$ is the "denominator", which represent the affine coordinates $(X/Z^2, Y/Z^3)$. These are called Jacobian coordinates. Note that these are not unique: $(X, Y, Z)$ and $(q^2 X, q^3 Y, q Z)$ for non-zero $q$ represent the same point.

Interestingly, it is also possible to think of these as a way to define the elliptic curve directly, rather than seeing them as just a way of encoding affine coordinates. In this case, the curve equation becomes $Y^2 = X^3 + aXZ^4 + bZ^6$ (with again, $a=0$ and $b=7$ for secp256k1). The explicit-formula database (EFD) site has a long list of formulas in algorithm form to perform point addition, point doubling, and more, on these Jacobian coordinates. Furthermore, any affine coordinate $(x,y)$ can be converted to Jacobian as $(x, y, 1)$, which allows constructing mixed-addition formulas that add points given Jacobian and affine coordinates together. This is the approach primarily used inside libsecp256k1.

3. Guide to libsecp256k1

Language The library libsecp256k1 is written in nearly-pure C89 (for compatibility with old/weak devices), plus it needs uint64_t, int64_t, and optionally uses a few other things like __int128, or assembly-optimized routines.

Library interface It aims to implement secp256k1-related cryptography at a high level, i.e., protocols. The functions it provides implement ECDSA, BIP340 Schnorr signatures, BIP327 Musig2, BIP324 ElligatorSwift, ECDH. It also provides functions for key generation and key tweaking (used in BIP32, BIP38, BIP341). It does not currently aim to expose low-level operations like finite field, scalar integer, or secp256k1 group arithmetic directly, even though it contains implementations of these, for the purpose of:

A more low-level interface, without stability guarantees and with some performance overhead, may get added through a hazmat module.

Focus The library focuses on safety (through a small feature set, high-level interface, best practices like constant-time operation to avoid side-channel attacks, testing, and review) and performance.

3.1. The public API

The public API exposed by the library can be found in the include/ directory. It is divided into several .h files, a main secp256k1.h for the interface to the core library routines, and secp256k1_M.h files for each module (which are optionally-compiled components of the library). The current modules are:

Modules are essentially how all new functionality to the library is added, as they avoid cluttering the main API, and allow phasing in changes.

I recommend reading at least the secp256k1.h file in full, as it includes extensive documentation about the provided interfaces, and gives a good idea of how things are generally done.

3.2. The main library code (secp256k1.c)

Single compilation unit The main library code is a single C compilation unit built from secp256k1.c (to maximize ability to inline functions), but the large majority of actual code is in .h files included by it, directly or indirectly. This provides some level of code organization, despite everything being a single compilation unit. These are generally separated into "layers" (a term I just made up for this write-up), consisting of L.h files (the "header", defining the interface) and L_impl.h files with the corresponding implementation code. For some of these, there are multiple alternative definitions with corresponding implementations (generally, alternatives targeting 32-bit and 64-bit code). These take the name L_V.h and L_V_impl.h, where V is the name of the variant. In this case there is still an L.h which describes the (common) interface exposed by both variables; L_V.h just includes representation-specific definitions.

The layers used by the main library code are (the names below drop the .h and _impl.h suffices), from low to high (later ones only depend on earlier ones):

Below we go a bit deeper into some of these. Note that they are presented in a different order than the layering order above, which I think is easier to understand.

3.2.1. General utilities with no other dependencies

3.2.2. scalar: integers modulo group order

scalar provides the secp256k1_scalar type, which represents integers modulo the group order. It has three variants:

Big integer arithmetic. The secp256k1_scalar type represents 256-bit numbers (modulo $n$, the group order), which poses a challenge. To paraphrase a StackExchange answer of mine to a somewhat infuriating question on the topic:

Computers generally cannot operate on arbitrary length numbers. Hardware can generally only perform integer arithmetic up to numbers of 32 or 64 bits in size. Anything else has to eventually be implemented in software. Many languages provide convenience functions that hide this complexity; for example Python has general-purpose arbitrary length integers as part of its standard library, but that simply means that the Python interpreter is translating operations on those apparent data types to many small individual instructions that each operate on just 64-bit integers, typically. If you wonder how that works, just imagine how you once learned to do math using numbers longer than 1 digit. Here the "digits" (often called "limbs", as they're bigger than digits) are 64-bit integers on their own, and so a 256-bit integer can be seen as a number consisting of 4 such limbs. Many operations (addition, multiplication, ...) on those are very similar to what you'd do for decimal numbers on paper.

libsecp256k1 is not written in Python, but in C, and C doesn't have support for integer types larger than what CPUs support internally, in general. So, it doesn't have the luxury of being able to just use off-the-shelf code for its big integer arithmetic, it has to be implemented by hand. Of course, that is the point: libsecp256k1 is intended to be fast, and by having custom integer code that's designed specifically for this application it can be many times faster than what a Python implementation could ever hope to achieve.

Furthermore, it's necessary. Arbitrary-length integer types, even if they were available, are generally not appropriate for implementing cryptography. We aim to design code that runs in constant time, so that an attacker cannot learn anything about our private key by observing how long operations on it take. Variable-length types can obviously not be constant-time, as operating on them depends on how long the number is.

The scalar_4x64 and scalar_8x32 use this approach, where they represent the scalar numbers using four 64-bit limbs, or eight 32-bit limbs. Specifically, scalar_4x64 (see below) represents 256-bit $x$ as $(x_0, x_1, x_2, x_3)$ such that $$ x = x_0 + 2^{64}x_1 + 2^{128}x_2 + 2^{192}x_3 \pmod n $$

Addition between two such numbers operates like schoolbook written arithmetic: add up the lowest-value limbs, take the low 64 bits of the result as output limb, and use the overflow (if any) as a carry bit added to the next lowest-value limbs. This means at least four "actual" additions need to be performed, but dealing with carry sometimes means even more are needed. If a carry remains when adding the highest-value limbs, we end up with a result $\geq 2^{256}$. If that happens, or even if the represented value sums to a value $\geq n$, $n$ is subtracted from the result to keep the summed representation in the range $[0, n)$.

Multiplication is similarly following a schoolbook approach, mostly. First, all 16 pairs of two limbs are multiplied together, the results are arranged by resulting power of $2^{64}$, and then added with carry, yielding a 512-bit multiplication result consisting of 8 limbs. $$ x = x \times y = (x_0, x_1, x_2, x_3) \times (y_0, y_1, y_2, y_3) \\ = \\ \begin{split} x_0 \times & ( & y_0 + & 2^{64} y_1 + & 2^{128} y_2 + & 2^{192} y_3 & & & & & ) + \\ x_1 \times & ( & & 2^{64} y_0 + & 2^{128} y_1 + & 2^{192} y_2 + & 2^{256} y_3 & & & & ) + \\ x_2 \times & ( & & & 2^{128} y_0 + & 2^{192} y_1 + & 2^{256} y_2 + & 2^{320} y_3 & & & ) + \\ x_3 \times & ( & & & & 2^{192} y_0 + & 2^{256} y_1 + & 2^{320} y_2 + & 2^{384} y_3 & & ) \\ \end{split} \\ = \\ \begin{split} 2^{0} & ( & x_0 y_0 & ) + \\ 2^{64} & ( & x_0 y_1 + x_1 y_0 & ) + \\ 2^{128} & ( & x_0 y_2 + x_1 y_1 + x_2 y_0 & ) + \\ 2^{192} & ( & x_0 y_3 + x_1 y_2 + x_2 y_1 + x_3 y_0 & ) + \\ 2^{256} & ( & x_1 y_3 + x_2 y_2 + x_3 y_1 & ) + \\ 2^{320} & ( & x_2 y_3 + x_3 y_2 & ) + \\ 2^{384}& ( & x_3y_3 & ) \end{split} $$ After this 512-bit result is computed, it is reduced in several steps to a value in range $[0,n)$.

scalar_4x64 implements these in secp256k1_scalar_add and secp256k1_scalar_mul, and there are many other functions that implement various operations on scalars, but they all largely use the same principles. For x86_64 systems, there is assembly code included, as it is not possible from C to actually observe the carry (overflow) bit when performing 64-bit integer addition, even though the hardware computes it. The pure C version therefore has to resort to a hack (checking whether the sum is less than one of the inputs), but the assembly version can directly use the hardware carry flag, by using addition-with-carry instructions.

scalar_8x32 uses all the same principles, but uses base $2^{32}$ and 32-bit limbs. This makes sense on systems where the hardware does not provide a native 64-bit arithmetic (because such arithmetic needs to be emulated by the compiler then anyway, or by us, see int128 below).

3.2.3. int128, an implementation of 128-bit arithmetic

When implementing scalar_4x64 specifically, we're faced with the problem that the intermediary limb multiplication results ($x_1y_3$ etc.) are 128-bit values, and the C language (as standardized) does not have a 128-bit integer type. Thus, multiplying two uint64_t normally just yields the bottom 64 bits of the result, discarding the top 64 bits. However, typical 64-bit hardware does support computing the full 128-bit result of a multiplication, even though it may not have 128-bit registers. 64-bit ARM solves this by having two separate instructions for computing the low respective high 64-bit of a multiplication result. 64-bit x86_64 solves this by having an instruction that puts the low and high 64-bit into two separate registers.

We really want to use these instructions, because the alternative is representing the number using eight 32-bit limbs, raising the number of individual multiplication instructions from 16 to 64, which would be a very serious performance penalty.

One option for this is relying on assembly code, but that means many different versions, one for each supported platform. But many C compilers do provide access to the full 128-bit result without needing to resort to assembly. GCC and Clang provide a non-standard __int128 type, which corresponds to a pair of two 64-bit registers/variables in hardware, and is otherwise emulated in software, but allows computing a uint64_t * uint64_t -> __int128 result through this type. MSVC provides a special built-in function for wide multiplication, which returns a pair of two uint64_t.

The int128 layer provides an abstraction over these various compiler-dependent ways of accessing 128-bit multiplication results, and using them in a few other ways (enough to implement the multiplication logic in scalar_4x64. It has two variants:

In what follows, I will refer to arithmetic using numbers represented in uint32_t but with multiplication and other temporary results in uint64_t as 32/64-bit arithmetic, and to arithmetic using numbers represented in uint64_t but with multiplication and other temporary results in int128 as 64/128-bit arithmetic.

3.2.4. field: finite field element representation

field provides the secp256k1_fe type which represents integers modulo $p = 2^{256} - 2^{32} - 977$, and many arithmetic operations on it. It has two variants:

Overcomplete representation. Despite implementing functionality that is very similar to scalar (only a different modulus), it uses quite different representations, and different algorithms to implement them. Specifically, it uses a representation with 25% more limbs than needed, and allows each of these limbs to exceed their normal range. For example, in field_5x52, $2^{100}$ could be represented as $(0, 2^{48}, 0, 0, 0)$, but also as $(2^{52}, 2^{48}-1, 0, 0, 0)$ or $(2^{55}, 2^{48}-8, 0, 0, 0)$, or many others. This has two advantages:

Performance impact. The field logic is primarily used to implement the group operations (point doubling/addition), and in this context, there are often sequences of multiple field additions and other operations that can exploit the same advantages, such as negations, and multiplication with a small constant. By using an overcomplete representation like this, we can often avoid many modular reductions and carries. This comes at a cost however, because there are 5 limbs, multiplications now need 25 uint64_t multiplications rather than 16, or 100 instead of 64 for 32-bit code. At the time libsecp256k1 was first developed, this was still a win, especially for pure C code (which needs hacks to determine carry flags, see above). However, it would be possible to introduce new field variants that use very different representations. Experiments with hand-written assembly show that for modern platforms, a representation using four (and sometimes five) 64-bit limbs (rather than 52-bit ones) can be somewhat more performant. Actually using these would however require some CPU auto-detection code, and a way to dispatch based on that detection.

Magnitude. Of course, we need to be careful, to not perform too many additions, as otherwise we may end up with a field element representation whose limbs exceed the range of the integer type (uint32_t for the field_10x26 variant, uint64_t for the field_5x52 variant), because then carry bits would actually get lost without code that actually performs carries. To do so, we assign a "magnitude" value to each field element to keep track of how big the excess above $2^{26}$ or $2^{52}$ is allowed to be. A "normal" representation for a number (with all limbs in their normal range) is assigned magnitude 1. Adding two field elements with respective magnitudes $m_1$ and $m_2$ yields a field element with magnitude $m_1 + m_2$. The magnitude is not allowed to exceed $32$. If a number would grow too large, the calling code needs to call a function to explicitly reduce its magnitude to 1 (secp256k1_fe_normalize and some variants).

All field.h operations document how the functions propagate magnitude, and what constraints they put on magnitude. The field_5x52.h and field_10x26.h files document the actual representation used, and what magnitude translates to for them.

Magnitudes of field elements are a compile-time property of the code, in that they are not actually kept track of at runtime (except in debug mode, with VERIFY macro), but are part of the interface that the higher-level code uses. If we were working in a language with a richer type system it would be possible to make the magnitudes a property of the type (e.g. as a C++ template parameter), but since this is C, it's just a property that the user of the code (i.e., higher layers within libsecp256k1) is expected to keep track of, though it is also checked in debug builds (through assertions).

Normalization. There is one more property like this. It is possible that a field element has magnitude $1$, i.e., all its limbs are within their normal range, but the number is not fully reduced modulo $p$. For example, $(2^{52}-1, 2^{52}-1, 2^{52}-1, 2^{52}-1, 2^{48}-1)$ represents $2^{256}-1 \pmod p$, but so does $(2^{32} + 976, 0, 0, 0, 0)$. Most operations can handle both, but if we're about to serialize a number to bytes to return from an API call, or if we want to determine whether a number is odd or even, it must be fully reduced so we're talking about the one canonical way of representing the number. For that, we add an additional compile-time property like magnitude to numbers, normalization, which is also only explicitly tracked in debug mode.

Simultaneous multiplication and reduction. Because the modulus $p$ has such a simple structure ($p = 2^{256} - 2^{32} - 977$), modular reductions are even simpler than in the scalar case: if $x = h_{\mathrm{low}} + 2^{256} h_{\mathrm{high}}$, and $x' = h_{\mathrm{low}} + (2^{32} + 977) h_{\mathrm{high}}$, then $x = x' \pmod p$. This operation - turning high limbs into low limbs by multiplication with $2^{32} + 977$ is so simple that the field multiplication code (secp256k1_fe_mul) and squaring code (secp256k1_fe_sqr) does this simultaneously with the multiplication, rather than in a separate phase afterwards like the scalar layer does.

Debug mode. The library can be compiled in debug mode, by setting the VERIFY macro. In this mode, field_impl.h introduces wrappers around all the field_5x52_impl.h- and field_10x26_impl.h-provided functions which implement the tracking and checking of magnitude and normalization flags, and much more. CI involves tests that set these flag.

3.2.5. modinv32 and modinv64: modular inverses

These are implementations of generic modular inverse calculation algorithms, and Jacobi symbol calculations (a generalization of the Legendre symbol, used for determining if a number is a modular square), using variants of the safegcd algorithm. There are 3 variants implemented:

Representation Internally, these algorithms use a signed digit representation. modinv64 uses five base-$2^{62}$ int64_t limbs (which are allowed to be negative), using 64/128 bit arithmetic internally. modinv32 uses nine base-$2^{30}$ int32_t limbs, using 32/64 bit arithmetic internally. The signed digit representation makes it possible to easily negate values internally, and also allows for more sparse representation of the modulo (e.g., $p$ can be represented as $(-(2^{32} + 977), 0, 0, 0, 2^8)$).

Uses They are used inside scalar and field to compute modular inverses and Jacobi symbols modulo their respective prime modulo $n$ and $p$. The 64-bit variants (scalar_4x64 and field_5x52) use modinv64, while the 32-bit variants (scalar_8x32 and field_10x26) use modinv32.

Safegcd is a very fast, modern (2019), algorithm for computing GCDs and modular inverses in constant-time. The latest variants of it had a few contributions from libsecp256k1 contributors, and assembly implementations of it are the fastest known constant-time modular inverse algorithm (paper coming soon, hopefully) - the pure C implementation in libsecp256k1 is some 2x-2.5x slower, but still very fast (in the order of 1-2 microseconds on modern desktop systems). If you are interested in more details, I suggest the explainer (with Python pseudocode) about how it is used in libsecp256k1.

3.2.6 group: group operations on the secp256k1 curve group

This layer implements point addition, point doubling, and point negation, for points represented in affine $(x,y)$ coordinates (type secp256k1_ge) or Jacobian $(X,Y,Z)$ coordinates (type secp256k1_gej), plus conversions between the two. There are also variants implemented that perform mixed addition, with one operand affine, one Jacobian, and Jacobian output.

The purpose of Jacobian coordinates is avoiding inversions. With the safegcd-based inverses (see above), computing an inverse is relatively fast, but it is still many times slower than a field multiplication (which is in the order of 10s of nanoseconds). Fully affine point addition requires at least one inversion (in the computation of the slope $\lambda = \frac{y_2 - y_1}{x_2 - x_1}$) per group operation, which would completely dominate the cost of a point addition. The $Z$ coordinate acts sort-of as a shared "denominator" for X and Y, allowing all divisions to be rewritten into multiplications, similar to how $\frac{a}{b} + \frac{c}{d} = \frac{ad+bc}{bd}$.

There are constant-time and non-constant-time variants of many functions. The constant-time ones are used when operating on private key or other secret material, while the (faster) variable-time ones have an explicit _var suffix to their name (a common convention throughout the codebase).

Constant-time point addition in short Weierstrass curves (like secp256k1 is) is a point of contention. The curve does not have any unified formula that works correctly for all operands on the curve, so it needs special cases to deal with doubling, adding infinity, and adding points that are each other's negation. Other curves, like ed25519, admit a single unified formula for everything, which makes constant-time implementations relatively speaking easier. For secp256k1 it essentially requires evaluating all variants, and then in constant-time (using bit-fiddling tricks) picking one of them (see secp256k1_fe_cmov, which in constant time picks one of two field elements). In libsecp256k1, constant time point addition is implemented in secp256k1_ge_add_gej, which does a constant-time mixed addition, and requires the affine input to not be infinity. That works, because it is used in contexts where the affine argument comes from a precomputed table, so we can know it is not infinity.

An old version of this secp256k1_ge_add_gej function did not do this correctly, and it was probably the closest to a security vulnerability libsecp256k1 ever had, though we believe it was non-exploitable. An algorithm from a paper was used, and adapted for secp256k1, which claimed a unified addition formula (something we now know does not exist for secp256k1), overlooking the fact that it only works for curves where no two distinct points have opposite Y coordinates. After the bug was discovered, the algorithm was adapted to work even when equal Y coordinates occur through the use of more secp256k1_fe_cmov operations, resulting in something that is somewhat faster than always computing both addition and doubling and selecting between the two, but at the cost of some complexity. However, we now also have Sage code that proves that algebraically, the algorithm works for all inputs (and, the same Sage code does fail when attempting to run on the old algorithm).

3.2.7. Scalar multiplications: ecmult, ecmult_gen, ecmult_const.

A significant portion (I expect at least 80%) of the time spent inside libsecp256k1 for common protocol operations, is in scalar multiplications. Thus, it is not surprising that a significant portion of the codebase is dedicated to making this efficient.

While the exact algorithms used are too detailed to discuss here, there are techniques they are based on, which are useful to have insight into.

The most naive scalar multiplication algorithm for computing $a \cdot P$ consists of first computing $P_i = 2^{i} \cdot P$ for $i \in \{0 \ldots 255\}$ through repeated doubling, and defining $a[i]$ as the i'th bit of the binary representation of $a$. The final result is then $\sum_{i=0}^{255} a[i] \cdot P_i$, selecting all of the 1 bits in the binary representation, and adding the respective precomputed $P_i$ multiples up.

Often we want to compute the sum of multiple scalar multiplications at once. For example, inside ECDSA verification one needs to compute $u_1 \cdot G + u_2 \cdot Q$. By working from high to low bits, we can only perform the 255 doubling once rather than twice.

Imagine we wanted to compute $101100101_2 \cdot G + 011001110_2 \cdot Q$. In every step we will double the result so far, and then - from high to low - add $G$ resp. $Q$ to the result if the respective bit is set:

Thus, with just 1 doubling per bit in the biggest scalar, and 1 addition per set bit, we have computed our result. Using a naive approach with precomputing powers of two of both $G$ and $Q$ and adding them, would have required 1 doubling per bit in all scalars combined, and 1 addition per set bit. This principle is the basis for the speedups in batch validation, as it means a single equation with more multiplication can be verified more efficiently than multiple individual equations, with the same number of total multiplication.

This is called Shamir's trick, and it is a special case of a more generic algorithm, called Strauss' technique. Instead of working bit by bit, it is possible to compute a few multiples (say $\{0, 1, 2, 3, 4, 5, 6, 7\}$ times each input base point), and then chop up the scalars into groups of 3 bits, and then iterate by doubling the result 3 times, and then performing 1 addition with a table entry per non-zero group of 3 bits in any scalar. This adds some precomputation cost, but reduces the number of additions to be performed by ~3x.

Now observe that adding an elliptic curve point is just as hard as subtracting one (it just needs a negation of the Y coordinate before adding). This means that we can effectively generalize this technique to scalars which are written in a number system that permits negative numbers. This can be exploited to introduce more zeroes, and thus even fewer additions. This is called the non-adjacent form (NAF), because it never needs two adjacent non-zero digits. It can be combined with Strauss' technique to obtain wNAF (windowed non-adjacent form).

Lastly, there is the GLV endomorphism which allows writing each input multiplication pair $a \times P$ as $a_1 P + a_\lambda \cdot (\lambda \cdot P)$, where $a_1$ and $a_\lambda$ are just 128-bit scalars, and $\lambda \cdot P$ is efficiently computable. So we can double the number of multiplications to perform, and make them all half length. This is beneficial, because it means only 127 doublings are needed anymore instead of 255, while leaving the number of point additions the same.

All of these, plus several more involved optimizations like the effective-affine technique, are implemented in the variable-time ecmult layer, which is used for verification and public key tweaking (as those do not involve any private key material).

The two constant-time variants, ecmult_gen and ecmult_const, are based on a technique called the signed-digit multi-comb (SDMC), from Section 3.3 of a paper by Mike Hamburg. In the case of ecmult_const it is combined with the GLV endomorphism too (with a derivation documented in the code). For ecmult_gen, where large precomputed table of multiples of $G$ is used, the cost of the doublings is suffered at table generation time, and thus GLV is not helpful there to avoid them at runtime.

3.2.8. The core public API implementation

Finally, secp256k1.c itself depends on all the above layers (by #includeing their _impl.h files) and more, providing thin wrappers that implement the public API using the implementation code above.

This generally involves converting input and output arguments between external and internal representations on each call. For example, public keys are internally typically represented as secp256k1_ge, which contains an X and Y coordinate, plus an "infinity" flag. Its size is platform-dependent, though 84 or 88 bytes on typical systems (40 bytes for each secp256k1_fe coordinate, plus an infinity flag, plus padding for alignment). We don't want the public API to expose this platform dependency, nor do we want to be unable to change the internal representation if a more efficient one with a different size ends up being used on some or all systems (not unlikely!). Thus, public keys are in the API represented as a different secp256k1_pubkey type of exactly 64 bytes, and all API calls convert between the two representations as necessary.

The secp256k1.c also #includes the main_impl.h file from each enabled module, which may pull in more layers from the module's directory. These add additional functionality to be exported this way.

3.3. Tests

The code is extensively tested using several approaches, and tests are expected for all new functionality added. Until recently we also had differential fuzzing through cryptofuzz, though I'm not sure what the status of that project is now.

3.3.1. Unit tests

The test code is primarily in tests.c, a cozy 7800-line file with tests for all core layers. It tests both the external API and internal layers, and thus #includes the tested layers's _impl.h files just like secp256k1.c does.

Every module has its own tests, in the module's module/M/tests_impl.h file, which is included in tests.c if the module is compiled in. Those together contain another 3500 lines of tests.

3.3.2. Exhaustive group tests

A novel testing technique used in libsecp256k1 is the so-called exhaustive tests, executed from tests_exhaustive.c. This file sets a special macro, EXHAUSTIVE_TEST_ORDER, which causes a variant of the group and scalar code to be built.

Instead of the secp256k1 curve $y^2 = x^3 + 7$, the curve $y^2 = x^3 + 2$ is implemented, with a generator that does not generate the entire group, but just a subgroup whose order is 13. Likewise, the scalar code now does not implement arithmetic modulo the 256-bit secp256k1 curve order, but simply modulo 13 (this is where the scalar_low variant comes in that was mentioned above).

With just 13 points on the curve, it is possible to check a number of cryptographic properties exhaustively with the (largely) same implementation as the production code. For example, we can test that for every ECDSA signature ($(R, s)$ pair), with every message (there are just 13 of them), and with every private key, the signing logic returns a valid signature.

3.3.3. Constant-time tests

Timing attacks. To protect the library against timing side-channel attacks, we want to make sure that runtime does not reveal any information about private key material it may operate on. This is especially important when running in environments like hardware wallets, but can in theory be a concern in many more settings.

Constant-time code. The best practice for guaranteeing this property is by making the code constant time. This term is a misnomer, as the runtime does not need to be (and generally isn't) actually a constant. This is generally not possible, e.g. depending on what else the CPU is doing, or the state of the CPU cache, there may be more or less memory accesses. What is actually needed is that the runtime is not affected by any secret information. To achieve this, one generally wants:

Valgrind. The last of these three is hard to guarantee, but the first two can (perhaps surprisingly) be tested fairly easily. The valgrind tool, which can run normally-compiled binaries in a CPU emulator, has the ability to track undefined behavior that results from using uninitialized memory. The way this works is that it maintains a bitmap for all memory, with for each bit of memory, whether its value is initialized or not. It is quite intelligent, and will in some cases not complain about using uninitialized memory, but propagate it. For example, if one copies uninitialized memory elsewhere, the operation will succeed, but just mark the result now also as uninitialized. It can even perform arithmetic or bit operations on this, just keeping track of what values are tainted by uninitializedness.

In fact, the only operations that valgrind will complain about, is when a branch is taken whose choice depends on uninitialized memory, or when it is used as a memory address. These exactly match what the cases which we care about for constant-timeness, except there it is when code/memory depends on secrets rather than uninitialized memory.

ctgrind. This is exploited in the ctgrind technique. Write a test program that uses uninitialized memory as secret input to a function you want to test, and run it through valgrind. This should succeed if the function does not exhibit any non-constant-time behavior.

We don't use ctgrind itself, but implement the technique within our ctime_tests.c code. To mark inputs as "uninitialized", it uses the valgrind API which explicitly allows us to mark bytes of memory as uninitialized. The opposite is also used to "declassify" the output of certain routines (e.g. the affine result of a point multiplication can be considered non-secret, even when the scalar to it was, if we assume the discrete logarithm is hard). Note that this is not generally true for the Jacobian result of a secret multiplication, as the Z coordinate might in theory reveal something.

Note that non-constant-timeness is compiler-dependent. No C compiler actually has any guarantees about producing constant-time code, and as optimization steps within compilers become more complicated, it is quite possible that code that formerly produced constant-time results no longer does that in future versions. In fact, releases 0.3.1 and 0.3.2 were both made specifically to introduce workarounds for things Clang 14 and respectively GCC 13 started compiling as variable-time on some platforms. Note that -O3 makes this behavior a lot worse.

MemorySanitizer. Valgrind is not available on all platforms, but on some of them, we can use msan as an alternative, which uses a very similar memory-initializedness tracking and propagation, and also exposes an API to mark memory as initialized or uninitialized.

3.4. Benchmarks

There are benchmarks, split into 3 programs: bench.c, bench_ecmult.c, and bench_internal.c. The first one benchmarks the public API, and is thus linked against the library. The last two benchmark functions from internal layers, and thus #include the relevant layers.

3.5. Examples

The library also contains examples that demonstrate how to use the API, in the examples/ directory of the project root (outside of src/).


Exercises

Exercise 1. In a programming language of your choice, implement affine point addition for secp256k1. As representing scalars and field element will be the topic of a future section, it is not important to focus on that now, so feel free to use a language or library which provides arithmetic over big integers.

Test vectors. All are given as decimal affine (x, y) coordinate pairs, where x and y are reduced modulo $p$ to the range $[0, p)$.

(67021774492365321256634043516869791044054964063002935266026048760627130221114, 22817883221438079958217963063610327523693969913024717835557258242342029550595) + (61124938217888369397608518626468079588341162087856379517664485009963441753645, 5723382937169086635766392599511664586625983027860520036338464885987365575658) = (78518484088348927894279633941273782106215956054783044881924083038087974375069, 18400956471605157290158330638123206056219981947313880254846397293938760781200)
(44797955726860071483167773525787460171685721903803276437396496681708013097206, 112878323467240798018200025047246733779416351939079609883282945822975931592141) + (44797955726860071483167773525787460171685721903803276437396496681708013097206, 2913765770075397405370959961441174073853632726560954156174638184932903079522) = 0
(95200151225387174391707134980196577229773167465894787919263504089948495725202, 94213123740092242124032541289267941722641721980066680728855126898974205181980) + (95200151225387174391707134980196577229773167465894787919263504089948495725202, 94213123740092242124032541289267941722641721980066680728855126898974205181980) = (5909177817561749019375996132097716007690336893057112295739767175467136927121, 32162989297956602751967132742255814558956860587998309119003795115938320862381)
(24050370140998638157368766089090079788245793492514664296883668741529047882113, 14478882322437672032054487172211630444001167135141445302555096737662467817571) + (15045863282447234231848775263852322721143017336655001075698483887751182719636, 14478882322437672032054487172211630444001167135141445302555096737662467817571) = (76695855813870323034353443655745505343881173836470898666875431378628604069914, 101313206914878523391516497836476277409268817530499118736902487270246366854092)
(14256779447437936128616290794341059890063336098474125854681710102809814868320, 90566103014364716248988921534849031279541603477816641946022463390335657035131) + (2303067510121489830312323422056091166740725427601969990117485452141659178613, 25225986222951479174582063473838876573728381187823922093435120617573177636532) = (95430772898311369787541983276504378677140303663720683940530878996024106515165, 48068184564993462938397020947826677061288691733511084479824032705110581338856)

Exercise 2. (Optional) Refer to this page with Jacobian point addition algorithms, and implement it. You can also pick a formula from the EFD site linked above (but be aware that you'll need to detect whether the input might be a $P + P$ or $P + (-P)$ case, which need special treatment), and implement Jacobian point addition for secp256k1.

Test vectors. All inputs are given as decimal affine (X, Y, Z) tuples. The outputs are given in affine coordinates (as the Jacobian result is not unique).

(61168739479711927142764658335960185139044138470269152817362835609619277248733, 21365265259791813296359020025112135293342760115353080382870338561918313862807, 37064183328797598544560694959943799168750358913858865780091974718018553562419) + (75776791705958340557958402430698975706422201066979121642449913138944604425660, 66383280047496136929271400526347103822935621943780462161181840552194350141564, 75975606300704613123930174557625573844043347281105167940536468038500802717509) = (72863032945283280953636129059545959529634042357753453132026174732744194676931, 111529132148508388427246132585785101600429639308058372390604751264868469767543)
(89959325059742944430358451400705002920926825355225869621717936807494095714290, 96093053924735119484524007701924861311484651710593769022900107977673928960245, 66142611799578950251083409575885695298839488135797694779041885661190727675299) + (61152793683249667605361745755257610395039301799655537107480658643593848781730, 108824838086741573141078213715633247883899533027170274847878148878014138167046, 20026567909062914103680712539641599080083135680565932483453732406779235372092) = 0
(1547568827951595983041825486208171785819741431893371520256763714464258127790, 87153109579099129796596751254693228766379983077346253255841414029284516911078, 105104885998309941273615701006706417602105584887217436384728254947105995740715) + (102754269592907928248165438489539780821724369832426272173645274109108284691770, 38298190034438650883752719589335983487411860447931052099125319988280170002045, 56745928453254477537417735654158445415425453625586007664329168279192608303666) = (21324256287414615615026299379536579336529998865990184416926039607504524853626, 96719670966356830360698314514227297774284915420887284954650836535688914930874)

Exercise 3. Implement scalar multiplication for secp256k1, using the affine or Jacobian point addition you implemented above.

Test vectors.

23529072936145521956642440150769408702836782170707519110832596096096916532363 * (94777218176490725267733209794395406270863807953747235979017564313980479098344, 53121120406880321033414824968851949358991212541220678285657788880408683486672) = (81492582484984365721511233996054540050314813088236204730182464710703690737195, 84165397430175583340352582740254662715932722835371860159802475562062898918484)
77770687059601253501098075906318324640585620643934538062621691587089455400301 * (5187380010089560191829928600869675928625207216422014112981972591844926771008, 75026050083095897004323393777174635055491620440662638678606562665317466685019) = (76999255841974189685876230118581110410155956505185745130247574937430232984638, 87571171775685157828750403037960903210473289232782306139148947195874900187006)
3747619523960563074315083315669137577217731866086110333821423552891044218266 * (66371586610273545144505648512343824229224003523952192165787799288317344396675, 6489011411151914877089190610663845093649879070897583530615192453262848111419) = (109441138145498884726545575659592733193661671281368885246963601136369148387669, 83708880322787879701338478937074052809697986569225329829504559758598509123336)

Exercise 4. Let $P = (-6^\frac{p+2}{9}, 1)$ be a point on the secp256k1 curve. Let $Q$ be the point on the curve for which $3 \cdot Q = 5 \cdot G - \lambda \cdot P$. What are the affine coordinates of $Q$?

The last 3 decimal digits of its X coordinate are 452.

Exercise 5. (Bonus) Define $P_0 = G$, and $P_{i+1} = P_i + P_i$, what is the first integer $i > 0$ for which $P_i = G$?

Hint: if you for whatever reason find yourself needing to factor a largish number, try Wolfram Alpha.

The last 3 decimal digits of the answer are 349.

Changelog