Skip to main content
Table of contents
Luna Tong

MPC From Scratch: Everyone Can Do it!

Building an implementation of garbled circuits from the ground up
Article heading

Suppose you and your friend are millionaires. You want to find out who is richer — but without revealing either of your net worths to each other. In other words, you’d like to jointly compute the function aba \geq b, given two numbers aa and bb, without either party knowing the other’s number. This is a classic motivating problem: the Millionaires’ problem.

MPC provides a solution for problems like this. MPC, or multiparty computation, is about how multiple parties can do shared computations on private inputs without revealing those private inputs. In fact, MPC allows us to securely compute jointly any function f(x1,x2,x3,)f(x_1, x_2, x_3, \cdots) where any subset of the inputs can be private for either party. (There are MPC protocols for more than two parties, but we’ll just cover two-party MPC in this post.)

This sounds cool, but is it possible? MPC is actually pretty easy to implement. Let’s walk through a basic implementation of MPC from scratch. In just a few hundred lines of code, we’ll be building all of the primitives needed to execute Yao’s Garbled Circuit protocol from the ground up.

Generating Primes

To do MPC, we’ll need asymmetric crypto. The easiest asymmetric primitive is probably RSA, and to do RSA, we’ll need to generate some prime numbers.

We can generate primes through guess and check. However, we first need a way to quickly check if a number is prime. A number of tests exist, but a common one is the Rabin-Miller primality test. It is a simple test based on elementary number theory, and it’s quite easy to implement:

def rabin_miller(n, k=40):
if n == 2:
return True
if n % 2 == 0:
return False
r, s = 0, n - 1
while s % 2 == 0:
r += 1
s //= 2
for _ in range(k):
a = random.randrange(2, n - 1)
x = pow(a, s, n)
if x == 1 or x == n - 1:
continue
for _ in range(r - 1):
x = pow(x, 2, n)
if x == n - 1:
break
else:
return False
return True

Let’s see if it works.

>>> rabin_miller(101)
True
>>> rabin_miller(1234567891234567)
False
>>> rabin_miller(2**2203-1)
True

Cool. This is a probabilistic algorithm, where parameter k determines how sure you are. This is because for the random bases aa‘s we choose, there’s a small false positive rate. It’s also interesting to note that if the bases can be predicted ahead of time, it’s possible to forge pseudoprimes that pass the test but are actually composite. We’re not too concerned here though because our prime candidates are randomly generated.

In practice, something like k=40 works fine for RSA-sized numbers. It runs in polylog time, so we can use this for our guess and check. In practice, it’s much faster to first sieve out any small primes first before running the more expensive full test.

def gen_prime(n=2048):
while True:
p = randbits(n)
p |= 1 # we only want odd numbers
for p in SMALL_PRIMES:
if n % p == 0:
break
else:
if rabin_miller(p):
return p

RSA

Now that we have prime numbers, we can do RSA. RSA is pretty simple, so let’s just do it.

def gen_rsa_params(n=2048):
p, q = gen_prime(n//2), gen_prime(n//2)
N = p * q
e = 65537
phi = (p-1)*(q-1)
d = pow(e, -1, phi)
return e,d,N

Python 3.8 and above supports using pow(x, -1, N) for modular inverse. Before, you could use the extended Euclidean algorithm to do it.

As always, encryption is pow(m,e,N) and decryption is pow(c,d,N). Let’s test it out:

>>> e,d,N = gen_rsa_params(32) # small N for illustrative purposes
>>> (e, N) # public key
(65537, 2393534501)
>>> d # private key
2037798773
>>> m = 123456789 # message
>>> c = pow(m,e,N) # encryption
>>> c # ciphertext
1967143072
>>> pow(c,d,N) # decryption
123456789

Note: This is textbook RSA, which is bad. In practice, securely using RSA is not easy and requires things like padding, generating random numbers securely, avoiding side channels, and more. In general, you shouldn’t roll your own crypto and should instead use libraries authored by experts, and so on. (But this is just for learning purposes, so let’s move on.)

Oblivious Transfer

Now that we have RSA, we can build oblivious transfer. Oblivious transfer is a building block for MPC. In fact, it’s complete for MPC, meaning you can build any MPC with just oblivious transfers.

In an oblivious transfer, there are two parties: the sender and the receiver. The sender Alice has two messages. She wants to let the receiver Bob choose to receive one of these two messages without revealing the other. However, the receiver also doesn’t want to reveal which message he chose.

Here’s a more intuitive analogy. Suppose you’re the sender, and I’m the receiver. You put two messages into two identical-looking boxes. Then, you turn around and I take one of the boxes. When you turn back around, you can’t tell what box I chose, and I don’t know what’s in the other box.

To accomplish this with cryptography, we’ll build a 1–2 oblivious transfer. The numbers 1 and 2 denote that the receiver can choose one out of two messages to receive. First, let’s consider the following protocol. This protocol doesn’t work, and we’ll see why:

  1. Alice has messages m0,m1m_0, m_1. Bob wants to receive mbm_b where bb is 0 or 1.
  2. Alice generates two random paddings x0,x1x_0, x_1, which are both given to Bob.
  3. Bob generates a random pad kk as well.
  4. Bob chooses one of the two pads, xbx_b, then blinds it with his own pad: xb+kx_b + k. This is returned to Alice.
  5. Alice can’t tell which xx Bob picked, because kk is unknown to her. Alice subtracts both of her own paddings to get (xb+k)x0(x_b + k) - x_0 and (xb+k)x1(x_b + k) - x_1. One of these will be equal to kk, and the other one will be meaningless to Alice (e.g., x1+kx0x_1 + k - x_0).
  6. Alice takes both values and pads both messages with them: m0+xb+kx0m_0 + x_b + k - x_0 and m1+xb+kx1m_1 + x_b + k - x_1. One of these will be equal to mb+km_b + k. These are both sent to Bob.
  7. Bob unpads kk from both messages. One of them will be mbm_b, and the other one will be mb+xbxbm_{b'} + x_b - x_{b'} (where bb' is the boolean inverse of bb).

In this protocol, Bob successfully received his message mbm_b without revealing which one he chose. However, there’s a problem: Bob can also figure out the other message, too! Since Bob has both x0x_0 and x1x_1, he can also recover mbm_{b'}. This breaks the desired property that the sender will only reveal the message that was chosen.

We can fix this with asymmetric crypto. The problem lies in Step 5, when Alice pads both messages before sending. The issue is that Bob can compute the value of the padding for both messages. We could fix this by having Alice perform some kind of private key operation (i.e., decryption) on her values before applying them to the message. A naive attempt might look like m0+(xb+kx0)dm_0 + (x_b + k - x_0)^d; however, this doesn’t work.

The issue is this: how would Bob remove Alice’s padding now? Without the private key, he has no way to compute or predict its value. To fix this, we’ll have Bob encrypt his blind kk before using it in Step 3. Now, when Alice decrypts the pads later in Step 5, this will cancel out Bob’s encryption, leaving just kk for one of the pads, which Bob knows. For the other pad, it will decrypt into meaningless junk. Now, Bob will be able to predict Alice’s pad for his message, while Alice’s pad for the other message remains undecipherable.

Here is the revised protocol in full:

  1. Alice has messages m0,m1m_0, m_1. Bob wants to receive mbm_b where bb is 0 or 1.
  2. Alice generates two random pads x0,x1x_0, x_1, which are both given to Bob.
  3. Bob generates a random pad kk as well. He encrypts it with Alice’s RSA public key: kek^e.
  4. Bob chooses one of the two pads, xbx_b, and then blinds it with his own encrypted pad: xb+kex_b + k^e. This is returned to Alice.
  5. Even with the private key, Alice can’t tell which xx Bob picked because kk could be anything. Alice subtracts both of her own paddings to get (xb+ke)x0(x_b + k^e) - x_0 and (xb+ke)x1(x_b + k^e) - x_1. One of these will be equal to kek^e, and the other one will be meaningless to Alice (e.g., x1+kex0x_1 + k^e - x_0).
  6. Alice decrypts both pads. One will be equal to kk, and the other one will be meaningless. Alice takes both values and pads both messages with them: m0+(xb+kex0)dm_0 + (x_b + k^e - x_0)^d and m1+(xb+kex1)dm_1 + (x_b + k^e - x_1)^d. One of these will be equal to mb+km_b + k. These are both sent to Bob.
  7. Bob unpads kk from both messages. One of them unpads to mbm_b. Alice’s other message is useless to Bob since he has no way to predict the value of its pad, (xb+kexb)d(x_b + k^e - x_{b'})^d without the private key.

In this new fixed version of the protocol, Bob is able to receive his message securely, while Alice only reveals one of her two messages!

This protocol is computationally secure. It isn’t information-theoretically secure, meaning Bob can use guess and check to try to guess Alice’s message. However, with sufficiently sized messages, this is unlikely to work. There are information-theoretically–secure OT protocols (e.g., cut-and-choose–based ones).

Now, let’s implement it in Python. Since it’s an interactive protocol, we will model the communication between Alice and Bob using coroutines, using yield to pass messages.

def oblivious_transfer_alice(m0, m1, e, d, N, nbits=2048):
x0, x1 = randbits(nbits), randbits(nbits)
v = yield (e, N, x0, x1) # send parameters to bob, wait for his reply
k0 = pow(v - x0, d, N)
k1 = pow(v - x1, d, N)
m0k = (m0 + k0) % N
m1k = (m1 + k1) % N
yield m0k, m1k # reply with the two padded messages

def oblivious_transfer_bob(b, nbits=2048):
assert b in (0, 1)
e, N, x0, x1 = yield # receive params from Alice
k = randbits(nbits)
v = ((x0, x1)[b] + pow(k, e, N)) % N
m0k, m1k = yield v # send (x_b + k^e) to alice, wait for her reply
mb = ((m0k, m1k)[b] - k) % N
yield mb # message received!

Each coroutine simulates one of the parties in the transfer. We can use the coroutines like this:

def oblivious_transfer(alice, bob):
e, N, x0, x1 = next(alice)
next(bob) # run Bob until first yield statement
v = bob.send((e, N, x0, x1))
m0k, m1k = alice.send(v)
mb = bob.send((m0k, m1k))
return mb

Now that we have our basic MPC building block, we can use oblivious transfers to build any MPC we want. That’s exactly what we’re going to do next with garbled circuits.

Garbled Circuits

Instead of trying to explain garbled circuits from the top down, let’s build it from the bottom up. The overall idea is that we use 1–2 oblivious transfer to evaluate individual logic gates in an MPC-like way. Then just draw the rest of the owl. Good job, now you have general-purpose MPC.

Computing a Single Gate

First, let’s try to MPC an AND gate. Our gate involves three wires: the inputs A and B and the output wire. Let’s say Alice knows AA and Bob knows BB and they want to jointly compute OutOut.

AND gate

Here’s how we’ll do this. For all three wires for each boolean value 0 and 1, choose a random kk-bit number where kk is a security parameter. Let’s call these labels. Each wire has two labels.

  • A0=12214430034326763207138511821778...A_{\color{blue}0} = 12214430034326763207138511821778...
  • A1=15587367908968267934296553115483...A_{\color{red}1} = 15587367908968267934296553115483...
  • B0=66560761019031368742675262660503...B_{\color{blue}0} = 66560761019031368742675262660503...
  • B1=B_{\color{red}1} = \cdots
  • Out0=Out_{\color{blue}0} = \cdots
  • Out1=Out_{\color{red}1} = \cdots

Okay. Now let’s look at the truth table:

ABOut
0\color{blue}00\color{blue}00\color{blue}0
0\color{blue}01\color{red}10\color{blue}0
1\color{red}10\color{blue}00\color{blue}0
1\color{red}11\color{red}11\color{red}1

For each row, take the label of the output wire value. Then, using a symmetric cipher, encrypt the label with a key derived from the labels for the row’s input wire values.

Out
EncA0,B0(Out0)Enc_{A_{\color{blue}0},B_{\color{blue}0}}(Out_{\color{blue}0})
EncA0,B1(Out0)Enc_{A_{\color{blue}0},B_{\color{red}1}}(Out_{\color{blue}0})
EncA1,B0(Out0)Enc_{A_{\color{red}1},B_{\color{blue}0}}(Out_{\color{blue}0})
EncA1,B1(Out1)Enc_{A_{\color{red}1},B_{\color{red}1}}(Out_{\color{red}1)}

Also, shuffle the table so you can’t trivially tell which logic inputs each value corresponds to.

Out
EncA0,B1(Out0)Enc_{A_{\color{blue}0},B_{\color{red}1}}(Out_{\color{blue}0})
EncA1,B1(Out1)Enc_{A_{\color{red}1},B_{\color{red}1}}(Out_{\color{red}1})
EncA1,B0(Out0)Enc_{A_{\color{red}1},B_{\color{blue}0}}(Out_{\color{blue}0})
EncA0,B0(Out0)Enc_{A_{\color{blue}0},B_{\color{blue}0}}(Out_{\color{blue}0})

This is the eponymous garbled table for the AND gate.

Here’s where the MPC protocol starts. Alice gives this garbled table to Bob along with the label for her input, either A0A_0 or A1A_1. This leaks no information to Bob, as he doesn’t know what the labels correspond to. To Bob, they’re just random numbers. How can Bob evaluate this garbled table?

Assume for now that Bob magically also has the label for his input, either B0B_0 and B1B_1. With that, he can go through all four entries in the garbled table, performing trial decryptions until one succeeds. Then, he’ll have some output label, either Out0Out_0 or Out1Out_1. As a concrete example, if Alice’s value was 0 and Bob’s value was 1, he’d be able to decrypt EncA0,B1(Out0)Enc_{A_0,B_1}(Out_0) to get Out0Out_0. Bob sends this label back to Alice.

At this point, Alice knows the gate output was 0. Keep in mind she doesn’t know whether Bob had 0 or 1 — she only knows the result. Alice shares this result with Bob as well. If Bob doesn’t want to blindly trust Alice’s result, they can run the whole protocol again with switched roles, with Bob garbling and sending the table.

Overall, the protocol works, except for one snag. Remember how we assumed that Bob magically knew what his input label was? How is he supposed to ask Alice for that without revealing his input? Alice could send both over, but then Bob could use that to extract information from the garbled table. In our example, Bob could now decrypt two rows of the table. Seeing that the values are the same, he can infer that Alice’s value must be zero, as the decrypted labels are the same value (if Alice’s value were 1, the outputs would be different).

To fix this, we use the oblivious transfers that we just developed previously! We will have Bob receive his label from Alice using a 1–2 oblivious transfer. Alice will let Bob choose between B0B_0 and B1B_1; Alice won’t know which one Bob chose, while Bob will only learn his input value and not the other.

As a final note, in this example, Bob (whose input was 1) can still infer that Alice’s input was 0 based on the output. But this is not a weakness of the protocol; it’s due to the computation itself (the AND function). If we had a circuit with more input bits, say a comparator, each party learns relatively little information about the other’s input.

Implementation for a Single Gate

Okay, let’s actually code the thing.

First, we need a symmetric crypto primitive. At the risk of going against the title of the blog post, I’m just going to import AES. There are simpler primitives (fan favorites include TEA and RC4) that would fit nicely in this post, but these algorithms are weak. (Anyways, this isn’t the main focus of the article, so let’s move on.)

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad
from Crypto.Util.number import long_to_bytes, bytes_to_long

def symmetric_enc(key, x):
cipher = AES.new(key, AES.MODE_CTR)
ciphertext = cipher.encrypt(pad(long_to_bytes(x), 16))
nonce = cipher.nonce
return ciphertext, nonce

def symmetric_dec(key, ciphertext, nonce):
cipher = AES.new(key, AES.MODE_CTR, nonce=nonce)
x = bytes_to_long(unpad(cipher.decrypt(ciphertext), 16))
return x

Let’s try it out:

>>> key = b'This is my key__'
>>> ciphertext, nonce = symmetric_enc(key, 12345)
>>> ciphertext.hex(), nonce.hex()
('db6f1eef1b94bbc04016176f8dfbdd4765b7ab61355d087d8e869c15e77b8e29', '7ad8c3a7dff064e2')
>>> symmetric_dec(key, ciphertext, nonce)
12345

Okay!

To combine two labels into a single key, we can use some hash function. I’m going to use Keccak.

from Crypto.Hash import SHA3_256

def combine_keys(keys, k=128):
h = SHA3_256.new()
for ki in keys:
h.update(long_to_bytes(ki))
return h.digest()

Now here is the actual table garbling. First, we need to label the truth table and also keep track of what wires and logic values each label corresponds to.

def label_table(logic_table, output_name, input_names, k=128):
labels = {}
# generate 0 and 1 labels for each var
for var in (output_name, *input_names):
labels[var] = [randbits(k), randbits(k)]

labeled_table = []
# some itertools b.s. to support arbitrary number of input wires
# number of input wires = number of dimensions of the logic table
for inp_values in itertools.product((0,1), repeat=len(input_names)):
output_value = functools.reduce(operator.getitem, inp_values, logic_table)
output_label = labels[output_name][output_value]
input_labels = [labels[input_names[i]][v] for i, v in enumerate(inp_values)]
labeled_table.append((output_label, input_labels))
return labeled_table, labels

Let’s try it out. We’ll set k=5k=5 so the labels are small, for illustrative purposes.

>>> labeled_table, labels = label_table([[0, 0], [0, 1]], "out", ["A", "B"], k=5)
>>> labels
{'out': [30, 27], 'A': [28, 19], 'B': [17, 30]}
>>> labeled_table
[(30, [28, 17]), # 0 = 0 & 0
(30, [28, 30]), # 0 = 0 & 1
(30, [19, 17]), # 0 = 1 & 0
(27, [19, 30])] # 1 = 1 & 1

Okay. Now let’s garble this table:

def garble_table(labeled_table, k=128):
result = []
for row in labeled_table:
output_label, input_labels = row
key = combine_keys(input_labels)
c, nonce = symmetric_enc(key, output_label)
result.append((c, nonce))
Crypto.Random.random.shuffle(result)
return result

Let’s test it out:

>>> garble_table(labeled_table)
[(b'\x02\xd6H\x01\xb2\xc9\xbe*"\x842\xb77\xf5\x81\xd2\xd2\x00\xbb\xb9\xcc\xce5\xe5\x89\x94\xc7"w\x03\x14>', b'\xdc\x0bWT\x8cM\xda\x01'),
(b'\x9f\xd7\xe71\x922V\x92s\x90Z\x81tY\x0f5\xa2K/\xb7\x10\xf5\xbec\xd0\xff&\x05\x82\xc2\x80 ', b'\x95[\xf6Q&kZE'),
(b'\xe7A\x8e\xf2#+\xcd\xc9\xa2\xe8\xb4p\xbdd\x93_\\\xf7U\xdc\x01u\xb6,JW\x18\xa2\xb5~\xe6\xd6', b'\xc1\xde\xca\xc5\xe90tu'),
(b'\xfa\xdc\x9d\xe5&\x81\x03q\xf4\x15\x1c%\xc2t\xb0\xbf\x0b0Vk\xbdNm\xe8CT\xb3\x01\xd7\x82\xe7\x12', b'J,q)B\x88\xe9r')]

The garbled table looks like an encrypted blob, as expected. To evaluate the garbled table:

def eval_garbled_table(garbled_table, inputs):
for row in garbled_table:
ciphertext, nonce = row
try:
key = combine_keys(inputs)
output_label = symmetric_dec(key, ciphertext, nonce)
except ValueError: # decryption failed, incorrect padding
continue
return output_label

>>> a, b = 1, 1
>>> eval_garbled_table(garbled_table, [labels['A'][a], labels['B'][b]])
27
>>> labels['out'][1] # verify this matches the label for out=1
27

It works! Let’s also try it out on a NOT gate. With both AND and NOT, we are functionally complete.

>>> labeled_table, labels = label_table([1, 0], "out", "x", k=5)
>>> labels
{'out': [23, 25], 'x': [31, 24]}
>>> labeled_table
[(25, [31]), (23, [24])]
>>> garbled_table = garble_table(labeled_table)
>>> eval_garbled_table(garbled_table, [labels['x'][0]])
25
>>> labels['out'][1]
25

Nice! However, there is a big problem here with our symmetric crypto. Every once in a while, the decryption succeeds on the wrong row in our garbled table by accident. Due to chance, the decryption had a valid padding, and our garbled table evaluator produces the wrong result. How can we prevent this?

Preventing False Positives on the Trial Decryption

In some sense, our padding is acting like a checksum by making it unlikely for random decryptions to succeed. Here’s an example:

>>> from Crypto.Util.Padding import pad, unpad
>>> pad(b'test', 16)
b'test\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c'
>>> unpad(b'test\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c', 16)
b'test' # <-- successful unpadding
>>> unpad(b'test\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0d', 16)
only one byte changed! ^
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "python3.10/site-packages/Crypto/Util/Padding.py", line 95, in unpad
raise ValueError("PKCS#7 padding is incorrect.")
ValueError: PKCS#7 padding is incorrect.

This is kind of like a form of error detection. Maybe we can strengthen this by using a checksum. We could use a simple one like Fletcher’s checksum or CRC-32. Unfortunately, while this reduces the error rate, we still encounter instances where the checksum is accidentally valid.

Then what about a cryptographic hash function like SHA-2? This will guarantee the absence of cases where the message is accidentally valid. So let’s just tack on a hash at the end, right? Maybe, but in general, it is very easy to get this kind of thing wrong. What we really want is an authenticated mode of encryption. Authenticated encryption protects not just the confidentiality of the plaintext but also the authenticity (i.e., integrity).

We can construct authenticated encryption using say, HMAC paired with our block cipher, but it is easier to reach for GCM. GCM is performant and gives us authenticated encryption with just AES without having to pull in another cryptographic primitive. Modifying our code to use GCM instead of CTR mode is simple. Here is the updated code:

def symmetric_enc(key, x):
cipher = AES.new(key, AES.MODE_GCM)
ciphertext, tag = cipher.encrypt_and_digest(pad(long_to_bytes(x), 16))
nonce = cipher.nonce
return ciphertext, tag, nonce

def symmetric_dec(key, ciphertext, tag, nonce):
cipher = AES.new(key, AES.MODE_GCM, nonce=nonce)
x = bytes_to_long(unpad(cipher.decrypt_and_verify(ciphertext, tag), 16))
return x

Now our code isn’t relying on the padding to detect invalid decryptions (which it isn’t designed to do!). Instead, we are relying on GCM, which is designed specifically to solve this problem. Note that instead of just storing the ciphertext and nonce, we now also need to store a tag. You can think of this tag as like a MAC that acts as an integrity check on the message.

Computing Two Gates

Now we can garble and evaluate one gate. What about two gates in series?

It’s basically the same principle as before. We generate labels for logic values 0 and 1 for all the wires: the input wires A, B, and C; the intermediate wire X; and the output wire Y. Then we garble the two truth tables, one for each gate.

To evaluate the circuit, we evaluate the gates in order, starting from the inputs and working our way forward. So first we would evaluate the garbled table for Gate 1 to get the decrypted label for wire X; then, we can evaluate the garbled table for Gate 2 to get the decrypted label for wire Y.

Supporting N Gates

Now we have the base case, and we have the inductive case. We can make any arbitrary circuits we want, basically.

We will represent our circuit as a directed graph, where the edges are wires and the nodes are gates. This is commonly referred to as a netlist.

The garbled version of our circuit is what Alice transmits to Bob. It’s also a directed graph with the same connectivity as the original circuit, but nodes are garbled tables rather than gates. The network (and its gates) may have any number of input and output wires. Some of the inputs are private to Alice and some are private to Bob. As with the 1-gate example, Alice openly provides the labels for her input wires to Bob. For Bob’s input wires, he receives them from Alice with oblivious transfer.

To evaluate the circuit, Bob proceeds from the input wires and iterates through the gates in topological order. In topological order, when a node is visited, all of its predecessors have already been visited. In our case, this means that by the time we evaluate a garbled table, all of its input wires’ labels are available.

You can find the code implementation of this algorithm here.

Logic Synthesis

This is great, but surely we don’t expect ourselves to manually build all of the computations we want to MPC using gate-level logic, right? Maybe it’s easy to manually write something like an N-bit comparator or full adder, but what about complex calculations? If we wanted to MPC the following — Sign(message, key=Hash(Alice key, Bob key)) (to make a 2-of-2 multi-sig, for example) — surely we shouldn’t have to reimplement the asymmetric crypto and hash functions from just gates, right? Luckily, this is a solved problem. We can write high-level code in an HDL like Verilog and synthesize it down to gate-level logic.

If you’re unfamiliar with hardware design, an HDL is a programming language but for circuit design. Be careful with this analogy, because several familiar programming language concepts don’t carry over to circuits. For example, we don’t really have “loops” in the traditional sense; we can pretend to have a loop by unrolling it a finite number of times. And unlike variables in programming languages, wires should not be assigned (i.e., driven by) multiple things. In some sense, everything is globally value numbered, like in SSA form. If you’ve done symbolic execution or stuff with SMT solvers you’ll be familiar with these differences.

For the actual synthesis, there are popular open-source tools, like Yosys. Yosys also handles technology mapping. The main idea of technology mapping is we tell the synthesizer what gates / IP blocks / cells we have to work with, and the synthesizer will try its best to generate a good design with those building blocks. If you’ve written LLVM backends before, I like to think of this as like LLVM’s pattern matching and lowering.

As our first nontrivial MPC example, let’s say Alice and Bob have two 32-bit numbers x and y and wish to see if they are multiplicative inverses of each other. We can write this logic easily in Verilog:

// circuit.v
module mycircuit(x, y, out);
input wire [31:0] x;
input wire [31:0] y;
output wire out = (x * y) == 1;
endmodule

And we can use the following Yosys script to synthesize it down to gates:

# read design
read_verilog circuit.v

# main logic synthesis step
synth

# technology mapping step. tells yosys what gates our "platform" supports
# we only have AND and NOT gate implemented, so tell abc this
abc -g AND # 'NOT' is always automatically added

# adjust the design so it only uses 1-bit wires
splitnets -ports -format _

# misc cleanup
clean_zerowidth
clean -purge

# save lowered design
write_verilog -simple-lhs -noattr out.v

The generated gate-level design looks like this:

And for fun, if we visualize it with Graphviz, here’s what it looks like. The 32-bit design is way too big to show, so here is a 4-bit version:

Certainly we are glad to have Yosys generate this for us rather than code it by hand. And this is just a simple multiplier. Imagine how complex the design would be for a more complicated computation!

Putting It All Together

At this point, we’re ready to actually do our MPC. Again, in our example, Alice and Bob want to confirm that their two 32-bit numbers, XX and YY, are multiplicative inverses of each other. Alice’s input XX is 1185372425, and Bob’s input YY is 1337. The circuit will be out = (x * y) == 1. Let’s test it out.

We can see all of the steps in action now. First, it creates all of the gates in the circuit. Then, it assigns labels to each wire. Bob receives his input labels using oblivious transfer, then evaluates all of the gates in topological order. Finally, he gives the input label back to Alice, who publishes the result: out=1. Hooray!

Now let’s study some of the ways we can make our implementation more efficient.

Optimization 1: Avoiding Trial Decryptions

For each gate, we have to do a worst-case of four trial decryptions, or 2.5 decryptions in the average case, before finding the correct entry in the garbled table. This is inefficient, and we can do better.

We can avoid trial decryptions by encoding information in each row as to what input keys it corresponds to. This method is described using fancy math equations in Section 5.1 of this paper, but I’ll outline it here below in plain English.

First, recall that the each ciphertext is encrypted with a joint key derived from two input labels, one label for each wire. For each input wire, there are two labels: one for logic 0 and one for logic 1. WLOG, we can generate the two labels to always have different least significant bits (LSBs). This doesn’t leak information about the actual logic value because the label for logic 0 may have either LSB 1 or 0; the labels are random after all. Then, for each table entry, we append the LSBs of both labels to the ciphertext:

Out
EncA0,B1(Out0)LSB(A0)LSB(B1)Enc_{A_{\color{blue}0},B_{\color{red}1}}(Out_{\color{blue}0}) \| LSB(A_0) \| LSB(B_1)
EncA1,B1(Out1)LSB(A1)LSB(B1)Enc_{A_{\color{red}}1,B_{\color{red}1}}(Out_{\color{red}1}) \| LSB(A_1) \| LSB(B_1)
EncA1,B0(Out0)LSB(A1)LSB(B0)Enc_{A_{\color{red}}1,B_{\color{blue}0}}(Out_{\color{blue}0}) \| LSB(A_1) \| LSB(B_0)
EncA0,B0(Out0)LSB(A0)LSB(B0)Enc_{A_{\color{blue}0},B_{\color{blue}0}}(Out_{\color{blue}0}) \| LSB(A_0) \| LSB(B_0)

Then, we can look for the row that has the matching LSBs for our input labels.

Even better, we can sort the table by the two LSBs (this is secure because the LSBs are random) for O(1)O(1) indexing as well. And at that point, we can also remove the appended LSBs, since the information is implicitly encoded in the permutation of the rows. With this optimization, we could elect to not use authenticated encryption either, since we should always have the right keys for the corresponding ciphertexts.

Optimization 2: Supporting More Gates

Our MPC implementation works, but the synthesized circuit is huge. To make our implementation more efficient, we’ll add support for some more gates beyond AND and NOT. Here are the gates we currently support:

def truth_table(gate):
if gate == 'and':
return [[0, 0], [0, 1]]
elif gate == 'not':
return [1, 0]
elif gate == 'const_0':
return 0
elif gate == 'const_1':
return 1
else:
raise ValueError('unsupported gate', gate)

The design that Yosys outputs is very big, because it has to synthesize everything out of just AND and NOT gates. But in terms of our MPC, our garbled circuit is very big and not very fast to evaluate — the depth of the circuit is very big. If we had more gate types to work with, Yosys would be able to synthesize a smaller circuit.

Technically, all we really need is NAND. In real life, there are some physical trade-offs between more complex cells versus cell size. If the cells are simpler, you can fit more into a given floor plan or die. The same principle applies to our MPC, too. For us, we can synthesize smaller circuits with fewer gates (depth) if we use more complicated truth tables. This affects the evaluation complexity. However, the number of inputs to a gate (width) affects the number of entries in the garbled table, which affects the communication complexity of our protocol.

More abstractly, we’re looking at a trade-off between circuit width and depth. We could encode our entire calculation in a giant truth table, but table size is exponential with width. We trade off some width by using several smaller gates — more depth but less width. This is the reason garbled circuits are so important over just oblivious transfer.

Anyways, we are already using AND gates, which have width 2 (= four truth table entries) and NOT gates, which have width 1 (= two truth table entries). When Yosys synthesizes NOT(AND(...)), this is really six table entries. A peephole optimization is to also add a NAND gate, which constant folds this outer NOT gate. But we can go further by supporting the common two-input gates. Let’s add support for the set of simple gates AND, NAND, OR, NOR, XOR, XNOR, ANDNOT, and ORNOT by hardcoding these truth tables in as well.

    # [...]
elif gate == 'nand':
return [[1, 1], [1, 0]]
elif gate == 'xor':
return [[0, 1], [1, 0]]
elif gate == 'nor':
return [[1, 0], [0, 0]]
# [[... you get the idea]]

And we update the abc call in our Yosys script:

# now we have more gates! please use them
# 'gates' is shorthand for AND,NAND,OR,NOR,XOR,XNOR,ANDNOT,ORNOT
abc -g gates

Now, if we look at the synthesized design, it’s remarkably smaller. Before the design had over 8,000 internal wires, whereas the new one only has 3,000. Not just that, if we calculate the total number of table entries in our garbled circuit, it’s also lower. The new design has 12,056 entries total, whereas the old one had 24,842. Adding support for these extra gates reduced the number of table entries by 50%!

Lastly, for fun, here’s the 4-bit design visualized—but using the new gate types.

Lookup Tables

We can do even better, however. ABC, the technology-mapping pass in Yosys, supports synthesizing directly to lookup tables (LUTs). Beyond thinking about gates, this is what we really want, because our garbled circuits are really just a network of truth tables.

We can specify how large we want our LUTs to be (i.e., the number of inputs). There is a width–depth trade-off here as bigger LUTs result in less circuit depth but more table entries. The number of entries in a nn-input lookup table is 2n2^n, so we expect the optimal nn to be small. Meanwhile, ABC mainly optimizes for circuit depth, at the expense of area.

nn# LUT2# LUT3# LUT4# LUT5# LUT6# LUT totalTotal LUT EntriesCircuit Depth
216072098642844
370994216381037226
46539204613761070819
533036010648912852154416
654825666804719503611612

It’s not surprising that we really feel the exponential cost of larger LUTs, so let’s stick to smaller LUTs. To do LUT synthesis, we can modify the abc call in our Yosys script like so:

abc -lut 4

The synthesized code looks like this:

  // [...]
assign _0735_ = 16'h7888 >> { x_14, y_1, x_15, y_0 }; // 4-way LUT
assign _0736_ = 8'h78 >> { _0737_, x_10, y_5 }; // 3-way LUT
assign _0737_ = 16'h7888 >> { x_11, y_4, x_12, y_3 };
assign _0738_ = 16'hb24d >> { _0739_, _0684_, _0677_, _0676_ };
// [...]

Further Optimizations

There are lots of ways to improve even further on our garbled circuits implementation. As further reading, there’s FreeXOR, which has zero-cost XOR gates; half-gates, which use two AESs per gate instead of four; and slicing and dicing, which uses 1.5 AES-block-lengths per gate. We won’t go into the details of these papers, but we recommend looking into them if you’re curious on how garbled circuits can be made more efficient for real-world use cases.

Discussion

Before we wrap up, let’s discuss a few things that are worth bringing up.

Combinatorial Logic

At this point, it’s worth pointing out that we have only combinatorial logic, not sequential. That means we have no latches or flip-flops. Because we lack sequential logic, we have no registers, and thus we can’t build anything stateful. CPUs are really just state machines. If we wanted to build a general-purpose MPC “CPU”, we would need to adapt our cryptographic primitives. We can simulate CPU cycles by unrolling the loop and duplicating the CPU design once per cycle, but this is very inefficient. One optimization here to reduce the duplication in the circuit is by exporting and reinputting any state as wires and running the garbled circuit multiple times (i.e., one evaluation per cycle).

Semi-Honest vs. Malicious Model

Our MPC implementation operates under the assumption that the participants are semi-honest. This means that the participants try everything they can to glean others’ private information, but they do not deviate from the protocol. This means, for example, that they do provide false or malicious inputs or responses into the protocol.

A more realistic model is the malicious model. In this model, an adversary corrupts and controls multiple parties in the computation and can follow any polynomial-time strategy to try to attack the protocol. We won’t cover this in this blog post, but garbled circuits can be extended to work under both regimes. For example, cut-and-choose techniques have the garbler commit to a set of circuits, of which the evaluator first opens a subset to check that they compute the correct function before using the unopened circuits for the remainder of the protocol.

Further Reading

Check out some of these resources:

Conclusion

In this post, we built a usable implementation of MPC in the form of Yao’s Garbled Circuit protocol. We did so from scratch, building up all of the necessary primitives along the way. First, we built up prime generation and RSA, and from that asymmetric primitive we built up 1–2 oblivious transfer. Using oblivious transfer as an MPC building block, we then developed garbled circuits. We explained how garbling and evaluating a single truth table works, before expanding that into a full circuit. Lastly, we reviewed how we can use logic synthesis and tools from hardware design for our MPC.

You can find the full implementation code for this project on GitHub. We encourage you to check it out!

Acknowledgments

I would like to thank Catherine (whitequark) for helpful feedback and advice about Yosys and the sections regarding hardware and logic synthesis.

I would also like to thank Avi Weinstock and Mohit Sharma from the Zellic cryptography team for useful feedback on the details of garbled circuits.

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.