This is the Move tutorial for real programmers. Everybody knows that real programmers write only in assembly. In this case, Move bytecode.
Jokes aside, it’s useful for a smart contract auditor to be familiar with the nitty gritty of the Move binary format and Move assembly owing to the recent popularity of Move-based chains like Sui and Aptos. In this blog, we will introduce the Move binary format and Move assembly, along with a tool that makes writing Move assembly easier.
In the Move language, there are two main types of programs, modules, and scripts. Modules are permanently deployed programs meant to run on the chain, and scripts are temporary programs meant to be ran by a user from off-chain. Potentially, Move-based chains can also have their own custom types. For the most part, this discussion will apply to both. However, there are some type-specific details. We’ll talk about the most general type of program, the module.
Move Binary Format
The Move binary format is a very simple format. It starts with a magic and a version; then a list of table handles, which contains the location and length of the tables in the file; finally followed by the tables themselves, along with a self-index (explained later). Think of tables like a span of raw binary data, not unlike ELF section headers. Although one can easily write a custom tool to parse it, it is much easier to use Move’s crate for this purpose↗.
The tables contain lists of entries that could refer to elements from other tables by a (usually 16-bit) index. At deserialization time, the tables are checked to cover every byte of the table area in the file exactly once. Every byte in the part of the file that contains tables must belong to exactly one table, and all tables must be contiguous. For modules, the index of the module handle(all modules referenced in this file has a module handle) of the module described in this file (the self-index) follows the tables in the file. For such a value, it would make sense to put it at the front of the file, in the file header. It’s at the end of the file. Even the Move developers think it’s unusual.
// load module idx (self id) - at the end of the binary. Why?
The relations between the tables and the purposes of the tables are very complex (and tedious) so we won’t be describing them fully here. See the source code↗ for a full list and some in-line documentation. Between the different Move versions used by the Move-based chains, the binary format is fairly well-preserved.
Move Machine Model
The Move virtual machine is a stack-based machine. Unlike a traditional machine architecture, the Move virtual machine does not operate at the byte level but rather at the object level. Machine-level objects are the smallest unit of data that can be stored. Objects include both complex data types like structs and also primitive data types like u256
.
Upon entry into a function, there are a number of registers (“locals”) available and a finite stack (size configured by parent chain). All instructions requiring operands require the operands to be placed on the stack first. The stack can contain any data type, but registers are typed, which means they can only contain objects of that type. Each function declares the number and types of registers available. When entering the function, the first x registers contain the x arguments to the function. Consequently, the types of the first x registers are defined by the parameter types of the function.
Instruction | Stack before | Stack after |
---|---|---|
Call | arg1, arg2, arg3, …, argn | return_value |
Pop | val | |
LdTrue | True | |
Note: the top of stack is on the right. |
Move bytecode is high-level and Move functions are not monomorphized. Move functions take type arguments in addition to regular arguments. When using type arguments, one must make sure to check that the type arguments fit the invariants of the function.
As for Move assembly, the source code↗ is currently the best source of documentation.
Move Verifier
One of the primary motivations for working at this level is to audit the verifier. The compiler does not produce code that breaks the verifier, barring compiler bugs. That aside, it is also necessary to understand the verifier in order to write assembly code that can run on a Move chain. Unlike most assembly languages, Move bytecode has a very strict set of rules that must be followed in order to pass the verifier. We will explain these rules in detail here.
At this point, it’s necessary to clarify that we are working with core Move. However, the verifier is mostly unchanged between different Move chains. There are quite a few functions that serve as entry points into the Move verifier. The caller can pass in arguments that determine what limits are put in place. Different chains tend to use different entry points and options. As of the time of this writing, all such options in core Move are numerical limits and only serve to prevent resource-exhaustion attacks. For the most part, it’s very hard to unintentionally reach one of these limits, so we will ignore them.
The smallest unit the verifier works with is the basic block. A basic block is the basic unit of Move assembly code. It is a contiguous section of Move bytecode that does not include any jumps or destinations for jumps. However, a call instruction is not considered a jump, despite the transfer of control.
The major requirements of the verifier are as follows:
- Generally, any information stored in the module itself must not be duplicated. For example, this means identifiers are unique, functions with the same name are only defined once, and so on.
- Anything that is referenced must exist.
- Recursion (e.g., in data types) is generally not allowed. Recursive functions are explicitly allowed. Recursive types must instead use indirect references, for example an index in another data structure(e.g. table) to where the actual data is kept.
- All operations on data must follow type rules. The operands must be of the correct type and have the right abilities.
- At the end of each basic block, the stack must be empty. This means that any items pushed to the stack must be consumed before the end of the basic block. The stack is always empty at the start of each basic block.
- The control flow graph of basic blocks must be reducible↗. Simplified, this means every loop must have exactly one entry. All edges into that loop must go to the entry. Loops can be nested still, however. This can also be regarded as a no-gotos rule.
- References must follow the reference rules (explained below).
- Any global resources↗ used must be properly acquired.
- Any local variables referenced must be guaranteed to be set at the time of reference. This means that it’s possible to put the code into SSA form↗. For example, it’s possible to have two branches that set a local variable to different values, but all references to the local variable must be preceded by setting the variable in all code paths.
- Any values left in a local variable at the end of a function must have drop.
The Reference Rules
The real reference rules are extremely complex and would involve an extensive discussion of the borrow graph. We present a simplified version here. For the full reference rules, see the source code↗.
- References may be duplicated, but the oldest copy of the reference is considered active. If references are destroyed, the active reference may change.
- Borrows to fields are considered children of the borrow they are created from.
- When a reference is destroyed, the next most active borrow pointing to the same object inherits its chlidren.
- Any mutable references passed into a function must be active and have no children. Immutable references can be passed into a function at any time.
Movetool
Knowing all of this, to write Move assembly, it is still necessary to manually type in the Move binary format into Rust code and invoke the serializer. To make this process easier, we created movetool↗. This tool allows decompiling of Move binary modules into a custom assembly format and to assemble the source back into a binary module. It is currently a little rough on the edges, but it gets the job done.
One of the primary motivations of writing Move assembly is to audit the Move verifier. For this purpose, we have also included a way to directly invoke the verifier and print the result in the tool.
Example: Creating a Binary Module With No-Effect Code
The Move compiler does not typically generate code with no effect. The module we create will be impossible to create with the Move compiler. We will write a simple function that adds two numbers returning the result and also calculates the XOR of these two numbers but discards this result.
Move modules involve a massive amount of boilerplate, so we will start by creating a new Move project and modifying it.
move new addxor
We write the following code in sources/addxor.move
.
module addxor::addxor {
public fun add(a: u64, b: u64): u64 {
a + b
}
}
Then we must set the address of addxor
in Move.toml
as follows for the project to build successfully.
[addresses]
addxor = "0x1337"
Then we build and disassemble it.
move build
movetool dis < build/addxor/bytecode_modules/addxor.mv > addxor.mvasm
We obtain the following output in addxor.mvasm
.
.type module
.version 6
.self_module_handle_idx 0
.table module_handles
; address_idx identifier_idx
0 1
.endtable
.table struct_handles
; abiltiies,cdsk module_idx identifier_idx
.endtable
.table function_handles
; module_idx identifier_idx parameters_sig_idx return_sig_idx type_parameters...,cdsk
0 0 0 1
.endtable
.table field_handles
; struct_def_idx member_count
.endtable
.table friend_decls
; address_idx identifier_idx
.endtable
.table struct_def_instantiations
; struct_def_idx type_params_signature_idx
.endtable
.table function_instantiations
; function_handle_idx type_params_signature_idx
.endtable
.table field_instantiations
; field_handle_idx type_params_signature_idx
.endtable
.table signatures
; arrays of types, for specifics see source code
[u64, u64]
[u64]
[]
.endtable
.table identifiers
; literal string identifiers
add
addxor
.endtable
.table address_identifiers
; addresses in hex
00000000000000000000000000000000
00000000000000000000000000000000
.endtable
.table constant_pool
; type encoded_value_in_hex
.endtable
.table metadata
; key value
.endtable
.table struct_defs
; structs can be native or declared
.endtable
.table function_defs
; function_handle visibility_public_private_friend is_entry
.func 0 public false
; indices of struct definitions acquired by this function
.acquires
.locals 2
moveloc 0
moveloc 1
add
ret
.endfunc
.endtable
The disassembled module mostly follows the layout of the binary module. The table we are interested in is the function definitions table, which contains the actual bytecode. We can see from there that the values from the arguments are moved onto the stack and added. The return value is at the top of the stack when the return instruction runs.
We will now add the code to calculate the XOR and discard the result. To do this, we have to copy the parameters instead, because the parameters must still be available for the addition calculation. The following code is added after the .locals 2
line.
copyloc 0
copyloc 1
xor
pop
We then assemble the code and verify it.
movetool asm < addxor.mvasm > addxor.mv
movetool verify < addxor.mv
Next, we’ll deploy this to the Move sandbox. Let’s first set up the Move sandbox by deploying the original contract.
move sandbox publish
The Move sandbox deploys its contracts into the “blockchain” in “storage/”, and addxor.mv
is stored in storage/0x00000000000000000000000000001337/modules/addxor.mv
. We can replace this contract with our binary module by copying it into place.
cp addxor.mv storage/0x00000000000000000000000000001337/modules/addxor.mv
We can write a script like the following and place it in sources/calladdxor.move
to test the program. For some reason, std::debug
does not work in the Move sandbox, so we’ll use an abort to check the return value.
script {
use addxor::addxor;
fun main() {
let x = addxor::add(1, 2);
assert!(x != 3, 1);
}
}
We run it like following and see that it indeed aborts.
move sandbox run sources/calladdxor.move
Execution aborted with code 1 in transaction script
Example: Adding a Logic Bomb Backdoor
Now let’s add a backdoor to the addxor
module, which aborts if the first parameter is 1337. We change the code of the function as follows.
copyloc 0
copyloc 1
xor
pop
copyloc 0
ld64 1337
eq
br_true 12
moveloc 0
moveloc 1
add
ret
ld64 1338
abort
We compile it and load it into the Move sandbox like before.
movetool asm < addxor.mvasm > addxor.mv
movetool verify < addxor.mv
cp addxor.mv storage/0x00000000000000000000000000001337/modules/addxor.mv
Then, we change the script to trip the backdoor by changing the first parameter to 1337. Running the script gives the following result.
Execution aborted with code 1338 in module 00000000000000000000000000001337::addxor.
Evidently, the backdoor we added using Move assembly was triggered.
Moving On
The threat landscape for Move is constantly evolving. More Move-based chains are being released, each with their own set of natives and potentially new VM features. We’ve provided the intro to Move’s binary format, machine model, and verifier, along with a tool to get you started — but there’s much to research and explore. It’s time for you to make the next move!
About Us
Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.
Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.
Contact us↗ for an audit that’s better than the rest. Real audits, not rubber stamps.