How to Fix: Cranelift: GVN depends on value definition order matching visit order
Cranelift GVN Bug: Why Value Numbering Breaks When CLIF Definitions Are Out of Order
Global Value Numbering (GVN) in Cranelift can silently miss equivalent instructions when a .clif test defines values in a numeric order that does not match the traversal order of the IR. The result is subtle: two identical expressions should collapse to one, but the egraph GVN map fails to treat them as equal because it assumes a definition-order property that the input does not guarantee.
Table of Contents
This issue matters because optimization correctness should never depend on incidental value numbering. If an optimization pass behaves differently based on the textual numbering of SSA values, it introduces nondeterminism into testing, makes debugging harder, and can block expected deduplication opportunities in Cranelift’s optimizer.
Understanding the Root Cause
At the core of the bug is a mismatch between two concepts:
- Value definition order: the numeric order in which CLIF values appear, such as
v10,v2,v7. - Visit order: the order in which Cranelift’s optimization pass walks instructions and records nodes in the egraph.
The problematic assumption is that these orders are aligned. When GVN builds or queries its map, it may rely on canonicalized operands or previously seen value representatives already being available. That only works if all referenced value identities were inserted in a consistent order before equivalent instructions are compared.
When a .clif file defines values out of increasing numeric order, this assumption breaks:
- An instruction is visited.
- Its operands are converted into keys used by the GVN map.
- The optimizer expects equivalent operand representatives to already be normalized and registered.
- Because value numbering and traversal order differ, the lookup key may be built from unstable or noncanonical representatives.
- A later equivalent instruction produces a different key shape, so the map does not detect the match.
In practical terms, the bug is not that the instructions are actually different. The bug is that the GVN key construction is tied too closely to value IDs or to an insertion sequence that depends on those IDs appearing in increasing order.
A simplified example looks like this:
block0(v3: i64, v1: i64):
v5 = iadd v3, v1
v2 = iadd v3, v1
return v5
If the pass internally expects earlier-numbered values to correspond to earlier-visited definitions, then v5 and v2 may not normalize identically at the moment the lookup happens, even though they represent the same operation with the same operands.
The correct design is to make GVN depend on a stable semantic representation of an instruction, not on textual or incidental ordering of SSA value numbers.
Step-by-Step Solution
The fix is to ensure the egraph/GVN lookup key is derived from canonical value representatives that are valid regardless of original CLIF numbering. In other words, make equality depend on instruction semantics and current union-find or eclass state, not on the order values were declared.
1. Locate the GVN key construction logic
Find the code in Cranelift’s optimizer where an instruction is transformed into a key for insertion or lookup in the GVN map. This is usually where opcode, type, immediates, and operands are packed into a comparable structure.
You are looking for logic that behaves like this conceptually:
let key = ExprKey {
opcode: inst_data.opcode(),
args: inst_args.map(|v| value_to_rep(v)),
ty: ctrl_typevar(inst),
imm: extract_immediate(inst),
};
The danger zone is the value_to_rep(v) part. If that function uses raw value numbers or an order-sensitive mapping, the bug can occur.
2. Replace raw value identity with canonical representatives
Before constructing the GVN key, convert every operand into its current canonical eclass representative or equivalent normalized form.
fn canonicalize_value(ctx: &mut Ctx, val: Value) -> CanonVal {
let root = ctx.eclasses.find(val);
CanonVal::from(root)
}
fn build_gvn_key(ctx: &mut Ctx, inst: Inst) -> GvnKey {
let data = ctx.dfg.insts[inst].clone();
let args = ctx.dfg.inst_args(inst)
.iter()
.map(|&arg| canonicalize_value(ctx, arg))
.collect();
GvnKey {
opcode: data.opcode(),
ty: ctx.dfg.ctrl_typevar(inst),
args,
imm: extract_immediate(&data),
}
}
This change ensures that two semantically identical instructions generate the same key even if their original value IDs were introduced in an unusual order.
3. Avoid using insertion order as a proxy for canonicality
If the implementation implicitly assumes that earlier definitions are always canonical, remove that assumption. A robust optimizer should explicitly ask the union-find, egraph, or representative table for the current canonical identity.
// Fragile pattern
let rep = value_number[val];
// Correct pattern
let rep = egraph.find(val);
If no explicit representative structure exists, introduce one. Depending on sorted value numbers is exactly what makes this bug surface.
4. Canonicalize commutative instructions consistently
For instructions like iadd, imul, or bitwise operations, operand order can also prevent matches if not normalized. Once values are canonicalized, sort or reorder operands for commutative opcodes.
fn normalize_args_for_opcode(opcode: Opcode, args: &mut [CanonVal]) {
if opcode.is_commutative() && args.len() == 2 && args[0] > args[1] {
args.swap(0, 1);
}
}
This is not the primary bug described in the issue, but it commonly appears in the same area and should be verified while fixing GVN key stability.
5. Add a regression test using out-of-order value definitions
Create or extend a .clif test that intentionally defines values in non-increasing numeric order and expects identical expressions to be unified.
test optimize
function %gvn_order_bug(i64, i64) -> i64 {
block0(v3: i64, v1: i64):
v5 = iadd v3, v1
v2 = iadd v3, v1
return v5
}
The exact expected output depends on the optimization pipeline, but the important assertion is that the second identical instruction is recognized as redundant or mapped to the same value class.
6. Verify with optimization tests
Run the relevant Cranelift test suite and confirm that:
- Equivalent instructions are merged regardless of CLIF value numbering.
- No existing tests relying on current egraph behavior regress.
- Canonicalization still preserves type correctness and instruction semantics.
cargo test -p cranelift-codegen
cargo test -p cranelift-filetests
If your local workflow uses filetests directly, run the optimizer-focused cases that exercise egraph and GVN behavior.
7. Audit nearby code for similar ordering assumptions
Once this bug is fixed, scan adjacent optimization logic for similar mistakes. Ordering bugs often appear in:
- CSE key generation
- eclass representative selection
- instruction memoization caches
- phi/block parameter normalization
- constant folding maps keyed by noncanonical values
If one subsystem assumed numeric value order matched traversal order, another may too.
Common Edge Cases
1. Commutative vs non-commutative instructions
Canonicalizing operands is safe for commutative operations, but not for operations like subtraction, division, shifts with direction-sensitive semantics, memory loads, or comparisons where operand order matters. Do not sort operands indiscriminately.
2. Instructions with immediates or type variables
Two instructions are not equivalent if they differ in immediate values or controlling types. A stable GVN key must include:
- Opcode
- Result type or controlling type variable
- Canonical operands
- Immediates, flags, and relevant metadata
If the key omits type or flags, you can create false-positive matches while fixing the original bug.
3. Block parameters and phi-like merges
Values introduced as block parameters may have numbering patterns that differ even more dramatically from instruction traversal order. Make sure representative lookup works equally well for block params, not only for normal instruction results.
4. Multi-result instructions
Some IR operations produce more than one result. If your keying or representative logic assumes one instruction maps to exactly one value, equivalent multi-result instructions may still fail to unify correctly.
5. Egraph mutations during traversal
If the optimizer mutates equivalence classes while iterating through instructions, a key built too early may become stale. Build the key using the most current canonical representative at lookup time, not a cached earlier snapshot unless the cache is explicitly invalidated.
6. Determinism in tests
Once the ordering dependency is removed, some golden test outputs may change because the optimizer now consistently finds equivalences it previously missed. That is a healthy correction, but test expectations must be updated carefully.
FAQ
Why does this bug only appear with some .clif files?
Because many test cases accidentally use value numbering that matches instruction visitation closely enough to hide the flaw. The bug becomes visible when SSA value IDs are defined out of order, exposing the incorrect assumption inside GVN key generation.
Is this a parser bug or an optimizer bug?
It is an optimizer bug. The parser is allowed to accept valid CLIF where value numbers are not introduced in increasing order. The optimization pass must not depend on textual numbering conventions for correctness.
Would sorting all values before visiting instructions fix it?
No. That would treat a symptom, not the root cause. GVN should compare instructions using canonical semantic representatives, not a reordered traversal that tries to force value IDs into a convenient shape.
Fixing this issue properly makes Cranelift more robust, more deterministic, and more faithful to SSA semantics. The key principle is simple: instruction equivalence must be based on normalized meaning, not on incidental CLIF numbering.