How to Fix: Cranelift: GVN depends on value definition order matching visit order

Updated June 9, 2026 7 min read

Aldawsari

7 min read

Cranelift GVN Bug: Why Value Numbering Breaks When CLIF Definitions Are Out of Order

Global Value Numbering (GVN) in Cranelift can silently miss equivalent instructions when a .clif test defines values in a numeric order that does not match the traversal order of the IR. The result is subtle: two identical expressions should collapse to one, but the egraph GVN map fails to treat them as equal because it assumes a definition-order property that the input does not guarantee.

Table of Contents

Understanding the Root Cause
Step-by-Step Solution
Common Edge Cases
FAQ

This issue matters because optimization correctness should never depend on incidental value numbering. If an optimization pass behaves differently based on the textual numbering of SSA values, it introduces nondeterminism into testing, makes debugging harder, and can block expected deduplication opportunities in Cranelift’s optimizer.

Understanding the Root Cause

At the core of the bug is a mismatch between two concepts:

Value definition order: the numeric order in which CLIF values appear, such as v10, v2, v7.
Visit order: the order in which Cranelift’s optimization pass walks instructions and records nodes in the egraph.

The problematic assumption is that these orders are aligned. When GVN builds or queries its map, it may rely on canonicalized operands or previously seen value representatives already being available. That only works if all referenced value identities were inserted in a consistent order before equivalent instructions are compared.

When a .clif file defines values out of increasing numeric order, this assumption breaks:

An instruction is visited.
Its operands are converted into keys used by the GVN map.
The optimizer expects equivalent operand representatives to already be normalized and registered.
Because value numbering and traversal order differ, the lookup key may be built from unstable or noncanonical representatives.
A later equivalent instruction produces a different key shape, so the map does not detect the match.

In practical terms, the bug is not that the instructions are actually different. The bug is that the GVN key construction is tied too closely to value IDs or to an insertion sequence that depends on those IDs appearing in increasing order.

A simplified example looks like this:

block0(v3: i64, v1: i64):
    v5 = iadd v3, v1
    v2 = iadd v3, v1
    return v5

If the pass internally expects earlier-numbered values to correspond to earlier-visited definitions, then v5 and v2 may not normalize identically at the moment the lookup happens, even though they represent the same operation with the same operands.

The correct design is to make GVN depend on a stable semantic representation of an instruction, not on textual or incidental ordering of SSA value numbers.

Step-by-Step Solution

The fix is to ensure the egraph/GVN lookup key is derived from canonical value representatives that are valid regardless of original CLIF numbering. In other words, make equality depend on instruction semantics and current union-find or eclass state, not on the order values were declared.

1. Locate the GVN key construction logic

Find the code in Cranelift’s optimizer where an instruction is transformed into a key for insertion or lookup in the GVN map. This is usually where opcode, type, immediates, and operands are packed into a comparable structure.

You are looking for logic that behaves like this conceptually:

let key = ExprKey {
    opcode: inst_data.opcode(),
    args: inst_args.map(|v| value_to_rep(v)),
    ty: ctrl_typevar(inst),
    imm: extract_immediate(inst),
};

The danger zone is the value_to_rep(v) part. If that function uses raw value numbers or an order-sensitive mapping, the bug can occur.

2. Replace raw value identity with canonical representatives

Before constructing the GVN key, convert every operand into its current canonical eclass representative or equivalent normalized form.

fn canonicalize_value(ctx: &mut Ctx, val: Value) -> CanonVal {
    let root = ctx.eclasses.find(val);
    CanonVal::from(root)
}

fn build_gvn_key(ctx: &mut Ctx, inst: Inst) -> GvnKey {
    let data = ctx.dfg.insts[inst].clone();
    let args = ctx.dfg.inst_args(inst)
        .iter()
        .map(|&arg| canonicalize_value(ctx, arg))
        .collect();

    GvnKey {
        opcode: data.opcode(),
        ty: ctx.dfg.ctrl_typevar(inst),
        args,
        imm: extract_immediate(&data),
    }
}

This change ensures that two semantically identical instructions generate the same key even if their original value IDs were introduced in an unusual order.

3. Avoid using insertion order as a proxy for canonicality

If the implementation implicitly assumes that earlier definitions are always canonical, remove that assumption. A robust optimizer should explicitly ask the union-find, egraph, or representative table for the current canonical identity.

// Fragile pattern
let rep = value_number[val];

// Correct pattern
let rep = egraph.find(val);

If no explicit representative structure exists, introduce one. Depending on sorted value numbers is exactly what makes this bug surface.

4. Canonicalize commutative instructions consistently

For instructions like iadd, imul, or bitwise operations, operand order can also prevent matches if not normalized. Once values are canonicalized, sort or reorder operands for commutative opcodes.

fn normalize_args_for_opcode(opcode: Opcode, args: &mut [CanonVal]) {
    if opcode.is_commutative() && args.len() == 2 && args[0] > args[1] {
        args.swap(0, 1);
    }
}

This is not the primary bug described in the issue, but it commonly appears in the same area and should be verified while fixing GVN key stability.

5. Add a regression test using out-of-order value definitions

Create or extend a .clif test that intentionally defines values in non-increasing numeric order and expects identical expressions to be unified.

test optimize

function %gvn_order_bug(i64, i64) -> i64 {
block0(v3: i64, v1: i64):
    v5 = iadd v3, v1
    v2 = iadd v3, v1
    return v5
}

The exact expected output depends on the optimization pipeline, but the important assertion is that the second identical instruction is recognized as redundant or mapped to the same value class.

6. Verify with optimization tests

Run the relevant Cranelift test suite and confirm that:

Equivalent instructions are merged regardless of CLIF value numbering.
No existing tests relying on current egraph behavior regress.
Canonicalization still preserves type correctness and instruction semantics.

cargo test -p cranelift-codegen
cargo test -p cranelift-filetests

If your local workflow uses filetests directly, run the optimizer-focused cases that exercise egraph and GVN behavior.

7. Audit nearby code for similar ordering assumptions

Once this bug is fixed, scan adjacent optimization logic for similar mistakes. Ordering bugs often appear in:

CSE key generation
eclass representative selection
instruction memoization caches
phi/block parameter normalization
constant folding maps keyed by noncanonical values

If one subsystem assumed numeric value order matched traversal order, another may too.

Common Edge Cases

1. Commutative vs non-commutative instructions

Canonicalizing operands is safe for commutative operations, but not for operations like subtraction, division, shifts with direction-sensitive semantics, memory loads, or comparisons where operand order matters. Do not sort operands indiscriminately.

2. Instructions with immediates or type variables

Two instructions are not equivalent if they differ in immediate values or controlling types. A stable GVN key must include:

Opcode
Result type or controlling type variable
Canonical operands
Immediates, flags, and relevant metadata

If the key omits type or flags, you can create false-positive matches while fixing the original bug.

3. Block parameters and phi-like merges

Values introduced as block parameters may have numbering patterns that differ even more dramatically from instruction traversal order. Make sure representative lookup works equally well for block params, not only for normal instruction results.

4. Multi-result instructions

Some IR operations produce more than one result. If your keying or representative logic assumes one instruction maps to exactly one value, equivalent multi-result instructions may still fail to unify correctly.

5. Egraph mutations during traversal

If the optimizer mutates equivalence classes while iterating through instructions, a key built too early may become stale. Build the key using the most current canonical representative at lookup time, not a cached earlier snapshot unless the cache is explicitly invalidated.

6. Determinism in tests

Once the ordering dependency is removed, some golden test outputs may change because the optimizer now consistently finds equivalences it previously missed. That is a healthy correction, but test expectations must be updated carefully.

FAQ

Why does this bug only appear with some .clif files?

Because many test cases accidentally use value numbering that matches instruction visitation closely enough to hide the flaw. The bug becomes visible when SSA value IDs are defined out of order, exposing the incorrect assumption inside GVN key generation.

Is this a parser bug or an optimizer bug?

It is an optimizer bug. The parser is allowed to accept valid CLIF where value numbers are not introduced in increasing order. The optimization pass must not depend on textual numbering conventions for correctness.

Would sorting all values before visiting instructions fix it?

No. That would treat a symptom, not the root cause. GVN should compare instructions using canonical semantic representatives, not a reordered traversal that tries to force value IDs into a convenient shape.

Fixing this issue properly makes Cranelift more robust, more deterministic, and more faithful to SSA semantics. The key principle is simple: instruction equivalence must be based on normalized meaning, not on incidental CLIF numbering.