How to Fix: Cranelift: Fuzz failure with egraphs on AArch64

Updated June 10, 2026 7 min read

Aldawsari

7 min read

Cranelift fuzz failure with egraphs on AArch64: root cause, fix strategy, and validation workflow

This failure is a classic case of a target-specific optimization mismatch: the same .clif test passes on x86 but fails on AArch64 when egraphs are enabled, which strongly suggests the optimizer is introducing a rewrite that is valid in one lowering model but incorrect, incomplete, or inconsistently legalized on another.

Table of Contents

Symptoms and reproduction
Understanding the Root Cause
Step-by-Step Solution
Common Edge Cases
FAQ

Symptoms and reproduction

The issue description points to a fuzz-generated .clif case with:

test interpret
test run
set opt_level=speed_and_size
set use_egraphs=true
target aarch64 ...

The important signal is the combination of:

AArch64-only failure
Passes on x86
Requires egraphs
Triggered under optimization

That pattern usually means one of four things:

An egraph rewrite is too aggressive and assumes semantics that do not hold after AArch64 legalization.
A transformed instruction sequence is legal in the IR but lowered incorrectly for AArch64.
AArch64 has stricter behavior around flags, lanes, immediates, shifts, extends, or bit-width canonicalization.
The interpreter and machine backend disagree because the optimization changed the IR into a shape that exposes a backend bug.

To reproduce locally, start by running the exact test case through Cranelift’s filetests and enable verbose passes if available in your local setup:

cargo test -p cranelift-filetests -- test_aarch64_egraphs --nocapture

If you have the original generated file, run the targeted filetest directly instead of a broad suite so you can inspect the transformed IR before and after egraph extraction.

Understanding the Root Cause

At a technical level, this happens because egraphs perform equivalence-based rewriting, not just local peephole optimization. That is powerful, but it also raises the risk that a rewrite considered semantically equivalent at the IR level may stop being equivalent after target-specific lowering rules are applied.

For AArch64, common problem areas include:

Integer extension semantics, especially mixing sign-extension and zero-extension.
Shift and rotate rewrites where out-of-range behavior, masked shift amounts, or bit-width assumptions differ.
Condition code materialization and compare folding.
Vector lane transformations that are legal in generic IR but not preserved identically by the backend.
Narrow/wide value rewrites where intermediate truncation or extension is silently changed.

The reason it passes on x86 is not necessarily that the rewrite is correct. More often, x86 either:

has a lowering path that accidentally preserves the intended behavior,
accepts a broader set of instruction forms, or
does not expose the semantic discrepancy due to different legalization decisions.

In practice, the root cause is usually one of these:

A missing target guard on an egraph rewrite.
A rewrite that should only fire for a specific bit-width or value domain.
A backend legalization bug revealed by a newly rewritten form.
An extraction cost model choosing a shape that is theoretically equivalent but backend-hostile on AArch64.

If the fuzz case only fails when use_egraphs=true, the fastest path is to inspect the IR immediately before and after egraph optimization, then compare the AArch64-lowered result against the pre-egraph baseline.

Step-by-Step Solution

The safest fix is to identify the offending rewrite, constrain or disable it for AArch64, and then add a regression test so the fuzz case stays fixed.

1. Isolate the optimization delta

Run the same test with and without egraphs and compare the resulting IR.

; baseline.clif
set use_egraphs=false

; repro.clif
set use_egraphs=true

If your local Cranelift tooling supports pass dumping, capture both forms. The key question is: what expression shape changed?

2. Minimize the fuzz case

Shrink the test until only the failing transformation remains. Keep:

the same target triple,
the same opt_level,
the same value widths,
the same opcode family involved in the rewrite.

A minimal case makes it much easier to prove whether the issue is in:

egraph rewrite rules,
instruction legalization, or
AArch64 lowering.

3. Inspect suspicious rewrite classes

Focus first on rules touching these areas:

- ireduce / uextend / sextend
- band / bor / bxor canonicalization
- ishl / ushr / sshr reassociation
- icmp folding and boolean normalization
- select / bint / flags-related simplification
- vector splat, extractlane, insertlane rewrites

If one rule rewrites a narrow-width operation into a wider form plus masking, verify that AArch64 lowering preserves the exact semantics.

4. Add a target-specific guard or semantic predicate

Once you find the bad rewrite, do not just remove it blindly. Prefer one of these fixes:

Add a bit-width predicate.
Add a target predicate so it does not fire on AArch64.
Require proof that the transformed expression preserves sign/zero extension behavior.
Move the rewrite later or earlier so legalization sees a safer form.

Conceptually, the change looks like this:

// Before: rewrite always fires
(rewrite (ishl x y) => (some-canonical-form x y))

// After: rewrite fires only when semantics are preserved
(rewrite (ishl x y) => (some-canonical-form x y)
    :when (safe_for_bitwidth_and_target x y target))

If the bug is in lowering rather than rewriting, fix the AArch64 backend to correctly legalize the egraph-produced form instead.

5. Validate with interpreter and backend

Because the file includes both test interpret and test run, validate against both the IR interpreter and generated machine code.

cargo test -p cranelift-filetests -- --nocapture

Then run the exact reduced file across targets where possible:

# Expected: pass on x86 and AArch64 after fix
# Compare egraphs enabled/disabled variants

Your goal is to confirm:

the interpreter still agrees with expected semantics,
AArch64 machine code now matches interpreter behavior,
the fix does not regress x86.

6. Add a permanent regression test

Create a dedicated .clif regression file using the minimized repro. Keep the original settings that triggered the bug:

test interpret
test run
set opt_level=speed_and_size
set use_egraphs=true
target aarch64

function %repro(...) -> ... {
  ; minimized body here
}

This is critical because fuzz failures often reappear when new rewrites are added nearby.

7. If needed, temporarily disable the rewrite

If you need an immediate stabilization patch, it is acceptable to disable the specific optimization on AArch64 while preparing a proper semantic fix. That is better than leaving a known miscompile path enabled.

// Temporary mitigation strategy
if target.is_aarch64() {
    disable_problematic_egraph_rule();
}

Use this only as a short-term measure and document why the rule is being gated.

Common Edge Cases

Even after fixing the main bug, several adjacent cases can still fail if they are not explicitly tested.

1. Sign-extension versus zero-extension confusion

A rewrite may preserve value bits for positive numbers while breaking negative values. Always test with inputs that exercise the sign bit.

2. Shift counts at or beyond type width

Backend behavior can diverge if the rewrite assumes shift counts are masked or normalized differently than the target actually lowers them.

3. Boolean canonicalization

Some rewrites assume booleans are always 0 or 1. If the target path materializes all-ones masks or uses flags-based representations, equality-preserving rewrites can still produce backend-visible differences.

4. Narrow integer legalization

AArch64 often legalizes subword operations through wider registers. If a rewrite changes where truncation happens, the resulting code can differ only on certain inputs.

5. Vector lane semantics

If the fuzz case touches vectors, verify lane ordering, extraction, insertion, and splat behavior. These are common places where target-specific lowering bugs hide.

6. Cost-model extraction issues

Sometimes every rewrite is individually valid, but the egraph extractor picks a form that stresses an incomplete backend path. In that case, the fix may belong in extraction heuristics or legalization, not the rewrite itself.

FAQ

Why does this fail only on AArch64 if the IR rewrite is supposed to be target-independent?

Because target-independent IR equivalence is only safe if all downstream lowering paths preserve that equivalence. AArch64 legalization may expose a semantic gap that x86 lowering does not.

Should I disable egraphs entirely for AArch64?

No. Disable only the specific problematic rewrite or fix the backend path. A broad disable loses optimization coverage and hides the real bug.

How do I tell whether the bug is in egraphs or the AArch64 backend?

Compare three stages: original IR, post-egraph IR, and final backend behavior. If post-egraph IR already violates interpreter expectations, the bug is in the rewrite. If interpreter behavior is correct but generated machine code is wrong, the bug is in AArch64 lowering or legalization.

The practical resolution for this GitHub issue is to treat it as a rewrite-to-lowering contract violation: isolate the egraph transformation introduced by the fuzz case, constrain it with the right semantic or target predicate, and lock in the fix with a minimized AArch64 regression test.