How to Fix: Cranelift: unimplemented >64 bits error during `bxor_not` and `icmp` optimization
Cranelift fails on >64-bit vectors during bxor_not and icmp optimization because the simplifier assumes a scalar-sized bitmask path that does not hold for wide SIMD values.
This issue appears when a .clif test triggers an optimization pattern involving bxor_not and icmp on values wider than 64 bits, such as SIMD vectors like i32x4. Instead of folding or canonicalizing the expression, Cranelift reaches an internal path that effectively expects a mask or immediate representation that fits within 64 bits, then aborts with an unimplemented >64 bits error.
If you are debugging Cranelift IR transforms, the important takeaway is that this is not a frontend parsing problem. It is an optimizer legalization or simplification gap caused by a mismatch between wide vector types and code written for 64-bit scalar assumptions.
Understanding the Root Cause
The failure is usually triggered by an optimization that tries to rewrite boolean or comparison logic into a simpler equivalent form. A common pattern is:
bxor_not(a, b) -> bxor(a, bnot(b))
or a compare-related fold where a bitwise expression is interpreted as a mask-producing operation that can be simplified before instruction selection.
The bug happens because one of these optimizer stages does at least one of the following:
- Attempts to materialize a constant mask using a representation only implemented for up to 64 bits.
- Calls a helper that converts a value to an integer-backed bit pattern, but that helper only supports u64-sized values.
- Treats a vector boolean result as if it were a scalar integer, then uses a scalar-only simplification path.
- Assumes lane-wise transformations can be expressed through a single wide immediate, which breaks for types like i32x4, i16x8, or other types whose total bit width exceeds 64.
In the reported case, the presence of icmp is significant because compare operations often produce masks or all-ones/all-zeros lane values. Those are easy to optimize for scalar types, but vector masks require either:
- lane-wise reasoning, or
- a backend representation that can safely encode constants wider than 64 bits.
When the optimizer uses a scalar-only helper for a vector-wide mask, Cranelift hits the unimplemented path.
In short, the root cause is an optimization rule that is semantically valid but implemented with a bit-width restriction that does not cover SIMD values wider than 64 bits.
Step-by-Step Solution
The safest fix is to guard the transform so it only fires for supported bit widths, then add a vector-safe implementation if the optimization is still desirable for SIMD values.
1. Reproduce the failure with a focused test
Start by minimizing the failing .clif case so the optimizer always reaches the problematic rule. Keep the wide type intact.
test optimize
set opt_level=none
set preserve_frame_pointers=true
set enable_multi_ret_implicit_sret=true
function %main() -> i32x4 fast {
; reduced reproducer preserving the wide vector path
; build a pattern involving icmp and bxor_not
}
If you are working in the Cranelift repository, place the test near related optimizer regressions and run the filecheck-based test suite.
2. Locate the offending optimization rule
Search the optimizer and canonicalization code for bxor_not, icmp, and helpers that build bit masks or convert immediates. You are looking for code paths that:
- call integer conversion helpers for constants,
- check bits() <= 64,
- use scalar-only immediate constructors, or
- panic on unsupported widths.
Typical investigation flow:
rg "bxor_not|icmp|unimplemented|64" cranelift/
Then inspect the simplification code around the match arm or rewrite rule that handles the failing pattern.
3. Add a width guard first
If the current transformation is only correct for scalar or 64-bit-backed values, prevent it from firing for larger types.
if ty.bits() > 64 {
return None;
}
For vectors, prefer checking whether the type is scalar versus vector, because total width alone can be misleading:
if ty.is_vector() {
return None;
}
This is the fastest way to stop the crash and restore correctness.
4. Implement a vector-safe fold if needed
If the optimization should apply to SIMD types, rewrite it so the transform does not depend on packing the full value into a 64-bit integer.
For example, avoid logic like this:
let mask = (1u64 << ty.bits()) - 1; // invalid for >64 bits
Use lane-aware construction instead:
if ty.is_vector() {
let lane_ty = ty.lane_type();
let lanes = ty.lane_count();
// Pseudocode: build per-lane all-ones or all-zeros constants
// rather than one scalar-wide integer mask.
let lane_mask = match lane_ty.bits() {
8 => 0xff,
16 => 0xffff,
32 => 0xffff_ffff,
64 => 0xffff_ffff_ffff_ffff,
_ => return None,
};
// Construct a splat or lane-wise constant using existing IR helpers.
}
The exact API depends on the part of Cranelift you are editing, but the rule is consistent: never require a single host integer to represent the full vector bit pattern.
5. Preserve semantics around boolean masks
Compare results in Cranelift may use lane masks that are not interchangeable with plain integers in every lowering path. When rewriting:
- preserve the original result type,
- preserve lane shape,
- do not assume scalar bitnot semantics apply to vector booleans unless the IR guarantees that representation.
A safer transform often looks like this conceptually:
// Before
v3 = bxor_not(v1, v2)
v4 = icmp eq v3, v0
// After: only if the rewrite is lane-wise valid and type-safe
v3n = bnot(v2)
v3x = bxor(v1, v3n)
v4 = icmp eq v3x, v0
If the simplification tries to collapse the entire sequence into constants or a single mask, stop and verify that the implementation is vector-aware.
6. Add regression tests for both scalar and vector types
Do not stop at the reproducer. Add coverage that proves the optimization:
- still works for i32 or i64,
- does not crash for i8x16, i16x8, i32x4, or other wide vectors,
- does not silently change compare semantics.
test optimize
function %scalar_ok(i64, i64) -> i64 {
block0(v0: i64, v1: i64):
v2 = bxor_not v0, v1
return v2
}
function %vector_no_crash(i32x4, i32x4) -> i32x4 {
block0(v0: i32x4, v1: i32x4):
v2 = bxor_not v0, v1
return v2
}
7. Validate with the Cranelift test suite
Run the relevant optimizer tests and, if available, target-specific SIMD legalization tests. If you are preparing a patch for review, include:
- the minimized reproducer,
- the guard or vector-safe fix,
- a short explanation of why the previous implementation was limited to 64 bits.
For project contribution workflow, link the patch to the repository issue using the project’s normal pull request process on Wasmtime on GitHub.
Common Edge Cases
1. Vector width versus lane width confusion
A type like i32x4 has 32-bit lanes but a 128-bit total width. Code that checks only the lane width may accidentally enter a path that still tries to encode the full value as u64.
2. Boolean vector representations
Some optimizations assume comparisons produce canonical all-zeros or all-ones integer values. That may be true semantically, but the implementation must still use a vector-capable constant path.
3. Scalar fix that regresses SIMD optimization quality
Adding a simple if ty.is_vector() { return None; } guard fixes the crash, but it may disable an optimization previously expected by tests. If performance matters, follow up with a proper SIMD-aware fold.
4. Backend-specific legalization failures
Even after fixing the optimizer, a later backend pass may still reject the transformed pattern if the target lacks support for a certain vector operation. Verify whether the crash moved rather than disappeared.
5. Incorrect constant splat construction
Replacing a scalar-wide mask with a vector splat is correct only if every lane should receive the same bit pattern. For mixed-lane constants, you need explicit lane construction rather than splatting one value.
6. Test settings masking the real phase
Issue reproducers sometimes use settings like opt_level=none, yet still hit canonicalization or legalization code that performs local rewrites. Do not assume the bug lives only in high-level optimization passes.
FAQ
Why does this happen even when opt_level=none is set?
Because Cranelift still performs required IR normalization, legalization, and peephole-style simplifications outside aggressive optimization levels. The failing rewrite can live in one of those always-on phases.
Is the correct fix to disable bxor_not optimization for all vectors?
No. That is a safe short-term fix, not necessarily the best long-term one. The better solution is to make the transform vector-aware so it works without assuming a 64-bit backing representation.
How do I know whether the bug is in icmp or bxor_not?
Usually neither instruction is inherently broken. The problem is the optimization rule connecting them. Trace the rewrite that fires immediately before the unimplemented >64 bits error and inspect any helper that materializes constants or masks.
The practical resolution is straightforward: identify the scalar-only rewrite, guard it for unsupported widths, and then replace it with lane-wise logic if SIMD optimization is still desired. That fixes the crash, preserves correctness, and gives you a clean path to a proper wide-type optimization later.