How to Fix: Cranelift: should be implemented in ISLE: inst = `v3 = uunarrow.i64x2 v1, v2`

7 min read

Cranelift x86_64 compile failure on uunarrow.i64x2: why it happens and how to implement the missing ISLE lowering

The failure is not in your test case. The real problem is that x86_64 is missing an ISLE lowering rule for v3 = uunarrow.i64x2 v1, v2, while AArch64 reaches a legal lowering path for the same CLIF instruction. That is why the exact same IR compiles on one backend and fails on the other.

In practical terms, this GitHub issue points to a backend coverage gap: Cranelift recognizes the CLIF vector narrowing operation, but the x86_64 instruction selector does not yet fully translate this opcode through ISLE. The result is a legalization or instruction selection failure that looks target-specific, because it is target-specific.

Understanding the Root Cause

The instruction uunarrow.i64x2 means: take two vector inputs containing wider integer lanes, perform an unsigned narrowing, and pack the result into a narrower vector type. In Cranelift, these vector transforms must eventually be mapped to backend-specific machine instructions or expanded into equivalent instruction sequences.

On AArch64, this kind of operation often has a direct or already-implemented lowering path because the backend contains matching legalization rules or instruction patterns for vector narrowing and packing. On x86_64, especially in the SSE4.2/AVX configuration shown in the issue, the selector may lack one of these pieces:

  • a direct ISLE pattern for uunarrow.i64x2
  • a legalization rewrite into simpler CLIF operations that x86_64 already supports
  • a custom lowering sequence that builds the packed result from multiple x86 vector instructions

That is why the compiler reports an error only on x86_64. The CLIF op is valid, but the backend does not know how to finish lowering it.

There is also an important semantic detail: narrowing from 64-bit lanes to smaller lanes with unsigned semantics is not always expressible as a single x86 instruction in older SIMD subsets. Some x86 pack instructions exist for narrower source widths, but a 64-bit unsigned narrow may need a multi-instruction sequence involving lane extraction, clamping, shuffling, and reconstruction. If the backend expects a one-shot rule and none exists, selection fails.

So the root cause is best summarized as follows: Cranelift x86_64 is missing an ISLE implementation or legalization path for the specific vector opcode shape produced by uunarrow.i64x2.

Step-by-Step Solution

The correct fix is to implement this operation in the x86_64 backend’s ISLE lowering pipeline. Depending on the current backend organization, that typically means either:

  1. adding a new ISLE rule that matches uunarrow.i64x2
  2. lowering it to an equivalent sequence of already-supported x86 vector operations
  3. or legalizing it earlier into a form the backend already knows how to select

Use the following workflow.

1. Reproduce the failure with a focused CLIF test

test compile
target x86_64 sse42 has_avx

function %main() -> i32 {
block0:
    v1 = vconst.i64x2 [1 2]
    v2 = vconst.i64x2 [3 4]
    v3 = uunarrow.i64x2 v1, v2
    return 0
}

Then run the Cranelift filetest or compile test command used by your local setup. The exact command varies by repository layout, but the goal is simple: confirm that the failure reproduces on x86_64 and not on AArch64.

2. Inspect where x86_64 lowering stops

Search the backend for existing support for related vector narrow operations:

grep -R "uunarrow" cranelift/codegen/src
grep -R "snarrow" cranelift/codegen/src
grep -R "unarrow" cranelift/codegen/src

Also inspect the x64 ISLE files for nearby vector rules. You are looking for either a missing matcher or similar implemented patterns that can be copied and adapted.

3. Add a new ISLE rule or helper for uunarrow.i64x2

If the backend uses specialized constructors, add a new pattern in the x64 ISLE definitions that recognizes the opcode and emits either a direct instruction or a synthetic sequence. Conceptually, the rule will look like this:

(rule (lower (has_type (multi_lane  ...)
        (uunarrow x y)))
  ;; produce x64 lowering sequence here
)

The exact syntax depends on the current Cranelift version, but the implementation strategy should follow the operation’s semantics:

  1. take lanes from both input vectors
  2. apply unsigned saturation or truncation semantics required by CLIF
  3. pack them into the destination vector in the correct lane order

If there is no single x86 instruction for this case, implement it as a helper sequence. In pseudocode, the lowering often resembles this structure:

// Pseudocode only
lo = extract_lanes(v1)
hi = extract_lanes(v2)
lo_narrowed = narrow_unsigned_each_lane(lo)
hi_narrowed = narrow_unsigned_each_lane(hi)
result = pack_vectors(lo_narrowed, hi_narrowed)
return result

In a real x86 backend, the sequence may involve combinations of:

  • lane shuffle instructions
  • min/max or compare-and-mask for saturation behavior
  • packs where applicable
  • bitwise operations to assemble the final vector

4. Prefer legalization if the target lacks efficient direct support

If implementing the operation directly in x64 ISLE becomes too complex, a better design may be to legalize uunarrow.i64x2 earlier into simpler CLIF operations. For example, split the vector into scalar or half-vector pieces, narrow them independently, and reconstruct the result using operations the backend already lowers correctly.

// Example strategy, not exact CLIF syntax:
extractlane v1[0], v1[1], v2[0], v2[1]
apply unsigned narrowing per lane
insertlane into destination vector
return packed vector

This approach is often slower than a native SIMD lowering, but it is correct and can unblock the backend until a more optimized path is added.

5. Add regression tests for both support and failure mode

Once the lowering exists, add a dedicated regression test that ensures x86_64 no longer errors on this instruction.

test compile
target x86_64 sse42 has_avx

function %main() {
block0:
    v1 = vconst.i64x2 [0 18446744073709551615]
    v2 = vconst.i64x2 [5 6]
    v3 = uunarrow.i64x2 v1, v2
    return
}

Add additional cases that verify:

  • zero values
  • maximum unsigned values
  • mixed low and high lane values
  • correct lane ordering from v1 and v2

6. Validate against AArch64 behavior

Since AArch64 already compiles the operation, use it as a semantic reference. Compare the intended result shape, lane ordering, and saturation behavior. The backend implementations do not need to look the same, but they must preserve identical CLIF semantics.

7. Submit the patch with a precise backend note

In the final patch description, explain that the issue was caused by a missing x86_64 ISLE lowering for uunarrow.i64x2 and that the fix adds either:

  • a direct lowering rule, or
  • a legalization path into supported operations

This makes the issue easier for reviewers to classify as a backend completeness bug rather than an IR validity problem.

Common Edge Cases

Even after adding the missing rule, several subtle bugs can still appear.

1. Wrong lane ordering

The destination vector usually combines data from both v1 and v2. A common bug is reversing halves or interleaving lanes incorrectly during shuffles or inserts. Always test known lane patterns so ordering mistakes are obvious.

2. Saturation vs truncation confusion

The semantics of unsigned narrowing must match CLIF exactly. If the operation is saturating, values larger than the destination lane width must clamp to the maximum representable unsigned value. If the implementation simply truncates, the output will be wrong for large inputs.

3. ISA feature assumptions

Your test case uses sse42 and has_avx. If the lowering accidentally depends on a later instruction set such as AVX2 or AVX-512, compilation may still fail or silently select illegal instructions for the declared target.

4. Type shape mismatches

Vector operations in Cranelift are sensitive to exact lane counts and element widths. A helper written for one vector shape may accidentally accept another and generate invalid machine code or ICE later in the pipeline.

5. Partial legalization gaps

Sometimes adding a rule for uunarrow.i64x2 fixes one path but leaves related forms unsupported, such as signed narrowing variants or different lane widths. Check nearby ops while you are in the backend to avoid repeating the same bug pattern.

6. Constant-only tests masking real bugs

If all regression tests use vconst, the backend might fold or simplify paths that never occur with live values. Include tests where inputs come from parameters or intermediate vector expressions.

FAQ

Why does this compile on AArch64 but fail on x86_64?

Because Cranelift backends are implemented independently. AArch64 already has a valid lowering or legalization path for this vector operation, while x86_64 is missing one for uunarrow.i64x2.

Is this a CLIF bug or an x86 backend bug?

This is most likely an x86_64 backend bug, specifically a missing ISLE implementation or legalization rule. The CLIF instruction itself is valid if another backend can compile it correctly.

Should the fix be a direct x86 instruction match or a legalized expansion?

Whichever is correct and maintainable for the available ISA level. If x86 lacks a clean native instruction sequence for the exact semantics, a legalized expansion into simpler supported operations is the safest first fix, followed by optimization later.

The key takeaway is simple: this issue is a missing lowering implementation, not a malformed test. Once x86_64 learns how to lower uunarrow.i64x2 through ISLE or legalization, the backend should behave consistently with AArch64 and the compile failure should disappear.

Leave a Reply

Your email address will not be published. Required fields are marked *