How to Fix: Cranelift: Wrong result for `atomic_cas.i8` on RISC-V
Cranelift on RISC-V can miscompile atomic_cas.i8 by returning the wrong value, and the failure usually comes from a subtle mismatch between byte-sized compare-and-swap semantics and the wider instructions RISC-V actually provides.
This issue matters because compare-and-swap is foundational for lock-free data structures, reference counting, and synchronization primitives. If an i8 atomic CAS returns the wrong result, generated code may appear correct under light testing but fail under fuzzing, concurrency stress, or architecture-specific validation.
Problem Overview
On RISC-V, native atomic memory operations are generally defined for word-sized or doubleword-sized accesses, not arbitrary byte-sized compare-and-swap operations. That means a backend like Cranelift must lower atomic_cas.i8 into a sequence that:
- Loads the containing aligned word.
- Extracts the target byte.
- Compares that byte against the expected value.
- If equal, constructs a new word with only that byte replaced.
- Retries the operation with a loop around lr/sc or equivalent atomic primitives.
- Returns the previous byte value, not the full word and not a sign-incorrect variant.
The bug appears when one of those steps is implemented with the wrong extension, masking, extraction, or reconstruction behavior. In practice, fuzzers catch this because they generate byte values around boundaries like 0x7f, 0x80, and 0xff, where sign extension and zero extension bugs become visible immediately.
Understanding the Root Cause
The core of the problem is that atomic_cas.i8 has byte-level semantics, but RISC-V atomic instructions operate on a larger granularity. So the backend must emulate byte CAS using a larger atomic transaction. That emulation is easy to get almost right and still return the wrong result.
There are three common technical failure modes here:
- Incorrect byte extraction from the loaded word
If the backend shifts the loaded word but fails to mask with0xff, higher bits may leak into the result. On RV64, this is especially dangerous because intermediate values often live in 64-bit registers. - Wrong extension when returning the old i8 value
Cranelift IR distinguishes between types and calling convention behavior. If the old byte is extracted as an unsigned value but later treated as signed, or vice versa, the returned result can differ from the expected i8 CAS contract. For byte values above127, sign handling becomes observable. - Compare performed on a widened value instead of the exact byte
If the expected argument is extended differently from the extracted byte, equality may fail or succeed incorrectly. For example, comparing a sign-extended extracted byte against a zero-extended expected byte is wrong for values like0x80.
In short, the lowering must treat the compared and returned value as the exact low 8 bits of the selected byte lane, while still performing atomic updates on the aligned containing word. Any mismatch between masking, shifting, and extension can produce a wrong return value even when the memory update itself is correct.
This explains why fuzzing found the bug: the generated test likely exercised a byte value whose signed interpretation differed from its unsigned bit pattern, exposing a return-path mistake rather than a total CAS failure.
Step-by-Step Solution
The fix is to implement or patch the RISC-V lowering for atomic_cas.i8 so that it operates on the containing word atomically, but compares and returns only the target byte with strict 8-bit semantics.
1. Locate the RISC-V atomic CAS lowering
Find the code path in Cranelift responsible for lowering small-width atomic CAS operations on RISC-V. Depending on the codebase version, this may live in the ISA lowering layer, instruction selection, or a legalization pass for unsupported atomic widths.
# Search for atomic CAS lowering on RISC-V in the Cranelift tree
rg "atomic_cas|cmpxchg|lr\.w|sc\.w|lr\.d|sc\.d" cranelift/
2. Verify the intended semantics
The correct result of atomic_cas.i8(ptr, expected, replacement) is the old byte value previously stored at memory, regardless of whether the swap succeeds. The success condition is based on exact equality of the byte lane, not on the entire containing word.
// Required logical behavior
old_byte = *ptr
if old_byte == expected {
*ptr = replacement
}
return old_byte
3. Align the address and compute byte position
Because RISC-V atomic instructions work on wider aligned memory units, derive:
- The aligned base address of the containing word.
- The byte offset within that word.
- A shift amount equal to
byte_offset * 8. - A byte mask positioned into the containing word.
// Pseudocode for RV64 using 32-bit atomics as an example
aligned = addr & ~0x3
byte_index = addr & 0x3
shift = byte_index << 3
mask = 0xff << shift
If the implementation uses lr.d/sc.d on RV64, the alignment and mask width should reflect 8-byte granularity instead.
4. Extract the old byte correctly
After loading the containing word inside the atomic loop, extract the target byte using a shift followed by a mask. Do not rely on arithmetic shift or implicit sign propagation.
// Correct extraction
old_word = lr(aligned)
old_byte = (old_word >> shift) & 0xff
This step is critical. The returned value should come from old_byte, not from a partially shifted register and not from a sign-extended temporary.
5. Compare using normalized 8-bit values
Make sure the expected value is also normalized to 8 bits before comparison. This avoids mismatches caused by earlier sign or zero extension.
expected8 = expected & 0xff
if old_byte != expected8:
return old_byte
If your IR already guarantees an i8 virtual value, this masking may be redundant logically, but adding it in the lowering often prevents backend-specific extension bugs.
6. Rebuild the new word without corrupting adjacent bytes
When the comparison succeeds, clear the target byte lane and insert the replacement byte.
replacement8 = replacement & 0xff
new_word = (old_word & ~mask) | (replacement8 << shift)
This guarantees that only the selected byte changes and the surrounding bytes remain untouched.
7. Use a proper retry loop around store-conditional
The sc instruction may fail spuriously due to contention. Retry only when the compare succeeded but the store-conditional failed.
loop:
old_word = lr(aligned)
old_byte = (old_word >> shift) & 0xff
expected8 = expected & 0xff
if old_byte != expected8:
return old_byte
replacement8 = replacement & 0xff
new_word = (old_word & ~mask) | (replacement8 << shift)
ok = sc(aligned, new_word)
if !ok:
goto loop
return old_byte
8. Return the byte with the correct Cranelift value semantics
This is where many bugs hide. If the function returns i8, ensure the result fed back into the Cranelift pipeline is the precise 8-bit old value. If the ABI requires extension at the boundary, let the calling convention handle that consistently rather than returning a widened backend temporary with the wrong signedness.
// Backend-side intent
result_i8 = old_byte & 0xff
return result_i8
Pay special attention if the test function uses sext in its signature. In that case, the logical i8 result must still be correct first; only afterward should ABI-mandated sign extension occur.
9. Add a regression test from the fuzz case
Create or preserve the original .clif reproducer and reduce it if necessary. Then add focused tests for values that expose extension errors.
;; Suggested coverage ideas
;; old = 0x7f, expected = 0x7f, replacement = 0x80
;; old = 0x80, expected = 0x80, replacement = 0x01
;; old = 0xff, expected = 0xff, replacement = 0x00
;; old = 0x80, expected = 0x7f, replacement = 0x22
;; verify both returned value and memory result
Also include tests where the target byte is not at offset zero inside the containing word, because shift and mask bugs often only appear on nonzero byte lanes.
10. Validate on real or emulated RISC-V
Run Cranelift tests under a RISC-V target environment or emulator and ensure the issue is fixed across both interpretation and code generation modes where applicable.
cargo test -p cranelift-codegen riscv
cargo test -p cranelift-filetests
# Add any project-specific test commands used for .clif execution
Common Edge Cases
Even after fixing the main bug, several adjacent problems can still break atomic_cas.i8 on RISC-V:
- Nonzero byte offsets
A CAS on the second, third, or fourth byte of a word can fail if the shift calculation is off by one or uses the wrong alignment basis. - RV64 register-width leakage
Intermediate values in 64-bit registers may retain high bits unless every extraction path masks with0xff. - Sign-extension at ABI boundaries
If a function is declared with sign-extending return behavior, the backend must still return the correct i8 payload before ABI extension happens. - Wrong atomic width selection
Using lr.d/sc.d versus lr.w/sc.w changes alignment and mask logic. Mixing those assumptions can target the wrong byte lane. - Endianness assumptions
Byte extraction must follow the target memory layout. A backend helper that assumes lane ordering incorrectly can compare or replace the wrong byte. - Failure to preserve neighboring bytes
Ifnew_wordis built incorrectly, the CAS may accidentally overwrite adjacent data while still returning a plausible old byte. - Missing retry on store-conditional failure
Under contention, an implementation that does not loop on sc failure becomes non-atomic and semantically wrong.
FAQ
1. Why does this bug affect i8 but not necessarily larger CAS widths?
Because byte-sized CAS is usually emulated on RISC-V using a larger atomic transaction. Native-width atomics map more directly to hardware instructions, so there are fewer opportunities for masking, shifting, or extension mistakes.
2. Is the memory update wrong, or only the returned value?
It can be either, but in this class of bug the most common symptom is that the returned old value is wrong while the memory update path looks mostly correct. Fuzzing often detects the mismatch by checking exact CAS semantics, which require the previous value to be returned precisely.
3. What is the safest way to prevent regressions here?
Add narrow-width atomic tests that cover 0x00, 0x7f, 0x80, and 0xff, run them on multiple byte offsets inside the containing word, and verify both the returned value and final memory state. A reduced Cranelift regression test based on the fuzz-generated .clif case is the best long-term protection.
Fixing this issue comes down to one rule: perform atomicity at word granularity, but preserve compare-and-return semantics at exact byte granularity. Once the lowering consistently masks, compares, rebuilds, and returns the target byte as a true i8, atomic_cas.i8 on RISC-V behaves correctly again.