How to Fix: WASI panics in `fd_readdir` when trying to read a directory with the Cyrillic name `Действие` (“Action”)
A panic in WASI fd_readdir on a directory named Действие is a classic sign of a UTF-8 boundary bug: the runtime, host adapter, or guest-side parsing code is treating directory entry names as byte slices incorrectly, then slicing or decoding them at the wrong offset. The result is that most non-ASCII names appear to work until one specific multibyte sequence exposes the bug and crashes the module.
Understanding the Root Cause
In WASI, fd_readdir returns a buffer containing one or more packed directory entries. Each entry has metadata plus a raw filename payload. The filename is not a Rust String inside the syscall boundary; it is effectively a byte sequence with a length field. Problems start when one of these layers makes a wrong assumption:
- The host runtime writes a malformed name length.
- The guest parser slices the returned buffer using character count instead of byte count.
- The code assumes filenames are ASCII and performs unsafe truncation, padding, or alignment.
- A library converts bytes to
&strtoo early and panics on invalid boundaries.
The directory name Действие is useful for exposing the issue because Cyrillic characters are encoded as multibyte UTF-8 sequences. A filename that looks like 8 characters may consume many more bytes. If code uses string indexing, stale offsets, or partially read record boundaries, Rust will panic with errors similar to invalid UTF-8 conversion, out-of-bounds slicing, or failure while iterating directory entries.
Another important detail: WASI does not guarantee that every host filesystem name arrives as a clean Unicode Rust string. Correct implementations should treat the name as bytes first, validate boundaries carefully, and only decode to UTF-8 when appropriate. If your stack includes a WASI shim, a WebAssembly runtime, and a Rust crate wrapping fd_readdir, the bug may be in any of those layers.
Step-by-Step Solution
The safest fix is to make directory reading byte-length aware, avoid direct string slicing, and update any affected runtime or WASI wrapper crate.
1. Reproduce the bug with a minimal test case
Create a directory tree containing the failing name and enumerate it from the WASI module only.
test-data/
├── normal
├── пример
└── Действие
If possible, log both the raw byte length and the decoded filename value.
use std::fs;
fn main() {
for entry in fs::read_dir("test-data").unwrap() {
let entry = entry.unwrap();
let name = entry.file_name();
println!("{:?} bytes={}", name, name.as_encoded_bytes().len());
}
}
If this works natively but fails in WASI, the bug is likely in the WASI runtime path rather than your business logic.
2. Update your WASI runtime and related crates
This class of bug has historically appeared in older builds of WASI adapters and runtimes. Upgrade:
- Your WebAssembly runtime such as Wasmtime, Wasmer, or another WASI host
- Your wasi* crates
- Any filesystem abstraction layer wrapping
fd_readdir
For Rust projects, update dependencies and rebuild from scratch:
cargo update
cargo clean
cargo build --target wasm32-wasi
If you use the newer target naming in your toolchain, build accordingly:
cargo build --target wasm32-wasip1
3. Stop assuming filenames are valid UTF-8 strings at the syscall boundary
If your code reads raw WASI directory buffers, parse entries using the reported byte length only. Do not use character counts. Do not slice using guessed offsets.
fn safe_name_from_bytes(bytes: &[u8]) -> String {
match std::str::from_utf8(bytes) {
Ok(s) => s.to_owned(),
Err(_) => String::from_utf8_lossy(bytes).into_owned(),
}
}
If you have custom parsing logic around fd_readdir, verify that each record is handled like this conceptually:
let name_len = dirent.d_namlen as usize;
let record_end = base_offset + header_len + name_len;
if record_end > buffer.len() {
return Err("malformed fd_readdir buffer".into());
}
let name_bytes = &buffer[base_offset + header_len..record_end];
let name = safe_name_from_bytes(name_bytes);
The critical rule is simple: slice by bytes, decode after slicing.
4. Avoid direct Rust string indexing
If any downstream logic trims, prefixes, truncates, or compares names, make sure it does not do this:
// Wrong: may panic on non-ASCII boundaries
let short = &name[0..5];
Use character-aware or byte-safe alternatives:
// Safer for display-oriented truncation
let short: String = name.chars().take(5).collect();
Or, if your logic must remain byte-oriented, validate first:
fn safe_prefix(s: &str, n: usize) -> &str {
if n >= s.len() {
return s;
}
let mut idx = n;
while !s.is_char_boundary(idx) {
idx -= 1;
}
&s[..idx]
}
5. Add a regression test for Cyrillic and other multibyte names
This bug should never be fixed without a test. Include names from several scripts.
#[test]
fn readdir_handles_multibyte_names() {
let names = [
"normal",
"пример",
"Действие",
"日本語",
"emoji-📁"
];
for name in names {
// create dir, enumerate through WASI path, assert no panic
assert!(!name.is_empty());
}
}
6. If the panic is inside the runtime, isolate and report it upstream
If your code already handles filenames as bytes safely and the panic still occurs inside the host implementation of fd_readdir, capture:
- Runtime name and version
- Rust version and target
- Minimal reproducer
- Exact panic backtrace
- Host OS and filesystem type
Then file an issue with a compact reproduction. Use a hyperlink to the runtime project rather than pasting raw URLs.
Common Edge Cases
- Partial directory entry buffers:
fd_readdirmay return a buffer that does not contain all entries. Your iteration logic must resume using the correct cookie and must not assume a single read is complete. - Invalid UTF-8 from host filesystems: some environments permit filenames that are not valid Unicode. If your app only supports UTF-8, fail gracefully instead of panicking.
- Normalization differences: visually identical names may differ in composed vs decomposed Unicode forms, especially across platforms.
- Incorrect cookie handling: resuming enumeration with the wrong cookie can make parsing appear random and may corrupt record boundaries.
- Display truncation bugs: even after successful reading, UI code that truncates by byte offset can reintroduce the panic.
- Host/runtime mismatch: an old WASI adapter with a newer guest module, or the reverse, can surface serialization and parsing incompatibilities.
FAQ
Why does only the name Действие fail when other Cyrillic names work?
Because the bug is usually not about Cyrillic in general; it is about a specific byte layout, length, or buffer alignment pattern. One filename may cross a bad boundary that others do not.
Should I convert everything to ASCII or transliterated names?
No. The correct fix is to handle UTF-8 and raw filename bytes properly. Renaming directories avoids the symptom but leaves the underlying WASI parsing bug in place.
How can I tell whether the bug is in my Rust code or in the WASI runtime?
Run the same directory enumeration logic natively first. If native execution succeeds and the WASI build panics, then inspect your WASI-specific parsing and the runtime version. A minimal reproducer is the fastest way to isolate the fault.
The practical fix is consistent across implementations: treat fd_readdir names as byte sequences, respect the explicit length field, decode only after safe slicing, and upgrade the runtime if the panic occurs below your code. That eliminates crashes on Действие and prevents the same failure on other multibyte directory names later.