Logging · 001
Engineering's view on local frontier AI — what actually runs on hardware you control.
A weekly engineering journal benchmarking open-weights models on consumer silicon. Real harnesses, real failures, no vendor decks.
Recent dispatches
- 1 A BIOS Update Won't Fix #6182 — I Tried the Newest One The Bosgame M5's ROCm bug is board-specific, not chip-specific — so firmware is the obvious lever. I flashed Bosgame's newest official BIOS hoping to dodge it. It didn't work, and the negative narrows where the fault actually lives.
- 2 Full Context on a Vulkan-Only Strix Halo: The Decode-Drop Reproduces, but the Sweet Spot Moves kmarble showed ROCm decode collapses 64% at full context on Strix Halo, and ROCm+MTP cures it. My board can't run ROCm. The Vulkan half reproduces the drop — but the MTP sweet spot from last week walks left at depth: by 76k, drafting too deep is slower than no speculation at all.
- 3 MTP Defaults Are a Trap: What 260 Runs Showed About Speculative Decoding on Qwen3.6 Until May 19, the llama.cpp speculative-decoding default was 16. On Qwen3.6's single MTP head, that default cost up to 75% of generation throughput. Here's where the real sweet spots are — and why they're architecture-specific.
- 4 ROCm 7.x on the Bosgame M5: 14 Configurations, 14 Failures We promised a ROCm 7.x revisit. We got a comprehensive workaround sweep instead. Both are useful.
- 5 Vulkan/RADV vs ROCm 6.4 on Strix Halo: What 128 Benchmark Runs Actually Showed The headline isn't where Vulkan wins. It's where ROCm doesn't run at all.
- 6 What 96GB of VRAM on Unified-Memory Hardware Actually Gets You for Local LLM Inference An honest practitioner take from a Bosgame M5 running Strix Halo at full BIOS allocation.