Field report · Field Report

AMD Is Selling "First-Class ROCm" on Strix Halo. I've Run the Same Chip for Six Months.

On June 8, AMD opened pre-orders for a $3,999 box built around the exact chip I've run in production since the start of the year, marketed on full ROCm support, with one of its own demos running on the exact model my board can't load under ROCm.

Author

Erik

Date

2026-06-18

Read

8 min read

Topics

ROCmStrix Halo

A note on links: some hardware below is linked through affiliate programs, marked (affiliate). If you buy through one, I earn a small commission at no cost to you. It never changes what I recommend or what I run — these are the boxes I'd point you to either way.

On June 8, Micro Center opened pre-orders for the AMD Ryzen AI Halo, a $3,999 developer box built on the Ryzen AI Max+ 395. That is the same Strix Halo silicon sitting in my Bosgame M5. The pitch is local AI without the cloud bill, and the headline is the software: full ROCm support, a pre-configured stack, day-zero model support, and the now-repeated framing that AMD treats this APU as a "first-class ROCm citizen." AMD's own cost-per-token comparison for the box, the one behind the "pays for itself" claim, runs on Qwen 3.6 35B.

I have run that exact chip, in production, since the start of the year. Qwen3.6-35B-A3B is my production model, and it is the model behind AMD's own "pays for itself" math for this box. In six months, ROCm has never loaded a single model on my board under the standard packages.

That reads like a contradiction with everything AMD is selling. It is not, quite, and the way it is not is the entire point of this post. The chip is fine. The hardware is fine. What I have spent six months mapping is the distance between "ROCm" as a word on a product page and ROCm as a stack you actually have to assemble, on hardware AMD did not build and does not validate. The Halo Box is AMD's answer to that distance. It is also, just by existing, the clearest admission that the distance is real.

Here is the map, because I think it is more useful than another spec comparison.

What "no ROCm" actually means here

My board hits a specific, repeatable wall. Every attempt to load a model into the GPU under ROCm dies with the same HSA fault, Memory critical error by agent node-0 ... Reason: Memory in use, fired during the GPU's queue setup, before the model is even resident. It is filed as ROCm issue #6182.

The detail that matters: this is board-specific, not chip-specific. The same silicon runs ROCm fine on other machines. A user with a Minisforum MS-S1 Max (affiliate), identical gfx1151, has ROCm working. I went through fourteen ROCm configurations on my board and got fourteen failures. So I did the only thing that kept the box useful: I version-locked ROCm at 6.4, froze all 66 packages so nothing could drift, and moved my entire stack to Vulkan through Mesa's RADV driver. Everything I publish runs on Vulkan because ROCm is not an option here.

Last week I tested whether firmware was the lever. Bosgame had shipped a newer BIOS, so I flashed it and re-ran the exact thing ROCm dies on. Same fault. A newer BIOS does not fix it, and the firmware came with no changelog to even suggest it might.

That was the state of things a week ago: board-specific bug, no firmware fix, Vulkan-only, six months in. Then a thread on r/StrixHalo changed my understanding of it, and that is the part worth your time.

The same vendor ships ROCm that crashes, ROCm that hangs, and ROCm that almost works

A commenter pointed out that AMD's own Lemonade server, which bundles its own build of llama.cpp on a different ROCm runtime (AMD's TheRock pipeline), runs models on this hardware where the standard ROCm packages crash. So I tested it, isolated in its own directory, with nothing else on the box changed: same board, same kernel, same BIOS, system ROCm untouched.

The result split cleanly, and both halves are findings.

The crash is gone. Under the TheRock 7.13 runtime that Lemonade bundles, a 1.5B model loads fully onto the GPU, all layers, and generates at 243 tokens per second. The HSA fault that the official ROCm 7.2.x packages and the community container both throw on this exact box simply does not happen. That tells you something the issue tracker has been circling for two months: #6182 is a bug in the ROCm userspace build, not in the board. Swap the runtime, keep everything else, and the wall moves.

But it is not a working path. With the whole model on the GPU (-ngl 99), anything over roughly 20 GB loads into memory and then hangs in initialization, before a single layer is assigned. No crash, no error, just stuck. I tested my 35B MoE production model and a 27B dense model of similar size. Both hang at the same spot. I ruled out the obvious explanations one at a time:

Not a locked-memory limit. Running as root with unlimited memlock changes nothing; it hangs identically.
Not MoE-specific. The dense model of the same size hangs the same way.
Not a stale build. The bleeding-edge nightly channel hangs too.

What is left is the large GPU allocation itself. There is one way around it: split the model, keep most of it on the GPU and spill the rest to the CPU (-ngl 20 and similar). That loads, and it generates. At about 3 tokens per second, CPU-bound, which is unusable for anything I actually run. So the full ROCm picture on this board is: full GPU offload hangs, partial offload technically works but is too slow to use, and Vulkan is the only path that is both stable and fast. TheRock trades the crash for a hang, and the hang has an escape hatch that lands you somewhere you would not want to stay.

So line them up. For one chip, AMD effectively ships at least three ROCm userspaces. The system packages crash on load. The community container crashes on load. AMD's own Lemonade bundle clears the crash and then hangs on anything over 20 GB. Same board, same kernel, same model file. The only variable is which flavor of AMD's own ROCm you happen to be running.

(One caveat, in the interest of not cherry-picking. A user on the same board reported loading a 105 GB model at full context under the same bundle. That is larger than the GPU memory on this chip, so it can only run split across GPU and CPU, the same partial-offload path I just described. It loads, the same way mine loads when split, and it almost certainly runs at the same kind of CPU-bound speed. So it is not a magic working configuration that contradicts the hang. It is the same mechanic: full GPU offload does not work, partial offload does, slowly.)

And here is the part I keep coming back to. The person who established that the headline bug is a runtime issue and not a hardware one, who narrowed the secondary failure to large allocations and ruled out the wrong explanations, was not AMD and was not the board vendor. It was a handful of people on a subreddit, over two evenings.

What the $3,999 actually buys

I want to be careful here, because the lazy version of this post is "AMD sells a broken ROCm box," and that is not what I am saying. I have no evidence for it. I have not touched a Halo Box.

In fact the opposite is likely true. AMD almost certainly validates its own hardware properly: curated drivers, a known-good ROCm build, a fixed configuration it controls end to end, and direct support. That is worth money. For $3,999, that is what you are paying for. You are not paying for the chip. The same chip sits in a GMKtec box (affiliate) for around $2,000, and in thirty-plus other third-party machines starting near $2,399. You are paying for the guarantee that the stack on top of the chip works, because AMD assembled and tested that exact stack.

Which is precisely the tell. The Halo Box is a $3,999 answer to a problem AMD's own marketing will not name: that "first-class ROCm support" for Strix Halo, today, is not a property of the chip you can rely on. It is a property of one specific board-plus-driver-plus-runtime combination that AMD will sell you pre-assembled at a premium. On the cheaper boxes running the identical silicon, you assemble that combination yourself, and as I have spent six months learning, the parts do not currently fit.

One reviewer put the buyer's side of this plainly: the clean path for someone who just wants a working local-inference setup is the AMD box, and the third-party route is for people willing to spend real time on ROCm setup. That is the same point I am making, from the other direction. The premium is not for the hardware. It is for not having to find out what I found out.

If you are buying one anyway

The useful version of all this is a checklist. If you are buying a Strix Halo machine specifically for ROCm, the question is not "does the chip support ROCm." Every one of these boxes has the chip. The real questions are all one level down, at the board:

Does ROCm actually run on this specific board, with user reports to back it? Not the chip's spec sheet. The chip's spec sheet is identical across all of them. #6182 is board-specific: two machines with the same silicon, opposite outcomes.
Is the board a rebadge of something with a track record? Mine is a Sixunited AXB35-02, which ships under several brand names. Searching the actual board, not the logo on the case, is how you find the real reports.
Which ROCm userspace are you committing to, and does anyone keep it working? The system packages, a community container, and AMD's own bundle gave me three different failure modes on one board. That is the part nobody puts on a spec sheet.
Who ships firmware for it, and does it come with anything you can read? Bosgame shipped a BIOS with no changelog that fixed nothing I was chasing.

Or you buy the box where AMD has answered all four questions for you, and you pay the roughly $2,000 premium for the answers. That is a legitimate trade. It is just not the trade the marketing describes. "The chip supports ROCm" and "your board runs ROCm" are different claims, and the $2,000 is the gap between them.

The part the spec sheet skips

I will end on the thing that actually bothers me, which is not the price and not the bug.

In six months, the chain that is supposed to own this produced the following. AMD's runtime ships in at least three flavors that behave differently on the same chip. The board vendor shipped a BIOS with no notes. The issue tracker logged the bug, closed it for lack of an in-house reproduction, then reopened it. And the product page calls the whole thing first-class.

In two evenings, a few unpaid people on a subreddit established that the headline crash is a runtime bug rather than a hardware fault, narrowed the secondary failure to large allocations, and eliminated three wrong explanations on the way. I contributed what I could test and reported it upstream.

That is the local-AI reality the spec sheet skips. Not that the hardware is bad. It is remarkable for the money, and Vulkan carries my entire production stack with no penalty I can measure. The skipped part is that "first-class ROCm support," for now, is something you either buy pre-assembled for $3,999 or assemble yourself from parts that do not currently fit, with the assembly notes written by other users at night.

Two honest loose ends, so I am not overclaiming. An AMD engineer on the issue asked me for an amdgpu-dkms comparison that I still owe him. And I have not run a Halo Box. If AMD's pre-assembled stack does exactly what the page says, that is genuinely good, and it makes the box a real shortcut for people who want one. It would also leave the central point untouched: the shortcut costs $2,000, and the long way around still does not reach the same place.

Both of those get answered here, with numbers, when I have them. That is the whole point of this: what runs on hardware you own, measured, every Thursday. No vendor decks. Subscribe, and you'll get them.

What "no ROCm" actually means here

The same vendor ships ROCm that crashes, ROCm that hangs, and ROCm that almost works

What the $3,999 actually buys

If you are buying one anyway

The part the spec sheet skips

Discussion

Related field reports