Field report · Strix Halo

Buy or Wait: Reading the Local LLM Hardware Question in a Memory Crunch

The instinct is to wait for the next box. In a memory crunch, that's backwards. How to read the buy-or-wait question for Strix Halo, DGX Spark, and Mac Studio.

Author
Date
2026-07-02
Read
6 min read
Topics
HardwareBuying GuideLocal LLM
Buy or Wait: Reading the Local LLM Hardware Question in a Memory Crunch
Photo by Liam Briese / Unsplash

On June 25, 2026, Apple raised the price of the 96GB Mac Studio from $3,999 to $5,299, and Tim Cook called the memory market a hundred-year flood. If you've been putting off buying a box for local inference, that overnight $1,300 should give you some pointers.

(Prices and dates below are as of late June 2026, with sources linked. In a market moving this fast, treat the specific numbers as a snapshot and the reasoning as the part that lasts.)

I've run a Strix Halo box as my daily local-inference machine for six months. The question I get more than any other right now is some version of "should I buy one, or wait for the next thing?" It's a fair question, and the answer has changed. The usual instinct with computer hardware is to wait: give it a few months, the next generation lands, the current one drops in price, and you get more for less. That instinct is right almost all the time. For this class of machine, right now, it's backwards, for two reasons worth understanding before you spend two to five thousand dollars. (If you want the case for which box to buy rather than when, I ranked them by memory bandwidth in the buyer's guide - with a bias on token generation though. This one's about the other half.)

Reason one: waiting costs more right now

The memory shortage driving this isn't the usual cycle. Past shortages eased when the makers brought new capacity online. This one comes from a choice: the same fabs can produce either commodity DRAM or the high-bandwidth memory that AI accelerators need, the high-bandwidth kind is far more profitable, so the capacity is going there. The specific memory in these boxes, LPDDR5X, has been hit hardest of all. One common module rose 89 percent in a single quarter. Analysts don't expect real relief before late 2027.

Now add the part specific to these machines. The memory is soldered. You pick a capacity at purchase and you live with it for the life of the box. You cannot add a stick later. So the capacity decision is permanent, and you're making it in the most expensive memory market in decades, one where prices are still climbing.

Put those together and the usual reason to wait disappears. The box you want keeps getting more expensive while you wait, and the high-memory configurations are being cut rather than added. Apple removed the 512GB Mac Studio earlier this year, and the M3 Ultra now tops out at 96GB. The DGX Spark went from a $2,999 announcement to $4,699 in about a year. I see it on the box I run: I paid under $2,200 for it, and the same one sells for around $2,800 today. Waiting used to be free. Right now it has a running cost.

Reason two: the bigger successor is not faster

This is the part that's easy to miss, and it's the one I feel most directly from running a box every day. All of these machines are limited by memory bandwidth when they generate tokens. Bandwidth sets your tokens per second, and on this class of hardware it sits around 256 to 273 GB/s. A successor with more memory but the same bandwidth lets you load a bigger model, then runs it at the same speed.

That matters for timing, because the headline reason to wait is usually "the next one has more." The next Strix Halo refresh jumps to 192GB. Apple has tested the next Mac Studio up to 768GB. Both let you load larger models. Neither makes the model you already run any faster, because the bandwidth ceiling barely moves. So "wait for the bigger box" only helps if you're short on capacity today. It does nothing for speed. If the models you run already fit, a 192GB or 768GB successor isn't the upgrade it looks like.

That's the generation side. Processing the prompt in the first place, the wait before the first token appears, is a separate, compute-bound step, and there the ranking between these boxes looks different. I come back to it on the Mac, because that's where it changes a decision.

Announced is not buyable

The other half of "wait" is "wait for the next generation." Here the crunch plays a second trick. In a tight market the gap between announced and available stretches, and so does the gap between a launch date and a box actually shipping.

The examples are current. The M5 Mac Studio was expected in the first half of 2026. It slipped, and the likely window is now around October, tied directly to memory supply. AMD's next refresh, Gorgon Halo, is aimed at roughly the third quarter of 2026, but it stays on the same RDNA 3.5 graphics, so it's more memory than a new architecture. The genuine architecture jump, the RDNA5-based Medusa Halo, is a later generation that isn't on the near-term roadmap.

So the rule is simple. Let an unreleased box change your timing only if it's both announced and dated, and even then, assume it lands later and costs more than the slide suggests. Just keep that in mind.

So the only real question left is your own urgency

If waiting doesn't save you money and the bigger successor isn't faster, the decision comes down to one question: do you need the machine now, or can you genuinely wait six to eighteen months? Be strict with yourself. "It would be nice" isn't "I need it," and in this market the difference is real money. Everything below is that question applied to each box.

Reading it per box

Strix Halo. This is the one I run, the mature option, and the cheapest way in, with third-party boxes well under the $3,999 of AMD's own dev kit. The near-term refresh, the 192GB AI MAX+ 495, is more capacity on more or less the same bandwidth, so it earns your wait only if you specifically need to load models past 128GB. It won't make anything noticably faster. The real step up is Medusa Halo, a generation further out, which is a long time to sit on your hands. If 128GB covers the models you actually run, waiting buys you very little here.

DGX Spark. The price has moved in one direction, up about 57 percent from its announcement to $4,699, and NVIDIA hasn't committed to bringing it back down when memory eases. The gains since launch came through software on the same hardware, not a new box. A "Spark 2" on next-generation silicon is rumored but has no date, so it shouldn't touch your timing. The wait case here is weak. What you pay for is CUDA and a polished, supported stack today, and if that's what you need, the cheaper OEM clones like the ASUS Ascent GX10 (affiliate) take some of the sting out of the price.

Mac Studio. This is the one box where waiting has a real argument, and also the one where the market makes that argument hard to call. The M5 and M5 Ultra refresh is a genuine step rather than a spec bump, and the reason is specific. The Mac's real weakness today shows up before generation even starts: it's slow to chew through a long prompt before the first token appears, because that step leans on compute the M-series has been short on. The M5 adds matrix accelerators inside each GPU core aimed straight at that, up to four times faster time to first token. If you run long prompts, large documents, or agents, that's the upgrade that matters, and at around October it's close. If you want a Mac and can wait the few months, I lean toward waiting for it. But go in with your eyes open. You'd be waiting for performance, not a lower price. Apple just raised the current Studio by $1,300 and said more increases may follow, so the M5 is likely to land at or above today's levels, and if supply is still tight it may be capped at launch the way the M3 Ultra got cut to 96GB. This is the shakiest call in the post: the silicon says wait, the pricing is genuinely unpredictable, and Apple hitting October isn't a given. If you need the machine in front of you and the high-memory config matters, buying the current box before the next hike is just as defensible.

What it comes down to

After six months with one of these boxes, the timing question is less interesting than it looks, because two of the usual reasons to wait have stopped applying. Waiting doesn't save you money right now, and the bigger successors mostly add capacity, not speed. What's left is your own urgency, weighed against what's genuinely dated. For most people running models that fit in 128GB, the boxes that exist today do the job. The Mac is the one place a short wait makes real sense, with the caveat that you're betting on a price nobody can predict.

Whatever you decide, one thing remains: soldered memory makes capacity a one-time decision. Buy the capacity you'll need over the life of the box, not the minimum that fits today's models, because you can't add it later and the next box won't be cheaper. The bandwidth ceiling is set in silicon, so the capacity arms race mostly buys you the ability to run bigger models more slowly - breakthrough all-in-one architectures for running bigger local models are yet to manifest - and are not likely to be affordable any time soon.

Discussion

More like this — every Thursday.

Field reports from running open-weights models on the hardware you actually own. No pitches.