Semiconductors & Advanced Manufacturing

May 14, 2026

Semiconductors & Advanced Manufacturing

For years, the AI chip industry was obsessed with one question: how smart can we make the model? This week's dominant story is that the market has quietly answered a different question — and the answer is reshaping which chip architectures are worth tens of billions of dollars.

The Speed Revolution: Fast Tokens Finally Win

Dylan Patel at SemiAnalysis — one of the sharpest technical analysts covering semiconductors — published what he calls a four-articles-in-one deep dive on Cerebras, a Silicon Valley chip company that has spent years betting on an architecture the rest of the industry considered a curiosity. That bet is now paying off spectacularly.

Cerebras builds what it calls the Wafer Scale Engine — a chip (currently the WSE-3) that is literally the size of a silicon wafer, the round disc from which chips are normally cut. Instead of dicing that wafer into hundreds of small chips, Cerebras uses the entire thing as one giant processor. The tradeoff: it's extraordinarily fast at generating tokens (the word-fragments AI models output), but it can't match the raw total throughput of a rack full of Nvidia GPUs for training large models.

For years, Patel notes, this was a problem. The industry cared about training powerful models, and Nvidia's GPU-based approach — using HBM (High Bandwidth Memory, a type of fast memory stacked directly on the chip) — dominated. But the market has moved. Frontier labs like OpenAI now sell multiple pricing tiers for the same model: fast (expensive), standard, and batch (cheap and slow). Users have voted with their wallets. Past a certain intelligence threshold, Patel argues, developers prefer faster output to marginally smarter output — especially when AI is embedded in every step of a workflow.

This shift has been validated by Nvidia's own moves. In December 2025, Nvidia "licensiquihired" Groq — Patel's coinage for a deal that was part license, part acqui-hire — a startup that built inference chips optimized for speed. If Jensen Huang at Nvidia saw at least $20 billion of value there, the inference-speed market is no longer a niche.

For Cerebras, the vindication is concrete: a 750 megawatt compute deal with OpenAI (for context, 750MW is enough power for roughly 600,000 US homes, now being pointed at AI inference). The company is approaching an IPO with a deal that has, Patel notes, "changed the company's fortunes." His piece covers the full architecture, a BOM (Bill of Materials — the cost breakdown of hardware components) analysis, and an assessment of whether Cerebras can actually secure the data center capacity it needs by 2028 to fulfill its OpenAI commitments.

Separately, a startup called Fractile raised $220 million in a Series B round co-led by Accel, Factorial Funds, and Peter Thiel's Founders Fund to develop its own AI inference chips. That a fresh entrant can raise that much, at this stage, speaks to how much capital is chasing the inference-speed opportunity.

Why it matters: The shift from "train smarter models" to "serve them faster" changes which chip architectures win. Nvidia's GPU, designed for parallel matrix math during training, faces new challengers in inference. The companies — and investors — who read this shift early are now in an enviable position.

The Data Center Buildout Is Running Into America's Backyard

AI's insatiable need for compute requires physical infrastructure: buildings full of servers, drawing enormous amounts of power. This week's news carried three separate data center project withdrawals in the US — Prince William County, Virginia; Pekin, Illinois; and Lebanon County, Pennsylvania (where a $1.7 billion project was abandoned, described by opponents as a "victory"). Reasons for the withdrawals were often unstated, but the pattern is clear: community opposition, zoning pressure, and local politics are becoming meaningful friction in the AI buildout.

This is the less-discussed supply constraint in the AI chip story. You can design the world's fastest inference chip, but if you can't build the buildings to run them in, capacity doesn't materialize.

On the other side of the ledger, NHN (a South Korean internet company) launched a 7,656-GPU cluster in Seoul — a reminder that the buildout is global even as the US faces local headwinds.

Why it matters: Physical infrastructure is a genuine bottleneck for AI, not just a boring logistics story. The places that can build data centers efficiently — whether due to permissive zoning, cheap power, or political will — become strategic assets.

Liquid Cooling Goes From Experiment to Requirement

Running AI chips at high density generates extraordinary heat. The traditional answer — blowing cold air through server aisles — is struggling to keep up. This week's coverage reflects a full-scale shift to liquid cooling: Telehouse deployed direct-to-chip liquid cooling at its Toronto data centers (connecting waste heat to a local district heating system, meaning the heat from AI computation literally warms nearby buildings), while the industry published a field guide to integrating high-density liquid cooling into existing air-cooled facilities — a sign that most operators are now retrofitting, not building from scratch.

Direct-to-chip cooling routes water directly to the processor rather than cooling the air around it — it's more efficient but requires new plumbing infrastructure inside facilities designed for air.

Why it matters: Cooling is becoming as important a design constraint as the chips themselves. Data center operators who solve this well can run denser, more powerful hardware. Those who don't are limited in the AI workloads they can take on.

On the Frontier: Quantum Infrastructure Takes a Quiet Step

Two smaller items worth flagging for readers watching longer time horizons. Oxford Instruments — a UK scientific equipment company — installed its PlasmaPro ALD (Atomic Layer Deposition, a technique for coating surfaces with precisely controlled thin films, atom by atom) system at NYU's Nanofab facility, marking the first time the tool will be used for superconducting quantum applications in the US. Quantum computers rely on superconducting circuits that must be fabricated with extreme precision; ALD is one of the key processes that makes that possible.

Separately, a Texas startup called Casimir raised $12 million to develop a chip it claims generates electrical power by harvesting energy from quantum vacuum fields — the background energy that quantum physics says permeates all space. This is sufficiently exotic that it warrants healthy skepticism, but it reflects the breadth of approaches being funded in the search for energy solutions for power-hungry data centers.

The Trend to Watch

The inference-speed story is the one to track. For most of AI's recent history, progress meant bigger models, trained on more data, requiring ever-more compute. The new axis of competition is latency — how fast a model responds. Architectures designed for speed (Cerebras's wafer-scale approach, Groq's inference chips, whatever Fractile is building) are suddenly commercially central rather than technically interesting. Nvidia knows this — hence the Groq deal. The question is whether Nvidia can adapt its GPU-centric ecosystem fast enough, or whether the inference era produces a more fragmented chip landscape than the training era did.

TL;DR - Fast inference is now where the money is: AI users prefer faster responses over marginally smarter ones, and chip architectures built for speed — like Cerebras's wafer-scale engine — are attracting massive deals (750MW with OpenAI) and IPO momentum - A new wave of inference chip startups is being funded: Fractile's $220M raise signals that investors see real opportunity in alternatives to Nvidia for serving AI models, not just training them - The US data center buildout is hitting community resistance: 3 major projects were withdrawn this week; physical infrastructure is a real constraint on AI capacity expansion - Liquid cooling is no longer optional: High-density AI compute generates more heat than air cooling can handle — the industry is retrofitting fast, and the operators who solve this win the next wave of AI workloads

Compiled from 2 sources · 21 items

Data Center Dynamics (20)
Dylan Patel (1)