Pure Signal AI Intelligence
<hr> <strong>PURE SIGNAL</strong> March 28, 2026 <p>Here's a paradox worth sitting with. Everyone assumed GPU prices would crater as newer chips arrived. Instead, H100s—four-year-old hardware—are now worth <em>more</em> than when they launched. That price signal tells you everything about where we are right now.</p> <p><strong>Compute Is the New Moat</strong></p> <p>The H100 rental market bottomed out after the DeepSeek shock in early 2025, and most observers assumed the depreciation cycle would continue. It didn't. Since December, prices have reversed sharply upward. Dylan on Dwarkesh put it bluntly: H100s are worth more today than three years ago.</p> <p>Why? Two forces converging. First, reasoning models and agents—the inference patterns that emerged in late 2025—are dramatically more compute-hungry than simple chat. Second, better software makes old hardware more valuable. A chip that could do X last year can now do significantly more X with improved inference stacks. The depreciation schedules data centers built their business models around assumed static software. That assumption was wrong.</p> <p>This connects directly to what leaked this week about Anthropic. A now-pulled post described something called Claude Mythos—or Capybara—a new tier above Opus, reportedly larger and more capable than Claude Opus 4.6, with substantially better scores on coding, academic reasoning, and cybersecurity. The rollout is constrained, by both cost and safety concerns. And a Financial Times report says Google is close to funding Anthropic's data center infrastructure. The pattern is clear: frontier competition is now gated by power and capital expenditure, not just algorithmic insight.</p> <p><strong>The Quantization Wars: Speed, Controversy, and a Challenger</strong></p> <p>While the frontier labs battle for compute, a fascinating technical fight is playing out at the other end of the stack—local inference, where researchers are squeezing every last bit of performance out of consumer hardware.</p> <p>The center of this week's debate is TurboQuant—a compression technique from Google's ICLR 2026 paper. The core idea: apply random rotations to model weights before quantizing them—essentially compressing models to use less memory—which dramatically improves accuracy at high compression ratios. Someone ran it on a base MacBook Air M4 with sixteen gigabytes of RAM, achieving twenty-thousand-token context on Qwen 3.5 nine billion. That was previously impossible on that hardware.</p> <p>But the controversy runs deep. A researcher named Gao alleges the paper misrepresented RaBitQ—a prior technique—in both theory and benchmarking, including unfair CPU-versus-GPU comparisons. This doesn't kill TurboQuant's engineering value, but it casts doubt on the published comparison claims.</p> <p>Enter RotorQuant. A new paper proposes replacing TurboQuant's expensive matrix operations with Clifford rotors—a mathematical structure borrowed from geometric algebra, where you represent rotations as compact algebraic objects rather than large matrices. The result: ten-to-nineteen times faster than TurboQuant, with forty-four times fewer parameters, and nearly identical cosine similarity scores. The tradeoff is some theoretical degradation on worst-case vectors. But in practice, on real model distributions, the fidelity holds.</p> <p>Meanwhile, a separate researcher discovered that ninety percent of the key-value cache dequantization work—where the model retrieves stored intermediate computations—can simply be skipped. Most attention weights are negligible, so why compute them? Three lines of code change yielded a twenty-two-point-eight percent decode speed improvement at thirty-two-thousand token context. The community's reaction was delight at the elegance.</p> <p>The practical upshot: local inference is advancing fast enough that workflows previously requiring cloud APIs are migrating to consumer hardware. One developer replaced an expensive text-to-speech subscription with a local Qwen 3.5 fourteen billion setup. Another compressed a thirty-five billion parameter model to fit full context into twenty-four gigabytes of VRAM with roughly one percent average performance drop.</p> <p><strong>Agents Growing Up: From Demos to Software Infrastructure</strong></p> <p>The agent story this week is less about breakthrough capabilities and more about the infrastructure maturing around them. That's actually the more significant signal.</p> <p>Nous Research's Hermes Agent integrated Hugging Face as a first-class inference provider, giving users access to twenty-eight curated models plus hundreds more—with persistent memory and machine access. Hugging Face's CEO Clément Delangue called this a step toward genuinely open agents. User reports emphasize lower friction than browser-automation-heavy alternatives.</p> <p>LangChain pushed a cluster of production tooling simultaneously: an agent eval—evaluation—readiness checklist, IDE-style UI guidance, and prompt promotion and rollback workflows via LangSmith. The direction is unmistakable. The stack is evolving from "chatbot with tools" to something resembling a full software development lifecycle—but for agents.</p> <p>The benchmark problem is also getting more honest. Artificial Analysis introduced a new agent performance benchmark focused on real coding trajectories, sequences over a hundred thousand tokens long, and throughput measured in concurrent users per kilowatt per dollar per rack. That's a deployment-relevant abstraction. It reflects what actually matters when you're running agent fleets, not toy tasks.</p> <p>OpenAI's Codex ecosystem is moving the same direction—persistent workspaces, issue trackers, terminals, PR flows, and plugins. One observer described the emerging pattern as kanban-style fleet management for software agents.</p> <p><strong>What Vibe Coding Teaches You</strong></p> <p>Simon Willison got frustrated with Activity Monitor on his new M5 MacBook Pro and decided to build replacements using Claude and GPT-5.4. No Swift experience. Minimal prompting. Two working macOS menu-bar applications later—one showing per-process network bandwidth, one showing GPU and memory state—he had something worth reflecting on.</p> <p>His candid observation is worth quoting directly: he has no idea if the numbers are accurate. He caught the GPU app reporting only five gigabytes of free memory when that was clearly wrong. A screenshot into Claude Code fixed the calculation. But confidence remains low.</p> <p>This is the honest state of vibe coding—rapidly spinning up functional-looking tools, without the expertise to validate what's underneath. Willison's framing is right: these apps are interesting as demonstrations of what's possible, not as trusted instruments. The capability is real. The epistemic gap between "it runs" and "it's correct" is also real.</p> <hr> The through-line this week: raw compute is scarcer and more valuable than anyone modeled, local inference is compensating with remarkable efficiency gains, and the agent stack is quietly accumulating the production primitives that transform experiments into infrastructure. The frontier and the edge are both accelerating—just in opposite directions on the cost curve. <hr>
HN Signal Hacker News
It was a quiet Saturday on Hacker News. No viral drama, no hot takes about the latest AI model. Just a community settling into something rarer — genuine curiosity about old things, skepticism about future things, and a few small tools built with care. On days like this, you learn what the HN crowd actually loves when nobody's watching.
THE RETRO COMPUTING REVIVAL
The thread that generated the most warmth today wasn't about anything new at all. It was a circuit-level emulator — a simulator that recreates a computer at the level of individual electronic signals — for the PDP-11/34. That's a minicomputer from Digital Equipment Corporation that most people stopped using before 1990.
Here's what makes this interesting. The emulator doesn't just run old software. It simulates the actual circuits inside the machine — every transistor, every signal path. And you can run it right in your browser, thanks to a WebAssembly port. WebAssembly — or Wasm — is technology that lets complex code run inside a web browser at near-native speed.
The comments lit up with nostalgia. User hank1931 wrote: "As a student I worked at a lab and had a PDP-11/10 all to myself. But of course I desired more. Six years later I worked for a company that purchased a PDP-11/70 running RSTS/E. I had died and gone to heaven." And user dboreham offered a different kind of awe — suggesting this was probably how our universe got started. As in: some god-like being's side project to build a subatomic-level simulator. That's the kind of philosophical spiraling that happens when you stare long enough at a virtual machine.
This connects to another story today — Undroidwish. It's a single-file, batteries-included binary for Tcl/Tk — a programming language and graphical toolkit that peaked in popularity around the time the internet was just getting started. The appeal here is portability. You can run it on almost anything, including Android. User pm3003 was delighted: "I've been looking for this. Tcl apps are hard to run today when you're not a developer. Like the wonderful Grimm dictionary compiled into a Tcl app ages ago by a German university." That's the through-line. Old tools, built with care, still useful — and the quiet engineering effort to keep them alive.
Then there's rpg.actor — a platform for hosting tabletop role-playing game character sheets, built on AT Protocol. AT Protocol is the open social technology behind Bluesky. The platform is running a game jam — a timed creative competition — and the first prize is a mint copy of RPG Maker 2000. User klaussilveira captured the mood perfectly: "The fact that the first prize is a mint copy of RPG Maker 2000 brings me fuzzy feelings. The Quake and RPG Maker community had a profound impact on who I am today." Three different stories. Same underlying impulse — honoring the tools and communities that shaped an entire generation of builders.
THE "COULD" PROBLEM
Shift gears, and you hit a completely different kind of energy. A Cambridge University press release announced a new computer chip material — a hafnium-based memristor, inspired by the human brain — that could dramatically slash AI energy consumption. Memristors are electronic components that can remember their state even without power, similar to how synapses work in the brain.
The HN response was swift and dry. User wg0 offered what felt like a community law: "Law of headlines — 'could happen' would never happen." User crest just quoted the word "could" with no further comment. And user random3 pointed out that similar papers — using nearly identical framing — have been appearing for years. The same Reddit post from four days earlier. Similar papers from three years ago.
This isn't cynicism for its own sake. It's pattern recognition. HN has seen enough "new battery," "new cancer treatment," and "new AI chip material" headlines to know the distance between academic lab result and deployed product is vast. The community is genuinely interested in neuromorphic computing — computing inspired by how the brain works. But they've also learned to hold the enthusiasm at arm's length until the engineering catches up to the press release.
The paper packaging story — Fraunhofer Institute research on sealing paper without adhesives, using a carbon dioxide laser to create natural sugar-like binding compounds — got a much gentler reception. Probably because it made no sweeping claims. It just explained the technique quietly, and user adolph quoted the mechanism with obvious appreciation. When researchers show their work without hype, HN responds in kind.
WHEN TOOLING BECOMES OPINION
There's a developer argument that never really ends — it just changes clothes. Today it showed up as a blog post titled "Stop Picking My Go Version for Me." Go is Google's programming language, popular for building servers and infrastructure. The complaint: when you publish a library in Go, your go.mod file — the file that describes your project's dependencies — can push a specific version requirement onto everyone who uses your code. The author calls this "viral" and argues library authors shouldn't impose toolchain choices on their users.
The response in the comments was mostly skeptical of the complaint itself. User cweagans asked the real question: "What are the actual, practical results of a package pushing you towards a higher Go version that you wouldn't otherwise have adopted right away?" User dherls was blunter — the author never actually gives a concrete example of harm. User PaulKeeble offered perhaps the most grounded take: one of Go's key advantages is that upgrading has almost never broken things. Old code compiles fine on new toolchains. User websap summarized with two words: "Skill issue."
It's a minor debate in the scheme of things. But it points at something real — the tension between library authors who want to use new language features and application developers who want stability and control over their own build environments. That tension exists in every ecosystem.
TreeTrek, a clean web viewer for raw Git repositories — the underlying version control systems that store code history — landed quietly alongside all this. Low points, small discussion, but user quantummagic called it "premium feel." Small tools, built with taste, often find their audience slowly.
Closing Thought
What today's HN reveals, in its quiet way, is the community's double nature. There's a deep romanticism here — for old computers, for ancient toolkits, for the RPG Maker communities of the nineties that turned kids into creators. And there's a hard-edged skepticism — for any headline that uses the word "could," for any research that sounds better in a press release than in a data sheet.
Both instincts come from the same place. A love of things that actually work — and a long memory for the things that didn't.