March 15, 2026

Pure Signal AI Intelligence

Something fundamental shifted in how software gets built—and the cracks are starting to show. Today we're looking at three converging stories: what expert agentic engineering actually looks like in practice, the open source crisis that's emerging from AI-generated code floods, and a billion-dollar bet that the entire token-prediction paradigm is the wrong foundation for serious AI.

The Practitioner's Playbook—and Its Unintended Consequences

Simon Willison gave a detailed fireside chat this week on what he calls agentic engineering. It's worth unpacking, because he's one of the most careful thinkers actually doing this work daily.

His core thesis: the inflection point wasn't a model—it was Claude Code plus Sonnet 3.5, about a year ago. That combination was the first time a model felt capable enough to drive a terminal usefully. Since then, the progression has been rapid. Willison says he's now "oneshotting basically everything"—two-sentence prompts that reliably produce working code.

But reliability comes from discipline, not luck.

His most emphatic point is about testing. He starts every coding session with red-green TDD—test-driven development, where tests are written before code. He hated this approach for his entire career. With agents, he doesn't care. The agent spins up tests, he walks the dog, and code quality goes up dramatically. His line is sharp: "Tests are effectively free now. They are no longer even remotely optional."

He's also doing something clever with manual testing. He built a tool called Showboat that has agents exercise an API themselves—running curl commands, logging results, catching bugs the test suite missed. Because a passing test suite doesn't mean the server actually boots.

Then there's what he calls conformance-driven development. For adding file uploads to his framework Datasette, he had Claude build a test suite that passed across six different frameworks—Go, Node, Django, Starlette. Then he used that test suite as the spec. Reverse-engineering six implementations to derive a standard, then implementing the standard. That's a genuinely new way to build.

On trust and sandboxing: Willison describes the "lethal trifecta"—prompt injection attacks succeed when a model has access to private data, is exposed to malicious instructions, and has an exfiltration vector to send data out. His recommendation is to run coding agents in containers Anthropic controls, not on your local machine. He admits, with some self-awareness, that he mostly ignores his own advice because it's so convenient not to.

Here's the tension though. All of this individual productivity comes at a collective cost.

Willison cites Jannis Leidel's decision to sunset Jazzband—a Python open source collective built on shared push access and open membership. That model worked when the worst-case scenario was an accidentally merged PR. It doesn't work when only one in ten AI-generated pull requests meets project standards—and when curl had to shut down its bug bounty entirely because confirmation rates dropped below five percent.

GitHub's response to the flood of AI-generated spam PRs was something unprecedented: a kill switch to disable pull requests entirely. Pull requests are the fundamental unit of open source collaboration. The fact that disabling them is now on the table tells you something about how badly the signal-to-noise ratio has degraded.

Willison's observation is quietly devastating. The productivity gains that agents enable for individuals are built entirely on the open source ecosystem those same agents are now helping to overwhelm.

The Case Against Token Prediction

While the coding world debates how to use LLMs better, Yann LeCun just raised a billion dollars on the thesis that LLMs are the wrong tool for the most important jobs.

LeCun's new lab—Advanced Machine Intelligence, or AMI Labs, based in Paris—closed a one-point-three billion dollar seed round this week. Probably Europe's largest seed ever.

The intellectual argument is crisp. LLMs—large language models—predict tokens, which are discrete chunks of text. That works well for information retrieval, summarization, coding, mathematics. But factories, hospitals, and robots don't operate in tokenized environments. Reality is continuous, noisy, and high-dimensional.

LeCun's alternative is world models—AI systems that learn abstract representations of physical reality and make predictions in representation space rather than output space. Action-conditioned world models let agentic systems predict consequences of actions and plan sequences toward goals.

AMI CEO Alexandre LeBrun put it plainly: "Despite their immense power, I do not believe that generative architectures are the path to achieving true understanding."

What's notable is the timing. Fei-Fei Li's World Labs just raised a billion dollars on a similar premise. Two of the most credentialed researchers in AI—separately, nearly simultaneously—are making the same bet against the dominant paradigm.

LeBrun's prediction: "world models" will become the next buzzword. In six months, every company will call itself a world model to raise funding. Which means the signal will get noisy fast. But the underlying scientific question is serious: can token prediction ever get you to the kind of grounded, physical understanding that robots and medical devices actually need?

Opening the Black Box at Scale

Bridging the gap between these two conversations is interpretability research—and Berkeley AI Research published something genuinely useful this week.

The challenge they're tackling is this: model behavior rarely comes from isolated components. It emerges from interactions—combinations of features, training examples, and internal components working together. The number of potential interactions grows exponentially as models scale. That's made comprehensive analysis computationally infeasible.

Their framework, SPEX—short for Spectral Explainer—reframes the problem. Instead of searching exhaustively, it exploits two structural properties. First, sparsity: relatively few interactions actually drive outputs. Second, low-degreeness: influential interactions typically involve only a small subset of features. Using tools from signal processing and coding theory, SPEX can identify these interactions with dramatically fewer computational probes.

A follow-on algorithm called ProxySPEX adds a third property: hierarchy. When a high-order interaction matters, its lower-order subsets usually matter too. That insight cuts the computational cost by roughly ten times.

The concrete results are striking. On a modified trolley problem where the correct answer was unambiguous—but GPT-4o mini got it right only eight percent of the time—standard attribution methods pointed to individual instances of the word "trolley" as culprits. ProxySPEX found something richer: a high-order interaction between both instances of "trolley," plus the words "pulling" and "lever." When those four words were replaced with synonyms, the failure rate dropped to near zero.

The framework also works for mechanistic interpretability—understanding which attention heads inside a model are responsible for specific behaviors. The finding there connects to LeCun's broader point: early layers in transformers are largely linear, with components contributing independently. Later layers are where interactions dominate—where the complex, emergent behavior actually lives.

Understanding those interactions might be essential for knowing when to trust a model. Which connects back to Willison's point about tests being free. The interpretability research tells you why the model succeeded or failed. The testing infrastructure tells you whether it succeeded or failed. Both are load-bearing in any world where we're running agents with real consequences.

What ties today's threads together is a single underlying tension: capability is outrunning understanding. We can build things faster than we can audit them, flood platforms faster than communities can filter, and deploy models into physical environments before we've mapped what's actually happening inside them. The most interesting work happening right now—from SPEX to world models to careful agentic engineering—is all trying to close that gap.

HN Signal Hacker News

🌅 Morning Digest — Sunday, March 15, 2026

Good morning! Here's what the tech world was chewing on while you slept.

🔺 Top Signal

[Ageless Linux: An OS that won't ask your age — and won't apologize for it](https://agelesslinux.org/) A new Linux distribution is making a pointed political statement about age verification laws — and it's picking a fight on purpose.

California recently passed a "Digital Age Assurance" law that requires operating systems to include a way for apps to verify a user's age — think of it like a digital ID check baked into your computer. Ageless Linux is a new version of Linux (Linux is a free, open-source operating system — "open source" means the code is publicly available and anyone can read, modify, or distribute it) explicitly built to refuse compliance with this law. They've registered the OS under the legal definitions of the law, then declared they won't follow it anyway. That's not accidental — it's a calculated act of civil disobedience designed to force a legal showdown. The community is split: some see it as heroic open source activism; others think it's "performative" grandstanding without a real legal defense plan. One commenter, nextos, noted that age verification debates have "popped up almost simultaneously in the US, UK, and EU" with "the same logical fallacies" — suggesting coordinated lobbying rather than organic public concern. The conversation is genuinely important: what happens when your computer's operating system becomes a checkpoint for government-mandated ID?

[HN Discussion](https://news.ycombinator.com/item?id=47381791)

[MCP Is Dead; Long Live MCP — A debate about the future of AI tool-wiring](https://chrlschn.dev/blog/2026/03/mcp-is-dead-long-live-mcp/) The AI developer community is in a loud argument about the best way to give AI systems access to outside tools — and one of the web's most respected programmers weighed in.

MCP stands for Model Context Protocol — think of it as a standardized plug adapter that lets AI assistants (like Claude or ChatGPT) connect to external services, like your calendar, a database, or a code tool. The author argues MCP is over-engineered and that simple command-line tools (programs you run by typing commands in a terminal) do the job better and more efficiently. The counterargument: MCP creates a common language that anyone can use without custom setup, which matters especially for non-developers. The reason this matters is that how we wire AI tools to the world will shape what AI agents can actually do — get it wrong, and you either have chaos or a bottleneck. The debate got extra spicy when antirez — the creator of Redis, a widely-used database tool, and something of a legend in tech circles — dropped this gem: "Often times, what is practical for humans to use, it is for LLMs. And the reply is almost never the kind of things MCP exports." In plain English: he thinks MCP exports the wrong kinds of capabilities.

[HN Discussion](https://news.ycombinator.com/item?id=47380270)

[Allow Me to Get to Know You, Mistakes and All — a plea against AI-ghostwritten communication](https://sebi.io/posts/2026-03-14-allow-me-to-get-to-know-you-mistakes-and-all/) A short essay about AI-polished Slack messages hit a nerve — and the comment section became a fascinating confessional.

The author argues that when colleagues use AI to write their internal communications, it erases the quirks, mistakes, and personality that help you actually know someone. The HN community split into two honest camps: those who find AI-polished messages hollow and alienating, and those who admitted they've started doing it — especially in high-stakes environments where one misread message can cost you your job. Commenter anal_reactor was refreshingly candid: "This is mostly an effect of the communication environment we created — taking risks is rarely rewarded, and mistakes can be very costly." Meanwhile, borski offered the most interesting counterpoint: he uses AI to get past the blank page, then edits heavily — so the ideas are his, but the AI breaks the paralysis. This isn't just a vibe debate; it's a real question about whether AI is slowly flattening human voice out of professional life.

[HN Discussion](https://news.ycombinator.com/item?id=47381736)

👀 Worth Your Attention

[How Kernel Anti-Cheats Work](https://s4dbrd.github.io/posts/how-kernel-anti-cheats-work/) A meaty technical explainer on the software that prevents cheating in online games. "Kernel level" means the anti-cheat software runs at the deepest, most privileged level of your computer — the same level as the operating system itself. It can see everything. Commenters are divided: some find this level of access horrifying from a security standpoint (game companies have accidentally shipped malware this way before), while others shrug and say it's the only arms race that works. A fascinating window into the hidden software arms race happening inside your gaming PC. [HN Discussion](https://news.ycombinator.com/item?id=47382791)

[Han: A Programming Language Written in Korean](https://github.com/xodn348/han) Someone built a fully working programming language where all the keywords are written in Korean (Hangul script) instead of English. Most programming languages use English words like `function`, `while`, and `return` — Han replaces those with Korean equivalents. It's a fun experiment, but it also surfaces a real point: billions of programmers worldwide have learned to code in a language (English) that isn't their own. zellyn put it nicely: "It's fun to look at your code samples, have absolutely no clue what any of it means, and think about just how many non-English-speaking programmers must have felt that way." [HN Discussion](https://news.ycombinator.com/item?id=47381382)

[Rack-Mount Hydroponics](https://sa.lj.am/rack-mount-hydroponics/) A delightful project where someone converted a server rack — the kind of metal shelving unit normally used to house computer servers in data centers — into a tiered hydroponic (soil-free, water-based) garden. Complete with automated pumps, grow lights, and cron jobs (scheduled computer tasks) to manage watering. It's wonderfully over-engineered in the best possible way. The comment that "growing my own produce is a great way to appreciate farmers" is funnier in context than it sounds. [HN Discussion](https://news.ycombinator.com/item?id=47384352)

[Bumblebee Queens Can Breathe Underwater for Over a Week](https://www.smithsonianmag.com/science-nature/bumblebee-queens-breathe-underwater-to-survive-drowning-revealing-how-they-can-live-submerged-for-a-week-180988330/) Researchers discovered that queen bumblebees — the ones who hibernate underground over winter — can survive complete submersion in water for more than a week. They're not just holding their breath; they appear to have a genuine mechanism for extracting oxygen from water during hibernation. Nobody knew this before. The discussion veered into whether insects feel pain (genuinely unresolved science), plus some lovely side notes about bumblebees hibernating in bags of autumn leaves. [HN Discussion](https://news.ycombinator.com/item?id=47381011)

[Hostile Volume: A Game About Terrible UI](https://hostilevolume.com/) A browser game where your only goal is to adjust a volume slider to 25% — but each level gives you a deliberately broken, infuriating interface to do it with. Rate-limited knobs, sliders that run away from your mouse, backwards controls. It's funny and quietly educational: several commenters noted they've encountered these exact horrors in real software. Retr0id delivered the most relatable observation: "There are two types of volume slider I've encountered: 'too logarithmic' and 'not logarithmic enough.'" (Logarithmic means the scale is uneven — quiet sounds need more precision than loud ones.) [HN Discussion](https://news.ycombinator.com/item?id=47379712)

💬 Comment Thread of the Day

From the MCP debate — [HN Discussion](https://news.ycombinator.com/item?id=47380270)

The MCP thread is worth reading for one reason: it's the rare technical debate where the community is genuinely split by experience level, not just opinion.

antirez (creator of Redis, a database tool used by millions of websites) cut straight to the chase: > "As yourself: what kind of tool I would love to have, to accomplish the work I'm asking the LLM agent to do? Often times, what is practical for humans to use, it is for LLMs. And the reply is almost never the kind of things MCP exports."

The key insight here: MCP tends to expose the internals of an API (a way for software programs to talk to each other) rather than high-level, practical operations. It's like giving someone directions by listing every gear shift instead of just saying "turn left at the church."

Meanwhile, MaxLeiter from Vercel's v0 product pushed back with a genuinely good counterexample: his team uses MCP so that users can connect Stripe or Supabase (common developer tools) to their AI with one click, zero configuration. That's the strongest real-world case for MCP — when the user needs the simplicity, not the developer.

And codemog just wanted everyone to know he called it from day one: "As soon as MCP came out I thought it was over-engineered crud." Reader, he may be right. He may also be the person who said the same about REST APIs in 2000.

✨ One-Liner

Today on Hacker News: an OS that refuses to card you, a game where the volume slider runs away from you, and bees that have been quietly breathing underwater this whole time — and none of us had any idea.