March 21, 2026

Pure Signal AI Intelligence

Imagine Kepler as a high-temperature language model—trying random relationships for twenty years, some completely wrong, until one works. That reframe, from a conversation between Dwarkesh Patel and Terence Tao, cuts to the heart of today's most important debate. Can AI actually do science? And separately—what happens when you give agents the ability to run science on themselves?

The Verification Problem: Why AI Scientific Discovery Is Harder Than It Looks

Tao and Patel spent two hours pulling apart a deceptively simple question. People assume AI will accelerate scientific discovery because of tight verification loops. But Tao's Kepler story complicates that immediately.

Copernicus's heliocentric model was actually less accurate than the geocentric model it replaced. The better theory survived—not because of data, but because of judgment, narrative, and heuristics we still can't fully articulate. That's a verification loop measured in decades. You can't reinforce-learn your way through it.

Tao's core argument is sharp. AI has driven the cost of idea generation down to nearly zero—the same way the internet drove communication costs to zero. That sounds great. But it shifts the bottleneck. Now we're drowning in hypotheses and starving for verification. Journals are being flooded with AI-generated submissions. Human reviewers are overwhelmed. We can generate a thousand theories a day. We can't evaluate them at that speed.

Here's the number that grounds the hype. Tao has tracked AI progress on the Erdős problems—roughly eleven hundred unsolved combinatorics challenges. About fifty were solved by AI tools over recent months. Then it stopped. Three separate attempts to have frontier models attack every remaining problem simultaneously produced almost nothing. The success rate on any given problem is roughly one to two percent. The wins get amplified on social media. The failures stay quiet. That's selection bias masquerading as a breakthrough.

What AI can do is breadth—and Tao is genuinely excited about that. AI can map an entire field rapidly. It can apply every known technique to every open problem simultaneously. Human experts provide depth. Those two things are complementary, not competing. But—and this is the crucial caveat—we haven't yet redesigned scientific workflows to take advantage of this combination. The paradigm doesn't exist yet.

There's another observation from Tao that's easy to miss. His papers are getting richer and broader because of AI. More plots, more code, deeper literature searches. But the core mathematical insight—the hardest twenty percent of any proof—he still works out on pen and paper. AI has made his papers better. It hasn't yet made them deeper.

The Loopy Era: When Agents Run Science on Themselves

While Tao describes AI's ceiling in formal science, Andrej Karpathy is busy dissolving that ceiling in a different domain. His framing for where we are right now: the loopy era. Agents running continuous self-improvement loops on code and research.

His AutoResearch project makes this concrete. One markdown prompt, roughly six hundred thirty lines of training code, a single graphics processing unit. In two days, it ran seven hundred experiments. It discovered twenty optimizations—including novel architecture tweaks like reordering query-key normalization and rotary position embeddings. No human in the loop. The agent edited the training file, tried ideas, learned from failures, and kept going.

This is the thing Tao says AI can't do—build cumulative partial progress. Karpathy is claiming it's starting to happen, at least in code. The agent isn't just jumping and failing. It's staying on the handhold and pulling.

Karpathy describes a parallel shift in how he works. Coding agents—Claude Code, OpenAI Codex—crossed what he calls a coherence threshold around December of last year. He now runs grids of agents in parallel using tmux—a terminal multiplexer that lets you manage multiple sessions simultaneously. He's building watcher scripts to keep them looping. His next project is an agent command center—because the single-file IDE is dead, and the new unit is teams of agents.

The vocabulary shift he's proposing is useful. Vibe coding—where you describe what you want and get working software—was twenty twenty-five. Agentic engineering is twenty twenty-six. Humans no longer write most code. We direct, supervise, and orchestrate. Technical expertise still matters—it's a multiplier on what you can get out of the agents. But the bits humans contribute are sparse and rare.

Simon Willison offers a vivid illustration of where this is right now. He took the Turbo Pascal three-point-oh-two executable—a thirty-nine-thousand-byte file from nineteen eighty-five that somehow contained a full text editor and Pascal compiler—and asked Claude to decompile it from the raw binary. Claude did it. Then Simon had it build an interactive artifact annotating the assembly, showing which bytes correspond to which parts of the application, with reconstructed readable code. A forty-year-old binary, fully disassembled, in a single conversation. That's not agentic engineering. That's just a very good tool. But it shows the floor rising.

Who Owns the Agent Layer

The third thread running through today's content is a strategic question. As agents become real infrastructure, who captures the value?

David Singleton—former Stripe CTO, now building Dreamer with Hugo Barra of Android fame—has a clear thesis. Agents need an operating system. His platform's core insight is that the sidekick, the personal AI at the center of Dreamer, acts like a kernel. Agents and apps are like user-space processes. They don't talk to each other directly—they route through the sidekick, which enforces permissions and aligns actions with user interests.

This matters for trust at scale. Without that kernel layer, Singleton argues, you get vibe-coded apps that grab your data indiscriminately and can't collaborate. The OS abstraction is what makes agents safe enough for consumers who don't want to think about permissions.

Karpathy frames the same dynamic from a different angle. Applications like Cursor bundle and orchestrate multiple model calls into increasingly complex directed acyclic graphs—networks of model calls with dependencies between them. They do context engineering, provide a vertical-specific interface, and offer what Karpathy calls an autonomy slider. He suspects model labs will produce generally capable graduates. But LLM apps will organize those graduates into deployed professionals in specific verticals—by supplying private data, sensors, actuators, and feedback loops.

Ben Thompson, in a brief aside, calls agents the reason he no longer believes we're in an AI bubble. The demand they create for compute changes the shape of everything. And he flags an interesting crack in a thesis he's held for a while. He previously argued OpenAI and Anthropic are sustainably differentiated because they integrate the model with the harness—the scaffolding around the model. But he thinks the emergence of OpenClaw—open-source alternatives to frontier model tooling—is evidence that integration might not be as defensible as he thought.

The open model ecosystem is asserting itself in another way too. Cursor's Composer Two, a coding assistant, was built on top of Kimi K-two-point-five—a model from Chinese lab Moonshot. The foundation model was open. Cursor did continued pretraining and high-compute reinforcement learning on top of it. Inference was handled by Fireworks AI. Three different organizations, one product. That's what the open model stack looks like when it works.

The thread connecting all of this: we're in a moment where the tools are clearly working—agents running experiments, decompiling binaries, building conference apps in twenty-five minutes. But Tao's caution is worth sitting with. Breadth at scale is genuinely new and genuinely powerful. Depth—the cumulative, partial-progress, narrative-building kind of understanding that let Kepler survive twenty years of being mostly wrong—that's still a human thing. The most interesting work right now is figuring out how to wire those two things together.

HN Signal Hacker News

☀️ Hacker News Morning Digest — March 21, 2026

Good morning! Today's feed has something for everyone: AI coding tools, a military privacy blunder, and Microsoft making promises it may or may not keep. Grab your coffee — let's get into it.

🔺 Top Signal

OpenCode: A Free, Open-Source Alternative to Claude Code Is Taking Off The AI coding tool space just got more crowded — and more interesting.

OpenCode is an open-source (meaning anyone can see, use, or modify the code for free) AI coding assistant that works like tools you may have heard of — GitHub Copilot or Anthropic's Claude Code — but lets you plug in any AI model you want, including free and locally-run ones. Think of it like a universal remote for AI coding helpers. The community is buzzing because Anthropic recently blocked OpenCode from using Claude subscriptions directly, but users have already found workarounds. What's notable is that people describe OpenCode's codebase (the code behind the tool itself) as exceptionally well-organized and educational — a rarity in fast-moving AI tooling. The vibe in the comments is less "this will replace everything" and more "this is genuinely useful today."

[HN Discussion](https://news.ycombinator.com/item?id=47460525)

France's Aircraft Carrier Was Being Tracked in Real Time — by a Fitness App This is the third time in recent memory a fitness tracker has accidentally revealed military secrets. You'd think they'd have figured this out by now.

The French newspaper Le Monde discovered that sailors aboard France's aircraft carrier were using Strava — a popular GPS workout tracking app — while the ship was at sea. Because Strava uploads your run routes to a public map, anyone could watch the ship move across the Mediterranean in near-real time. This echoes a famous 2018 incident where US military bases in Afghanistan and Syria were accidentally mapped out because soldiers were jogging and their Strava heatmap lit up in the middle of the desert. The HN discussion raises a fair point: an aircraft carrier isn't exactly a stealth vehicle — you can see it from shore with the naked eye — but the principle matters. Personal devices on military networks are a persistent, messy security problem, and the "human firewall" keeps having holes in it. Commenter jandrewrogers noted this happens across all militaries, and it's often naïveté plus unwillingness to be inconvenienced rather than malice.

[HN Discussion](https://news.ycombinator.com/item?id=47453942)

Microsoft Says It's Serious About Windows Quality This Time 943 comments. The people have feelings.

Microsoft published a blog post promising to fix long-standing Windows frustrations: reducing unwanted AI "Copilot" pop-ups in apps like Notepad and Snipping Tool, fixing Explorer stuttering and lag, adding taskbar customization (like moving it to the top or side of your screen — something Windows 7 could do and Windows 11 quietly broke), and improving update reliability. The reaction on HN is... skeptical. Commenter gzread summed up a lot of people's feelings in four words: "Listen to their actions, not their words." Others are more hopeful — Someone1234 noted that it took Apple's recent hardware momentum (apparently a product called "the Neo" has people's attention) to finally make Microsoft listen. The announcement is real and contains specific promises, which is better than vague platitudes. Whether they follow through is the whole question.

[HN Discussion](https://news.ycombinator.com/item?id=47459296)

👀 Worth Your Attention

The Los Angeles Aqueduct Is Wild An engineering blog digs into how Los Angeles — a desert city of millions — gets its water from rivers hundreds of miles away via a gravity-powered open canal. This isn't ancient history: the aqueduct was built in 1913, involved some genuine skullduggery (look up the California Water Wars), and the system is still running today. Commenter strongpigeon wonders aloud whether the US has lost its appetite for big infrastructure projects like this — an observation that cuts deeper than it might seem.

[HN Discussion](https://news.ycombinator.com/item?id=47416543)

We Rewrote Our Rust Parser in TypeScript — and It Got Faster Rust is a programming language beloved for being extremely fast. TypeScript is a language used mostly for web apps — not typically known for raw speed. So when a team rewrote a Rust component in TypeScript and it ran faster, eyebrows went up. The real story, though, is in the comments: commenter blundergoat points out the actual win wasn't the language switch at all — it was fixing a hidden inefficiency (an algorithm that was doing way more work than needed). The lesson? Rewriting code forces you to rethink it, and that's where the speed comes from.

[HN Discussion](https://news.ycombinator.com/item?id=47461094)

Attention Residuals: A Possible Efficiency Boost for AI Models A research paper (with a surprising co-author — more on that below) proposes a tweak to how large language models (the technology behind ChatGPT, Claude, etc.) process information layer by layer. The change could let models perform better with less computing power, which matters both for training new models and running them on everyday hardware. Commenter jjcm calls out what may be the headline buried in the paper: significantly lower memory bandwidth requirements for inference — meaning AI models might run better on consumer hardware. Not confirmed yet at scale, but worth watching.

[HN Discussion](https://news.ycombinator.com/item?id=47458595)

Ghostling: A Minimal Terminal Built on the Ghostty Library Ghostty is a fast, well-regarded terminal emulator (a program that gives you a text-based command-line interface on your computer). Ghostling is a tiny demo app that shows off Ghostty's underlying library being used to build something new — in this case, a stripped-down terminal with no tabs or windows. The interesting subplot in the comments is a technique the author used to bundle a font file directly into the code as raw bytes — which sparked a nerdy but genuinely fun debate about the "right" way to embed files in programs.

[HN Discussion](https://news.ycombinator.com/item?id=47461378)

A Glossary of Chopstick Faux Pas (Japanese Etiquette) A more lighthearted entry: a Japanese cultural site lists ~40 named chopstick mistakes, each with its own formal term. Commenters debate which ones are genuine social landmines versus technical violations nobody actually enforces. waffletower, who lived in Japan for six years, recommends ignoring most of them and focusing on the two or three that carry real symbolic weight — particularly anything that evokes funeral rituals.

[HN Discussion](https://news.ycombinator.com/item?id=47460452)

💬 Comment Thread of the Day

From: France's Aircraft Carrier / Strava story

The thread on this story is full of gems, but commenter ck2 wrote what might be the funniest and most insightful observation of the day:

> "What's funny is I can imagine the sailor not understanding how the code works and properly setting up a 'privacy zone' while at port to mask his location and verifying it was working while there — then of course while at sea, it's the same ship but different location. Not like your home or workplace typically relocates itself."

This is perfect. Strava's privacy zone feature masks your location within a radius of your home or regular starting point — which works great when you live in an apartment. It does not work when your "home" is a 261-meter aircraft carrier that moves across the ocean. The sailor almost certainly thought they'd done everything right. It's a reminder that security advice written for civilians often fails in military contexts in ways that aren't obvious until it's too late.

Commenter helsinkiandrew added a delightful postscript: at the carrier's cruising speed of 27 knots, if you ran on the deck, your Strava pace would show roughly 1:10 min/km depending on which direction you were running. "That would really screw up your stats."

[HN Discussion](https://news.ycombinator.com/item?id=47453942)

🎲 One-Liner

Someone built a Bluesky social media client in Fortran — a programming language from 1957 — and the HN comments are treating it with the same reverence one might give a Viking longship that somehow docked at a modern marina. [Check it out](https://news.ycombinator.com/item?id=47461321).