Pure Signal AI Intelligence
Ninety percent reliability sounds impressive. Andrej Karpathy says it's basically nothing. Today, his framing of the "March of Nines" is rippling through the AI community—and it connects to a bigger story about where coding agents actually stand right now.
The Reliability Trap: Why Nine-Tenths Isn't Even the Starting Line
Here's the core insight. Karpathy describes what he calls the March of Nines—each step toward production-grade AI requires not just improvement, but an order-of-magnitude improvement. You hit ninety percent reliability and think you're close. You're not. You've reached the first nine. Then you need ninety-nine. Then ninety-nine point nine. Each nine is harder than the last.
This isn't abstract. A "vibe-coded" operating system—built almost entirely through AI prompting—shipped this week and promptly collapsed under the weight of its own bugs. It's a vivid illustration of the gap between demo-worthy and deployable. The demo worked most of the time. Most of the time is nowhere near enough.
The deeper point Karpathy is making isn't pessimistic. It's clarifying. The teams that understand the March of Nines will build the infrastructure to get to four or five nines. The teams that mistake a strong demo for a finished product will ship fragile systems—and wonder why users don't trust them.
The December Inflection: When Coding Agents Actually Started Working
Set the reliability question aside for a moment. Karpathy is making a second claim that deserves equal attention. He says agentic coding systems—AI tools that autonomously complete complex development tasks, not just autocomplete suggestions—basically didn't work before December. And basically work now.
That's a striking line. Not gradually improved. Not incrementally better. Crossed a threshold.
He describes the change as "extremely disruptive to the default programming workflow"—and uses the phrase "this is what post-AGI—artificial general intelligence—feels like" to describe the current moment. That's Karpathy, not a hype blogger. Someone who has been precise about capability claims for years is reaching for language that would have sounded irresponsible eighteen months ago.
Here's what connects to Simon Willison's observation this week. OpenAI just launched Codex access for open source maintainers—six months of premium tool access for projects above meaningful size thresholds, mirroring what Anthropic offered with Claude Max last month. Both labs moving simultaneously to seed their most capable coding tools into the open source ecosystem—at the exact moment Karpathy says those tools crossed a usability threshold—is worth paying attention to. The bet is clear: whoever becomes the default tool for serious developers in this window sets the trajectory.
The Military AI Fault Line
Meanwhile, a different kind of inflection is playing out in Washington and Silicon Valley simultaneously.
Following the Pentagon's decision to blacklist Anthropic—designating the company a "supply chain risk" after it refused to allow its technology for mass surveillance or fully autonomous weapons—hundreds of employees at Google and OpenAI have signed open letters expressing solidarity. One letter, titled "We Will Not Be Divided," grew from a few hundred names to nearly nine hundred within a weekend. Almost a hundred signatories came from OpenAI. Close to eight hundred from Google.
The letter's framing is strategically sharp. It argues the Pentagon is using a divide-and-conquer approach—betting that each lab will fear the others will capitulate first. By creating public solidarity across companies, the letter attempts to collapse that game theory.
What makes this substantive rather than just symbolic: Google is reportedly in active negotiations with the Pentagon about deploying its Gemini model on classified systems. The company has stayed conspicuously silent on all of this. The internal pressure from its own employees—over a hundred AI workers signed a separate internal letter to management—makes that silence increasingly difficult to hold.
The underlying question isn't whether AI should have any military applications. It's about which applications, with what constraints, negotiated how. Anthropic drew a line at autonomous weapons and mass surveillance. The Department of Defense called that line a supply chain risk. That's the fault line—and it's not going away.
Tying It Together
Three distinct conversations this week—but one through-line. Reliability gaps, agentic capabilities, and ethical constraints are all forcing the same underlying reckoning. AI systems are becoming powerful enough to matter in production, in development workflows, and in geopolitical decisions. The margin for hand-waving is shrinking. The stakes for getting it right are rising. The March of Nines applies to trust, not just software reliability.
HN Signal Hacker News
🌅 Hacker News Morning Digest — Sunday, March 8, 2026
Good morning! Here's what the tech community was buzzing about while you slept.
🔥 Top Signal
[A Decade of Docker: The Container That Shipped the World](https://cacm.acm.org/research/a-decade-of-docker-containers/) Ten years ago, a tool changed how software gets built, shipped, and run — and it's still everywhere.
If you've ever heard someone say "it works on my machine," Docker is the punchline to that joke. A container is basically a lightweight, self-contained package that bundles an app and everything it needs to run — so it behaves identically whether it's on your laptop or a server halfway around the world. This academic retrospective celebrates Docker's first decade and digs into its surprisingly scrappy origin story. One standout detail: Docker's team once hijacked a 1990s dial-up tool originally designed for Palm Pilots — yes, the little handheld gadgets from before smartphones — to sneak container network traffic past corporate security software that kept blocking them. The community is largely celebratory, though some commenters argue containers are a clever workaround for deeper problems that were never actually solved. User `talkvoix` put it best: "We took the 'it works on my machine' excuse and turned it into the industry standard architecture — 'then we'll just ship your machine to production.'"
[HN Discussion](https://news.ycombinator.com/item?id=47289311)
[LLM Writing Tropes: A Catalog of AI's Most Annoying Writing Habits](https://tropes.fyi/tropes-md) Ever notice that AI-written text sounds... a certain way? Turns out there's a whole taxonomy for it.
LLMs (Large Language Models — the AI systems behind ChatGPT, Claude, and friends) have developed remarkably consistent verbal tics. This site catalogs them: the dramatic "It's not X — it's Y" reframe. The word "tapestry." Tricolons (three things in a row, always three, never two or four). Unnecessary em dashes — just like this. The community is having a field day, partly because many commenters admit to catching themselves using the same patterns. There's a deeper discomfort underneath the laughs: as AI-generated text floods the internet, our collective sense of what "authentic" writing sounds like is shifting. The meta-irony of a tech audience on Hacker News discussing AI writing tells while potentially using AI-assisted replies is not lost on anyone.
[HN Discussion](https://news.ycombinator.com/item?id=47291513)
[Cloud VM Benchmarks 2026: Who's Actually Fast, and Who's Cheap?](https://devblog.ecuadors.net/cloud-vm-benchmarks-2026-performance-price-1i1m.html) A thorough real-world test of cloud computing performance and value — and the results might surprise you.
A VM (Virtual Machine) is a rented slice of a computer in someone else's data center — it's how most websites and apps run without the company owning physical servers. This comprehensive benchmark (a standardized speed test) pits the giants — AWS, Google Cloud, Azure — against scrappier alternatives. The headline finding: Hetzner, a European provider you may never have heard of, consistently offers the best performance-per-dollar. Meanwhile, Oracle Cloud — yes, the notoriously aggressive enterprise software company — keeps landing near the top of the value charts, which makes everyone nervous given Oracle's reputation for locking customers in with surprise fees. The comment from `preserves`, who disclosed they work on Google's VM team, is a rare moment of industry candor: they praised the writeup while noting AMD's latest chips ("Turin") are genuinely impressive.
[HN Discussion](https://news.ycombinator.com/item?id=47293119)
👀 Worth Your Attention
[The Yoghurt Delivery Women of Japan — and What They're Really Delivering](https://www.bbc.com/travel/article/20260302-the-yoghurt-delivery-women-combatting-loneliness-in-japan) Japan's Yakult Ladies have been doing door-to-door probiotic drink delivery for decades — but their real product, apparently, is human connection for isolated elderly customers. The HN crowd is divided: several commenters quickly flagged this as a "submarine ad" (industry slang for content that looks like journalism but is actually sponsored marketing), while others noted France's postal service runs a similar paid check-in program for elderly parents. Worth a skim for the genuine social question underneath the PR gloss: what happens to community in a hyper-automated, aging society?
[HN Discussion](https://news.ycombinator.com/item?id=47287344)
[FLASH Radiotherapy: Blasting Cancer in Milliseconds](https://spectrum.ieee.org/flash-radiotherapy) FLASH radiotherapy is a new cancer treatment approach that delivers radiation doses hundreds of times faster than conventional methods — in fractions of a second rather than minutes. The theory is that healthy cells and cancer cells process the resulting stress differently at ultra-high speeds, potentially sparing healthy tissue. It's genuinely promising science, though the HN discussion gets a little dark: multiple commenters noted that the commercial product is named "Theryq," which sounds uncomfortably close to "Therac-25," a notorious 1980s radiation machine that fatally overdosed several patients due to a software bug. A radiation oncologist's spouse weighed in to add useful skepticism about the long-term tissue-sparing questions that remain unanswered.
[HN Discussion](https://news.ycombinator.com/item?id=47288533)
[PyPy, the Faster Python, May Be Running Out of Steam](https://github.com/astral-sh/uv/pull/17643) Quick context: Python is the programming language; PyPy is an alternative version of it that runs programs significantly faster — sometimes 5x or more. CPython is the standard "official" Python most people use. A popular Python tool called `uv` (think of it as a fast package manager — software that helps you install other software) just added a warning label saying PyPy may be "unmaintained." That word choice sparked debate, and a PyPy core developer, `cfbolztereick`, showed up in the thread to clarify: "PyPy isn't unmaintained. We are certainly fixing bugs... however, the remaining core devs don't have the capacity to keep up with CPython." It's a delicate, honest admission about the struggles of keeping a volunteer open-source project alive when the official alternative keeps moving forward.
[HN Discussion](https://news.ycombinator.com/item?id=47293415)
[Karpathy's Autoresearch: Teaching AI to Run Its Own Experiments](https://github.com/karpathy/autoresearch) Andrej Karpathy — one of the most respected researchers in AI, formerly at Tesla and OpenAI — released a side project where AI agents autonomously run machine learning experiments on a single consumer GPU (the chip typically used for gaming, now also used for AI). Think of it as giving an AI a junior research team and a notebook, then watching it try to improve itself. The community reaction is a mix of impressed and skeptical: commenter `kubb` dryly noted it's "burning Claude tokens to slightly improve a tiny LLM," while others see it as a template for future AI-automated science. The most charming detail: when the AI runs out of ideas, it just tries changing the random seed.
[HN Discussion](https://news.ycombinator.com/item?id=47291123)
[CasNum: A Number That Is Also a Computer](https://github.com/0x0mer/CasNum) This is a delightful rabbit hole. CasNum is a project where a single astronomical number encodes a working computer — using ancient Greek geometric math principles (compass-and-straightedge constructions) as the computational engine. It can't run Doom. It can barely do arithmetic. But the README alone is worth five minutes of your morning, including this gem in the FAQ: "Q: Why did you make this? A: I wanted arbitrary precision arithmetic, but I also wanted to feel something." In a week full of AI hype, it's a refreshing reminder that some people just build weird beautiful things for the joy of it.
[HN Discussion](https://news.ycombinator.com/item?id=47291292)
💬 Comment Thread of the Day
From the LLM Writing Tropes discussion — this thread deserves its own morning.
Researcher `capnrefsmmat` dropped a genuine data point that stopped the thread cold:
> "I work on research studying LLM writing styles... This is the first one I noticed that mentions 'tapestry', which we found is GPT-4o's second-most-overused word (after 'camaraderie', for some reason)."
"Camaraderie." GPT-4o's most overused word is camaraderie. Nobody knows why. The researchers themselves don't know why. It just... loves that word. The thread then cascades into people cataloging their own discoveries: Gemini apparently likes to say "I'll shoot straight with you" before refusing a request; Claude has a habit of calling things "genuine" and "real" when it's trying to sound earnest.
Why does this matter? Because as AI-generated text becomes ubiquitous — in blog posts, emails, pull request descriptions, news articles — these verbal fingerprints become the way attentive readers will distinguish human writing from machine writing. At least until the models get better at hiding them. The discussion is part linguistics class, part detective game, and entirely worth your time.
[HN Discussion](https://news.ycombinator.com/item?id=47291513)
✨ One-Liner
Today's Hacker News was a mirror held up to the internet: a story about AI writing tics, a story about AI running its own experiments, and a story about Docker turning ten — the very technology that made it possible to deploy all those AI models in the first place. The ouroboros is containerized and shipping to production.