March 10, 2026

Pure Signal AI Intelligence

Three things collided today. A landmark AI governance lawsuit. The first real demonstration of recursive self-improvement. And a $99-per-user bet that enterprise AI is about to consolidate fast. Let's get into it.

The Anthropic-Pentagon Fault Line

Anthropic filed two lawsuits against the Department of Defense overnight. And within hours, thirty-plus employees from OpenAI and Google—including Jeff Dean, Google's chief scientist and lead of the Gemini program—filed an amicus brief in support.

Here's the backstory. The Pentagon labeled Anthropic a "supply chain risk"—a designation normally reserved for foreign adversaries—after Anthropic refused two specific demands. They wouldn't allow Claude to be used for mass domestic surveillance. And they wouldn't allow it for fully autonomous lethal weapons systems—that is, AI that can kill without a human in the loop.

The Defense Department's position was that it should be able to use AI for any "lawful purpose." Anthropic drew a line. Negotiations collapsed. The blacklist followed.

The amicus brief makes a pointed argument. It says the Pentagon could have simply cancelled the contract if it didn't like the terms. Instead, the government branded a domestic company a national security threat. The brief warns this "chills professional debate on the benefits and risks of frontier AI systems"—and threatens US competitiveness.

Here's what makes this extraordinary. Jeff Dean signing onto a brief defending a direct competitor. Sam Altman—whose company quickly signed its own military contract after Anthropic's fell apart—publicly calling the designation "very bad for our industry." Even rivals agree the precedent is dangerous.

The core question this case will decide: can the US government blacklist a domestic company for taking a position on AI safety? Every lab in the industry will be watching.

Machines Improving Machines

While that drama played out in courtrooms, something quietly significant happened in a GitHub repo.

Andrej Karpathy released a tool called autoresearch. The concept is deceptively simple. An AI agent reads its own training codebase, forms a hypothesis for improvement—maybe change a learning rate, maybe adjust an architecture depth—modifies the code, runs the experiment, and evaluates the result. Then repeats. Overnight. Without a human watching.

The results matter. After roughly seven hundred autonomous changes on a small language model training run, the agent improved training efficiency by about eleven percent. Not massive. But—and this is the key—it discovered changes that transferred from a smaller to a larger model configuration. The agent wasn't just tuning. It was finding generalizable improvements.

Jack Clark's Import AI newsletter frames this precisely. He cites researchers at Oxford and GovAI who just published fourteen metrics for tracking—quote—"AI R&D Automation." The idea: recursive self-improvement isn't a single event. It's a gradient. And we need to measure where we are on it before we reach the point where we can't easily see it anymore.

One of those metrics is particularly striking. Measure how well human teams can actually supervise AI systems that are building other AI systems. That problem gets harder as the volume of AI-generated work scales faster than human reviewers can follow.

Meanwhile, Ajeya Cotra—a longtime AI forecaster—published an update saying her January predictions for 2026 are already too conservative. She had predicted AI agents would reach a twenty-four-hour task horizon by end of year. Current benchmarks have Opus 4.6 at twelve hours—already halfway there in March. Her updated guess: by December, agents could sustain over a hundred hours of autonomous work on software tasks. At that point, she writes, the concept of a "time horizon" may start to break down entirely.

ByteDance adds a concrete data point here. They fine-tuned their Seed 1.6 model—a mixture-of-experts—specifically for writing CUDA kernels—the low-level GPU code that determines how fast AI models actually run. The resulting agent hit a hundred percent success rate on two benchmark tiers, and ninety-two percent on the hardest tier. That's roughly forty percent better than Claude Opus 4.5 on that hardest tier.

The implication: AI is now being used to write the GPU code that will train the next generation of AI. The loop is closing.

Microsoft Absorbs the Competition

Ben Thompson at Stratechery has a sharp read on the Microsoft-Anthropic Copilot Cowork announcement. Microsoft's classic move is to commoditize its complements—make everything around its core products cheap or free to strengthen its own position. But Thompson notes Anthropic has a point of integration strong enough that Microsoft chose to build on top of it rather than around it.

The product itself is notable. Copilot Cowork runs in the cloud—not desktop-only like Claude's native agent. It pulls from emails, meetings, files, and calendars across Microsoft 365's entire ecosystem. Users describe an outcome. Cowork breaks it into steps and produces deliverables across apps. It launches inside a new ninety-nine dollar per user enterprise tier called E7.

The strategic read: wrapping Anthropic's agent technology inside M365's security and compliance layers gives this product something Claude alone can't easily match. Deep, integrated enterprise context across four hundred fifty million users' worth of organizational data. Anthropic gets distribution. Microsoft gets capability it didn't have to build.

Inference at Planetary Scale

The latest Latent Space podcast goes deep inside NVIDIA's Dynamo—a data center scale inference framework—with two engineers who live at the intersection of GPU hardware and AI deployment.

A few insights worth extracting.

The first is about a fundamental architectural shift called prefill-decode disaggregation. Historically, inference engines ping-pong between two phases: prefill—reading the input and building a cache of key-value vectors that represent the sequence—and decode—generating new tokens using that cache. Kyle Kranen, one of Dynamo's lead architects, explains that separating these phases onto different hardware pools unlocks real efficiency gains. Prefill is compute-bound. Decode is memory-bound. Mixing them on the same hardware means each phase constantly blocks the other. Disaggregating them lets you scale each independently based on actual workload.

The second insight is about agent security. Nader Khalil offers a practical mental model that's worth internalizing. Agents can do three things: access your files, access the internet, and write and execute code. His rule: only ever let an agent do two of those three at once. Files plus code execution—no internet access, because that's your injection vulnerability. Internet plus files—know the full scope before you give it those two together. The combinatorial risk is the problem.

The third is about where agents are heading. The current average autonomous work window for coding agents in production is twenty to forty-five minutes—that's measured in human-equivalent work, not wall-clock time. But the engineers expect this to compound. Kyle expects an agent capable of running with self-consistency for longer than twenty-four hours before end of year. The broader framing they offer: this is the year of "system as model"—where a single API call to something that looks like one model is actually a coordinated system of specialized agents underneath.

The Technology Choice Paradox

Simon Willison closes out today with an observation that cuts against a common assumption. Many people expected coding agents to push developers toward popular, well-documented technology stacks—whatever is best represented in training data. Boring tech by default.

Willison says that's not what he's seeing. Modern agents with long context windows can consume documentation for new or private tools before they start working. They iterate, test their own output, and learn patterns from existing codebases. The agent adapts to the technology choice—not the other way around.

One caveat he adds: what technology an agent recommends unprompted is a separate question. Studies show Claude Code has strong preferences—GitHub Actions, Stripe, shadcn/ui appear at near-monopoly rates when agents make choices independently. The lesson: agents adapt to your choices when you're explicit. But watch what they default to when you're not.

Today's through-line is compression. AI timelines are compressing faster than forecasters can update. The gap between human and AI-generated work is compressing. And the distinction between "AI capability" and "AI governance" is compressing into a single legal and strategic battleground—with the Anthropic lawsuit as its first major test case.

HN Signal Hacker News

🌅 Morning Digest — Tuesday, March 10, 2026

🔝 Top Signal

Ireland Goes Coal-Free — And the Internet Has Feelings About It The last coal plant in Ireland shut down, making it the 15th European country to go coal-free. The milestone is real — but HN's 600-comment pile-on is a masterclass in "it's complicated."

Ireland's Moneypoint power plant, which had been burning coal since 1985, is now offline. On the surface, this is a clean energy win. But the comment section quickly surfaced the trade-offs: Ireland now imports much of its electricity, energy prices have surged, and one Irish commenter (cauliflower99) pointed out that real-world cost-of-living pressures are hitting people hard. Others noted that "coal-free" doesn't mean "carbon-free" — if a country imports goods manufactured using coal elsewhere, it's really just moved the pollution off its own books. User reedf1 put it bluntly: "No country will be truly coal-free until they are a net energy exporter and they do not import any goods that use coal-based energy in their supply chain." Meanwhile, CalRobert reminded everyone that Ireland still burns turf — basically digging up ancient wetland carbon stores and setting them on fire — which doesn't show up in the coal statistics.

[HN Discussion](https://news.ycombinator.com/item?id=47307055)

When AI Rewrites Your Open-Source Code Without Asking: A Legal and Moral Debate Someone used an AI to clone a popular open-source library, stripped out its license, and republished it. Technically legal. Deeply uncomfortable. And HN can't stop arguing about it.

This one requires a bit of background. "Open source" (think: free-to-use software where the code is publicly visible) comes in flavors. Some licenses, called copyleft licenses (like the GPL), say: "You can use my code, but if you build on it, your version must also be open source." It's a way of keeping the commons alive. The story here: a developer fed an open-source library's API (its public interface — how other programs talk to it) and test suite into an AI like Claude, had it rewrite the whole thing from scratch, and published it under a more permissive (less restrictive) license. No copyleft required. The author of the linked essay argues this isn't just a legal question — it's a legitimacy one. The community built something together, and AI let someone extract the value while dodting the social contract. The counter-argument (strongly made in comments) is that clean-room reimplementation has always been legal, AI just made it cheaper. User ordu observed that "AI is eroding copyright" broadly — not just copyleft — and that this was probably inevitable.

[HN Discussion](https://news.ycombinator.com/item?id=47310160)

Florida Judge: Red Light Camera Tickets Are Unconstitutional A Florida judge threw out red light camera fines — not because cameras are bad, but because the law made you prove you weren't driving, instead of making the government prove you were.

In most legal systems, you're innocent until proven guilty — the government has to prove you did something wrong, not the other way around. Florida's red light camera law apparently flipped this: if a camera caught your car running a red, the ticket went to the registered owner, who then had to prove they weren't the one driving. The judge ruled this backwards, and constitutionally problematic. Commenters were quick to point out the real issue isn't cameras per se, but who bears the burden of proof. User embedding-shape called it "a no-brainer no one could disagree with." Others noted the irony that the next move by municipalities will probably be better facial recognition cameras — solving the identification problem while raising a whole new set of privacy concerns. And user lateforwork captured something genuinely interesting: "If policing is done by robots, then humans are expected to be infallible." Automated enforcement has zero tolerance for ambiguity, which creates its own injustices.

[HN Discussion](https://news.ycombinator.com/item?id=47312090)

👀 Worth Your Attention

OpenAI Is Stepping Back from the Oracle Stargate Deal OpenAI is reportedly pulling back from its plan to expand AI data centers (massive buildings full of computer chips) with Oracle. The reason? By the time Oracle finishes building them, the hardware inside will already be outdated — NVIDIA's next-generation chips (codenamed "Vera Rubin") will offer roughly 5x better efficiency. User reisfbaker, who runs a small AI inference company, put it well: "Oracle is building today's data centers... tomorrow." This matters because it signals that the AI hardware race is moving so fast that multi-year infrastructure deals are becoming genuinely risky bets.

[HN Discussion](https://news.ycombinator.com/item?id=47315128)

No, It Doesn't Cost Anthropic $5,000 Per Power User A Forbes article claimed Anthropic loses $5,000 per heavy Claude Code user (Claude Code is an AI tool that writes and edits software on your behalf). This post pushes back, arguing the math was based on retail API pricing — not actual compute costs. An API (Application Programming Interface) is basically a pay-per-use way to access AI models, and the public prices are set much higher than what it actually costs to run the computations. The more interesting data point buried in comments: user anonzzzies calculated their team would cost ~$200K/month at retail API prices, but pays only $1,400/month in Max subscriptions. The gap is enormous — and nobody fully knows how to explain it.

[HN Discussion](https://news.ycombinator.com/item?id=47317132)

JSLinux Now Runs x86_64 — Linux in Your Browser, Full Speed Fabrice Bellard — a legendary programmer known for writing an entire copy of the video tool FFmpeg, among other feats — quietly updated his JSLinux project to support the modern 64-bit version of the x86 chip architecture (the kind inside most laptops and desktops). JSLinux is a full Linux operating system running entirely in your web browser, no installation needed. User testifye spent four hours in it building software from source and called it "rock solid." Simon Willison (simonw) floated an intriguing idea: using a browser-based Linux as the ultimate sandbox for running AI coding agents safely. It's about 50x slower than native hardware, but the fact it works at all is remarkable engineering.

[HN Discussion](https://news.ycombinator.com/item?id=47311484)

A Startup Tried Paying Artists Royalties for AI Art — Here's What Happened Kapwing, a video editing company, built "Tess" — an AI image generator that licensed artist styles and paid those artists royalties (like how musicians get paid when their songs are played on the radio). The experiment failed and they've written an honest postmortem. The most striking data point: only 1 in 4 of the enrolled artists actually used the AI tool themselves. And the hardest truth: the artists were expected to help market the product to their own audiences — a community deeply skeptical of AI-generated art. The comments are a mix of sympathy, skepticism about whether the underlying model was truly ethical, and genuine appreciation for the transparency of the writeup.

[HN Discussion](https://news.ycombinator.com/item?id=47318421)

Someone Spent Two Years Running Emacs With No Plugins — Just Pure Code For the uninitiated: Emacs is an ancient, infinitely customizable text editor that some programmers swear by. Most Emacs users rely on a large ecosystem of community-built add-ons. This author spent two years configuring it from scratch using only what ships with the program itself — 3,500 lines of hand-written configuration. The result is weirdly inspiring: a piece of software completely understood by one person, that breaks in predictable ways. User wilkystyle captured the appeal: "The code is sketchy sometimes, sure, but it's in my control." In a world of black-box AI tools and opaque dependencies, there's something quietly radical about knowing every line of your own environment.

[HN Discussion](https://news.ycombinator.com/item?id=47317616)

💬 Comment Thread of the Day

The Leap Second Bug Horror Stories — from the "No Leap Second in June 2026" Thread

A leap second is exactly what it sounds like: an extra second occasionally added to the world's official clocks to keep them in sync with Earth's slightly irregular rotation. Sounds minor. In software, it's a nightmare.

The thread on today's announcement that no leap second will be added in June 2026 turned into an impromptu therapy session for engineers who've lived through them. The best story came from Ozzie_osman:

> "The worst bug I ever dealt with in a 20 year career was a leap second bug (back in 2012). Servers all slowed down dramatically very suddenly, CPU saturated. No relevant code changes or changes in traffic. Turns out, they just got into that state due to a leap second. Some livelock bug. A restart fixed everything... many other large sites (like Reddit, LinkedIn) also had the same issue."

User imglorp added that their company maintained a wall calendar reminder to reboot a specific application before every leap second — for years. And rappatic explained Google's elegant workaround: instead of inserting one sudden extra second, they "smear" it invisibly across hours, making every second just slightly longer than normal. The computers never notice.

Why does this matter to newcomers? It's a beautiful example of how a tiny edge case in a global standard — one extra second, announced months in advance — can cascade into widespread outages. Software assumes time moves forward in predictable increments. When it doesn't, chaos ensues.

[HN Discussion](https://news.ycombinator.com/item?id=47308059)

🎯 One-Liner

Today's HN proved that the two most reliable ways to generate 500+ comments are: shutting down a coal plant and putting a camera at a traffic light.