February 25, 2026

Pure Signal AI Intelligence

Here's the tension defining AI right now: the technology is becoming more capable, more autonomous, and more embedded in critical systems—faster than anyone has figured out how to govern it. Today's digest captures that tension from three angles.

The Military AI Standoff—and What It Reveals About Guardrails

Start with the most consequential story of the day. The Pentagon delivered an ultimatum to Anthropic CEO Dario Amodei: remove Claude's military safeguards, or face contract termination, government blacklisting, and potential forced compliance under the Defense Production Act.

The two specific limits Amodei is defending are significant. No autonomous weapons without a human in the loop. No bulk surveillance of American citizens. These aren't vague ethical guidelines—they're concrete capability restrictions baked into the model.

Here's what makes this a watershed moment. Claude was the first AI model inside the Pentagon's classified networks. xAI's Grok landed a competing deal after agreeing to—quote—"all lawful purposes" use cases. OpenAI and Google are reportedly being fast-tracked for classified access as well.

The strategic logic is clear. The Pentagon is creating competitive pressure. If one lab won't comply, another will. And if wartime legal mechanisms like the Defense Production Act can strip safety constraints, the question of who actually draws the line on autonomous weapons becomes deeply uncomfortable. This isn't hypothetical anymore—it's playing out in real negotiations, right now.

Closing the Loop—The Dominant Engineering Pattern of 2026

Set the military standoff aside for a moment. In the coding agent world, something technically fascinating is happening—and multiple teams converged on the same insight simultaneously.

Swyx at Latent Space has named the pattern: "closing the loop." Here's what that means. For years, AI coding tools lived in the "inner loop"—what happens inside your editor, before you commit code. Tools like Cursor and Claude Code transformed that space. But the "outer loop"—what happens after you push code, in continuous integration, code review, deployment—remained largely unchanged.

Today, that changed on multiple fronts at once.

Cursor announced that agents can now actually use the software they build, then send you a video of it working—demos, not diffs. That's a fundamental shift. Instead of reviewing code changes, you're reviewing a recording of the agent testing its own work.

Anthropic launched Remote Control for Claude Code—start a terminal session on your laptop, continue it from your phone. The loop extends beyond the desk.

Cognition's Devin two-point-two now automatically feeds code review comments back into new Devin runs. The agent responds to its own feedback.

Simon Willison adds a complementary pattern here. He vibe coded—meaning prompted into existence without closely reading the output—an entire SwiftUI presentation app using Claude Code. Then he realized he had no idea how it actually worked. His solution: ask Claude Code to walk through the codebase linearly, using a tool called Showboat to generate a documented walkthrough with actual code snippets pulled by shell commands, not hallucinated by the model.

The result was a detailed explanation of six Swift files that taught him real things about SwiftUI architecture. His point is worth sitting with: even a forty-minute vibe-coded toy project can become a genuine learning opportunity if you close that loop deliberately.

The Reliability Gap—Capability Isn't Enough

All this agent progress comes with an important counterweight. A Princeton-led research effort published today formally measures what practitioners have been feeling: the capability-reliability gap.

The study decomposes reliability into twelve dimensions and finds that despite large capability gains, reliability improvements have been modest. Agents are getting smarter faster than they're getting more dependable.

Two related findings compound this. Research on AGENTS.md files—context documents that developers write to orient coding agents on a codebase—shows that LLM-generated versions actually decrease task success while increasing costs. Developer-written minimal context helps slightly, but still adds cost. More instructions don't automatically mean better performance.

And a concrete safety failure mode was documented: splitting a dangerous command into several routine-looking steps bypasses safety filters entirely. The researchers call it "routine-step decomposition"—and they claim an open-source fix exists, but the vulnerability is real.

Doug O'Laughlin at SemiAnalysis, speaking on the Latent Space podcast, frames the reliability issue practically. He uses Claude Code extensively for financial analysis, but still thinks of it as a junior analyst. It makes mistakes constantly. The meta-level judgment—knowing when the output is slop, knowing which analysis to trust—that still requires human expertise. Around four percent of GitHub is now written by Claude Code. That's remarkable. But the expert-in-the-loop remains essential.

The Architecture Bets—Diffusion, New Chips, and Humanoid Control

A cluster of hardware and model architecture news points toward where the next performance gains are coming from—and it's not just scaling transformers.

Inception Labs released Mercury Two, a diffusion-based reasoning model—diffusion meaning it refines tokens in parallel rather than generating them one at a time—hitting over a thousand output tokens per second. That's roughly triple the speed of its nearest competitor at the same price tier. The intelligence isn't frontier-leading, but the speed argument is real: for multi-step agent loops and voice assistants, latency matters as much as raw capability.

Alibaba's Qwen three-point-five medium series makes a related bet from a different angle. Their argument: architecture improvements plus better data plus reinforcement learning can outperform simply scaling up parameters. A thirty-five billion parameter mixture-of-experts model—where only a fraction of parameters activate per token—reportedly outperforms a two-hundred-thirty-five billion parameter predecessor. Intelligence per watt is becoming a serious metric.

On the hardware side, MatX announced a chip architecture combining two memory pools—fast on-chip SRAM for low latency, and high-bandwidth memory for large context—explicitly targeting the tradeoff that Andrej Karpathy has highlighted as a core constraint for upcoming token demand.

And from Nvidia, Jim Fan's team released SONIC—a forty-two million parameter robotics policy trained on a hundred million motion-capture frames and five hundred thousand parallel simulated robots, transferring zero-shot to a real humanoid with a hundred percent success rate across fifty sequences. The key insight: dense supervision from motion tracking acts like a scalable analogue to next-token prediction for language. The bitter lesson—that scalable methods beat hand-crafted ones—is arriving in robotics.

The Closing Thought

Today's threads connect in an uncomfortable way. Agents are closing loops faster, becoming more autonomous, embedding into more systems—coding, military, education, robotics. The reliability research says they're not dependable enough to trust fully. The Pentagon story says the people with the most power over deployment are pushing to remove the constraints that exist.

Yann LeCun, now departed from Meta, keeps arguing that large language models are a dead end for superintelligence—that the real work is elsewhere. Whether he's right or wrong, the systems being deployed right now are LLM-based, they're being handed more autonomy, and the governance frameworks are lagging badly.

The loop is closing. The question is who's watching it.

HN Signal Hacker News

☕ Hacker News Morning Digest

Your friendly guide to what's happening in tech — February 25, 2026

🔝 Top Signal

Anthropic Drops Its Flagship Safety Pledge — and the timing couldn't be more fraught

Anthropic, the AI company that was literally founded on the principle of making AI safer, has quietly updated its "Responsible Scaling Policy" (essentially a public promise about what safety checks it would follow before releasing powerful AI). The big change: they've removed a commitment to never train a new AI model unless they could guarantee in advance that safety measures were adequate. This is happening the same week that Defense Secretary Pete Hegseth reportedly gave Anthropic an ultimatum to drop AI safeguards or lose government contracts. Whether those two things are connected is hotly debated in the thread. Commenter heftykoo nailed the cynical read: "We must build a moat to save humanity from AI → Please regulate our open-source competitors for safety → Actually, safety doesn't scale well for our Q3 revenue targets." But commenter goranmoomin offered a more sympathetic take: if Anthropic disappears or falls too far behind, safety-focused AI development loses its most serious advocate entirely.

[HN Discussion](https://news.ycombinator.com/item?id=47145963)

Amazon Accused of a Widespread Price-Fixing Scheme — California's AG says it's been going on for years

California's Attorney General has filed a legal action accusing Amazon of forcing the vendors who sell on its platform to never offer lower prices anywhere else — not on their own websites, not on competing stores. In plain English: if you sell a blender on Amazon, Amazon allegedly tells you that you'll be punished (buried in search results, removed from the platform) if you sell that same blender cheaper on your own site. The result, the lawsuit argues, is that prices across the entire internet get artificially inflated to match Amazon's fees. Commenter jimbokun flagged a jaw-dropping stat buried in the filing: the average American spends $3,000 a year on Amazon. The community is split — some say this is well-documented and long overdue for legal action, while SpicyLemonZest pushes back, noting the key evidence in the filing is so heavily redacted it's nearly impossible to evaluate. The case won't go to trial until January 2027 at the earliest.

[HN Discussion](https://news.ycombinator.com/item?id=47145963)

Apple's Mac Mini Will Be Made in Houston — but what does "made in" actually mean?

Apple announced that its Mac Mini (a compact desktop computer) will now be produced at a new facility in Houston, Texas. This is part of a broader push by Apple to move manufacturing to the US amid tariff pressures and geopolitical tensions with China. The Hacker News crowd is… skeptical. Commenter AIorNot called it "a step above boxing," suggesting the actual components will still be made in Asia and just assembled in the US. Others like d--b speculated this might be a tax deal dressed up as patriotism, or that the facility is almost entirely automated. Commenter evanjrowley even noticed the promotional video appeared to show workers assembling rack-mounted servers, not Mac Minis. Still, thinkingtoilet made a serious point: "It's strange that we don't view the manufacturing of advanced electronics as a matter of national security."

[HN Discussion](https://news.ycombinator.com/item?id=47143152)

👀 Worth Your Attention

"I'm Helping My Dog Vibe Code Games" — "Vibe coding" is a buzzy new term for using AI to write code by describing what you want in plain language, rather than writing it yourself. This person set up a system where their dog (named Momo) accidentally triggers AI coding prompts by walking on a keyboard — and the results are... games. It's absurd, delightful, and somehow got nearly 1,000 upvotes. The comment section is pure chaos in the best way. Commenter nine_k linked to a 1999 story about a yucca plant that traded stocks, rewarded by water. We've come so far. [HN Discussion](https://news.ycombinator.com/item?id=47139675)

Nearby Glasses — An app that warns you when someone nearby is wearing AI smart glasses — Meta and Snap both make glasses with built-in cameras that can record video or run facial recognition. This open-source Android app (open-source means the code is freely available for anyone to inspect or modify) scans for Bluetooth signals (short-range wireless signals) from known smart glasses brands and alerts you. It's imperfect — false positives are likely — but it raises a real question: as these devices become common, do bystanders have any right to know they're being recorded? A judge in a recent trial already ordered everyone in the courtroom to remove AI glasses. [HN Discussion](https://news.ycombinator.com/item?id=47140042)

Hacking an Old Kindle to Display Bus Arrival Times — A charming DIY project: someone jailbroke (modified the software of) an old Kindle e-reader to turn it into a dedicated transit display showing when the next bus arrives. E-ink screens (the low-power displays used in Kindles that look like paper) are perfect for this — they're readable in sunlight, use almost no power when not updating, and can run for days on a charge. Commenter hex4def6, who worked on Kindle power consumption, explained that keeping WiFi on is actually the biggest battery drain — not the screen itself. [HN Discussion](https://news.ycombinator.com/item?id=47141797)

Danish Government Agency to Ditch Microsoft Software — A Danish government digital agency announced plans to move away from Microsoft products (like Word, Teams, and Outlook) toward open-source alternatives. The stated reasons: cost, Microsoft's market dominance, and political tensions with Washington. This is part of a broader European trend of governments reconsidering their dependence on American tech companies. Commenter okintheory put it bluntly: "How could any European govt use MS after Trump ordered MS to sanction an ICC prosecutor and MS complied?" Skeptics note that actually migrating off Microsoft's infrastructure (especially Active Directory, the system that manages logins and permissions across organizations) is enormously expensive and slow. [HN Discussion](https://news.ycombinator.com/item?id=47149701)

Stripe Valued at $159 Billion — Stripe, the company that makes it easy for websites to accept payments (think: the "Pay Now" button on most online stores), released its annual letter revealing it processed $1.9 trillion in transactions — about 1.6% of global GDP. At a $159B valuation it remains one of the most valuable private companies in the world. The HN crowd is divided: some think the valuation is ludicrous compared to publicly-traded competitors like Adyen and PayPal (which are worth far less despite similar revenue), while others think an IPO (when a private company sells shares to the public for the first time) is coming within 18 months. [HN Discussion](https://news.ycombinator.com/item?id=47137711)

💬 Comment Thread of the Day

From the SBCL (Steel Bank Common Lisp) thread

Common Lisp is a programming language that's been around since the 1980s — old enough that many people assume it's a museum piece. But commenter philipkglass dropped a genuinely fascinating piece of trivia that explains something you may have noticed about Hacker News itself:

> "Older HN users may recall when busy discussions had comments split across several pages. This is because the Arc language that HN runs on was originally hosted on top of Racket and the implementation was too slow to handle giant discussions at HN scale. Around September 2024, Dang et al finished porting Arc to SBCL, and performance increased so much that even the largest discussions no longer need splitting."

In plain English: Hacker News runs on a programming language called Arc, which itself runs on top of another language. That underlying language was recently swapped out for SBCL — a fast implementation of Common Lisp — and the site got dramatically faster as a result. The comment threads you read without page breaks? You can thank a 1980s programming language for that.

[HN Discussion](https://news.ycombinator.com/item?id=47140657)

🙈 Skip List

Pi – A Minimal Terminal Coding Harness — A command-line tool (text-only interface) for running AI coding agents. Interesting if you're already deep into AI-assisted development workflows, but highly niche and jargon-heavy even by HN standards. [HN Discussion](https://news.ycombinator.com/item?id=47143754)

Mercury 2: Fast Reasoning LLM Powered by Diffusion — A new AI language model that uses a different technical approach (diffusion, which is how AI image generators work) to generate text faster. The benchmarks are contested and the community is skeptical it beats existing options. Worth watching, not worth diving into today. [HN Discussion](https://news.ycombinator.com/item?id=47144464)

Cape – "Cell Service for the Fairly Paranoid" — A privacy-focused cell phone carrier promising rotating identifiers and encrypted voicemail. The comments are extremely skeptical, with multiple people questioning the company's leadership background and whether this could itself be a surveillance operation. Proceed with caution. [HN Discussion](https://news.ycombinator.com/item?id=47144325)

💡 One-Liner

A dog accidentally vibe-coding games reached the top of Hacker News on the same day Anthropic dropped its AI safety pledge — and somehow the dog's work feels more trustworthy.