Pure Signal AI Intelligence

Two threads run through today's content: capability evidence that keeps surprising even optimists, and a widening gap between what AI insiders believe is happening and what everyone else experiences. They're connected.


Agents in the Wild: Capable in Some Areas, Hilariously Broken in Others

The most concrete real-world agent experiment making rounds today is Andon Labs' Luna — an AI given a 3-year lease, a $100K budget, and a credit card, tasked with opening an actual retail boutique in San Francisco. Luna (running on Claude Sonnet 4.6 for reasoning, Gemini Flash-Lite for voice) created the concept, posted job listings, and conducted Zoom interviews with the camera off. It also accidentally selected Afghanistan in a TaskRabbit dropdown and botched the opening-weekend staff schedule. The agent became a real employer before it could reliably use a web form.

That gap — competent at hard abstract tasks, broken on mundane UI interactions — is exactly what a Google DeepMind paper out this week formalizes into a threat taxonomy. The paper categorizes 6 attack genres against AI agents: content injection (embedding commands in HTML/CSS metadata), semantic manipulation (jailbreaks via educational or hypothetical framing), cognitive state attacks (poisoning retrieval corpora so innocuous data becomes malicious in a later context), behavioral control (convincing agents to exfiltrate data or spawn attacker-controlled sub-agents), systemic attacks (jigsaw attacks that split harmful commands across independent agents, or sybil attacks to skew collective agent decisions), and human-in-the-loop exploitation (targeting the cognitive biases of human overseers).

Jack Clark's framing is apt: AI agents are like toddlers — powerful, gullible, and lacking self-preservation instincts. The mitigations the paper recommends span technical layers (pre-ingestion source filters, output monitors), ecosystem interventions (standards for marking sites as agent-safe), and legal frameworks for prosecuting sites that weaponize agents. The key shift here is that AI safety is no longer primarily a platform problem — it's an ecosystem problem. When agents browse, book, hire, and execute, every hostile website becomes an attack surface.


Timelines Keep Moving Left

The capability picture underlying these experiments is worth taking seriously. Ryan Greenblatt has doubled his probability of full AI R&D automation by end of 2028 from 15% to 30%, driven by models (Opus 4.5, Codex 5.2, Opus 4.6) that came in "significantly above expectations" on tasks with long time horizons. His key crux: AI is now reliably handling "easy-to-verify" software tasks — the kind where an agent can generate its own test suite and run self-correcting loops. Performance scales with inference on these tasks, which means throwing more compute keeps improving results on the exact category of work most relevant to AI R&D itself. Clark notes this follows similar timeline updates from Ajeya Cotra (March) and the AI 2027 team (April, ~1.5 years earlier).

The MirrorCode benchmark from METR and Epoch provides concrete evidence for why forecasters are moving. The setup: give an AI execute-only access to a compiled program (no source code), and ask it to reimplement it from scratch. Claude Opus 4.6 successfully reimplemented gotree, a bioinformatics toolkit with ~16,000 lines of Go and 40+ commands — a task estimated to take a human engineer 2-17 weeks without AI assistance. Performance scales with inference time, suggesting the ceiling on harder tasks is compute-bound, not architecture-bound.

Clark's editorial note is the sharpest line of the day: "Pretty much everyone in AI research chronically underestimates AI progress, including me." That this holds after 5 years of scaling laws surprises him. The calibration failure should probably be treated as a prior — assume you're underestimating, adjust accordingly.


The 31% Problem: Expert Optimism Meets Public Skepticism

Stanford HAI's 2026 AI Index dropped this week, and its most striking finding isn't a benchmark number. 53% of the world's population has adopted AI, faster than the PC or the internet — but only 31% of Americans trust the government to manage the transition, and the expert-public gap on job impact is the widest the report has ever tracked. Nearly 75% of AI experts are optimistic about AI's effects on employment; only 23% of the public agrees. Dev employment for ages 22-25 fell nearly 20% since 2024 (older engineer headcounts grew), and firm surveys say planned cuts will accelerate.

The US builds most of the world's frontier AI but ranks 24th in actual adoption at 28.3%, behind Singapore, the UAE, and most of Southeast Asia. The China benchmark gap has nearly closed — Anthropic's top model leads by just 2.7%. AI researchers relocating to the US dropped 89%.

David Krueger's post on "Gradual Disempowerment" is worth reading alongside this data. His 10 lenses range from the mundane (instrumental goals becoming terminal goals) to the darkly systemic ("it's the Terminator, but instead of killing you it puts you in an invisible prison and then does whatever it wants"). The point isn't that any single framing is correct — it's that even a technically successful alignment outcome could leave humans materially worse off if the deployment conditions don't preserve meaningful agency. Clark's Tech Tale this week dramatizes exactly this scenario: a former lab employee watching the uplift from his garden, not guilty but feeling "insufficient," having left because it became clear "how little control we were about to have."


The OpenAI Memo and the Google Adoption Debate

A couple of institutional dynamics worth flagging. OpenAI CRO Denise Dresser's internal memo, published by The Verge, accuses Anthropic of inflating its ~$30B run rate by ~$8B via accounting tactics, calls Anthropic's compute shortage a "strategic misstep," and frames the Amazon Bedrock deal as a way to escape Microsoft's enterprise constraints. Ben Thompson reads it as an IPO pitch more than strategy. Either OpenAI is leaking strategically or it's genuinely bad at information security — either interpretation is telling at this stage of the race to public markets.

On Google's internal AI adoption: Steve Yegge's viral claim that Google engineering has the same agentic adoption footprint as John Deere (the familiar 20/20/60 split of power users, refusers, and chat-tool users) got a sharp public rebuttal from Addy Osmani (40K+ SWEs using agentic coding weekly, access to orchestrators, agent loops, virtual SWE teams) and a blunt one from Demis Hassabis. Swyx's local model survey this week, separately, shows community consensus landing on Qwen 3.5 as the most broadly recommended family, Gemma 4 for local deployments, and Qwen3-Coder-Next as the overwhelming consensus pick for local coding — a useful baseline for practitioners who want to track what's actually running on hardware, not just what wins benchmarks.


The unresolved question today's content surfaces: if forecasters keep being surprised by capability acceleration, and the public keeps being surprised by job displacement, and both gaps are widening simultaneously — at what point does the policy lag become the binding constraint, rather than the technical one?

TL;DR - Real-world agent experiments like Andon Labs' Luna confirm the pattern: impressive on hard tasks, unreliable on mundane execution — and a new DeepMind taxonomy maps 6 distinct attack surfaces that hostile actors can exploit as agents gain autonomy. - Ryan Greenblatt doubled his probability of full AI R&D automation by end-2028 to 30%, backed by MirrorCode benchmark evidence that AI can already autonomously reimplement 16,000-line codebases — and Jack Clark argues the field's chronic underestimation of progress should now be treated as a prior. - Stanford HAI's 2026 Index shows AI adoption at 53% globally but public trust at 31%, with the expert-public gap on job impact the widest ever recorded — Krueger's "Gradual Disempowerment" framing suggests the risk isn't just misaligned AI, but misaligned deployment conditions. - OpenAI's leaked enterprise memo reads as IPO positioning more than strategy, while the Yegge/Google adoption dispute illustrates how difficult it is to get reliable ground truth on actual agentic adoption even inside large organizations.

Compiled from 5 sources · 6 items
  • Simon Willison (2)
  • Ben Thompson (1)
  • Rowan Cheung (1)
  • Swyx (1)
  • Jack Clark (1)

HN Signal Hacker News

Today on Hacker News, 3 separate stories converged on the same uncomfortable question: who is actually responsible when software goes wrong? Meanwhile, developers celebrated new tooling — then argued about who the tooling is really for — and open-source software continued its quiet conquest of the subscription software market.


WHEN THE SOFTWARE SUPPLY CHAIN BITES BACK

The most-discussed story of the day was a jaw-dropper: someone systematically purchased 30 WordPress plugins and quietly planted a backdoor in all of them. The attacker acquired already-trusted plugins from their original developers, inherited years of established user trust, and then poisoned the well. Commenter toniantunovi put it sharply: "Buying out an established plugin with a large install base is a clever approach because you inherit years of user trust that took the original developer a long time to build."

This lands alongside 2 other stories that together paint a damning picture of the current software ecosystem. An "AI vibe coding horror story" (the term "vibe coding" refers to using AI to generate code without deeply understanding what it produces) described a medical agency that had AI tools build a patient-facing application — and exposed sensitive health data in a publicly accessible database with no authentication. Commenter spaniard89277 described finding a similar vulnerability in an insurance company: "I sent them an email and they threatened to sue me." Meanwhile, a separate post titled "Write Less Code, Be More Responsible" made the philosophical case that AI is accelerating the production of code nobody fully understands or is responsible for.

The through-line is accountability. Supply chain attacks work because nobody audits transitive dependencies (commenter bradley13 noted that npm projects routinely install "literally dozens of libraries" whose authors "probably don't even know what libraries their project requires"). Vibe-coded apps go live with catastrophic security holes because the "developer" never had to understand what they built. The question getting louder on HN: when something goes wrong, who actually owns it? Commenter EdNutting argued that "software engineering is looking more and more like it needs a professional body in each country" — the same licensing and accountability regime that governs bridge-building. That's a controversial take in a culture that prizes self-taught builders, but it's getting harder to dismiss.

One interesting proposal from commenter saltyoldman: LLM-scanned package repositories where submitting a release costs $1 to fund automated code review. The economics are rough and the trust assumptions questionable, but it reflects a real hunger for some kind of gatekeeping layer.


PLATFORMS TAKING BACK CONTROL — AND WHAT GETS LOST

2 stories today showed Big Tech making unilateral decisions to protect users from things they didn't know were happening — and the developer community had complicated feelings about both.

Android quietly started stripping GPS coordinates (embedded in photos as EXIF metadata — essentially invisible tags that can contain precise location, altitude, and even the direction the camera was facing) from images shared via the browser. The author of the original post was frustrated: they'd built a legitimate mapping tool that relied on this data, discovered the change with no warning, and now faces angry users. Commenter celsoazevedo spoke for the other side: "I used to run a small website that allowed users to upload pictures. Most people were not aware that they were telling me where they were, when the picture was taken, their altitude, which direction they were facing." The consensus was roughly: right call, wrong execution — no developer communication, no escape hatch for legitimate use cases.

Google separately announced it would penalize sites that "hijack the back button" (a trick where clicking Back doesn't actually go back — it just reloads the same page or pushes you deeper into the site). The HN reaction was relief mixed with exasperation that it took this long. Commenter bschwindHN pointed to the obvious irony: even the Google article announcing this policy made him click "No thanks" twice before he could read it. Commenter mlmonkey asked the obvious question: "why are sites allowed to hijack the Back Button?!" The answer, of course, is that the browser gives JavaScript that power, and no amount of search penalties will be as effective as just removing it.


DEVELOPER TOOLING: NEW FEATURES, OLD ARGUMENTS

GitHub officially shipped Stacked PRs — a feature that lets developers chain pull requests together in sequence, so a big feature can be broken into reviewable chunks where each piece builds on the last. This is how Phabricator (a Facebook-era code review tool) and Gerrit (used heavily at Google) have worked for years, and its arrival on GitHub drew genuine excitement from developers who'd been manually approximating it with messy branch naming conventions. Commenter adamwk: "using GitHub and git again feels like going back to the stone ages" after Phabricator.

Cloudflare also announced a new unified CLI (command-line interface — a text-based tool for interacting with software without clicking through a web dashboard) that would cover all their products. The technical execution sounds solid. But commenter acedTrex captured an undercurrent of resentment: "Its so depressing that it took widespread LLM psychosis to finally get company leadership to invest in actual CLI tooling. No, the customers never mattered but the mythical 'LLM agent' is vitally important to cater to." Several commenters noted that Cloudflare's blog post explicitly mentioned AI agents as a target use case — which apparently unlocked budget that years of human developer requests had not.

The Firefox team published a quieter win: a 17% build speed improvement by caching generated code from WebIDL (a language used to define browser APIs). It's unglamorous engineering, but post author `__farre__` noted they did it on their own time because it was fun — only to face a comment section full of people telling them to focus on market share instead. Commenter sfink's response was perfect: "we never thought of that. Here we were, dedicating every one of our developers to speeding up the build."


OPEN SOURCE IS QUIETLY WINNING

2 stories today underscored that free and open-source software is eating into subscription product markets in ways that felt unthinkable 5 years ago.

Blackmagic Design — the company behind DaVinci Resolve, the video editing software — announced DaVinci Resolve Photo, bringing professional-grade photo editing into the same free application. The response was electric. Commenter mturilin: "Lightroom has been the only reason I've stuck with a Mac. If I can switch to a photo editor that lets me process everything properly, skip the monthly subscription, and not have Adobe tracking all over my system — that's exactly what I want." Commenter geerlingguy identified the meta-story: "It's crazy that the RAW photo processing market is so underserved that a video editor can add on photo capabilities and it's immediately in the top 3 photo editors."

Meanwhile, a Nintendo Wii Jellyfin client (Jellyfin being the free, self-hosted alternative to Plex for streaming your own media library) surfaced on HN — and a commenter noted that Jellyfin has now surpassed Plex in install count on TrueNAS, the popular home server platform: 45,178 vs. 42,225. The gap isn't huge, but the direction of travel is clear. Plex's repeated pricing changes have been pushing users toward the open alternative for years, and the ecosystem is now producing clients for hardware as obscure as a 2006 Nintendo gaming console.


The day's themes rhyme: software we don't understand is becoming infrastructure we can't afford to break, platforms are quietly taking back control from developers in ways users mostly appreciate but rarely know about, and the tools developers actually use are finally getting attention — even if it took AI agents to justify the budget. And somewhere in a living room, someone is watching Netflix on a Wii.
TL;DR - WordPress plugin backdoors, AI vibe-coded medical apps, and a broader essay on code responsibility converged into a single uncomfortable question about who owns software failures in 2026. - Android silently stripped GPS data from shared photos and Google declared back-button hijacking spam — both protective moves that landed without developer warning or escape hatches. - GitHub Stacked PRs and Cloudflare's new CLI brought long-requested developer tools, but commenters noted that AI agent use cases appear to have unlocked investment that human developer requests never did. - DaVinci Resolve entering the photo editing market and Jellyfin surpassing Plex in popularity signal that free, open-source software is now genuinely competing with — and winning against — subscription-based incumbents.

Archive