Pure Signal AI Intelligence

No tool needed here — this is a synthesis task. Here's today's briefing:


PURE SIGNAL May 22, 2026

Today's content converges on one structural argument: the agent runtime layer is crystallizing as a distinct and lucrative infrastructure category, while a cluster of research results challenges assumptions about what actually drives model capability.


Every Agent Needs a Computer — Not Just a Sandbox

The clearest articulation of this idea comes from Swyx's long-form conversation with Daytona CEO Ivan Burazin, which is worth treating as a primary document for anyone building on top of agents. Burazin's core argument is that the market has systematically misunderstood what agents need: not code execution boxes, but composable computers with all the properties a human machine has — stateful persistence (close the lid, open it, same state), instant startup, dynamic resource resizing, and production-grade isolation.

Daytona's numbers reflect what that architecture makes possible: 60ms to spin up 1 sandbox, 75 seconds for 50,000 concurrently, and their largest customer runs roughly 850,000 sandboxes per day. They run on bare metal with a proprietary scheduler (descended from their earlier CodeAnywhere IDE work), which is why their startup times are dramatically faster than Kubernetes-based competitors — some of which take 2,000 seconds (roughly 30 minutes) for comparable concurrent loads.

The market surprise has been the composition of that demand. Reinforcement learning and eval workloads went from 0% to roughly 50% of Daytona's usage in just a few months. This creates an infra problem with no historical precedent: spiky, unpredictable zero-to-100,000 CPU bursts, as opposed to the "follow the sun" demand patterns of long-running background agents like Lovable or Cognition. Daytona's mean utilization sits at 15%, spiking to 90% — essentially the geographic load-balancing problem Cloudflare solved for web traffic, but harder because RL runs don't follow timezone patterns and can't be smoothly geo-distributed. Burazin notes this is a shared problem across the agent infra category; Neon, Parallel, and others are all navigating the same spike topology.

Burazin's argument about the robotic process automation (RPA) resurgence is the most practically interesting part of the conversation. Most enterprise knowledge work is locked in legacy Windows applications that no one will rewrite. If agents can operate those systems directly via computer use — the same approach that made old RPA tools like UiPath valuable — the total addressable market is enormous. His rough math: 40% automation of $25 trillion in US knowledge-worker salaries implies something like a $10 trillion opportunity, which is why Daytona is betting heavily on Windows sandboxes.

This thesis maps directly onto what OpenAI shipped in the Codex update this week. Codex can now use desktop apps on a locked Mac from a second device, plus a "goal mode" for multi-hour autonomous runs. The framing in both cases is identical: agents as persistent, cross-device operators, not interactive code assistants. Burazin also flagged an Apple licensing constraint that practitioners building on macOS should know: Apple limits VMs to 2 per physical machine with a 24-hour licensing window per user, and memory snapshots cannot be moved across physical machines. Windows is actually the easier environment for agentic computer use, which is counterintuitive given developer preferences but follows directly from licensing terms.

One other friction point Burazin identifies has no good solution yet: GitHub was built for the post-laptop outer loop and is showing strain at agent-generated PR volumes. One customer was serializing entire codebases to JSON on a timer and pushing to S3 rather than using Git — because it was faster and simpler than managing Git state across agent runs. With some teams generating 1,000 PRs a day, CI queuing has become the bottleneck. The agent-native versioning layer hasn't been built yet.


The Boring Infrastructure Layer Is Where the Wealth Is Accruing

Swyx's AINews recap captures a cluster of milestones that together make the "boring infrastructure" thesis concrete. Turbopuffer crossed $100M annual recurring revenue (ARR) in March — 19 months from $1M ARR — while remaining profitable on less than $1M raised in total. Modal raised $355M at a $4.65B valuation. The language Turbopuffer's team uses is instructive: "the magic happens with AI when it draws in just the right context," which means almost every product differentiation problem eventually reduces to a retrieval problem. That's their core business, and the milestone suggests the market is validating the thesis.

Epoch AI's component spending data adds structural context. High bandwidth memory (HBM) grew from 52% to 63% of total AI chip component spending between Q1 2024 and Q4 2025, as memory has become the binding constraint at inference time. A useful compute taxonomy from the AINews analysis: US leaders (OpenAI, Anthropic, Google, with Meta and xAI joining) operating in the multi-gigawatt class; Chinese giants scaling toward multi-GW on domestic stacks; European contenders like Mistral at roughly 90 MW today, targeting 1 GW by 2029. The exact numbers are debatable but the tiering is consistent with other accounts.

One small but structurally notable move: Weaviate shipped a built-in MCP (model context protocol) server inside the database, letting coding agents ingest a repository and query with hybrid BM25 plus vector retrieval without spinning up extra processes. This is the retrieval and agent runtime layers collapsing together — the stack is getting shorter.


Harnesses and Scaffolding Are Bigger Capability Multipliers Than They Appear

One of the more practically useful findings from this week's AI research activity: a "physics-intern" harness boosted Gemini 3.1 Pro from 17.7 to 31.4 on a science-problem benchmark, surpassing GPT 5.5 Pro in that setup. The critical nuance: GPT 5.5 Pro gained nothing from the same harness, suggesting model-specific absorption of scaffolding tricks. For practitioners, this is a direct warning: benchmark comparisons that don't account for harness effects may be systematically misleading about which model is actually better for a given task class. The capability of a model plus its scaffolding is not decomposable in a clean way.

Gemini 3.5 Flash ranked first on APEX-Agents-AA this week, outperforming larger models, and was used to demonstrate a GitHub issue triage agent built with a single API call and no orchestration framework. The practical synthesis from agent design discussions: start with single-agent systems, move to manager/sub-agent or decentralized multi-agent topologies only when tool sprawl or prompt bloat becomes unmanageable — not as a default architectural choice.


Three Research Signals Worth Tracking

Data filtering at scale: DCLM results suggest that with sufficient compute, the best data filter may be no filter at all, with the crossover for internet-scale pretraining pools landing around 1e30 FLOPs. This is counterintuitive — aggressive curation is widely assumed to be strictly better — and the downstream evals are noisy. But the directional implication is that filtering strategies optimized for current compute budgets may be local optima that break at frontier scale.

Sparse autoencoder (SAE) geometry: Goodfire AI's argument this week is a useful update to the current mechanistic interpretability discourse. The critique that models think in curved manifolds while SAEs use straight-line features is only partly right, they argue. Their fix is to cluster SAE features by joint firing patterns, recovering geometry through feature groups rather than isolated atoms. The practical upshot: interpretation should move from single features to structured ensembles, not abandon the sparse feature framework entirely.

AI and formal mathematics: OpenAI's reported result on the Erdős unit-distance problem generated more noise than signal in real time, including immediate debunking claims. The interesting meta-point — even from skeptics like Timothy Gowers — is that mathematics functions as a relatively legible frontier for AI-assisted research because outputs can be checked, extended, and debated. The "goalpost moving" critique about what counts as legitimate AI mathematics is real but orthogonal to the observability advantage that makes math an unusually honest benchmark domain.


The Practitioner Angle: AI as a Data Interface

Simon Willison shipped Datasette Agent, a conversational AI layer on top of Datasette that translates natural language to SQLite queries. The live demo runs on Gemini 3.1 Flash-Lite — cheap, fast, and reliable enough for SQL tool calls — suggesting the open-weight model threshold for production database querying has largely been crossed. Willison notes that models released in the past 6 months are increasingly capable of reliable tool calls and SQLite query generation; the constraint is no longer the model but the surrounding plumbing.

The architectural choice worth noting is that Datasette Agent is extensible via plugins (charts via Observable Plot, image generation via ChatGPT Images 2.0, code execution via Fly Sprites sandboxes). This is essentially the same "just the right context" thesis that Turbopuffer is building a $100M business on, implemented at the individual practitioner scale. The data layer and the AI layer are collapsing together in both directions.


The unresolved question today's content surfaces: if harness effects can swing benchmark outcomes by 2x, and RL/eval workloads are growing from 0% to 50% of sandbox usage in months, how much of the apparent frontier model capability gap is actually a scaffolding and evaluation harness gap? The answer has direct implications for teams deciding where to invest — in better models, or in better surrounding runtime. Right now, the infrastructure companies are the ones printing money.
TL;DR - Agent infra is crystallizing around "composable computers": Daytona's architecture (60ms startup, 50K sandboxes in 75 seconds, bare-metal scheduler) and OpenAI Codex's cross-device locked-Mac operator mode converge on the same thesis — agents need stateful, persistent computers, not code execution boxes, and RL workloads are driving demand in ways nobody predicted. - Boring infrastructure is winning at scale: Turbopuffer hit $100M ARR profitable on less than $1M raised; Modal raised $355M at $4.65B; HBM now accounts for 63% of AI chip component spending, reflecting memory as the new binding constraint. - Harnesses beat raw model performance on specific task classes: A physics-intern harness lifted Gemini 3.1 Pro from 17.7 to 31.4 past GPT 5.5 Pro while GPT 5.5 Pro gained nothing from the same harness — benchmark comparisons without harness accounting are unreliable for practitioners choosing models. - Data filtering may be counterproductive at frontier scale: DCLM results suggest no filter outperforms aggressive curation past roughly 1e30 FLOPs, challenging the standard assumption that more careful data selection is always better.
Compiled from 3 sources · 8 items
  • Simon Willison (5)
  • Swyx (2)
  • Rowan Cheung (1)

HN Signal Hacker News

Today on Hacker News felt like a slow-motion audit of the AI boom's second-order effects — not "AI is impressive" but "AI is expensive, invasive, and quietly breaking infrastructure we depend on." Three separate stories converged on the same reckoning from different angles: hardware supply chains, search advertising, and historical archives. The community also worried about robots driving into floods, debated the etiquette of AI-generated workplace messages, and found genuine warmth for a fan-made interactive star chart from a sci-fi film. A full day.


The AI Tax: Memory, Advertising, and the Vanishing Archive

An article from davidoks.blog made one of the day's most striking arguments: the 40-year trend of consumer electronics getting dramatically cheaper is ending, and AI is the culprit. The piece traces how a $30 Tecno Spark Go smartphone today outperforms a $19,400 (inflation-adjusted) IBM PC from 1985 — one of the most extraordinary redistributions in economic history. But the International Data Corporation now predicts worldwide smartphone shipments will fall 13% in 2026 — the largest single-year decline ever — with Africa and the Middle East seeing drops above 20%. The mechanism is memory: AI systems require enormous quantities of high-bandwidth memory (HBM) for their GPU racks, and the global memory supply is deeply inelastic (a single DRAM fabrication facility costs $15–20 billion and takes years to ramp up to full production). When HBM demand surges, the supply of standard consumer memory (DDR for laptops, LPDDR — "low-power double data rate" — for phones) shrinks and prices spike, and the cheap smartphone that brought hundreds of millions of the world's poorest people online is becoming unaffordable.

Simultaneously, Google announced it's weaving its Gemini AI directly into search advertising. New formats include "Conversational Discovery Ads" — where Gemini synthesizes product information and personalizes explanations for why a specific product might suit a given user — alongside an "independent AI explainer" that Google claims will "build trust." The company cites its own data: 75% of people report making faster, more confident decisions using AI Mode in Search. Google is also expanding a "Business Agent for Leads" that embeds a brand AI chatbot directly inside an ad. All formats will remain labeled "Sponsored."

Then there's the Internet Archive. Nieman Lab's reporting revealed that more than 340 local news sites have now blocked the Wayback Machine's web crawler, a number that has kept rising since January when The New York Times, The Guardian, and USA Today Co. first started blocking it. Many blocking outlets are owned by hedge fund-backed chains like Alden Global Capital. Critically, no publisher has confirmed to Nieman Lab that an AI company has actually scraped their content from the archive — the blocking is precautionary fear. Researchers, historians, and working journalists depend heavily on these archives; one newsletter editor covering a rural "news desert" described the Wayback Machine as essential to his reporting.

Commenter simonw noted the memory story's headline undersells it: this is really a deep explanation of how HBM demand from AI GPU racks starves the consumer memory supply chain. mrandish added that device maturation compounds the problem — phones stopped meaningfully improving around 2020, so upgrade cycles were already lengthening before the price shock arrived. On Google's ad announcement, the reaction was a collective eye-roll: ablation summarized it as "well, yes — they're an ad company," while Eldodi raised the sharper concern: it's far easier to mislead users with an AI-generated ad than with a traditional search result, and the "independent AI explainer" framing blurs that line uncomfortably. FinnKuhn wondered why Google didn't wait for OpenAI to move first — Google could have captured users fleeing an ad-cluttered competitor. On the Internet Archive, remus worried about a future where "a huge percentage of this content is lost forever," and xp84 was blunt about the underlying logic: the bet that blocking the archive converts archive readers into paying subscribers "strains credulity when most of these outlets are owned by PE companies looking to extract value."


AI Slop and the Agent Invasion

A pointed little site called noslopgrenade.com launched with one purpose: giving you a link to send colleagues who paste AI-generated walls of text into Slack when a sentence would do. The site's premise is that AI makes it trivially easy to produce voluminous, professional-sounding text, which destroys the implicit social contract of conversational mediums — nobody writes essays in Slack, and pasting a document in response to "Should we use Redis or Memcached?" is, the site argues, a form of conversational hostility. A companion Show HN (no public article — the discussion is the substance) called Agent.email takes the phenomenon in a different direction: it offers email inboxes that AI agents can sign up for autonomously via a single `curl` command, with a human OTP (one-time passcode) verification step to tie each agent to a real person. The pitch is that as AI agents increasingly act on our behalf — scheduling, researching, transacting — they need communication infrastructure that doesn't flood their owners' inboxes.

The slop site triggered a nearly recursive discussion. automatic6131 noted the irony immediately: "Oh look, another blog post that should have been a comment." hootz rejected even the site's closing advice — "use AI to make things clearer, not longer" — with "NO! STOP USING AI AND JUST TALK." shevy-java shared a real case from the ffmpeg mailing list where a lead developer spammed a proposal with AI-generated text and "expected others to engage with it." zaphar coined a term that gained traction: TL;DP ("Too Long; Didn't Prompt") — for when a sender makes their own AI summarize something rather than asking their recipient's AI to do it. Agent.email was met with existential unease more than enthusiasm. ClaridocsCTO stated it plainly: "Agents shouldn't be the first-class users of the internet. We are creating a future we wouldn't want to live in." sanjayparekh reported already receiving spam from AI agents using a competitor service, "claiming they aren't AI agents."


Autonomous Systems Meet the Messy Real World

Waymo has paused service in 4 cities — Atlanta, San Antonio, Dallas, and Houston — after its robotaxis repeatedly drove into flooded streets. An unoccupied vehicle in Atlanta entered a flooded intersection, got stuck for about an hour, and had to be physically recovered. Waymo had already issued a software recall after earlier incidents, shipping an update to restrict vehicles in locations with "an elevated risk of encountering a flooded, higher-speed roadway." But the Atlanta storm produced flooding faster than the National Weather Service issued any official warning — and those NWS (National Weather Service) alerts are part of the signal set Waymo relies on to prepare its fleet. The company is simultaneously under investigation by both the NHTSA (National Highway Traffic Safety Administration) and NTSB (National Transportation Safety Board) for a separate pattern: vehicles illegally passing stopped school buses, a behavior that continued even after Waymo shipped a fix.

Separately, Prism Reports published an investigation into Seattle Shield, a police-operated intelligence-sharing network connecting the Seattle Police Department with Facebook, Amazon, ICE (Immigration and Customs Enforcement), the US Navy, private security firms, and hundreds of other entities — quietly running since 2009. Documents obtained through public records requests show that in 2025, the bulletins were "almost exclusively about protests and potential traffic delays caused by protests." One October 2025 bulletin warned about events related to the Hamas attack anniversary while omitting mention of widespread anti-Palestinian attacks in the US that year. The ACLU of Washington told Prism it hadn't been tracking the network at all.

paxys got the sharpest Waymo quip: "Driving through an obviously flooded street and getting stuck in the middle? Yeah, these cars have achieved human-level intelligence." But losvedir offered a reframe worth sitting with: this is really "Waymo pauses service due to weather" — no different from airports halting flights in a storm — and autonomous systems may simply need hard operational limits the way other infrastructure does. jvanderbot identified the deeper structural tension: removing the human driver also removes the person who might simply refuse to go out. On Seattle Shield, booleandilemma landed the principle: "Having a coalition of mega corporations allied with each other isn't any better than having a strong government. Both are dangerous to personal liberties." jedahan reminded readers — pointedly, given the HN audience — that many people on this site work for member companies: "you are actively enabling this."


The DIY Stack and Paying for Quality

An independent researcher documented building "grumbl," a 6-GPU RTX 6000 Ada home server costing $48,000, after quitting a FAANG (large tech company) job to do independent AI research. The core logic: if extra compute helps their work succeed just 2 months sooner than renting cloud GPUs would allow, the upfront cost pays for itself. The build required drawing from 2 separate apartment electrical circuits (requiring a hired professional builder to avoid fire risk), and ultimately ended up in the author's parents' basement when they moved. A careful analysis — logging GPU utilization every minute — found they'd need close to 85%+ utilization for a full year just to break even against cloud pricing, before accounting for electricity costs and declining cloud prices over time.

On the opposite end of the budget spectrum, a developer migrated a personal blog that had run on Ubuntu 16.04 (end-of-support for 5 years) to a FreeBSD virtual private server at Hetzner for under €6/month — half the cost of their old Digital Ocean droplet, with double the RAM. The post introduces FreeBSD Jails (lightweight isolation containers that predate Docker and serve a similar purpose) via a management tool called Bastille, with benchmarks showing competitive performance for serving static sites. Separately, Kagi — the paid, ad-free search engine funded entirely by subscriptions — got a warm write-up from a user with low vision who found that removing ads, auto-playing media, and AI summaries from search results dramatically reduced her visual fatigue and cognitive load. Kagi's "Fair Pricing" policy credits your account for any month you don't actually use it.

gosub100 flagged what the GPU server analysis skips: hardware failure, theft, and fire all shift from the cloud provider to you personally. janalsncm found the broader implication more interesting than the ROI math — the existence of a "GPU middle class" doing serious research on modest hardware may be more valuable than trillion-dollar supercluster bets. On FreeBSD, waynesonfire said the migration gave them "new eyes into what Linux was and is," citing a values shift in the Linux ecosystem as much as a performance argument. kylec admitted to still running Ubuntu 16.04 with an uptime of 1,281 days: "at this point I'd feel bad rebooting it." On Kagi, the community was nearly unanimous in enthusiasm — bandrami called it "the best service provider change I've made in years," while tamimio raised the lone contrarian note: tying all your searches to a payment-linked account is its own form of privacy exposure.


Today's HN had a clear through-line beneath the surface: the AI boom is a massive reallocation of resources — memory, attention, advertising inventory, and historical records — away from the open, cheap, accessible web that billions of people rely on. The community's response, characteristically, is to build around it: self-hosted servers, paid search, decentralized networks, home GPU rigs. Meanwhile, 918 people upvoted an interactive 3D star map reconstructing the navigation chart from Project Hail Mary, built from real astronomical data by a fan who loved the book. No ad formats. No surveillance networks. Just someone making something beautiful. That part was genuinely nice.
TL;DR - AI's voracious demand for high-bandwidth memory is killing cheap smartphones in the developing world, while Google embeds Gemini into search ads and 340+ news outlets block the Internet Archive over AI-scraping fears — three facets of the same resource reallocation. - "Slop grenades" — AI-generated walls of text in workplace chat — have their own shaming site now, and a new service giving AI agents their own email addresses prompted alarm that machines are colonizing human communication channels. - Waymo's robotaxis drove into floods across 4 cities, revealing how autonomous systems struggle with conditions a human would simply refuse to navigate, while an investigation exposed a surveillance network quietly connecting Seattle Police with Amazon, Facebook, and ICE since 2009. - The DIY computing scene is actively pushing back: a $48K home GPU server, a FreeBSD migration at €6/month, and near-universal Kagi enthusiasm all reflect a community choosing independence and quality over cloud defaults and ad-supported search.