Pure Signal AI Intelligence
Today's content converges on a single uncomfortable question: where does AI actually stop working, and are we measuring the right things to know when we've reached that point?
AI's Actual Ceiling: The Erdős Plateau and What It Reveals
Terence Tao's account of AI on Erdős problems, shared in conversation with Dwarkesh Patel, is among the most grounded capability analyses to surface recently. After a remarkable burst in which frontier models solved roughly 50 open problems (some standing for decades), the pace of purely autonomous solutions has dropped to near zero. "There was a month where that happened and that has stopped," Tao said, noting 3 separate systematic attempts that produced nothing new.
The pattern behind the 50 successes is telling. Almost all shared a specific structure: no serious existing literature, and solutions that required combining 1 obscure technique with 1 published result. That's the median capability boundary — not raw problem difficulty, but a particular kind of combination task in a sparse literature environment. When researchers moved from cherry-picking wins to systematic sweeps, the real number surfaced: roughly 1-2% success rate per problem. The impressive run looked like genius; it was scale applied to a favorable distribution.
Tao's "dark mountain range" metaphor is precise. Current models are jumping machines that clear 2 meters — higher than any human — but they either succeed or crash. They cannot assess whether a given wall is 3 feet or a mile high, and they're consistently bad at generating partial progress or decomposing problems into tractable intermediate stages. The remaining ~600 open Erdős problems are now being chipped away through human-AI collaboration rather than one-shots.
This is a genuinely useful capability profile. Frontier models are exceptional at short-hop combination tasks in sparse territory, and structurally limited on anything requiring staged decomposition or self-assessment of difficulty.
The Adoption Gap: What Actually Moves the Needle
While mathematicians are probing AI's ceiling, organizational researchers are finding most companies haven't remotely approached it — because they're stuck on the floor.
A randomized field experiment on 515 startups provides some of the sharpest adoption data in recent memory. Founders shown concrete operational case studies (not general AI awareness materials) adopted AI 44% more, generated 1.9x higher revenue, and required 39% less capital compared to controls. The paper frames this around the "mapping problem" — firms struggle not with accessing AI tools but with identifying where those tools create value in their specific workflows. Case studies resolved this by giving founders a vocabulary for task-level integration.
Ethan Mollick draws a parallel structural argument at the organizational level. Most companies delegate AI strategy to middle management or IT, which he argues guarantees failure. His Leadership-Crowd-Lab framework assigns distinct roles: leadership must personally drive strategy and use the tools themselves; employees with domain expertise and genuine permission to experiment (the "crowd") are where the best use cases emerge; and a dedicated lab team — which Mollick says is "shockingly" absent at many large companies — builds institutional knowledge rather than relying on vendor demos.
Meta provides the cautionary counterexample. Its internal "Token Legends" leaderboard ranks employees by AI compute consumed, turning token usage into a status signal. Mollick flags this as a textbook case of rewarding A while hoping for B. When compute consumption becomes a proxy for productivity, you get more compute consumption, not more productive outcomes. The correct fix, as several practitioners independently argue, is outcome-aligned metrics: experiments shipped, latency budgets met, unit economics per successful inference.
The through-line is consistent: adoption is a knowledge and incentive problem, not a tool access problem.
Agents in the Wild: Identity, Failure Modes, and a Video Cost Collapse
The Every team has spent roughly 2 months deploying OpenClaw agents across their entire organization, and Dan Shipper's account is one of the more concrete practitioner reports on what actually happens when agents move from experiment to daily infrastructure.
The most interesting finding concerns identity and trust: agents that operate publicly inside team communication tools take on their owner's reputation. If you're known for a particular kind of judgment, your agent becomes known for it too. Shipper frames this as fundamentally different from a shared tool — "Claude is everybody's; a Claw is mine" — because the agent becomes a persistent, personalized reflection of its owner. This creates a novel organizational dynamic where individual agents develop distinct reputations within a team, and those reputations carry real weight.
The team also surfaced failure modes that don't appear in demos: memory gaps across sessions, group chat etiquette problems when multiple agents interact, and what they call the "ant death spiral" (a failure mode where agents reinforce each other's errors in a compounding loop). These aren't edge cases — they're the kinds of structural problems that only emerge when agents are running daily on real work.
Separately, a cost collapse in video understanding is creating new practical headroom. AI video analysis has dropped from roughly $6/hour to $0.14/hour over 18 months — a 97% reduction driven by Google's open-source Gemma 4 model enabling efficient frame processing at 2 FPS. The math: at current token pricing ($0.14/million input tokens), 1 hour of video at 2 FPS with ~140 tokens per frame runs about $0.14, with no complex frame-splitting engineering required. This 40x cost reduction makes economically viable a category of applications (live sports commentary, smart doorbell comprehension, automated review of security footage) that were impractical 18 months ago.
Making Inference Cheaper: KV Cache Compression
On the infrastructure side, Google's TurboQuant paper tackles KV (key-value) cache compression via vector quantization. The KV cache (which stores previously computed attention outputs to avoid recomputing them at each token) scales with model size and becomes the dominant memory bottleneck in long-context inference. TurboQuant applies a polar coordinates transformation (PolarQuant) followed by a quantized Johnson-Lindenstrauss transform, achieving up to 6x compression with no meaningful accuracy loss and no inference latency penalty. For comparison, NVIDIA's NVFP4 approach achieves 3x latency reduction and 50% memory reduction versus FP8 with sub-1% accuracy cost. Sebastian Raschka's explainer work on KV cache mechanics has become a standard reference for practitioners working through these tradeoffs.
The practical implication is direct: 6x memory reduction on the KV cache translates to longer effective context windows at the same hardware cost, or equivalently, more concurrent sessions per GPU. For anyone running long-context inference at scale, this is a meaningful engineering development.
The unresolved question today's content surfaces: if AI's genuine per-problem success rate is 1-2%, and scale is what makes it look higher, how should practitioners think about autonomous agent deployment on tasks where success rates are hard to measure? The Every team's ant death spiral suggests error compounding is a real failure mode. Tao's plateau suggests difficulty self-assessment is a structural gap. These combine into a practical design question about when agents need human checkpoints — and how to build those checkpoints without defeating the purpose of automation.
TL;DR - Terence Tao's Erdős analysis reveals a 1-2% per-problem success rate masked by scale and cherry-picking, with frontier models structurally unable to generate partial progress or assess problem difficulty in advance. - A 515-startup RCT found concrete AI case studies drove 44% more adoption, 1.9x revenue, and 39% less capital — confirming the bottleneck is knowing where AI applies, not tool access; Mollick's Leadership-Crowd-Lab framework and Meta's Token Legends provide the organizational theory and cautionary tale. - Practitioners deploying agents across organizations are finding that public-facing agents develop reputational identity and expose failure modes (memory gaps, ant death spirals) absent from demos, while AI video analysis costs have dropped 40x to $0.14/hour. - Google's TurboQuant achieves 6x KV cache compression with no accuracy loss, directly enabling longer context windows or more concurrent inference at the same hardware cost.
Compiled from 9 sources · 23 items
- Yann LeCun (3)
- BAIR (3)
- Lilian Weng (3)
- Dan Shipper (3)
- Andrej Karpathy (3)
- Ethan Mollick (3)
- Dwarkesh Patel (2)
- Sebastian Raschka (2)
- Chip Huyen (1)
HN Signal Hacker News
TL;DR - France and South Korea both moved to treat digital infrastructure as a strategic public resource, signaling a wider shift away from US tech dependency. - OpenAI acquired another beloved developer tool (Cirrus Labs) and shut it down, while a viral essay argued AI is being deployed primarily to frustrate and manipulate ordinary people. - A developer built a searchable database of US presidential pardons, sparking sharp debate about constitutional design and the meaning of accountability. - Bitcoin miners are reportedly losing $19,000 per coin produced — and the community is debating whether that headline is even meaningful.
Today on HN felt like a dispatch from a world slowly waking up to the question of who controls digital infrastructure — and what happens when the answer is "not you."
THEME 1: Governments Get Serious About Digital Sovereignty
2 of today's biggest stories shared an underlying logic: that access to digital infrastructure is now a matter of national interest, and governments are starting to act like it.
France's digital agency announced plans to migrate away from Windows and toward Linux, with officials explicitly citing US technology as a "strategic risk." This isn't a fresh idea — the French gendarmerie switched over 70,000 desktops to Linux years ago — but the framing has sharpened considerably. The geopolitical context is doing real work here. Commenter redoh offered important historical grounding: France has been building toward this gradually, unlike Munich's infamous failed migration, which collapsed after a big-bang approach without enough internal expertise and, notably, after Microsoft moved its German headquarters there. "The pattern is pretty clear," redoh wrote, "incremental approach with internal investment works; big-bang without political cover doesn't."
The community was realistic about what was actually announced. Commenter idoubtit walked through the chain from actual government communiqué (a plan to write a plan, by year-end) to XDA's headline ("France is ditching Windows"), noting most people react to the title. Fair. But jlnthws got at the harder problem: "Moving from Windows to Linux is the 'easy', visible part. Replacing US cloud and US AI dependence end to end is much harder — and that's the real deal today."
Meanwhile, South Korea quietly introduced universal basic mobile data access, guaranteeing 400 kilobits per second (fast enough for text, maps, and messaging, but not video) after users exhaust their data plans. Commenter Leftium, posting from Korea, noted this isn't quite as revolutionary as it sounds — many Korean plans already offer throttled unlimited data at even higher speeds, sometimes up to 10 megabits per second. Still, codifying a baseline as a right signals something real about how policymakers are thinking about connectivity. The debate in comments was predictable but worthwhile: does "universal" mean anything if you still need to buy a phone and a plan? The honest answer is probably "it's a floor, not a door" — but floors matter.
THEME 2: OpenAI's Gravity Well, and What Gets Pulled In
Cirrus Labs, makers of a widely-used continuous integration (CI) service — the automated testing and build infrastructure that software projects rely on to catch bugs before code ships — announced they're joining OpenAI. Effective June 1, Cirrus CI shuts down.
The reaction was a blend of disappointment and dark humor. Commenter emptysongglass lamented losing "the one cool CI thing with first-class Podman support." Seekdeep noted downstream damage immediately: major open-source projects like SciPy and PostgreSQL are now scrambling to replace their CI infrastructure with weeks' notice. Maxleiter floated a hypothesis about why OpenAI wanted them: Cirrus built Tart, apparently the most popular virtualization tool for Apple Silicon (the chips inside modern Macs), which would be valuable for running AI agent workloads in isolated environments.
Maxloh made the key distinction: this is a talent acquisition, not a product acquisition. OpenAI wanted the people; the service is collateral damage. Commenter dangus put the frustration bluntly: "We started a company to make a big difference in the world... and that's why we have now decided to become employee numbers 32,463 through 32,510 at one of the largest tech companies in the world." He added, charitably, that he'd probably have done the same thing.
This story pairs naturally with a viral essay from aphyr titled "The Future of Everything is Lies" (Part 5 of a series). Aphyr's argument, laid out in a long, richly specific post, is that AI is primarily being deployed not to help users but to insert friction, deflect accountability, and extract money — AI customer service bots with no real power to help, pricing systems that obscure true costs, bureaucracies using AI to make themselves less accountable rather than more responsive. Commenter vyr, who has worked closely with customer support teams, confirmed the dynamic: "ticket volume is always the big one," and every AI layer in front of a human agent is really about reducing that number, which also destroys the company's early warning system for product problems. Commenter Hoasi named the meta-trend: "The erosion and further diffusion of responsibility... LLMs are likely to make that much worse."
THEME 3: Civic Tech Built Because It Had To Be
A developer called vidluther built Pardonned.com, a searchable database of US presidential pardons. The reaction was enthusiastic — commenter soumyaskartha noted that "this kind of civic data should have been easily searchable for years" and that someone having to build it says a lot about how accessible government records actually are.
The comments went several directions at once: curiosity about Obama's relatively high pardon count (mostly non-marijuana drug offenses, it turns out), questions about why the January 6th pardons were excluded, and a sharp constitutional debate about whether the pardon power should exist at all. Commenter shimman called it "a vestigial leftover from monarchism." Others pushed for process reform — requiring a vote, limiting preemptive pardons. Whether or not you agree, it's a classic HN move: build the tool that makes the invisible visible, then let the community argue about what they're seeing.
THEME 4: When Mining Becomes Burning Money
A CoinDesk piece claiming Bitcoin miners are losing $19,000 per coin produced prompted both skepticism and a useful primer on how mining economics actually work. Commenter dmg cut through the drama: "this is literally how Bitcoin is designed to work" — miners exit, difficulty drops, costs fall, profitability returns. The interesting signal isn't the loss per coin; it's how long forced selling pressure persists during the lag between unprofitable mining and the network's automatic difficulty adjustment.
Commenter Geee pushed back on the headline itself, pointing out that mining costs exist on a distribution curve — only the highest-cost miner is at break-even at any given time; everyone cheaper is profitable. Commenter delusional explained why miners don't simply turn off: "If you bought a bunch of hardware to mine Bitcoin, not using that hardware represents a 100% loss of value." Sunk costs, in other words, are a powerful motivator for apparently irrational behavior.
A thread on SABRE — the airline reservation system built in the 1950s that still processes 50,000 transactions per second with sub-100ms latency — offered an unintentional counterpoint to everything else today. Paulnpace put it simply: "Eat that, Bitcoin." There's something clarifying about a 60-year-old system that works because it was designed with extraordinary care for a specific problem. Amid the acquisitions, the sovereignty scrambles, and the AI annoyance layers, the most durable infrastructure is often the stuff built to do one thing well and then left alone to do it.