Willison & Thompson AI Signal — Featured
Today's content is a light day — 2 short curatorial posts from Simon Willison, both worth sitting with for what they imply together.
Simon Willison — The NYT publishes a fabricated AI-generated quote as journalism
The New York Times issued an editors' note correcting a story in which a quote attributed to Canadian Conservative leader Pierre Poilievre was not a quote at all — it was an AI-generated summary of his views, rendered as direct speech. The specific fabrication: Poilievre was attributed as calling party-switchers "turncoats." He did not use that word in the referenced April speech.
Willison flags this with minimal commentary, letting the editors' note speak for itself. The failure mode is straightforward: a reporter used an AI tool, the tool returned a plausible-sounding paraphrase dressed as a quotation, and the reporter did not verify it against the source material. The AI didn't hallucinate a fake event — it hallucinated a fake voice for a real event, which may be the more dangerous failure mode because it's harder for editors and readers to catch without going back to primary sources.
The tags Willison chose are precise: `hallucinations`, `journalism`, `ai-ethics`. No editorializing needed.
Simon Willison — Quoting Andrew Quinn on the optimal number of wheels to reinvent
Willison surfaces a footnote from Andrew Quinn's piece on replacing a 3 GB SQLite database with a 10 MB finite state transducer (FST) binary — but the philosophical observation in that footnote stands entirely on its own. Quinn describes the trap of never building things yourself for fear that a better solution already exists (the awk example: why write a TSV search-and-replace when awk already solves that class of problem?).
Quinn's claim is that the right number of wheels to reinvent is not zero and not a thousand — it's roughly 4 or 5 in most domains, and 20-30 in highly formalized fields like mathematics or computer science. Each reinvention, combined with directed questioning along the way, propels the learner toward the actual frontier of a field faster than either passive study or exhaustive from-scratch construction. The mechanism isn't mystical: you only identify what you don't know by building, and you only know when to stop building by having built enough to recognize existing solutions when you encounter them.
Willison's decision to highlight this in a week when AI-assisted coding is ubiquitous is the real editorial act here. He doesn't say so explicitly, but the implication hangs in the air: if AI tooling makes it trivially easy to skip the reinvention phase entirely, Quinn's calibration question becomes urgent — how many wheels does a developer need to have built themselves before AI assistance accelerates rather than bypasses their growth?
Synthesis
Today's 2 pieces, read together, circle the same underlying problem from opposite directions. The NYT incident is a case study in what happens when a practitioner lacks the domain-specific judgment to verify AI output — the reporter presumably couldn't catch the fabricated quote because they weren't close enough to the source material to know it was wrong. Quinn's framework offers a structural explanation for how that gap forms: if you skip the reinvention phase, you don't develop the calibrated skepticism that comes from having built things yourself and knowing where they break.
The connection isn't just thematic. The NYT failure isn't a hallucination problem in isolation — it's a verification-culture problem that AI makes newly consequential. Hallucinations have always existed in research tools (bad citations, misremembered sources); what's changed is the fluency and confidence with which AI outputs land, making them harder to flag for checking. Quinn's point about directed questioning being as important as the building itself is relevant here: the skill being lost may not be coding or writing per se, but the habit of asking "how would I know if this is wrong?"
Neither piece is alarmist. Quinn is explicitly optimistic about getting to the frontier faster with the right balance. Willison is characteristically dry. But the gap between "AI accelerates the competent" and "AI exposes the incompetent" is the same gap — and today's content quietly marks where that line sits.
TL;DR - A real AI journalism failure: the NYT published an AI-generated paraphrase as a direct quote from a sitting politician, falsely putting the word "turncoats" in his mouth - Quinn argues the optimal number of things to build from scratch is ~4-5 in most domains, ~20-30 in rigorous fields — enough to reach the frontier, not so many you never get there - The two pieces together surface a single latent question: without the reinvention phase, do practitioners lose the calibrated skepticism needed to catch AI errors before they publish?
Compiled from 1 source · 2 items
- Simon Willison (2)
HN Signal Hacker News
Today's HN had the feeling of a community taking stock. Multiple threads, from wildly different angles, converged on the same core tension: what are the real costs of letting platform giants — Google, Apple, AWS, Anthropic — run the infrastructure of your digital life? Woven through that was a practical counterthread: builders actively trying to route around dependency, one local model and one hand-crafted architecture at a time.
The Great AI Dependency Audit
The day's most-discussed cluster started with a direct challenge from the developer behind The Brutalist Report, a deliberately minimal news aggregator. His iOS app generates article summaries entirely on-device using Apple's local AI APIs — no server detours, no vendor account, no data retention footnotes required. His argument isn't anti-AI; it's anti-reflexivity. Developers reaching for an OpenAI or Anthropic API call out of habit are, he writes, "taking a UX feature and turning it into a distributed system that costs you money." The silicon in your pocket has a dedicated Neural Engine sitting mostly idle while your app waits for a JSON response from a server farm in Virginia. His closing line lands hard: "AI everywhere" is not the goal. Useful software is the goal.
A practical companion piece showed what "local AI" actually looks like in 2026. After testing several models that technically fit in memory but were unusable in practice, the author of the M4 piece settled on Qwen 3.5 9B quantized — achieving ~40 tokens per second with thinking enabled and a 128K context window on a 24GB MacBook Pro. The setup friction (choosing between Ollama, llama.cpp, and LM Studio; tuning temperature, top_k, and cache quantization types) is real but surmountable. The author's enthusiasm is genuine: "No internet connection required. Not to mention a way of reducing your dependence on big US tech, even if just a tiny bit."
The most honest vibe-coding (building software by steering an AI rather than writing code directly) post-mortem in months came from a developer who built k10s — a GPU-aware Kubernetes (a system for managing containerized applications across servers) dashboard for NVIDIA cluster operators — in 234 commits over 30 weekends. The first 3 weekends produced a working clone with pods, nodes, services, and Vim keybindings. Then he added the GPU fleet view that was the whole point of the project, and it silently broke the pods view. After 7 months of velocity masking architectural rot: "AI writes features, not architecture. The longer you let it drive without constraints, the worse the wreckage gets." He's rewriting in Rust — not because Rust is better, but because it's the language where his instinct for "something's wrong" kicks in before he can articulate why. That instinct, he argues, is the one thing vibe-coding can't replace.
James Shore adds the mathematical frame. If you're writing code 2x faster with an AI agent but not halving your maintenance costs, you're trading a temporary speed boost for permanent indenture. His crowd-sourced estimate: for each month of writing code, you'll spend 10 days on maintenance in year 1 and 5 days per year after that — forever. Double your output without improving code quality, and you hit the 50% maintenance ceiling under a year earlier than before. The implication: agents that churn out plausible-looking code that quietly accrues technical debt aren't accelerating you — they're setting a trap.
Finally, curl lead developer Daniel Stenberg was offered access to Anthropic's Mythos model — announced with considerable fanfare as "dangerously good" at finding security vulnerabilities. After delays, a third-party scan was run on curl's codebase. Mythos found 1 confirmed low-severity vulnerability. Other AI security tools had already produced 200-300 bugfixes and a dozen confirmed CVEs from curl over the prior 8-10 months. Stenberg's verdict: "the big hype around this model was primarily marketing."
The community was genuinely split. On local AI hardware, Galanwe threw cold water: frontier-worthy local inference runs $10,000-$30,000 upfront in hardware. Sourc3 offered the pragmatist's middle path: local models for fun, OpenRouter at under $3/day for actual work. The vibe-coding post drew pointed title-policing — plastic041, apt-apt-apt-apt, and xantronix all noted that "going back to writing code by hand" is misleading since Claude is still generating the code; the real change is doing architecture by hand. On Mythos, apexalpha admitted the hype "reached the CISO of my small semi-government org" and caused mild panic — but also unlocked budget, so: "never waste a good marketing scare." Vidarh offered the steelman: curl is already analyzed to death; most production software isn't, so Mythos might still be dangerous in the wild.
Platform Capture: Who Controls Your Stack
Three stories today made essentially the same argument from three different vantage points.
The most structurally important was a thread from the GrapheneOS project on hardware attestation (hardware-level cryptographic proof that your device is running approved software). The article body is empty — this is a Mastodon post, so the discussion is the substance. Apple's App Attest and Google's Play Integrity APIs let services verify that a user is on a certified, unmodified OS. As governments mandate these APIs for digital payments, age verification, and digital ID — the EU Digital Identity Wallet being the sharpest example — users of open alternatives like GrapheneOS get locked out of banking apps and government services. Not due to a vulnerability, but because Google hasn't certified their hardware.
An early AWS evangelist returned briefly after years away — to benchmark a 192-core spot instance and test Claude on Bedrock — and had his dormant account suspended as a suspected security breach when the large instance fired up. AWS support took 3+ days to respond. Bedrock was "WAY more expensive" than a direct Anthropic subscription and slower. His broader indictment covers AWS's appropriation of open-source communities (Elasticsearch → OpenSearch, Redis → Valkey), an IAM (Identity and Access Management) system of Byzantine complexity, and a drift from customer-focused engineering to customer capture.
Maryland's Office of People's Counsel filed a formal federal complaint against being billed $2 billion — roughly $345 per residential customer over 10 years — for grid upgrades driven primarily by data center construction concentrated in Virginia, Ohio, and Pennsylvania. "Maryland customers have neither caused the need for these projects nor will they meaningfully benefit from them," said People's Counsel David Lapp. PJM Interconnection, which manages electricity transmission for 65 million Americans across 13 states, allocates upgrade costs regionally even when demand growth is highly localized.
The attestation thread drew the day's most heated commentary. ChuckMcM framed it as the end of "open anything" — once essential services require Google/Apple-certified hardware, the escape hatch closes. Miohtama landed the sharpest point: the EU Digital Identity Wallet requiring Google or Apple attestation means European digital identity is structurally dependent on American platforms — the opposite of the sovereignty the EU claims to pursue. On AWS, faangguyindia offered the counter-benchmark: "I've had apps doing a few million [requests] a day on Hetzner for nearly a decade — $20k/month on GCP? Are you kidding me?" On Maryland, trollbridge predicted AI infrastructure electricity bills will be "the biggest political issue of the upcoming midterms" — a grievance that crosses partisan lines with an easy villain.
Security: Funnier and Scarier Than It Should Be
A satirical incident report filed as "CVE-2024-YIKES" traces a (fictional but devastatingly plausible) supply chain attack: a developer's laptop stolen, credentials phished via a fake YubiKey store registered 6 hours earlier, cascading through a Rust compression library with 12 GitHub stars that is nonetheless a transitive dependency of cargo itself (the Rust package manager). Approximately 4 million developers receive malware before the attack is accidentally patched by an unrelated cryptocurrency mining worm. The satire lands because every beat is recognizable — the support ticket marked "low priority — user environment issue" and auto-closed, the legitimate maintainer who won the EuroMillions and is researching goat farming in Portugal.
The Obsidian (a popular note-taking app) story is not satire. Researchers documented a real campaign targeting finance and crypto professionals: attackers pose as venture capitalists on LinkedIn and Telegram, invite victims to share a cloud-hosted Obsidian vault, then socially engineer them into enabling community plugins — which deploy a Remote Access Trojan (RAT) named PHANTOMPULSE capable of keylogging, screenshots, and file exfiltration. Cleverly, the trojan resolves its command-and-control server address through the Ethereum blockchain, making it resistant to takedowns. Obsidian CEO kepano appeared in the thread: a major plugin security update is coming, the existing safety warnings were bypassed by victims, and this appears to be a proof of concept with no confirmed widespread infections yet.
Lynndotpy helpfully flagged that the CVE post is fiction, "but it had me very worried during a brief scan." David_shaw made the most pointed observation: agentic AI development tools that execute code automatically will "create a new era of security-related problems" by expanding the attack surface in ways we haven't fully reckoned with. On Obsidian, zhivota drew the blunt practical conclusion: "never accept a shared Obsidian vault, demand a plaintext export."
The monthly Ask HN "What are you working on?" thread — where the discussion is the substance, since there's no article — offered a useful reality check. Under all the platform drama: indie narrative RPGs, SQL canvas tools, vibe-coded games, video cloud platforms migrating off AWS to colocated servers, LLM inference gateways, security scanning tools. The community is still building things, quietly and persistently.
A 2024 piece about James Burke's 1978 TV series Connections circulated as a kind of palette cleanser. The "greatest shot in television": Burke explains thermos flask physics as a Saturn V rocket launches directly behind him on a single, unrepeatable take — the climax of a 50-minute journey from credit cards to the moon. The clip has 18 million YouTube views. Commenters called it the best science communication ever made and mourned that modern documentaries are "dumbed down." RachelF noted the late 1970s produced Connections, Cosmos, and Attenborough's Life on Earth in quick succession: "perhaps it's just me, but modern documentaries are rather dumbed down?" Some things are apparently timeless — including the HN instinct to notice when something was done better before.
TL;DR - The AI dependency reckoning is in full swing: local models are getting usable (if hardware-demanding), vibe-coded architectures are collapsing under their own velocity, and Anthropic's hyped Mythos model found exactly 1 low-severity bug in a codebase already exhaustively scrutinized by other tools. - Platform capture was the day's sharpest thread: hardware attestation is being weaponized as a monopoly mechanism, AWS has squandered its early-adopter goodwill through complexity and open-source appropriation, and Maryland residents are being billed $2 billion for data center power demands they didn't generate. - Software security is funnier and scarier than it should be: a satirical supply chain attack was barely distinguishable from reality, while a real trojan is spreading through shared Obsidian vaults via social engineering dressed as VC networking. - Underneath the platform drama, the HN community is still quietly building — indie games, SQL canvases, local inference rigs, and everything in between.