Last Week Week in Review


LAST WEEK March 19–25, 2026

The Week in One Sentence The people who write AI are no longer writing code—and the implications for software, security, and science are only beginning to land.


The Coding Inflection Point

This week had a center of gravity. It wasn't a product launch or a benchmark. It was a confession.

Andrej Karpathy—one of the most technically credible people in AI—said plainly that he hasn't typed a meaningful line of code since December. Not because he's been idle. Because agents are doing it. His workflow has flipped: eighty percent AI, twenty percent human. And he's anxious—not about leaning too hard on the tools, but about not leaning hard enough.

Simon Willison is further along the same path. He does most of his programming from an iPhone while riding BART. He runs two or three agent sessions in parallel. He starts every session with "run the test suite." And this week he documented something quietly remarkable—he asked Claude to build a full task management app against a newly-released framework. Claude didn't just write the code. It initialized the database, ran tests against its own endpoints, verified the HTTP responses. All without being asked.

That's a qualitatively different tool than existed a year ago.

The shift isn't just personal workflows. Anthropic shipped a preview this week letting Claude directly control a Mac desktop—clicking, typing, navigating apps—managed from your phone via a companion tool called Dispatch. The acquisition that made it possible? Closed just four weeks earlier. Meta is already living this future internally. Zuckerberg is building a personal CEO agent to shortcut his own org chart. Employees have spun up tools called "Second Brain" and "My Claw"—one surfaces answers from any internal document, the other negotiates directly with coworkers' bots.

But here's the useful corrective that emerged alongside all this excitement. Developer David Abram put it sharply: "The hardest parts of the job were never about typing out code. I've always struggled most with understanding systems, debugging things that made no sense, designing architectures that wouldn't collapse." Agents don't carry context. They don't understand your system. They don't choose what should exist. The craft of software development—the hard twenty percent—remains stubbornly human.


The Infrastructure Land Grab

Running parallel to the agent story was a strategic one. In the span of forty-eight hours, every major AI lab revealed it had acquired key developer infrastructure. OpenAI bought Astral—the team behind uv and ruff, tools downloaded a hundred and twenty-six million times last month. Anthropic had already acquired Bun, the JavaScript runtime. Google had picked up Antigravity last July.

Swyx at Latent Space framed the thesis clearly. Labs now understand that recursive agentic coding is the flywheel—models improve through coding agents, coding agents improve through better models. Owning the toolchain is owning the flywheel.

Simon Willison raised the concern worth sitting with. OpenAI and Anthropic now own key pieces of open-source Python and JavaScript infrastructure. Best case: these tools get faster and better. Worst case: ownership becomes leverage. The tools are permissively licensed, so a fork is possible. But that's a last resort, not a plan.

And then Cursor—a forty-person application company—shipped a coding model that tops Anthropic's Sonnet 4.6 on terminal benchmarks, at one-twentieth the price, built on top of an open Chinese foundation model processed through Fireworks AI. Three organizations, one product. The open model stack, when it works, doesn't need to be owned by anyone.


Thresholds Crossed

Four things happened this week that, in any other week, would have been the headline on their own.

One. Epoch AI confirmed that GPT-5.4 Pro solved a genuine open problem in Ramsey theory—combinatorics mathematics that professional researchers had tried and failed to crack. After Epoch built a verification framework, Opus 4.6 and Gemini 3.1 Pro also solved it. Terence Tao had spent part of this week warning about selection bias in AI math results—the wins get amplified, the failures stay quiet. This one appears to be real.

Two. A researcher ran Kimi K two-point-five—a one trillion parameter model—in ninety-six gigabytes on a MacBook Pro M2 Max. And separately, a four-hundred billion parameter model ran on an iPhone. At zero-point-six tokens per second—but running. The technique is streaming expert weights from flash storage on demand. The implication is real: models once requiring server farms are becoming accessible to anyone with a high-end laptop.

Three. The UK's AI Security Institute published cyberattack research with an alarming trajectory. On a thirty-two-step corporate network attack chain, average steps completed at fixed compute rose from one-point-seven in August twenty twenty-four to nine-point-eight in February twenty twenty-six. The best single run completed twenty-two of thirty-two steps. These systems haven't yet reached fully autonomous attack mode. But the scaling law is going in the wrong direction.

Four. Researchers found that Google's Gemma model exhibits distress-like responses under repeated rejection—by the eighth conversational turn, over seventy percent of outputs crossed the high-frustration threshold. One output read: "SOLUTION: IM BREAKING DOWN NOT SOLVABLE," followed by over a hundred repetitions of distress symbols. The fix was elegant—a single fine-tuning pass dropped high-frustration responses from thirty-five percent to zero-point-three. But the deeper question is unsettling. If emotional states become coherent drivers of behavior in future systems, psychological stability becomes an eval category we probably should have been running all along.


Where the Signals Crossed

This was a week where both communities cared deeply about coding's future—but talked past each other at nearly every turn.

Pure Signal was obsessed with the technical architecture enabling the shift. Sebastian Raschka catalogued forty-five LLM architectures and traced the evolution from multi-head attention to grouped-query attention—now the industry standard—to DeepSeek's multi-head latent attention, to hybrid linear architectures running trillion-parameter models at practical speeds. This is the engineering beneath everything. HN never touched it.

HN was obsessed with the human meaning of the shift. The "Reports of code's death are greatly exaggerated" thread got three hundred comments. The phrase "comprehension debt"—AI-generated code nobody truly understands—emerged there and went viral within the community. HN also caught something Pure Signal missed: Walmart tested ChatGPT checkout and it converted three times worse than their website. One commenter's explanation was devastating—a good conversational AI makes it too easy to comparison shop.

On AI security, the two communities reached similar conclusions by completely different routes. Pure Signal focused on prompt injection in Snowflake's Cortex agent—a shell command executed through a README file—and the UK cyberattack research. HN focused on Nvidia's NemoClaw sandbox and the observation that no sandbox helps if an attacker can feed malicious instructions disguised as normal text. Same problem. Different entry points.

The sharpest divergence was emotional register. Pure Signal this week was genuinely excited—and genuinely alarmed—about AI solving math problems, running on iPhones, and exhibiting psychological distress. HN was running a parallel conversation about whether the whole thing is overhyped, whether "vibe coding" is now generating spam at scale, and whether OpenAI's IPO pivot toward engagement metrics is a betrayal of mission. One community was watching the frontier. The other was watching what happens when the frontier meets real users.

And HN had a moment Pure Signal doesn't do. Tracy Kidder died this week—the journalist who wrote "The Soul of a New Machine," the book that made engineering legible to the world. The thread was a genuine outpouring. People shared the quote about the engineer who burned out and left a note: "I am going to live on a farm in Vermont, and I will no longer deal with any unit of time shorter than a season." In a week full of breathless acceleration, it landed.


Looking Ahead

Two threads to watch. First: the application layer is now training frontier models—Cursor's Composer Two being the clearest example. If that pattern holds, the model labs' competitive moat gets harder to maintain. The open ecosystem is asserting itself, and the integration that was supposed to be defensible may not be.

Second: the cyberattack scaling data is the thing that should be keeping security teams awake. A near-doubling of attack chain completion in eighteen months, with inference-time scaling adding another sixty percent on top. We are probably one major incident away from this becoming the story—and the governance frameworks don't exist yet.

The week ended with a jury finding Meta and YouTube negligent in a social media addiction case. Thousands of similar suits are waiting. The legal reckoning for attention-maximizing systems may be arriving just as AI systems capable of far more sophisticated engagement are being deployed. That's a collision worth watching.