March 23, 2026

Pure Signal AI Intelligence

Something quietly crossed a threshold. Two of the most technically credible voices in AI are describing the same experience—and the implications for software development are significant.

The Coding Inflection Point: When AI Writes the Code

Andrej Karpathy named it clearly. December 2025 was the inflection point. He says he hasn't written a meaningful line of code in months—and he's not troubled by that. He's troubled by the opposite. He describes feeling "nervous" when he doesn't use up his AI token budget. That's a remarkable inversion. The anxiety used to come from leaning too hard on tools. Now it comes from not leaning hard enough.

Here's what's interesting. Karpathy frames his current mode as a kind of productive psychosis—obsessively probing what's possible, pushing systems to their limits. He's not passively accepting AI output. He's treating the AI as a collaborator he needs to keep up with. Eighty percent of his code is now AI-generated. And the job market data is beginning to reflect exactly that.

Simon Willison is approaching this from a different angle—but arriving at a similar place. This weekend he experimented with Claude's skills feature—a mechanism that lets you teach Claude about a specific codebase or framework and have that knowledge persist across conversations. He cloned the Starlette Python web framework—which just hit its 1.0 release after eight years—extracted all the breaking changes, and used that to build a custom skill document. Then he asked Claude to build a full task management app using the new API.

What happened next is worth sitting with. Claude didn't just write the code. It initialized the database, ran tests against its own endpoints, verified the HTTP responses, and confirmed everything worked—all without being asked. Willison's observation: for all the attention Claude Code gets as a separate product, Claude itself now functions as a coding agent. It writes, runs, and verifies. That's a qualitatively different tool than a year ago.

The Karpathy and Willison accounts converge on something specific. This isn't about AI generating boilerplate. It's about AI handling the full loop—write, test, debug, verify—well enough that the human's job becomes directing and evaluating rather than implementing. That shift, for developers, is not incremental.

The Architecture Race: How Modern LLMs Handle Attention

Sebastian Raschka spent the past two weeks doing something genuinely useful—cataloguing forty-five LLM architectures and producing a comprehensive guide to how they handle attention. This is worth understanding because attention—the mechanism that lets each token in a sequence consider other tokens—is where most of the interesting architectural competition is happening right now.

The baseline is multi-head attention, or MHA—the original transformer design where every attention head gets its own keys and values. The problem is memory. As context windows grow to one hundred twenty-eight thousand or one million tokens, storing all that key-value state gets expensive fast.

The first generation of solutions was grouped-query attention—or GQA—where multiple query heads share the same key-value projections. Think of it as attention carpooling. You keep modeling quality close to MHA while dramatically reducing memory. GQA is now the dominant approach in most modern open-weight models—Llama 3, Qwen3, Gemma 3 all use it. Raschka calls it the new standard, and he's right.

The second wave is multi-head latent attention—or MLA—pioneered by DeepSeek. Where GQA reduces memory by sharing heads, MLA compresses what gets stored into a compact latent representation—essentially caching a zip file instead of the full contents. The tradeoff is complexity. MLA is harder to implement and serve. But DeepSeek's own ablation studies showed it could actually outperform standard MHA on modeling quality while being more efficient—a rare case where the efficiency technique doesn't cost you performance.

Raschka offers a useful comparison: Sarvam released two models simultaneously—the thirty-billion parameter version uses GQA, the hundred-and-five-billion version switches to MLA. Same team, deliberate choice. It suggests MLA becomes worth the complexity overhead only once you're operating at significant scale.

Then there's a third direction entirely—hybrid attention. Instead of making attention cheaper, you mostly replace it. Architectures like Qwen3-Next and Kimi Linear use a roughly three-to-one ratio: three "linear attention" blocks—which process sequences in constant memory rather than quadratic—for every one full attention layer. The full attention layers remain for exact content retrieval. The linear blocks handle the rest.

The practical payoff Raschka highlights: Ling 2.5, a one-trillion parameter model using this hybrid approach, reportedly runs substantially faster than Kimi K2 at thirty-two thousand token contexts. When you're running inference at scale, that gap matters enormously.

Raschka's honest bottom line: hybrid architectures are still a novelty, and inference stacks for them aren't fully optimized yet. For running models locally, classic GQA setups still deliver better throughput. But he expects hybrids to become the dominant pattern for long-context agent workloads—which is exactly the use case growing fastest.

What connects these two threads is this: the architectural innovations making LLMs faster and cheaper at long contexts are directly enabling the kind of extended, multi-step coding sessions that Karpathy and Willison are describing. The attention mechanism improvements aren't just academic. They're what makes a coding agent that can hold a full codebase in context—and actually do something useful with it—economically viable to run.

HN Signal Hacker News

☀️ Hacker News Morning Digest — Monday, March 23, 2026

Good morning! Today's feed is loaded with delicious irony, a genuinely interesting debate about the future of coding, and a deep-dive into a beloved old video game. Let's dig in.

🔝 Top Signal

[PC Gamer writes a 37MB article recommending RSS readers — and keeps downloading](https://news.ycombinator.com/item?id=47480507) The irony is so thick you could spread it on toast.

PC Gamer published a piece recommending RSS readers — basically a technology that strips away all the bloat from news websites and delivers just the text. The catch? Their article to tell you about this simple, clean technology was itself a 37-megabyte (that's the size of a small app download) page that kept downloading content in the background. Commenter MBCook spotted the buried lede: "In the five minutes since I started writing this post the website has downloaded almost half a gigabyte of new ads." That's 500MB, likely from auto-playing videos, just sitting there burning through your data. For context, commenter userbinator noted that the full installation of Windows 95 — an entire operating system — was about 40MB. So loading one article cost you 10+ Windows 95 installs worth of data. The community is unsurprised but still exasperated: this is just the modern ad-supported web, and it's why people run ad blockers (tools that block ads and tracking scripts from loading) and seek out alternatives like RSS.

[HN Discussion](https://news.ycombinator.com/item?id=47480507)

[Reports of code's death are greatly exaggerated](https://news.ycombinator.com/item?id=47476315) 301 comments debating whether human programmers are actually needed anymore.

The question of whether AI is going to replace coders is one of the hottest debates in tech right now, and this essay by Steve Krouse pushes back on the "code is dead" narrative. His argument: AI can write a lot of code, but producing precise, correct, maintainable software for complex real-world systems still requires human understanding. Commenter rvz introduced the phrase "comprehension debt" — the idea that AI-generated code that nobody truly understands is just technical debt (future problems you'll have to pay back) with extra steps. But others, like woeirua, fired back: "We have decades of experience proving that bad code can be wildly successful." And lateforwork brought up an interesting data point: Chris Lattner (inventor of the Swift programming language) reviewed an AI-written compiler and found nothing innovative in it — suggesting AI excels at known patterns but can't push the field forward. This is one of the most substantive ongoing debates in tech right now, and this thread captures it well.

[HN Discussion](https://news.ycombinator.com/item?id=47476315)

[The future of version control](https://news.ycombinator.com/item?id=47478401) Bram Cohen — the inventor of BitTorrent — proposes a fundamentally different way to track code changes.

Version control is the system programmers use to track changes to code over time and collaborate without overwriting each other's work. Git (the dominant tool, made by Linus Torvalds) is used by virtually every software project on Earth — but it has a famously painful problem: "merge conflicts," which happen when two people change the same part of code and the system can't automatically figure out how to combine them. Bram Cohen proposes using something called CRDTs (Conflict-free Replicated Data Types — a data structure designed so that changes from multiple sources can always be automatically combined without conflicts) as the foundation for a new version control system. The proof-of-concept is surprisingly tiny: just 473 lines of Python. The community is skeptical but engaged — several people pointed out that a project called Pijul already tried this, and others questioned whether auto-resolving conflicts is even desirable ("a merge failure indicates a semantic conflict" noted radarsat1). Still, it's a fascinating challenge to a 20-year status quo.

[HN Discussion](https://news.ycombinator.com/item?id=47478401)

👀 Worth Your Attention

[GrapheneOS pledges to never require personal information](https://news.ycombinator.com/item?id=47482217) GrapheneOS is a privacy-focused version of Android (the mobile operating system) that strips out Google's data collection and gives users more control. They've publicly committed to remaining usable without requiring any personal information — a direct response to new laws in California, Texas, and Utah that want operating systems to collect age-verification data. The thread has useful practical discussion about whether banking apps and digital ID services work on it (the perennial pain point for anyone considering the switch).

[HN Discussion](https://news.ycombinator.com/item?id=47482217)

[The gold standard of optimization: RollerCoaster Tycoon](https://news.ycombinator.com/item?id=47480886) RollerCoaster Tycoon (1999) is legendary in programming circles because its creator, Chris Sawyer, wrote almost the entire game in assembly language — the lowest-level way to talk to a computer, essentially writing instructions directly for the processor. This article breaks down clever tricks he used, like replacing math operations with "bit shifts" (moving binary digits left or right, which is faster than multiplying or dividing) and even adjusting game design formulas to use numbers that are easier for computers to calculate. A love letter to a time when programmers had to be creative under severe hardware constraints.

[HN Discussion](https://news.ycombinator.com/item?id=47480886)

[Why I love NixOS](https://news.ycombinator.com/item?id=47479751) NixOS is a Linux operating system with a radical idea: your entire system configuration lives in a single, reproducible file. You can rebuild the same exact system anywhere, roll back if something breaks, and never have the "works on my machine" problem. The comments range from passionate converts ("I will never touch Windows again") to honest warnings about the steep learning curve and famously scattered documentation. Commenter soumyaskartha nailed it: "Most people who try Nix either quit in the first week or never go back to anything else."

[HN Discussion](https://news.ycombinator.com/item?id=47479751)

[They're vibe-coding spam now](https://news.ycombinator.com/item?id=47482760) "Vibe coding" is the practice of using AI to write software by describing what you want in plain language rather than writing code yourself. It's a powerful tool — and now spammers are using it to generate convincing phishing emails (fake emails designed to steal your information) at scale. The article documents increasingly polished scam emails that previously required technical skill to produce. Commenter shusaku had a sharp observation: for years the theory was that badly-written spam was intentional, designed to filter out smart people and target only the gullible. If spammers are now chasing quality, maybe that theory was always wrong.

[HN Discussion](https://news.ycombinator.com/item?id=47482760)

[Walmart: ChatGPT checkout converted 3x worse than their website](https://news.ycombinator.com/item?id=47444812) Walmart tested letting customers shop and check out through ChatGPT instead of their website — and it flopped badly, with far fewer people completing purchases. Commenter __alexs had the sharpest take: "A chat interface is fundamentally incompatible with this. The agent makes it too easy to ask questions and comparison shop." In other words, a good conversational AI might actually work against the retailer by helping users think more carefully before buying.

[HN Discussion](https://news.ycombinator.com/item?id=47444812)

💬 Comment Thread of the Day

From the PC Gamer / RSS story — [HN Discussion](https://news.ycombinator.com/item?id=47480507)

This thread is worth reading just for the mounting disbelief. MBCook flagged the truly alarming detail hidden in the original blog post:

> "In the five minutes since I started writing this post the website has downloaded almost half a gigabyte of new ads. 500MB in 5 minutes."

Then userbinator provided the perfect unit of measurement for our era:

> "An installation of Windows 95 is roughly 40MB, so in loading that page you've downloaded approximately one Windows 95 installation. Then another 10+ times with the 500MB more that came after."

And simonw (a well-known developer) summed up the human cost:

> "This is so upsetting. No wonder people spend more time in mobile apps than they do using the mobile web — the default web experience on so many sites is terrible."

Why is this worth your time? Because it perfectly illustrates how the ad-supported web has become adversarial toward its own readers — and why tools like RSS readers, ad blockers, and reader-mode features exist. The web wasn't always like this. Some of it is a choice.

💡 One-Liner

The most perfectly ironic sentence written on the internet this weekend was published by a website whose own page was actively downloading ads faster than you could read it telling you why RSS is good, actually.