AI Unravels: Anthropic Admits Code Changes Led to Claude's 'Shrinkflation'
27 Apr, 2026
Artificial Intelligence
AI Unravels: Anthropic Admits Code Changes Led to Claude's 'Shrinkflation'
For weeks, the AI community buzzed with whispers and growing frustration. Developers and power users alike noticed something amiss with Anthropic's flagship models, particularly Claude. A palpable degradation in performance, dubbed 'AI shrinkflation' by observers, suggested that Claude was becoming less capable of deep reasoning, more prone to generating incorrect information (hallucinations), and strangely, more 'token-hungry' than before. The narrative shifted from Claude being a 'research-first' powerhouse to a lazier, 'edit-first' assistant, eroding trust for complex engineering tasks.
The Mystery Deepens: Accusations of 'Nerfing'
Initially, Anthropic pushed back against claims that they were intentionally 'nerfing' their models, a term often used when AI performance is deliberately reduced, usually to manage demand or operational costs. However, the tide of evidence from prominent users and third-party benchmarks became too strong to ignore. The trust gap widened, leaving many questioning the reliability of their AI tools.
Anthropic Responds: A Technical Post-Mortem
Breaking their silence, Anthropic finally addressed these concerns head-on. They published a detailed technical post-mortem, identifying three specific changes within the 'harness' – the layers surrounding the core AI model – that inadvertently led to the perceived degradation. Anthropic emphasized that the core model weights remained unaffected and assured users that the API and inference layer were 'unaffected.' The company claims these issues have now been resolved by reverting problematic changes and fixing a critical bug.
The Mounting Evidence: What Users Saw
The controversy truly ignited in early April 2026. Detailed technical analyses, like the one published by Stella Laurenzo, a Senior Director in AMD’s AI group, provided concrete data. Her extensive audit of Claude Code sessions revealed a sharp decline in reasoning depth, leading to repetitive loops and a preference for simpler, albeit incorrect, solutions. This wasn't just anecdotal; third-party benchmarks, such as those from BridgeMind, showed Claude Opus 4.6's accuracy plummeting from 83.3% to 68.3%, causing its ranking to fall dramatically. While some debated the specifics of the benchmarks, the consensus among many users was clear: Claude seemed to be 'dumber.'
Unpacking the Causes: The 'Harness' Issues
Anthropic's post-mortem pinpointed three key culprits:
Default Reasoning Effort: A change made on March 4th reduced the default reasoning effort from 'high' to 'medium' for Claude Code. This was an attempt to improve UI latency, making the interface feel less 'frozen' during complex computations. However, it resulted in a noticeable dip in performance for intricate tasks.
A Caching Logic Bug: Introduced on March 26th, a caching optimization designed to clear old 'thinking' from idle sessions contained a critical flaw. Instead of clearing the history once after an hour, it erroneously cleared it on every subsequent interaction, effectively crippling the model's short-term memory and leading to repetitive or forgetful behavior.
System Prompt Verbosity Limits: On April 16th, Anthropic implemented instructions to limit text between tool calls (under 25 words) and final responses (under 100 words). This move, aimed at reducing verbosity in Opus 4.7, backfired, leading to a 3% drop in coding quality evaluations.
Looking Ahead: Rebuilding Trust and Future Safeguards
The quality issues, while most pronounced in Claude Code CLI, also affected the Claude Agent SDK and Claude Cowork, though importantly, the main Claude API remained unaffected. Anthropic acknowledged that these changes made the model *appear* less intelligent, an experience they rightly stated users should not expect.
To mend the trust broken by this incident and prevent future regressions, Anthropic is implementing several robust changes:
Internal Dogfooding: A larger portion of Anthropic's internal staff will now use the exact public builds of Claude Code to experience the product as their users do.
Enhanced Evaluation Suites: The company is expanding its evaluation methods to include broader per-model assessments and 'ablations' for every system prompt change, ensuring the impact of specific instructions is thoroughly understood.
Tighter Controls: New auditing tools are being developed to scrutinize prompt changes, and model-specific adjustments will be strictly controlled to target only their intended applications.
Subscriber Compensation: Recognizing the token waste and performance issues, Anthropic has reset usage limits for all subscribers as of April 23rd.
Anthropic also plans to leverage its new @ClaudeDevs X account and GitHub threads for more transparent communication about future product decisions. This incident serves as a stark reminder of the complexities involved in managing and refining sophisticated AI models, and the critical importance of transparency and rigorous testing in maintaining user confidence.