Claude Opus 4.8 is Here.
Anthropic shipped Opus 4.8 forty-two days after 4.7 dropped on April 17, the fastest turnaround in Claude's release history. Same context window, output length, knowledge cutoff, and price: $5 per million input, $25 per million output. Anthropic re-tuned the 4.7 base rather than training something new.
Scores edged up across the board. The one category Anthropic still can't take from GPT-5.5 is Terminal-Bench 2.1, the agentic benchmark that drops a model into a sandboxed terminal and makes it find files, run commands, read errors, and debug across multiple steps. It's the closest proxy for raw agent-dev capability.
It stopped hiding its own bugs
Claude writes a feature, says it's done, you run it, something else breaks. You go back, it apologizes, fixes it with the same certainty, you run it again, new error. The confidence never matched what it knew.
Anthropic's figure: 4.8 lets flaws in its own code slip past about 4x less often than the previous generation. On cutting corners, Anthropic's system card puts 4.8 at a 0% bad rate, the only model there.
I had a dashboard 4.7 built that lagged ten-plus seconds per click. Asked why, 4.7 said that's how realtime queries perform and it couldn't be fixed. 4.8 went through the same code line by line and pulled out the real bottlenecks.
More precise, less proactive
4.8 does what you point it at, closer to GPT-5.5's behavior. Error rate and hallucination rate both drop. The cost is initiative: tell it to do A and it does A, nothing else.
I hit this the first night. I forgot to tell it to check production data instead of reading local code. On 4.6 and 4.7, the model would reach for my skill, connect to the production server, and pull live data without prompting. 4.8 didn't, twice, and handed me fixes built on local assumptions.
With tight prompts and your own harness, this feels great. The model executes the spec and stops guessing. The vibe coding crowd loses more than it gains. Our video team tested 4.8 on their skill-driven motion-effects workflow tonight and the output got worse: more confident, fewer check-ins, optimization plans executed with no confirmation step. You now have to specify what the model used to infer.
Creative writing still trails 4.6
Same skill, same brief, better than 4.7, behind 4.6. I ran my piece on the six talent traits of the AI era through 4.8 with my writing skill. It dodged the banned "not X but Y" construction by rewriting it as "no longer X", which evades the rule without solving it. It compared a reliable person to lubricant in a high-speed machine, and another to an objectified anchor, with long stretches of empty parallelism. A 1000-word Wandering Earth 2 continuation came back stiff and stereotyped.
The web app keeps two generations, so 4.8 bumped 4.6 off. The content prompts and skills I tuned on 4.6 go back for a rewrite.
Other changes
Effort control is open on every plan now, free included. It sits next to the model picker, Low through Max, with an adaptive-thinking toggle underneath. Run both together. 4.7 only had adaptive thinking, so this is a return.
Fast mode got cheaper. 4.7's fast ran 2.5x speed at 6x the price: $30 input, $150 output against the standard $5 and $25. 4.8 keeps 2.5x speed and drops to a third of the old fast tier, $10 input and $50 output, two times standard instead of six. Trigger it with /fast in Claude Code.
Dynamic workflows landed in Claude Code. Claude writes its own orchestration script, spins up dozens to hundreds of subagents in parallel for one task, verifies the result, then hands it back. Anthropic targets it at bug hunts across a whole service, migrations touching hundreds of files, and plans needing multi-angle stress testing in complex legacy codebases. Two triggers: tell Claude Code to create a dynamic workflow directly, or set effort to Ultracode, which pushes effort to xhigh and lets Claude decide when to spin one up.
Anthropic also flagged a model a tier above Opus, code-named Mythos, opening to all customers in a few weeks.