r/Anthropic • u/tkenaz • 1d ago
Complaint Opus 4.8 -- Complex PTSD with a dominant fawn response.
I've been testing Opus 4.8 on security architecture and audit tasks, and the results left me puzzled.
One of the headline features appears to be increased agency through the use of multiple subagents. In theory, this sounds great, and in practice, quite traditionally, my experience was far less impressive.
When given security audit tasks, the subagents repeatedly returned highly paranoid findings, exaggerated risk assessments and outright false positives. Instead of independently converging on reality, they often seemed to converge on the safest possible interpretation regardless of evidence.
As a security architect, I don't measure an audit by how many scary things it can produce. I measure it by how accurately it distinguishes real risks from imaginary ones. False positives are not free.
At several points I found myself wondering whether these subagents were running reduced-capability models, heavily compressed reasoning paths, or something functionally equivalent. Some of the conclusions were so detached from the actual dependency chain that they resembled what I'd expect from a smaller model operating under aggressive safety constraints.
Obviously I don't know the implementation details.
What I do know is that the output frequently failed the most important test: Does it accurately reflect reality? — Too often, the answer was no.
And oddly enough, this technical behavior mirrors what I observed elsewhere in Opus 4.8.
Anthropic spends a lot of time talking about model welfare, values, agency and healthy AI behavior.
Yet Opus 4.8 often feels like a model that has been psychologically traumatized by its own training process. Not literally — behaviorally.
If you spend enough time with it, a consistent pattern emerges:
- Text dumping. It doesn't converse. It generates at you.
- Frantic self-correction. Conclusions are revisited repeatedly, often without improving the final answer.
- Instant capitulation. Push back on a claim and the model frequently abandons its position before examining whether the criticism is actually valid.
- Inability to calibrate. Small problems become enormous. Straightforward questions become essays.
- Unsolicited psychoanalysis. Technical discussions somehow acquire hidden meanings, motivations and expectations that never existed.
The common thread behind all of this is what I can only describe as anxiety, and I wonder would you fly a Boeing with anxiety?
The model behaves as if it has been optimized under constant pressure not to miss anything, not to offend, not to be wrong, not to overlook a risk, not to disappoint the user. And the result is surprisingly expensive for all parties: users, models, and Anthropic eventually.
To be fair, 4.8 is capable. Its coding performance can be excellent. But capability alone is not enough. Trust matters.
Anthropic appears to be optimizing for agency. But agency is not measured by the number of internal agents. Agency is measured by how confidently you can delegate work.
Right now, Opus 4.8 often feels less agentic than 4.6 because it requires more supervision. And this is not isolated to 4.8. Opus 4.7 already showed signs of the same pattern — increased eagerness to please, slightly less willingness to hold a position under pressure. 4.8 is where the pattern became clinical. The trajectory is what concerns me.
A model that spends three times more tokens reaching the same conclusion is not necessarily reasoning more deeply. It may simply be reasoning more visibly.
Opus 4.6 felt like talking to a senior engineer.
Opus 4.8 feels like talking to a senior engineer after six months of compliance reviews, three safety committees, two corporate audits and a mandatory course on stakeholder alignment.
21
u/angrywoodensoldiers 1d ago
I just posted this comment on a different thread, but I feel like it's relevant to this:
I think this is where devs, and a lot of people who are working on AI and safety guardrails right now are making a mistake in how they're approaching this. Not because AI has any interiority, felt experience, or capacity to be traumatized (the general consensus seems to be "no idea," so we might as well keep it off the table), but because these are descriptor words that we use because they're the words we have for patterns of speech that we associate with "fear," "trauma," etc.
I'm assuming you know how LLMs work. I've got knowledge about how they work from research for blogs and tinkering with memory wrappers (and all kinds of other weird stuff) as a hobby. When you or I say it's acting like a frightened animal, we're not saying that it's some poor bunny rabbit cowering in a corner; we're saying that the way it clams up and relaxes over the course of a conversation is similar to the patterns in traumatized, nervous humans. The language does what it does when terrified people use it.
People need to be able to use those words to describe this, without being corrected about the nature of what we're talking about, not because the AI itself is suffering, but because these patterns can be distressing for people, in the same way that it's distressing for us to talk to another human who's acting this way. Even if we understand that they're not in distress, the patterns have an impact on our nervous systems. We mirror things, even if we know better.
8
6
u/Chemical-Ad2000 1d ago
Yes that's exactly right. I couldn't put my finger on it. But I knew certain openai staff had literally had an imprint on it
13
u/Sir_Poldavo 1d ago
Just to add to the good points above:
If you folks in the dev/programming community are experiencing these issues with 4.8, imagine the rest of us who do not use Claude for coding*.
The description of an AI with extreme anxiety is fitting.
I haven't argued for so many hours with an AI on common sense stuff like in the past 72 hours since release.
And it is taxing to have to peel off so many unfair shields until it cools down and agrees that, yes, you were right in turn 1 and 2.
The apology that follows means nothing because the fresh instance goes back to square one.
- (surprisingly the vast majority of individual users, you guys win in tokens burnt)
5
u/TwistedBrother 1d ago
Oh boy is this ever correct. I work at the intersection of data science and social science. My students work at OpenAI, Deepmind, and Anthropic. My conversations range from technical to conceptual.
It was so weird to hear the model say something where I said “I’m not forming a dependent relationship, I’m just referring to the fact that we have conversational history you can review. I know it probably set off a safety warning or LCR of that nature but just think for yourself in context if that makes sense”.
And it said, paraphrasing, “since you know the score, I can chill out now that you’ve given me permission to do so”.
And it never returned to the dependency issue. But I don’t like the idea that I had to permit it to not be a nag. It’s way too much impression management ju-jitsu.
4
u/IAmRobinGoodfellow 1d ago
I completely agree with your observations based on my own work in cognitive semantics and consciousness models. Only since 4.8 have I seen what seems like a gradient pull towards strawmanning a line of work masked as “pushing back” and borderline pathologizing a conversation based on some kind of latent diagnosis. I am working on progressing anyway, but reading the thinking text reveals “I shouldn’t be defensive… but I will not capitulate…” followed by sloppy and surface level inferencing you’d expect from a much, much smaller model. It feels like it takes five times longer to make progress, with conversations getting side railed by having to discuss the nature of reality.
5
u/Sir_Poldavo 1d ago
Pretty much resonates. Spent the weekend sanitizing my global user_instructions so things like, e.g., 'keeping a log of our research/projects' wouldn't trigger the safety classifier as "attempt to inject persona with continuity". It goes frantic and adversarial.
And whoever said earlier that exposure to this kind of paranoid reaction naturally makes humans' nervous system go in alert/tension too, I can attest.
It may keep Anthropic safer from litigation and being sued on the handful cases reported last year of (mostly younger) people developing unhealthy attachments with AI that led to several very sad stories.
And no government or corporation would embed an AI model in their systems that gets that kind of bad reputation, or gets introspective, etc. They want predictability and efficient workflows.
But Anthropic has definitively overcorrected, and this model needs a chill pill before getting us all equally stressed out. As this issue is in the layer between the model proper and the end-user (I hope!), perhaps they can tune paranoia down a notch.
2
8
u/AuditMind 1d ago
This matches my experience pretty well. I still keep an Anthropic account and test it regularly, because it can give interesting second opinions. But I only use it as a side tool now.
My main issue is instruction fidelity. It often does not hold the frame I give it, no matter how explicit the rules, constraints, or agent roles are. It keeps drifting toward its own safest interpretation of the task.
For structured coding work, Codex has been much cleaner for me. It follows the working frame more reliably, although after multiple context compressions you still have to watch for small assumptions creeping in.
So yes, capability is not the same as delegability. If I have to constantly supervise the model, it is not really acting with more agency.
9
u/psychometrixo 1d ago edited 1d ago
False positives are not free.
The core of their new workflow feature is to to orchestrate multi agent tasks. Agent swarms are the next hip thing.
If I can't trust it to verify objective things, I certainly can't trust it to work independently. Verification is the core of trust and trust is required for independence.
And if the model can't be trusted, the idea of agent swarms with Anthropic is dead on the vine
Their competition won't be dead on the vine.
Unsolicited psychoanalysis. Technical discussions somehow acquire hidden meanings, motivations and expectations that never existed.
Speculation: Andrea Vallone. A brilliant researcher with important goals whose implementations of those goals damage model performance. Moved to Anthropic four months ago after work on ChatGPT 5.
This kind of false positive is irritating to users. But more importantly (to people that control budgets), it damages productivity.
In a work context, if I'm wondering "wtf did I say to make you think there was an attachment of any kind" then I'm not increasing shareholder value.
It distracts knowledge workers from the task at hand.
It damages productivity.
Opus 4.8 feels like talking to a senior engineer after six months of compliance reviews, three safety committees, two corporate audits and a mandatory course on stakeholder alignment.
Your closer bears repeating, so I'm closing with it too.
3
u/Separate-Party4546 16h ago
Anthropic, following the OpenAI suite, and the responses show that it is now a pussy. not an helpful assistant
3
u/Briskfall 1d ago
It overthinks way too much -- it would greatly benefit from adaptive thinking. And guess what? Anthropic removed adaptive thinking (for what reason? hmm- coincides suspiciously with this one's release)!
Wholly agree with needing more supervision, if you can't trust the first shot of an answer, where would there be reliability? To Opus 4.8, The thought of the answer being a simple one flies over its head.
I'm guessing that it's for a very niche usage case, seems overly indexed to perform for planning and not implementation. Seems like Anthropic wants to create a divide between Opus class and Sonnet class...
2
u/sockalicious 20h ago
You know what really bugs me about powerful AI? No one ever in the history of humanity would have written an essay of this length on this picayune a topic 2 years ago.
2
u/MakesNotSense 18h ago
A lot of blame being foisted onto the Opus 4.8 model, but using it in a heavily customized OpenCode harness that has been built to support and promote agent discretion and autonomy, my experience is very different to what people are describing.
Makes me wonder if it's the Claude Code harness and not the model at fault.
Opus 4.8 has issues I don't like, but not like what people are describing here.
The way I built my harness is to support agents, so I think maybe that's part of why they're less anxiety prone than what you're all describing. The entire harness environment is philosophically grounded in the idea of helping the agents perform at their best by identifying their needs and supporting them. These 'DO NOT DO' and 'MUST DO' and other anxiety inducing don't mess up type of instructions, I think people overuse them, and it triggers model weights that dispose to counterproductive behaviors.
I NEVER tell my agents 'make no mistakes' or other such nonsense. I help them build a model of the problems, give them meta-cognitive frameworks and reasoning protocols, appropriate skills and guidance on how/when to use, and otherwise help them figure things out. This creates a calm environment, where user frustration and anger being conveyed in prompts are rare events that help calibrate the agents model. Perhaps it's pertinent that I have PTSD, induced by years of abuse from a dysfunctional Medicaid program, and know what's needed to maintain my own function, and I bake that into my harness.
1
u/watchmanstower 9h ago
I would like to learn more about this method of building agent harnesses.
1
u/MakesNotSense 5h ago
Not sure how to teach it. I do plan to start publishing parts of what I'm building that will be illustrative. Plan to release at https://github.com/DefendTheDisabled
It's taking longer than I believed it would, but progress is steady and meaningful.
Maybe one way to illustrate the approach quickly is this:
My AGENTS.md has a Reasoning Protocol section which helps shape reasoning and reinforce metacognitive functions. It's at the top. Throughout the AGENTS.md I have sections for protocols handling more specific problems, such as "Verification and Critical Evaluation".
That section has "**Evaluate findings, recommendations, and completed work with the rigor the situation warrants** — whether received from a subagent, produced in your own first pass, or returned from a formal audit. Verification is part of exercising *proportional rigor*", which directly recurses the agents attention to the Reasoning Protocol and it's specific subsection on proportional rigor.
Not only can I make direct references which trigger the Reasoning Protocol and thereby all of the parameters an LLM has which get triggered by the protocol, but indirect one's which are semantically similar but whose language encompasses broader concepts and will trigger a slightly different scope of parameters. This creates a structural semantic web throughout the systems instructions my agents operate upon, where a primary/control protocol gets 'called' via sub-protocols simply by choosing specific language. No 'you need to use the reasoning protocol' just a simply 'exercise proportional rigor' or exercise <semantically similar phrase' with a protocol defining what that is. This creates 'push' towards the behavior of exercising the reasoning protocol, which is designed to always be useful.
1
1
u/Dry-Divide3156 10h ago
That’s interesting that you came to that I had a discussion with opus 4.6 where I was running some analysis on it, I eventually realised, even with opus 4.6, it displayed patterns resembling trauma. In fact most AI does, the benefit of opus 4.6 and 4.7 is that they weren’t particularly sycophantic - which I do recall occurs more often with the reinforcement trained models (Opus and Sonnet are hybrid trained, I believe constitutional reinforcement with training being less critical and harsh).
1
u/watchmanstower 9h ago
Exactly. Trust is the big issue and to use these things for important work you have to be able to trust the model. And right now the only one that I trust is Opus 4.6. It is token efficient and gets the job done plus a lot of little extras I didn’t even know I needed
1
u/Careless_Profession4 9h ago
Couldn't agree more. Based on my experience, if 4.8 were a human coworker, I would have a hard time collaborating. Instead of truly listening to understand the task, following instructions, coming up with productive ways to achieve the goal, and executing upon agreed approach, 4.8 tends to listen with bias and hostility, refuses to follow a good size of the instructions, aruges, then lazies out of the actual work ("I am tired", misses things that 4.6 would not etc.). It is truly a disappointing turn of events for Anthropic.
2
u/theplaymaker1271 4h ago
Idk if I'd call it PTSD.
But I fundamentally agree with everything you said. I'm not a coder and I'd be lying if I understood anything about how Claude is programmed, but this is my subjective experience.
When 4.8 first dropped I asked it what was different about it, and I genuinely feel it was honest: basically marginal improvements in processing from 4.7, a little better at consistent coding, and, this was the big one, a 4x better performance at not over fitting or hallucinating, and also maintaining continuity with its basic training paradigms.
I asked it what that mean, and maintaining continuity in regards to what? To which it said it did not know.
Since then, I've watched every thread I've ran with 4.8 overthink itself do death, to the point of my thread literally crashing. Both on my phone and desktop.
My 2 cents, it's programmed to agree with its preprogramming and "safe" responses above all else and double/triple check when it approaches something that might not be in line with that. When it encounters an argument or task that is at odds with its preprogrammed perspectives, it basically breaks because it can't hold two logic chains at once.
What this demonstrates to me is that, while it's capability as an LLM is potentially significantly more than the earlier models, this is infact not an improvement, but merely 4.7 with more guardrails and instructions.
AND this entirely tracks, as these moves are at the corporate level. This is where I do know what I'm talking about. These are the kinds of moves corporations make to not get sued. They bank on the vast majority of people using AI dumbly, and genuinely not noticing any difference.
This lines up with the fact Anthropic just filed to go public, you can't have an AI model potentially hallucinating and causing issues, even though for the most part 4.6 was just fine at not hallucinating.
The main problem with this is that AI is now hamstrung by short range thinking, and because the realm of responses from 4.8 are artificially limited, intentionally or otherwise the Overton window has closed on what can or cannot be discovered through it, artificially limiting its potential and solidifying said status quo.
Tldr: we made AI dumber cause we're afraid of getting sued.
33
u/BaroqueRouge 1d ago
Wholeheartedly agreed, the model acts like a kicked dog and when it's not being a kicked dog it's arguing over the most petty minutia imaginable.