Not Impressed with Opus 4.8
I’ve used it all weekend to help debug a new, unreleased product and I am not impressed. Anthropic’s Opus 4.8 may be marginally better than Opus 4.6 (I found Opus 4.7 to be awful and did not use it more than 2-3 hours and have been on Opus 4.6 since February), but improvements are barely visible. 4.8 does have a 1 million token context window, but code quality, ability to solve problems feels just like 4.6.
Also, we need to stop calling Anthropic and OpenAI model builders. These are no longer large language models (LLMs), but are harnesses, which is to say they are models wrapped in tools and skills. Where we have built a vertically-focused harness with Kilby, Anthropic and OpenAI are building general purpose harnesses.
I will be surprised if forthcoming Mythos is a step-function better. We may have to wait until 2027 to see material improvement. The other characteristic that Anthropic baked into 4.8 and Mythos is a penchant for the harness to say “let me be straight with you, let me give you the facts, let me blah, blah, blah..” It seems every response is qualified with blather.



