Sonnet 4.5 Review: The first spec-driven model has arrived


Sonnet 4.5 is markedly different from its predecessor in both behavior and tone. The most pronounced changes are also the most invisible; namely S4.5’s new approach to multi-context window management and its intelligent token management.

While these two capabilities work together to facilitate more manageable long-running tasks, the model also hosts a noticeably (one might even say dramatically) larger capacity for parallel executions.

Our internal testing showed a 20-25% speed advantage compared to Sonnet 4; time will tell whether this was a result of more generalization or just a faster model in general; it may well be a mix of both. Either way, while some core tool calls like file search behave somewhat semi-sequentially on other models, on Sonnet 4.5 they are, at times, virtually simultaneous to the naked eye.

In isolation, each of these facets demonstrate unique value-adds in terms of speed, cost and quality. When used together, these behaviors combine into an agent that feels like less of a powerful collaboration tool and more of an industrial machine, a chain of processes with enough momentum that you hesitate to halt execution before the machine comes to a natural stop. 

This sense of quiet confidence is also largely due to Sonnet 4.5’s new tone, or at times lack thereof. Reasoning blocks are shorter and explanatory outputs rarer, at least by default. The result is a less conversational model that feels better suited for a more granular understanding of the task at hand. 

Enter Spec-Driven Development

Robust agent steering solutions are becoming increasingly necessary for both the growing adoption of agentic coding in critical applications as well as the evolving capabilities of agents themselves, particularly when it comes to long-running tasks.

Without a strong scaffold, long-running agent executions with even the best models devolve into an exquisite corpse of seemingly functional but ultimately scattered implementation informed primarily by the summaries of their most recent outputs, rather than by a coherent vision. The old adage that “if you don’t make plans; life will make plans for you” holds true with time and tokens alike.

Specify

Highly steerable + long-running execution requires robust directions up-front. This is especially important where ambiguity and room for interpretation are greater.

A specification in the context of Sonnet 4.5 isn’t just a requirements document—it’s a living contract between you and the model that establishes shared understanding. Start with the user experience: Who will use this? What problem does it solve? What does success look like? These high-level goals become anchors that prevent the agent from drifting during long execution runs.

The specification should include concrete acceptance criteria. Rather than “implement user authentication,” specify: “Users can register with email/password, receive a verification email within 60 seconds, log in with rate limiting (5 attempts per hour), and reset forgotten passwords via email token valid for 24 hours.” This level of detail channels Sonnet 4.5’s execution capacity toward precise outcomes rather than generic patterns.

Critically, include what you don’t want. Sonnet 4.5’s tendency to parallelize aggressively can lead to over-engineering if not properly bounded. Explicitly state: “Do not add OAuth providers,” “Keep the UI minimal with no animations,” or “Use only the existing database schema.” These negative constraints are as valuable as positive requirements.

Plan

By now, most Claude Code users will be familiar with the “plan with Opus; implement with Sonnet” motion; the fact that this is now a pre-defined option in the CLI reinforces the importance of starting an agentic workflow with a well-defined scope of work. At scale (both in terms of the current scope of work as well as its place within an existing platform), a well-articulated implementation plan has to consider the environment as a whole. 

Sonnet 4.5’s parallel execution gathers a slew of context from available sources; its ability to think non-linearly was especially pronounced on internal testing that used our multi-repo search tool to skim files from dozens of other repos in our org while examining the current repo as well, simultaneously mapping the application’s internal architecture and its interdependencies within the larger platform. The result is an implementation plan that accounts for obstacles, contradictions or areas of ambiguity before landing on an approach/course of action.

Task

Here’s where Sonnet 4.5’s new state-tracking behavior shines. It’s not only a backward-looking account of work done, but also a detailed, forward-looking scope of discrete units of work still to do. A good way to think about tasks is to ask the agent to think in terms of tests; this helps break tasks down into their most atomic functionalities.

As long-running tasks run, you may find that Sonnet 4.5 updates the task list in the face of refactors, or complications; we recommend ensuring all state tracking files are committed at the very least at the conclusion of a task completion; by strongly encouraging your agent to commit progress, updating state trackers creates a breadcrumb trail redundancy that can help reconcile any divergences from the original spec down the line.

Implement

On implementation, Sonnet 4.5 is more likely to seek input or clarification, e.g. when given a spec. On one hand, this behavior may be undesirable for those looking for a purely autonomous execution; if so, this default behavior can be adapted with a firmly worded prompt. However, we found the model’s innate aversion to ambiguity to be doubly advantageous; beyond helping the agent stay on track, we as developers get to answer questions we hadn’t considered. In Zencoder, these questions are often presented in interactive modals using our platform’s native “gather requirements” tool which helps structure the way the agent is steered.

Conclusion

Looking ahead, Sonnet 4.5’s improved token tracking and context management bring start to finish remote agentic feature development one step closer to consistent performance. At the same time, the model’s capacity for parallel tool calling lends itself to some exciting possibilities around time-sensitive situations like critical errors that require extensive context gathering through MCP connections. 

Of course, there’s no need to wait for tomorrow when you can start using Sonnet 4.5 in Zencoder today; you can even bring your Claude Code subscription for the best of both worlds. Wherever these new capabilities lead; it’s refreshing to see a model focus on novel behavior that feels truly aligned with the evolving nature of AI-native engineering.

About the author
Leon Malisov

Leon Malisov

Developer Advocate @ Zencoder

View all articles