All Insights
February 2026

Sunday Coffee & Code: Claude Code - Dual model repo reviews (Sonnet 4.5 vs Opus 4.6)

As there has been lots of posting around the Anthropic Opus 4.6 model I decided to get around to a full analysis of ๐—บ๐˜† ๐—ฅ๐—™๐—ฃ ๐—ฅ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐—ฑ๐—ฒ๐—ฟ to review the codebase and identify gaps in a few areas.

By Steve Harris

As there has been lots of posting around the Anthropic Opus 4.6 model I decided to get around to a full analysis of ๐—บ๐˜† ๐—ฅ๐—™๐—ฃ ๐—ฅ๐—ฒ๐˜€๐—ฝ๐—ผ๐—ป๐—ฑ๐—ฒ๐—ฟ to review the codebase and identify gaps in a few areas: โ€ข Application architecture โ€ข Code quality and maintainability โ€ข Security โ€ข Reliability / Operations I ran two independent Claude Code reviews of the same GitHub repository using Sonnet 4.5 vs Opus 4.6 using the same โ€œreport-onlyโ€ prompt - and then compared the findings. โ€ข One thing jumped out immediately - Opus got the architecture wrong, sub-agents with other sub-agents when they are all called by the Orchestrator. Sonnet was almost spot-on. โ€ข Both missed the Docling MCP use โ€ข Claude โ€˜Usageโ€™ consumption was quite different - screenshots at the bottom. ๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜: 8 mins - 45%. ๐—ข๐—ฝ๐˜‚๐˜€ 6 mins - 28% They converged strongly on ๐˜๐—ต๐—ฒ ๐˜€๐—ฎ๐—บ๐—ฒ ๐—ฐ๐—ผ๐—ฟ๐—ฒ ๐—ฟ๐—ถ๐˜€๐—ธ๐˜€ (no real surprises given the phase of development for this app - ๐˜ ๐˜ฌ๐˜ฏ๐˜ฆ๐˜ธ ๐˜ข๐˜ฃ๐˜ฐ๐˜ถ๐˜ต ๐˜ฎ๐˜ฐ๐˜ด๐˜ต ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ๐˜ด๐˜ฆ): โ€ข Several agent services were reachable on the network with no authentication between internal components. โ€ข Some endpoints accepted arbitrary file paths, creating a clear path traversal / local file read risk. โ€ข There were unauthenticated destructive operations in the RAG layer (e.g., reset/clean-up). โ€ข Basic hygiene: .env protection / risk of accidental secret exposure. Where they ๐—ฑ๐—ถ๐—ณ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ฑ they provided useful nuance (some of these were ๐˜ณ๐˜ฆ๐˜ข๐˜ญ๐˜ญ๐˜บ ๐˜จ๐˜ฐ๐˜ฐ๐˜ฅ ๐˜ช๐˜ฏ๐˜ด๐˜ช๐˜จ๐˜ฉ๐˜ต๐˜ด): โ€ข Sonnet put more weight on โ€œedgeโ€ and deployment hardening (HTTPS and Apache config), and surfaced a few extra reliability/perf details (upload memory DoS pattern, RAG naming collisions, HTTP error-handling hygiene). โ€ข Opus went deeper on operational resilience (job lifecycle / zombie jobs - ๐˜ต๐˜ฉ๐˜ช๐˜ด ๐˜ธ๐˜ข๐˜ด ๐˜ข ๐˜จ๐˜ฐ๐˜ฐ๐˜ฅ ๐˜ฐ๐˜ฏ๐˜ฆ) and highlighted architectural throughput issues in the agent layer (blocking patterns that can stall a server under load). The takeaway for me was no real surprise, model diversity is useful - not because one is โ€œrightโ€ and the other is โ€œwrongโ€ - but because the overlap confirms what I should fix first, and the differences would help me widen coverage before going to prd. Itโ€™s a simple way to reduce blind spots (and ๐—ข๐—ฝ๐˜‚๐˜€ ๐—ฑ๐—ผ๐—ฒ๐˜€๐—ปโ€™๐˜ ๐—ฎ๐—ฝ๐—ฝ๐—ฒ๐—ฎ๐—ฟ ๐˜๐—ผ ๐—ฏ๐—ฒ ๐—ฎ ๐˜€๐—ถ๐—บ๐—ฝ๐—น๐—ฒ ๐—ฑ๐—ฟ๐—ผ๐—ฝ-๐—ถ๐—ป ๐˜๐—ผ ๐—ฟ๐—ฒ๐—ฝ๐—น๐—ฎ๐—ฐ๐—ฒ ๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜ that does everything Sonnet can do and more).

Want to Discuss This Topic?

Steve is always happy to have a direct conversation.