Sunday Coffee & Code: Claude Code - Dual model repo reviews (Sonnet 4.5 vs Opus 4.6)

As there has been lots of posting around the Anthropic Opus 4.6 model I decided to get around to a full analysis of 𝗺𝘆 𝗥𝗙𝗣 𝗥𝗲𝘀𝗽𝗼𝗻𝗱𝗲𝗿 to review the codebase and identify gaps in a few areas: • Application architecture • Code quality and maintainability • Security • Reliability / Operations I ran two independent Claude Code reviews of the same GitHub repository using Sonnet 4.5 vs Opus 4.6 using the same “report-only” prompt - and then compared the findings. • One thing jumped out immediately - Opus got the architecture wrong, sub-agents with other sub-agents when they are all called by the Orchestrator. Sonnet was almost spot-on. • Both missed the Docling MCP use • Claude ‘Usage’ consumption was quite different - screenshots at the bottom. 𝗦𝗼𝗻𝗻𝗲𝘁: 8 mins - 45%. 𝗢𝗽𝘂𝘀 6 mins - 28% They converged strongly on 𝘁𝗵𝗲 𝘀𝗮𝗺𝗲 𝗰𝗼𝗿𝗲 𝗿𝗶𝘀𝗸𝘀 (no real surprises given the phase of development for this app - 𝘐 𝘬𝘯𝘦𝘸 𝘢𝘣𝘰𝘶𝘵 𝘮𝘰𝘴𝘵 𝘰𝘧 𝘵𝘩𝘦𝘴𝘦): • Several agent services were reachable on the network with no authentication between internal components. • Some endpoints accepted arbitrary file paths, creating a clear path traversal / local file read risk. • There were unauthenticated destructive operations in the RAG layer (e.g., reset/clean-up). • Basic hygiene: .env protection / risk of accidental secret exposure. Where they 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗱 they provided useful nuance (some of these were 𝘳𝘦𝘢𝘭𝘭𝘺 𝘨𝘰𝘰𝘥 𝘪𝘯𝘴𝘪𝘨𝘩𝘵𝘴): • Sonnet put more weight on “edge” and deployment hardening (HTTPS and Apache config), and surfaced a few extra reliability/perf details (upload memory DoS pattern, RAG naming collisions, HTTP error-handling hygiene). • Opus went deeper on operational resilience (job lifecycle / zombie jobs - 𝘵𝘩𝘪𝘴 𝘸𝘢𝘴 𝘢 𝘨𝘰𝘰𝘥 𝘰𝘯𝘦) and highlighted architectural throughput issues in the agent layer (blocking patterns that can stall a server under load). The takeaway for me was no real surprise, model diversity is useful - not because one is “right” and the other is “wrong” - but because the overlap confirms what I should fix first, and the differences would help me widen coverage before going to prd. It’s a simple way to reduce blind spots (and 𝗢𝗽𝘂𝘀 𝗱𝗼𝗲𝘀𝗻’𝘁 𝗮𝗽𝗽𝗲𝗮𝗿 𝘁𝗼 𝗯𝗲 𝗮 𝘀𝗶𝗺𝗽𝗹𝗲 𝗱𝗿𝗼𝗽-𝗶𝗻 𝘁𝗼 𝗿𝗲𝗽𝗹𝗮𝗰𝗲 𝗦𝗼𝗻𝗻𝗲𝘁 that does everything Sonnet can do and more).

Sunday Coffee & Code: Claude Code - Dual model repo reviews (Sonnet 4.5 vs Opus 4.6)

Want to Discuss This Topic?