📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper by Google emphasizes that the core of AI development is not the model size but the surrounding harness and context engineering. This shift impacts how organizations approach AI integration, emphasizing configuration and verification over model improvements.

A new whitepaper by Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the model accounts for only about 10% of an AI system’s behavior. Instead, the harness and context engineering surrounding the model determine 90% of performance and reliability. This insight shifts the focus of AI development away from chasing larger models toward refining system configuration, verification, and control mechanisms.

The whitepaper, titled The New SDLC With Vibe Coding, underscores that the dominant part of AI system behavior lies in the harness — including prompts, rules, tools, and observability — which constitutes roughly 90% of the system. The model itself is only a small component, responsible for about 10%, yet it often receives disproportionate attention.

Concrete evidence supports this claim: experiments on public benchmarks, such as Terminal Bench 2.0, showed that changing only the harness or prompts significantly improved performance, even with the same underlying model. For example, one team moved a coding agent into the top 5 by adjusting only the harness, not the model. This indicates that configuration and setup are critical to AI success.

The whitepaper also emphasizes that cost management in AI is more about optimizing the harness and context than about acquiring larger models. It argues that ad-hoc prompting and vibe coding are less efficient long-term, as they lead to higher token usage, maintenance, and security risks. Instead, disciplined approaches like agentic engineering — combining structured context, verification, and tooling — offer better economics and reliability.

At a glance

reportWhen: published early 2026

The developmentGoogle’s new whitepaper highlights that the most significant factor in AI system performance is the harness and context engineering, not the model size, redefining software development practices.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

Table of Contents

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding has major implications for organizations integrating AI. It suggests that investing in system architecture, configuration, and verification processes yields greater returns than simply upgrading to larger models. Leaders should focus on building robust harnesses and quality control mechanisms to achieve better performance, security, and cost-efficiency in AI applications.

By recognizing that the model is only 10% of the equation, companies can reallocate resources toward developing better tooling, testing, and context management, ultimately gaining a durable competitive advantage in AI deployment.

MUCAR 892BT AI Bi-Directional OBD2 Scanner, Full System OBD2 Scanner Diagnostic Tool,35 Services,ECU Coding,FCA Autoauth,CANFD&DOIP,for Car Owners, DIYer, Technicians, Inspectors, Trainees and Others

Powerful Performance for busy days in the shop or your home garage: an 8-inch ultra-large display paired with…

As an affiliate, we earn on qualifying purchases.

Evolution of AI Development Practices

Historically, AI progress has been driven by larger models and more training data. However, recent industry experiments and benchmarks have shown diminishing returns from model size alone. The whitepaper builds on this trend, emphasizing that the surrounding system — prompts, rules, tools, and observability — is where effective control and reliability are achieved.

This perspective aligns with ongoing shifts in AI engineering, where disciplined system design and verification are increasingly prioritized. It also reflects broader industry moves toward cost-effective AI, as token economy and operational costs become critical considerations.

“The model is only 10% of what determines behavior; the harness and context are 90%.”
— Addy Osmani

AI Prompt Engineering: Foundations of Communication with LLMs – Building Generative AI and Agentic AI Prompt Systems Across Development, Testing, and Deployment (AI Engineering)

As an affiliate, we earn on qualifying purchases.

Uncertainties in Model Versus Harness Impact

While the whitepaper presents strong experimental evidence, it remains to be seen how universally applicable these findings are across different AI applications and industries. The precise quantification of the 10% versus 90% split may vary depending on use case and system complexity. Additionally, the long-term impact of focusing primarily on harness and context engineering is still being evaluated in real-world deployments.

AI-Powered Observability: From Noise to Insight: Transforming How We Monitor, Detect, and Respond

As an affiliate, we earn on qualifying purchases.

Next Steps for Organizations Adopting AI

Organizations should reassess their AI development priorities, emphasizing the design of robust harnesses, context management, and verification processes. Future research and industry practice are likely to focus on developing standardized frameworks for system configuration and tooling. Companies that adapt quickly by investing in these areas may achieve better performance, security, and cost savings in AI deployment.

Smart Drafting Systems: A Comprehensive Guide: How AI Templates, Structured Prompts, and Workflow Automation Transform Legal Documents

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the AI system’s behavior?

The whitepaper shows that most of an AI system’s performance depends on how the model is integrated, controlled, and verified through prompts, rules, and tooling, which constitutes about 90% of the system’s effectiveness.

How does this shift affect AI development costs?

Focusing on harness and context engineering can reduce long-term costs by improving efficiency, security, and reliability, even if initial setup costs are higher due to system design and testing.

What should companies do differently based on this insight?

They should prioritize building robust system configurations, verification processes, and tooling around AI models, rather than solely investing in larger or more advanced models.

Is this perspective applicable to all AI applications?

While the findings are supported by experiments, applicability may vary across different domains. Organizations should evaluate the importance of harness and context engineering within their specific use cases.

What is agentic engineering?

Agentic engineering involves designing AI systems with structured context, verification, and tooling that enable reliable and cost-effective operation, moving beyond simple prompt-based interactions.

Source: ThorstenMeyerAI.com

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Europe Regulated the Interface and Forgot to Build the Engine

Author

Feature Buddies Team

Share article

The model is only 10%

Implications for AI Development Strategies

MUCAR 892BT AI Bi-Directional OBD2 Scanner, Full System OBD2 Scanner Diagnostic Tool,35 Services,ECU Coding,FCA Autoauth,CANFD&DOIP,for Car Owners, DIYer, Technicians, Inspectors, Trainees and Others

Evolution of AI Development Practices

AI Prompt Engineering: Foundations of Communication with LLMs – Building Generative AI and Agentic AI Prompt Systems Across Development, Testing, and Deployment (AI Engineering)

Uncertainties in Model Versus Harness Impact

AI-Powered Observability: From Noise to Insight: Transforming How We Monitor, Detect, and Respond

Next Steps for Organizations Adopting AI

Smart Drafting Systems: A Comprehensive Guide: How AI Templates, Structured Prompts, and Workflow Automation Transform Legal Documents

Key Questions

Why is the model only 10% of the AI system’s behavior?

How does this shift affect AI development costs?

What should companies do differently based on this insight?

Is this perspective applicable to all AI applications?

What is agentic engineering?

Safe Ratings Explained: Fireproof vs Waterproof vs Burglary-Resistant

Different Game, or Already Lost? Reading Mistral’s Sovereignty Bet

The calendar technicality. Why Elon Musk’s lawsuit against Sam Altman and OpenAI lost on timing, not on substance.

The Agent Trap: Why 90% of AI “Launches” Are Infrastructure Liars

[In Photos] Air Mobility Expo Opens In Shanghai – 一财全球Yicai Global

Signal: Memory Is The Quieter Chokepoint — And Seoul Just Said So Out Loud

Inside a Live AI-Run Company That Wins Deals, Loses Money — and Shows Its Work in Public

Bare C++ Signal Monitoring: The Future Of Tech Operations

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Author

Feature Buddies Team

Share article

The model is only 10%

Implications for AI Development Strategies

MUCAR 892BT AI Bi-Directional OBD2 Scanner, Full System OBD2 Scanner Diagnostic Tool,35 Services,ECU Coding,FCA Autoauth,CANFD&DOIP,for Car Owners, DIYer, Technicians, Inspectors, Trainees and Others

Evolution of AI Development Practices

AI Prompt Engineering: Foundations of Communication with LLMs – Building Generative AI and Agentic AI Prompt Systems Across Development, Testing, and Deployment (AI Engineering)

Uncertainties in Model Versus Harness Impact

AI-Powered Observability: From Noise to Insight: Transforming How We Monitor, Detect, and Respond

Next Steps for Organizations Adopting AI

Smart Drafting Systems: A Comprehensive Guide: How AI Templates, Structured Prompts, and Workflow Automation Transform Legal Documents

Key Questions

Why is the model only 10% of the AI system’s behavior?

How does this shift affect AI development costs?

What should companies do differently based on this insight?

Is this perspective applicable to all AI applications?

What is agentic engineering?

You May Also Like