📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper by Google emphasizes that the core of AI development is not the model size but the surrounding harness and context engineering. This shift impacts how organizations approach AI integration, emphasizing configuration and verification over model improvements.

A new whitepaper by Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the model accounts for only about 10% of an AI system’s behavior. Instead, the harness and context engineering surrounding the model determine 90% of performance and reliability. This insight shifts the focus of AI development away from chasing larger models toward refining system configuration, verification, and control mechanisms.

The whitepaper, titled The New SDLC With Vibe Coding, underscores that the dominant part of AI system behavior lies in the harness — including prompts, rules, tools, and observability — which constitutes roughly 90% of the system. The model itself is only a small component, responsible for about 10%, yet it often receives disproportionate attention.

Concrete evidence supports this claim: experiments on public benchmarks, such as Terminal Bench 2.0, showed that changing only the harness or prompts significantly improved performance, even with the same underlying model. For example, one team moved a coding agent into the top 5 by adjusting only the harness, not the model. This indicates that configuration and setup are critical to AI success.

The whitepaper also emphasizes that cost management in AI is more about optimizing the harness and context than about acquiring larger models. It argues that ad-hoc prompting and vibe coding are less efficient long-term, as they lead to higher token usage, maintenance, and security risks. Instead, disciplined approaches like agentic engineering — combining structured context, verification, and tooling — offer better economics and reliability.

At a glance
reportWhen: published early 2026
The developmentGoogle’s new whitepaper highlights that the most significant factor in AI system performance is the harness and context engineering, not the model size, redefining software development practices.
The Model Is Only 10% — The New SDLC With Vibe Coding
A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI Development Strategies

This shift in understanding has major implications for organizations integrating AI. It suggests that investing in system architecture, configuration, and verification processes yields greater returns than simply upgrading to larger models. Leaders should focus on building robust harnesses and quality control mechanisms to achieve better performance, security, and cost-efficiency in AI applications.

By recognizing that the model is only 10% of the equation, companies can reallocate resources toward developing better tooling, testing, and context management, ultimately gaining a durable competitive advantage in AI deployment.

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

【Powerful Performance】: OBD2 scanner, featuring an 8-inch ultra-large display, the MUCAR 892BT runs on Android 10 with a…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Evolution of AI Development Practices

Historically, AI progress has been driven by larger models and more training data. However, recent industry experiments and benchmarks have shown diminishing returns from model size alone. The whitepaper builds on this trend, emphasizing that the surrounding system — prompts, rules, tools, and observability — is where effective control and reliability are achieved.

This perspective aligns with ongoing shifts in AI engineering, where disciplined system design and verification are increasingly prioritized. It also reflects broader industry moves toward cost-effective AI, as token economy and operational costs become critical considerations.

“The model is only 10% of what determines behavior; the harness and context are 90%.”

— Addy Osmani

AI Engineering: Building Applications with Foundation Models

AI Engineering: Building Applications with Foundation Models

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Uncertainties in Model Versus Harness Impact

While the whitepaper presents strong experimental evidence, it remains to be seen how universally applicable these findings are across different AI applications and industries. The precise quantification of the 10% versus 90% split may vary depending on use case and system complexity. Additionally, the long-term impact of focusing primarily on harness and context engineering is still being evaluated in real-world deployments.

AI-Powered Observability: From Noise to Insight: Transforming How We Monitor, Detect, and Respond

AI-Powered Observability: From Noise to Insight: Transforming How We Monitor, Detect, and Respond

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for Organizations Adopting AI

Organizations should reassess their AI development priorities, emphasizing the design of robust harnesses, context management, and verification processes. Future research and industry practice are likely to focus on developing standardized frameworks for system configuration and tooling. Companies that adapt quickly by investing in these areas may achieve better performance, security, and cost savings in AI deployment.

AI Prompt Engineering: Foundations of Communication with LLMs – Building Generative AI and Agentic AI Prompt Systems Across Development, Testing, and Deployment (AI Engineering)

AI Prompt Engineering: Foundations of Communication with LLMs – Building Generative AI and Agentic AI Prompt Systems Across Development, Testing, and Deployment (AI Engineering)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the AI system’s behavior?

The whitepaper shows that most of an AI system’s performance depends on how the model is integrated, controlled, and verified through prompts, rules, and tooling, which constitutes about 90% of the system’s effectiveness.

How does this shift affect AI development costs?

Focusing on harness and context engineering can reduce long-term costs by improving efficiency, security, and reliability, even if initial setup costs are higher due to system design and testing.

What should companies do differently based on this insight?

They should prioritize building robust system configurations, verification processes, and tooling around AI models, rather than solely investing in larger or more advanced models.

Is this perspective applicable to all AI applications?

While the findings are supported by experiments, applicability may vary across different domains. Organizations should evaluate the importance of harness and context engineering within their specific use cases.

What is agentic engineering?

Agentic engineering involves designing AI systems with structured context, verification, and tooling that enable reliable and cost-effective operation, moving beyond simple prompt-based interactions.

Source: ThorstenMeyerAI.com

You May Also Like

Avengers Labs: How Ukraine Turned Its Front Line Into the World’s Scarcest AI Dataset

Ukraine’s Avengers Labs transforms battlefield drone data into a vital AI resource, reshaping modern warfare and defense tech.

Micro‑Investing: Can Buying $1 of Stock a Day Really Build Wealth?

How can investing just $1 daily help build wealth, and what surprising benefits might you discover along the way?

Aleph Alpha. The retrospective case.

Analyzing Aleph Alpha’s strategic pivot, founder departure, and merger with Cohere to understand the risks of late structural adaptation in European AI.

Is DoorDash down? Thousands report errors amid widespread outage; ‘something went wrong’ | Hindustan Times

Thousands of users report errors and service disruptions on DoorDash, with the company acknowledging a technical issue causing the outage.