From Build to Buy: What Changed in Enterprise AI Procurement

Jan 12

A year ago, conversations with enterprise technology leaders followed a predictable pattern. They had a budget for AI initiatives. They had internal engineering resources. They had proprietary data they believed would be their moat. The question was not whether to pursue AI but how to build it themselves.

The confidence was understandable. Large organisations had watched smaller competitors deploy AI capabilities and concluded that with sufficient resources, they could do the same—better, because they understood their own domain, their own data, their own workflows. Some invested heavily. Some built dedicated AI teams. Some partnered with consultancies to develop bespoke solutions.

A year later, those same conversations sound different. The confidence has softened. The timelines have slipped. The internal teams that were supposed to deliver production systems are still iterating on pilots. And increasingly, the question has shifted from "how do we build this?" to "who can help us deploy this before next quarter?"

Industry surveys suggest the shift is significant—some indicate that three-quarters of enterprises now favour buying AI capabilities rather than building them internally, compared to roughly half a year ago. The precise numbers matter less than the pattern they reflect: something changed in how organisations think about AI development, and it changed faster than most predicted.

What Building Actually Requires

The shift is not difficult to explain once you understand what building production AI systems actually entails.

The models themselves have become commoditised. Access to capable foundation models is no longer a differentiator; it is table stakes. Any organisation with a budget can access GPT-4, Claude, Gemini, or open-source alternatives. The technology that seemed magical two years ago is now available through an API call.

What has not become commoditised is everything around the model: the scaffolding that transforms a capable model into a reliable system. This scaffolding includes context management—ensuring the system has access to relevant information without overwhelming the context window or introducing noise. It includes memory architecture—maintaining state across interactions, sessions, and time periods in ways that enable continuity rather than constant reset. It includes evaluation frameworks—measuring not just whether the system produces plausible outputs but whether those outputs are accurate, appropriate, compliant, and useful.

It includes edge case handling, which in enterprise environments means handling regulatory constraints, exception workflows, escalation paths, and the twenty percent of situations that do not fit standard patterns. It includes observability—understanding what the system is doing, why it made particular decisions, and how to debug failures when they occur. It includes cost management—ensuring that token consumption and compute costs remain sustainable as usage scales. It includes governance—audit trails, access controls, explainability for regulators and internal stakeholders.

Most internal teams underestimated this scaffolding by six to twelve months, sometimes more. They scoped projects based on the model integration work—the part that is genuinely straightforward—and discovered only later that the surrounding infrastructure would consume the majority of effort and time.

The Twenty Percent Problem

One pattern appears consistently across engagements in different sectors: the gap between what automation handles cleanly and what requires ongoing human intervention.

In legal process automation, teams report that their existing RPA and AI systems handle roughly eighty percent of cases effectively. The remaining twenty percent—edge cases, ambiguous situations, exceptions that require judgement—still require manual workflow modifications. The automation ceiling is real, and reaching it does not take long. Breaking through it requires architectural decisions that most initial implementations did not anticipate.

In financial services, conversational systems can handle product education, general enquiries, and standard workflows. But the moment complexity increases—conditional eligibility rules, regulatory constraints that vary by context, decisions that must be explainable to auditors—the scaffolding requirements multiply. The question is not whether the model can generate appropriate responses but whether the surrounding system can enforce boundaries, maintain compliance, and provide the traceability that regulated environments demand.

In manufacturing, quote automation can dramatically compress cycle times for standard configurations. But non-standard requests, margin-sensitive customers, and specifications that require engineering judgement reveal the limits of rule-based approaches. The system needs access to institutional knowledge—how similar situations were handled historically, what assumptions tend to fail, which precedents should inform current decisions—and that knowledge rarely exists in structured, accessible form.

The twenty percent is where building becomes genuinely difficult. And addressing it requires not just engineering effort but architectural decisions about memory, learning, governance, and human-in-the-loop design that most internal teams did not scope into their initial plans.

What Enterprises Are Actually Buying

The shift toward buying is real, but characterising it simply as "buying tools" misses what is actually happening. Enterprises are not purchasing software licenses in the traditional sense. They are purchasing speed to production—the ability to deploy capabilities in weeks rather than quarters.

This changes what matters in vendor evaluation. The winning vendors are not necessarily those with the most sophisticated models or the most comprehensive feature sets. They are the vendors who can demonstrate production deployment rapidly, with governance frameworks that satisfy compliance requirements, and with operational patterns that have been validated in similar contexts.

The conversation has shifted from "what can your technology do?" to "how quickly can we be live, and what do we need to provide to make that happen?" Enterprises have learned, often painfully, that capability demonstrations and pilot success do not predict production viability. They want evidence that the vendor has navigated the scaffolding challenges before—context management, edge case handling, compliance frameworks, escalation paths—and can apply that experience to accelerate deployment.

This is why the distinction between general-purpose AI tools and operational AI systems matters. A general-purpose tool provides capabilities; an operational system provides capabilities embedded in workflows, with the scaffolding already solved. The enterprise does not need to build context management from scratch because the vendor has already built it. The enterprise does not need to design governance frameworks from first principles because the vendor brings patterns that have been validated in production.

The Trust Progression

Speed to production matters, but it is not the only factor. Enterprises have also learned to be cautious about systems that claim full capability at launch.

The pilots that failed most spectacularly were often those that promised too much too quickly. The system worked impressively in demonstrations, struggled with production variability, lost credibility with end users, and was quietly shelved. The failure mode was not technical incompetence but misaligned expectations—the system was positioned as ready for autonomous operation before it had earned the trust required for that autonomy.

What works instead is progressive trust-building. The system begins with limited scope—handling well-defined tasks where failure is recoverable, and human oversight is intensive. As it demonstrates reliability, scope expands. Autonomy is earned through operational evidence, not claimed through marketing.

This progression requires architectural support. The system must be designed for staged deployment, with clear boundaries between what it handles autonomously and what it escalates. It must capture feedback from production operation—not just outcomes but the reasoning that led to those outcomes—so that learning is continuous rather than episodic. It must provide visibility into its own performance so that stakeholders can develop justified confidence rather than blind faith.

Enterprises that have experienced failed pilots are increasingly sophisticated about this. They ask not just "what can the system do?" but "how does the system earn expanded authority?" They want to understand the stage gates, the performance thresholds, the escalation paths. They want to see governance frameworks that allow them to maintain control while the system proves itself.

Switching Costs and the Commitment Window

The shift toward buying creates urgency for both enterprises and vendors, but for different reasons.

For enterprises, the urgency comes from competitive dynamics. If AI capabilities provide a genuine operational advantage—faster response times, better customer experience, reduced cost per transaction—then delay means ceding ground to competitors who deploy sooner. The window to capture advantage is open now; it narrows as adoption becomes universal.

But there is a countervailing pressure: switching costs compound quickly once a vendor is selected. The system accumulates organisational knowledge through operation. It learns patterns specific to that enterprise's products, customers, and workflows. Integration deepens as the system connects to more internal processes. The cost of switching—measured not just in implementation effort but in lost learning and institutional memory—increases with each month of operation.

This creates a tension that sophisticated enterprises recognise. They want to move quickly to capture an advantage, but they want to choose carefully because the choice becomes increasingly difficult to reverse. They are buying not just current capability but a relationship that will evolve over the years.

For vendors, the implication is that the current window is unusually favourable. Enterprises have largely abandoned the build-first assumption and are actively seeking external solutions. But the window is also competitive—multiple vendors can move quickly, and being first matters primarily if you can also be durable.

What the Shift Does Not Change

The build-or-buy shift changes procurement dynamics but does not change fundamental requirements. Enterprises still need systems that work in production, not just in demonstrations. They still require governance frameworks that meet the requirements of regulators and internal stakeholders. They still need integration with existing workflows rather than parallel systems that create additional overhead. They still need operational reliability at scale.

The vendors that succeed in this environment are not those who move fastest in isolation but those who move fastest while addressing these requirements. Speed without governance creates risk. Deployment without integration creates friction. Capability without reliability creates disappointment.

The shift also does not change the underlying challenge: most enterprise AI value lies not in the model but in the scaffolding around it. The commoditisation of models raises the importance of everything else—the context management, the memory architecture, the evaluation frameworks, the governance structures. Enterprises that previously thought they were buying model access are discovering they need to evaluate scaffolding sophistication.

The Compounding Question

Perhaps the most significant question enterprises should ask—and often do not—is what happens to the investment over time.

A tool provides capability today. An operational system that learns from its own operation provides a capability that compounds. The system deployed today should be meaningfully more capable in twelve months, not because the underlying model improved but because it has accumulated organisational knowledge through use.

This is where the distinction between buying software and building institutional memory becomes relevant. The enterprise that deploys a learning system gains something that competitors cannot replicate by licensing the same technology. They gain operational history—the patterns, precedents, and institutional reasoning captured through production use. That history becomes a form of moat, even in a market where the underlying technology is commoditised.

The build-or-buy question, properly understood, is not just about how to acquire capability but about how to grow institutional intelligence. The answer, for most enterprises, is that buying provides the foundation—the scaffolding, the governance, and the speed to production—but the durable value comes from what the system learns once it is deployed.

That learning cannot be purchased. It can only be accumulated. And the organisations that begin accumulating it now will have advantages that late movers cannot easily catch up to.

—

The shift from build to buy reflects hard-won lessons about what production AI actually requires. The scaffolding around models—context, memory, governance, edge cases—is where most implementations struggle. Speed to production matters, but so does choosing partners who have solved these challenges before and can help institutional intelligence compound over time.

Mitochondria builds ATP — agentic AI for operations. It learns your workflows, earns autonomy in stages, and runs with governance built in. Your data stays yours. Based in Amsterdam and Pune, working with organisations across Europe and India.

Sushrut Munje