Beyond Cost Comparison: A Framework for Evaluating AI Deployments
There is a peculiar problem that emerges when AI deployments succeed: the value becomes invisible.
Before the system was implemented, the pain was tangible. Delays were felt. Errors were noticed. Bottlenecks were complained about. People remembered the frustration of waiting, the deals lost to slow response, and the knowledge that walked out the door when someone left.
After the system works reliably for a few months, that memory fades. The process simply happens. Quotes go out. Enquiries get responses. Information flows where it needs to flow. The absence of friction is difficult to perceive precisely because friction is what draws attention. Smooth operation is, by nature, unremarkable.
This creates a challenge for organisations evaluating whether their AI investment is justified. The comparison they instinctively reach for—what does this cost versus what we paid before?—often misses the point entirely. And when baseline metrics were never collected (which is common, because struggling processes rarely get measured systematically), the comparison becomes even harder to make.
Over multiple deployments (manufacturing, financial services, travel, eCommerce, ESG monitoring, real estate, and others), we have developed an approach to impact assessment that addresses this problem. It is a way of thinking about value that captures what AI deployments actually change, and what would be lost if they stopped.
The Wrong Comparison
The most common mistake in assessing AI impact is comparing the subscription or service cost to the cost of hiring someone to do the same work.
This comparison fails for a straightforward reason: a person cannot do the same work. A person hired to handle customer enquiries cannot guarantee response times under one minute, twenty-four hours a day, every day. A person cannot handle volume spikes without stress, error, or delay. A person cannot ensure that every output includes every required element, every stakeholder is notified, every attachment is included, and every field is correctly populated. A person cannot provide a complete, searchable audit trail of every interaction.
These are not criticisms of people. They are observations about what human work involves: variability, fatigue, availability constraints, and the ordinary limitations of attention and memory. A well-designed AI system does not have these limitations—not because it is superior to humans, but because it is solving a different kind of problem.
The correct comparison is not "what would it cost to hire someone?" but "what would it cost to build this capability any other way?" In most cases, the answer is that equivalent capability (guaranteed consistency, perfect availability, instant response, complete audit trails) cannot be achieved through hiring at any reasonable cost. Some capabilities simply do not exist in human-staffed form.
The Baseline Problem
Impact assessment requires comparison, and comparison requires a baseline. What was the situation before the AI system was deployed?
For many organisations, this baseline does not exist in measurable form. They know things were slower, but they did not track how slow. They know leads were missed, but they did not count how many. They know errors occurred, but they did not log them systematically. The pain was real but unquantified.
Terming it as negligence is unfair. Struggling processes rarely get instrumented. When a team is overwhelmed, the last thing they have capacity for is building dashboards to measure their overwhelm. The absence of baseline data is itself evidence that the process was under strain.
Our approach addresses this by reconstructing baselines through operational analysis rather than historical data. We examine the process as it existed—the steps involved, the people required, the decision points, the failure modes—and estimate what the process would produce under realistic assumptions about human performance.
How long does each step take when performed manually? How many steps does it require for someone to be available? What happens when that person is unavailable? How often do errors occur at each handoff point? What is the cost of those errors—not just the direct cost, but the downstream consequences?
This reconstruction is not precise, but it does not need to be. Even conservative estimates typically reveal substantial gaps between what the process could theoretically produce and what it actually produced under real-world constraints.
Five Dimensions of Value
An impact assessment that focuses solely on cost reduction misses most of the value AI deployments create. We evaluate impact across five dimensions, each of which captures a different aspect of operational change.
Direct capacity replacement is the most visible dimension: the hours of human work that the system now performs. If a process previously required four hours of administrative work per day, and the system now handles that work, the organisation has recovered four hours of capacity. This capacity has value whether it is converted to cost savings (fewer staff required) or redeployed to higher-value activities (same staff, different work).
Opportunity cost recovery captures what people were not doing while they performed the work that the system now handles. When senior salespeople spend hours assembling quotes, they are not spending that time on activities that only they can do: building relationships, closing complex deals, and developing new accounts. The opportunity cost of skilled people doing administrative work rarely appears in any accounting, but it compounds over time. A salesperson who recovers two hours per day has ten additional hours per week for actual selling.
Risk and error elimination captures the cost of mistakes that no longer occur. Manual processes produce manual errors. In high-value transactions (machinery quotes, financial applications, legal documents), a single material error can damage a deal or a relationship. The cost is not just the immediate loss but the erosion of trust that makes future deals harder to close. AI systems that enforce completeness and consistency eliminate entire categories of error by design.
Speed and responsiveness captures competitive advantage that is difficult to quantify but real. In markets where multiple suppliers receive the same enquiry, response speed correlates with win rates. The organisation that responds in minutes has a structural advantage over competitors who respond in hours or days. This advantage does not appear in any metric until it is lost—until the competitor who responded faster wins the business.
Scalability captures the difference between linear and fixed cost structures. Human capacity scales linearly with volume: twice the work requires roughly twice the people. AI systems handle volume increases without proportional cost increases. At ten transactions per day or fifty transactions per day, the cost is identical. As the organisation grows, this leverage compounds. The marginal cost of growth decreases rather than remaining constant.
The Infrastructure Asset
Beyond operational metrics, AI deployments often create durable assets that would have required separate investment to build.
Consider what a well-implemented system requires to function: structured product data, documented pricing logic, formalised process flows, and clear escalation paths. Organisations that implement AI systems are often forced, for the first time, to consolidate institutional knowledge into an accessible, structured form.
Before the system, product knowledge lived in experienced employees' heads. Pricing logic was applied through judgment calls that were never documented. Process flows existed as habits rather than specifications. This unstructured knowledge was valuable but fragile—accessible only when specific people were available, lost entirely when they left.
After implementation, that knowledge exists as infrastructure. Product catalogues are documented. Pricing rules are explicit. Process flows are formalised. This documentation has value regardless of the AI system itself. It reduces key-person dependency. It accelerates onboarding. It enables consistency that was previously impossible.
The investment to create this documentation would have been substantial if undertaken as a standalone project. The AI implementation made it necessary, and the organisation now owns it permanently.
The Visibility Dividend
Most organisations have limited visibility into their own operations before AI deployment. They know roughly how much work gets done, but not the patterns within that work. They know outcomes but not the paths that led to those outcomes.
AI systems that handle operational workflows generate comprehensive data about those workflows as a byproduct of operation. Every transaction is logged. Patterns become visible: which products generate the most enquiries, which time periods see demand spikes, which customer segments require the most handling, and which process steps create the most friction.
This visibility has value beyond the immediate automation. It informs product decisions, staffing decisions, and process improvement priorities. It enables management by evidence rather than intuition. Organisations consistently report that the operational intelligence generated by AI systems (the ability to see what is actually happening) is valuable independent of the automation itself.
The Counterfactual Test
Perhaps the most clarifying question in impact assessment is the simplest: what happens if the system is switched off?
This question cuts through accounting abstractions and reveals operational reality. If the system stopped working tomorrow, what would the organisation lose?
Not "lose a chatbot" or "lose an automation tool", those framings minimise what is actually at stake. What specific capabilities would disappear? What guarantees would no longer hold? What visibility would be lost? What would the team need to do differently, and what would fall through the cracks while they adjusted?
The answers typically reveal that the system has become infrastructure rather than tooling. Switching it off does not mean reverting to a previous process; it means operating without capabilities that the previous process never provided. No fallback delivers instant response, perfect consistency, complete audit trails, and unlimited scalability. The alternative is accepting that those capabilities no longer exist.
This counterfactual clarifies value in a way that comparative metrics cannot. The organisation is not paying for marginal improvement over a previous state. It is paying for capabilities that have no equivalent substitute.
Compounding Value
The final dimension of impact assessment concerns the trajectory rather than the snapshot.
A system that operates for one month has limited learning to draw upon. A system that operates for twelve months has accumulated patterns, precedents, and institutional knowledge that make it meaningfully more capable. A system that operates for three years has compounding advantages that would take any new implementation years to replicate.
This compounding creates asymmetric value over time. Early assessments over three-six months capture operational improvement but not accumulated intelligence. Later assessments reveal capabilities that did not exist at deployment and could not have been purchased: organisational memory built through operational experience.
The question for impact assessment is not just "what is the current value?" but "what is the trajectory of value?" A system that compounds learning is on a different curve than a system that provides static capability. The gap between them widens with each month of operation.
The Assessment Framework
Bringing these dimensions together, our approach to impact assessment asks six questions:
What capacity has been recovered, and how is it being redeployed? This captures direct labour replacement and opportunity cost recovery.
What errors and risks have been eliminated, and what was their historical cost? This captures quality improvement and risk reduction.
What speed and responsiveness advantages have been gained, and how do they affect competitive position? This captures the market impact that is real but difficult to quantify precisely.
What infrastructure assets have been created as a byproduct of implementation? This captures documentation, formalisation, and knowledge consolidation.
What operational visibility now exists that did not exist before? This captures the intelligence dividend from comprehensive transaction logging.
What would be lost if the system were switched off? This is the counterfactual test that clarifies what the organisation is actually paying for.
The answers to these questions rarely produce a single number. They produce a picture, a comprehensive view of how operations have changed and what capabilities now exist. That picture is what stakeholders need to evaluate whether the investment is justified and where further investment might compound returns.
Impact assessment is not accounting, but understanding, and understanding requires asking questions that accounting frameworks do not typically pose.
—
Measuring AI impact requires looking beyond cost comparison to the capabilities that now exist and would be lost if the system stopped. The value lies not just in what the system does but in what it makes possible: speed, consistency, visibility, and institutional memory that compounds with operation.
Mitochondria builds ATP — agentic AI for operations. It learns your workflows, earns autonomy in stages, and runs with governance built in. Your data stays yours. Based in Amsterdam and Pune, working with organisations across Europe and India.