Operations: Where Everything Either Lands or Doesn't

November 6, 2025

Delivery 360 — Pillar 5 of 5

Summary: In this final of our series looking at the main pillars of the Delivery 360 we, look at last at Operations and the delivery engine itself. That is, how well the engineering system builds and ships, and whether what ships actually reaches users and gets adopted. The Delivery 360 is an independent, evidence-based diagnostic that investigates the full system around technology delivery — not just engineering in isolation. It aims to provide a clear picture of where the issues sit, why they exist, and what to do about them. The Delivery 360 assesses across five key pillars: Identity & Direction; Intelligence & Adaptation; Control & Optimisation; Coordination; and Operations.

But it is in Operations where the quality of everything upstream becomes observable. Good strategy, empowered teams, sound governance, and effective coordination either produce fast, reliable, high-quality delivery. When we look at operation we typically look at it through two lenses: Technology Delivery Operations and Go-to-Market & Launch.

‍

Why do Operations Pillar Matter?

Operations is where the whole delivery system traditionally becomes measurable. It is the point at which strategy, governance, coordination, and execution stop being abstractions and start producing observable outcomes. In the Viable System Model, this is the part of the organisation that actually produces output. Everything else in the VSM exists to support, direct, and coordinate this. Identity & Direction sets direction. Intelligence & Adaptation informs what to build. Control & Optimisation governs what work enters the system. Coordination ensures multiple teams can work in parallel. And Operations is where all of that becomes real in products or features and materialises as value, or does not.

‍

Engineering metrics such as deployment frequency, lead time for changes, change fail rate, and recovery time provide a clear quantitative picture of how the delivery is performing. More recent DORA guidance has also added deployment rework rate, which sharpens that picture by showing how much delivery capacity is being consumed by fixing work that was supposedly already done. Together, these measures show performance. What they do not show on their own is why.

‍

A team deploying once a month may be doing so because engineering practices are weak. Or because the pipeline is brittle. Or because demand management is poor enough that most work arrives under-specified and gets reworked. Or because the team structure creates too many dependencies for anything to be released independently. Or because leadership has not created enough safety for frequent release.

‍

This is the diagnostic challenge the whole Delivery 360 designed to address. Operations is where the symptoms appear. The causes are usually upstream.

‍

What we examine, then, is both the delivery performance itself via DORA metrics, quality indicators, and operational reliability data, and the practices that produce it. Strong metrics with weak practices often indicate luck, narrow local optimisation, or even metric gaming. Weak metrics with otherwise strong practices often point to an upstream constraint. It's the combination tells the actual story.

‍

Dimension 1: Technology Delivery Operations

The diagnostic question: Is the engineering organisation's delivery performance across speed, quality, and operational reliability measured, understood, and continuously improved?

‍

This is where the focus on engineering productivity has been over the last years. It is critical and our work here most directly surfaces engineering delivery performance. We rely heavily on quantitative evidence. DORA remains the clearest common language for this. It distinguishes between throughput and instability, using deployment frequency, change lead time, failed deployment recovery time, change fail rate, and deployment rework rate. It gives a broad read. The numbers only become useful when read together. They "why" for the numbers is what the qualitative analysis adds.

‍

What Technology Delivery Operations looks like

At the lowest maturity levels, deployment is infrequent. Lead times are long. Change failure rates are high. Incident recovery is slow. Rework is normalised. What we also see is that engineers do not feel ownership of production systems. Reliability is often treated as somebody else's department. Continuous integration is weak or absent. Quality is controlled by a separate QA gate at the end of the process, which means defects are found late, when they are most expensive to correct.

‍

In more mature organisations, the engineering organisation has built the technical and cultural infrastructure for faster, safer delivery. A CI/CD pipeline runs on every commit. Production deployments are automated. On-call is structured. Post-incident reviews are documented and actions tracked through to closure. Test automation coverage is not just discussed, but managed. Feature flags decouple deployment from release and make progressive rollout possible.

‍

At the highest maturity levels, deployment is routine and low drama. Teams can release on demand. Lead time is measured in hours or minutes, not weeks. Recovery from failed deployment events is fast. Rework is visible and actively reduced. The delivery platform itself is treated as a product, with a named owner, an improvement roadmap, and a review cadence. A useful shorthand is the difference between shipping often and shipping cleanly.

‍

What we look for in the evidence

High-performing delivery organisations do not just produce better metrics. They can explain them:

engineering leaders can cite current delivery performance without preparation
teams distinguish between deployment speed and deployment quality
feature flags are standard operating practice or only used for exceptional releases
post-incident reviews are both blameless and operationally consequential
pipeline improvement has clear ownership, rather than being maintained reactively
rework from production issues is measured, visible, and treated as capacity loss.

‍

What's also important is the cultural dimension. It can matter as much as the technical one. "You build it, you run it" is not a slogan. It is a structural design choice. Teams that own production quality tend to take testing, observability, and release discipline seriously from the start, because they are the ones who pay the operational cost when things fail. Teams that hand off to a separate operations function at deployment often aren't invested.

‍

AI-assisted development adds another layer of consideration to this dimension that is rapidly becoming mainstream rather than optional. Developers write more code, faster. While this is genuinely valuable it also introduces risks that organisations are only beginning to grapple with. Organisations that adopt AI coding tools without maturing their CI/CD, test coverage, and code review practices are accelerating into a higher volume of inadequately validated changes. Speed without quality infrastructure is not a capability improvement; it is a multiplication of risk. In our work we assess whether the organisation's engineering practices have kept pace with the adoption of AI development tooling.

‍

For AI-powered products, standard delivery practice does not always work in the same way. They can't fully account for non-deterministic outputs, prompt and model lifecycle management, hallucination risk, or output quality monitoring in production. That means organisations shipping AI systems need additional operational capability. This is evolving as we speak but key elements have emerged:

evaluation infrastructure for model and prompt changes
monitoring for quality drift
explicit rollback and containment mechanisms for unsafe outputs
clearer ownership of production model behaviour.

The traditional delivery engine may be optimised for traditional software but is under-instrumented for AI.

‍

Dimension 2: Go-to-Market & Launch

The diagnostic question: Does shipped product reliably reach users and get adopted? And is launch treated as a coordinated system, not just an engineering event?

‍

Shipping is not the outcome. Adoption is. Engineering often measures success at the point of deployment. But code in production is not business value. Business value appears when a user discovers the capability, understands it, adopts it, and integrates it into their work. The connection between deployment and adoption is the launch system. In many organisations, that system is weak, inconsistent, or even missing entirely.

‍

The default launch pattern in low-maturity organisations tends to be build and announce. Engineering ships. Marketing sends an email or updates release notes. Sales hears about the feature late. Customer success is unprepared for customer questions. And so support volumes rise after launch because enablement and onboarding were deferred. This is not usually a feature failure. It is a launch system failure.

‍

At higher maturity levels, launch is a coordinated cross-functional process rather than a last-minute comms exercise. And GTM planning often begins during development, not after it. Launch readiness criteria are specific. Sales is briefed. Customer Success & Support is trained. Documentation is published. Analytics is instrumented. Onboarding is prepared. Launch tiers are applied.

‍

At the highest maturity levels, adoption is jointly owned across product, GTM, and CS. Post-launch performance is typically tracked at 30, 60, and 90 days. Time-to-value is defined and measured. Adoption criteria are explicit rather than vague. Launch retrospectives generate learnings that feed back into the next cycle.

‍

What we look for in the evidence:

The most reliable signal here is not the launch checklist. Rather, it is the consistency of the story we hear across functions:

sales and CS describe the same launch process when interviewed separately
product defines adoption in behavioural terms, not just in terms of release completion
feature adoption is tracked
time-to-value is measured for meaningful launches
Customer Success and commercial readiness are treated as hard launch gates
Support spikes after launch are visible and trending down as the system matures.

‍

A strong launch system is one in which commercial teams describe being prepared, not surprised. This is also why we interview commercial and Customer Support functions without the Product Manager present. The gap between what the PM believes they have communicated and what the other functions say they received is one of the most reliable indicators of launch maturity. In weak systems, the gap is large. In strong ones, the process is clear for everyone.

‍

What The Operations Pillar Tells Us

In many ways, the Operations pillar is where the full diagnostic comes together. Engineering metrics provide the quantitative baseline. They show, clearly and comparably, how the delivery engine is performing. The surrounding evidence from practices, behaviours, and cross-functional interviews explains why it is performing that way.

‍

When delivery metrics are weak and upstream dimensions are also weak due to unclear strategy, poor demand management, overloaded teams, or brittle coordination, the diagnosis is systemic. The delivery engine is producing what the system conditions allow. Improvement requires attention upstream, not just pressure on the engineering team.

‍

When delivery and operational metrics are weak but upstream conditions are reasonably sound, the issue is most likely to sit inside engineering. It could be pipeline maturity, release discipline, test automation, incident response, or platform ownership. The intervention can be narrower and more targeted.

‍

When engineering throughput is strong but customer adoption is weak, the failure is not in shipping. It is in landing. Fast deployment without a functioning GTM is simply faster delivery of things that cannot not create value. The investment in engineering performance is only recovered when shipped work reaches users, gets adopted, and changes outcomes.

‍

The 5 Pillars

This concludes are loop of the five pillars of the Delivery 360. The five pillars of the Delivery 360 form a connected system. Weakness in any one degrades the others, and the cascade is usually predictable.

‍

Identify sets direction. Weakness here produces misaligned demand. So teams receive work without strategic context. Demand management is hard because there is nothing stable to filter with.

‍

Intelligence informs decisions. Weakness here potentially produces misdirected effort. The wrong things are being built.

‍

Control governs investment. Weakness here produces waste because investment profile is wrong, commercial feedback does not reach product, and pricing captures less value than the product creates.

‍

Coordination enables parallel work. Weakness produces friction across the org. Rather than teams supporting each other, instead they block each other.

‍

Operations produces and delivers the output. Weakness here produces slow, unreliable delivery and whatever value was identified upstream fails to reach users.

‍

The Delivery 360 diagnostic examines all five. Each pillar score is meaningful on its own. The pattern across all five is where the highest-value findings sit. The diagnostic is designed to read all five levels together and identify the connections between them. Findings are rarely isolated to one pillar. They are usually a story about how weakness in one part of the system propagates into the others, and where the highest-leverage intervention sits.

‍

That is why the assessment exists. Not to audit engineering. Not to produce a maturity score. But to give leadership an integrated, evidence-based view of the delivery system so they can understand what it is producing, why it is producing it, and what needs to change.

‍

Would you like to find out more? Click the button below to get in touch.