The Agent Readiness Ladder: A Framework for Where You Actually Are

Why existing AI maturity models fail practitioners, and a self-assessment scorecard for Singapore’s banks

Jun 04, 2026

A Practitioner’s Field Report, Part 2 of 4

In Part 1, I mapped where Singapore’s financial services sector actually stands on agentic AI. The short version: most banks are further along than headlines suggest, but almost nobody is where they think they are.

Today I want to give you a framework. Not another maturity model that tells you to “aspire to Level 5.” A self-assessment tool that answers a specific question: where are you starting from, and what does the next step actually require?

I call it the Agent Readiness Ladder. It came out of this research, because when I looked at how organizations were actually making decisions about agent deployment, the existing frameworks were not useful. They describe destinations without mapping the route.

Why Existing Maturity Models Fail Practitioners

There is no shortage of AI maturity models. Gartner has one. Google has one. Microsoft has one. They all follow the same basic pattern: classify organizations on a spectrum from “basic” to “advanced,” usually across 4–6 levels, with descriptions of what each level looks like.

They are useful for boardroom presentations. They are not useful for the people actually building these systems. Why? Three reasons.

First, they describe what each level looks like without specifying what it takes to get there. A bank can be told it is at “Level 2” and should aspire to “Level 4.” But the framework says nothing about the evaluation infrastructure, governance mechanisms, or organizational trust required for that transition.

Second, they treat progression as linear and universally desirable. In regulated financial services, the wrong level of autonomy is worse than too little. Moving up the ladder on the wrong use case does not just waste money. It makes the system worse.

Third, they ignore the transition costs. The jump from Level 2 to Level 3 is qualitatively different from Level 3 to Level 4. The Level 3–4 transition is architectural. It requires fundamentally different infrastructure, evaluation approaches, and governance models.

The Five Rungs

For each rung, I define three things: what it looks like, the jump conditions to move up, and the common failure mode at that level.

Rung 1: The Chatbot

What it looks like: A single AI model handles a single task category. FAQ bots, basic customer service routing, simple document summarization. When the model cannot handle something, it routes to a human. This is table stakes by 2026.

Jump conditions to Rung 2: Domain-specific knowledge bases, a structured evaluation framework, basic compliance review, and broad organizational adoption where humans still decide.

Common failure mode: Over-scoping the chatbot. Organizations try to make a single model handle increasingly complex queries instead of building specialized capabilities. Jack of all trades, master of none.

Rung 2: The Copilot

What it looks like: AI augments human decision-making across multiple dimensions. The system suggests, prioritizes, or pre-processes; the human decides and acts. Think personalized product recommendations that a banker reviews before presenting, document drafting with human review, risk scoring that an analyst validates.

Most Singapore financial institutions sit here today. OCBC driving 40% of sales through AI-powered personalization? Rung 2. DBS-GPT used by two-thirds of staff as a daily productivity tool? Rung 2. Standard Chartered’s SC GPT deployed to 80,000 employees for document processing and review? Also Rung 2.

Jump conditions to Rung 3: Domain-scoped architecture with auto-escalation, calibrated evaluation per domain, tested compliance guardrails, graceful degradation, and organizational trust that AI can act autonomously within scope. The critical shift: the business has to be willing to let AI act, not just suggest.

Common failure mode: The Copilot Trap. The AI is good enough to assist but the organization never grants it autonomy. Every output still requires human approval, creating a bottleneck that negates much of the efficiency gain.

Worse: over time, humans who rubber-stamp AI outputs stop building the judgment that makes their approval meaningful. The approval becomes theater. This is more often a governance failure than a technology failure.

Rung 3: The Specialist

What it looks like: An autonomous AI system operates within a clearly defined domain with enforced guardrails. It makes decisions independently for routine cases and escalates edge cases to humans. Autonomous fraud detection and blocking, automated compliance screening, algorithmic trading within defined parameters.

The leaders operate here. OCBC’s Source of Wealth Assistant (SOWA), deployed at Bank of Singapore, reduced Source of Wealth report writing from 10 days to 1 hour. It operates at Rung 3. Their publicly reported agentic AI models in production are all Rung 3 deployments: autonomous within a defined domain, with guardrails that enforce scope.

Jump conditions to Rung 4: This is the architectural transition: orchestration layer with shared state, end-to-end evaluation across agent boundaries, cross-agent accountability framework, and executive sponsorship for multi-process AI autonomy. All four dimensions shift qualitatively, not just quantitatively.

Common failure mode: Overconfidence. Success at Rung 3 breeds a dangerous assumption: if one specialist agent works well, deploying many and connecting them will work equally well. It does not. The coordination overhead is non-linear.

Rung 4: The Team (The Hard Transition)

What it looks like: Multiple specialized AI agents collaborate, each with a defined mandate, coordinated by an orchestration layer. The system handles multi-step, cross-domain processes end-to-end.

Think about what full client onboarding would look like with a team of agents: a KYC agent verifies identity documents, a compliance agent screens against sanctions lists, a product recommendation agent matches client needs to offerings, a documentation agent generates the paperwork. Each agent is a specialist. The orchestration layer coordinates the handoffs.

Few organizations globally have achieved Rung 4 in production in regulated financial services. This is the frontier.

Why is the transition so hard? Because every dimension of readiness changes qualitatively, not just quantitatively. With N agents you get N-squared interactions plus emergent behaviors that no single agent was designed to produce. And giving a team of agents autonomy across an entire process requires institutional trust that a single-agent deployment never tests.

Common failure mode: Stitching specialists together. Organizations take successful Rung 3 agents and connect them with ad hoc integrations, skipping the orchestration layer, shared state management, and cross-agent evaluation. The result works in demos and fails in production.

Why Multi-Agent Is Not Always Better

Figure 2: Multi-Agent Coordination — Not Universally Better

A December 2025 study by researchers across Google Research, DeepMind, and MIT (“Towards a Science of Scaling Agent Systems,” arxiv 2512.08296) tested 180 different agent configurations using the Finance-Agent benchmark, which covers entry-level financial analyst tasks. The finding: centralized multi-agent coordination improved performance by +80.9% on parallelizable tasks but degraded performance by 39–70% on sequential tasks (varying by task complexity and coordination overhead).

Multi-agent is dramatically better for the right tasks and dramatically worse for the wrong ones. Most financial services workflows (compliance screening, client onboarding, document processing) are primarily sequential, with limited parallelizable stages.

I am not arguing against multi-agent systems. I am arguing for precision. Knowing which tasks benefit from coordination and which are better served by a single specialist is the core skill of the Rung 3 to Rung 4 transition.

Rung 5: The Department

What it looks like: AI agents own end-to-end processes with minimal human oversight. Humans set strategy, constraints, and objectives; agents execute across the full operational scope.

Status in regulated financial services: Largely theoretical. The most instructive example is Klarna, which in February 2024 claimed AI handled two-thirds of customer service interactions autonomously. By mid-2025, CEO Sebastian Siemiatkowski publicly acknowledged the AI-first strategy “went too far.” Service quality dropped. Klarna began rehiring human agents. In regulated financial advice, the bar is higher still.

Progress is real: MAS’s MindForge Phase 2 (concluded March 2026) delivered concrete outputs including an AI Risk Management Toolkit and Operationalisation Handbook. IMDA separately released a Model AI Governance Framework for Agentic AI in January 2026, addressing multi-agent accountability specifically.

Jump conditions to Rung 5: These do not yet exist in regulated finance. That is part of the point. The governance and regulatory prerequisites for systemic AI autonomy have not been defined by any jurisdiction.

The Self-Assessment Scorecard

Now score yourself. For each of the four dimensions below, find the highest rung where your organization meets the description. Be specific: pick one AI-powered process and score that process, not your organization in the abstract.

Scoring rule: Your rung = your LOWEST score across all four dimensions. If your architecture is Rung 4 but your governance is Rung 2, you are at Rung 2. The weakest dimension is your binding constraint, and it tells you exactly what to fix next.

Two things this reveals that a single-score maturity model hides. First, most organizations are lopsided. Strong architecture, weak governance. Or high trust from leadership, but no evaluation infrastructure to justify it. That asymmetry is the diagnosis: it tells you where to invest next. Second, the “lowest score” rule prevents vanity scoring.

The Compounding Failure Problem

Figure 4: Compounding Failure in Multi-Step Workflows

If an agent achieves 85% accuracy per action (which is solid), a 10-step workflow succeeds only 20% of the time. (0.85¹⁰ = 0.197.)

This is pedagogical, not predictive. Real systems have correlated failure modes (which make it worse) and retry logic, checkpoints, and error recovery mechanisms (which make it better). But the principle holds: multi-step processes compound errors, and even high per-step accuracy can produce unacceptable end-to-end failure rates.

Gartner predicted in June 2025 that over 40% of agentic AI projects would be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. Separately, a 2025 MIT report found that 95% of generative AI pilots fail to deliver measurable P&L impact, though this figure has been debated.

These numbers are not arguments against building agent systems. They are arguments for building evaluation infrastructure before you build agents. They are arguments for mastering Rung 3 before attempting Rung 4.

The Deployment Matrix: Where to Play

The upper-right quadrant (high autonomy in high-regulation environments) is where the most significant value lives. It is also where Singapore’s proactive regulatory approach — MAS MindForge, IMDA’s agentic AI governance framework — positions the market well.

Where Singapore’s Banks Sit Today

Based on publicly disclosed information through Q1 2026: annual reports, press releases, MAS filings, patent databases, and vendor announcements. Banks may have undisclosed deployments at higher rungs.

DBS — Rung 2, entering 3

The scale leader. 430+ AI use cases, 2,000+ models, S$1B confirmed economic value in 2025, the largest AI deployment in ASEAN banking. Two-thirds of employees use DBS-GPT daily. CEO Tan Su Shan announced the transition from “AI as copilot to AI operating on autopilot” in 2025.

The agentic signal: In February 2026, DBS became the first bank in Asia-Pacific to pilot Visa Intelligent Commerce: AI agents that can initiate and complete purchases on behalf of customers within pre-approved parameters using DBS/POSB cards. They also deployed a SWIFT message classification agent and CodeBuddy. Both still operate within human-approved guardrails.

Why not higher: Every agentic system maintains human checkpoints at critical decision points. The Visa pilot is their most autonomous deployment and it is still a pilot.

OCBC — Rung 2–3, leading on agentic

The agentic leader. OCBC has the most confirmed agentic AI systems in production among Singapore banks. Their Source of Wealth Assistant (SOWA) at Bank of Singapore reduced report writing from 10 days to 1 hour, with AI agents that autonomously extract information from client documents, validate plausibility against benchmarks, and generate standardized reports.

Beyond SOWA: 360 Bonus Appeal agent (automating end-to-end processing across 6 categories), Compliance Copilot (reducing onboarding compliance from days to minutes), and a multi-agent private banking onboarding system targeting 40-document processes in under 1 hour.

The ecosystem: Their GenAI Wealth Advisor Simulator (April 2026) is a training tool, not an agent — but it delivered 2x weekly client appointments and 50% revenue uplift for 900 wealth advisors. A.I. Oscar (Rung 2 recommendation engine) drove a 95% jump in trading accounts. Other named tools include OCBC GPT, Buddy, Wingman, and HOLMES AI.

UOB — Rung 2, entering 3

The surprise mover. In March 2026, UOB participated in the first live, authenticated agentic transaction in Singapore alongside DBS and Mastercard. An AI agent autonomously booked a ride to Changi Airport within Mastercard’s Agent Pay framework’s pre-authorized guardrails. UOB is also one of 13 issuers in Visa’s Agentic Ready programme.

The foundation: A 3-year Accenture MOU (April 2025) explicitly targets agentic AI via the AI Refinery platform. Project Magnet: 148% increase in genuine alert detection. However, these are detection systems, not autonomous agents.

Standard Chartered — Rung 2, building for 3

Copilot at scale. SC GPT deployed to ~80,000 employees across 54 markets (launched March 2025), delivering a 6% average productivity uplift. 300+ AI use cases in production. Mortgage memorandum generation reduced from 2 days to under 10 minutes.

The infrastructure play: Their AI Factory platform (July 2025) explicitly provides reusable components for agentic AI. They are refreshing their Responsible AI Standard for agentic challenges. But no production agentic agents have been announced.

HSBC — Rung 2, researching 4

The R&D play. HSBC filed a patent application for “Systems and Methods for Hybrid Multi Machine Learning Agent Orchestration” (December 2024). First Chief AI Officer (David Rice, April 2026). $1.8 billion toward AI/tech. 600+ AI use cases deployed globally.

The agentic signal: At Sibos 2025, HSBC partnered with Microsoft, ANZ, and Lloyds on a proof-of-concept where AI agents transform trade workflows. Mistral AI partnership (December 2025) for self-hosted multilingual reasoning. Both are PoC-stage, not production.

The race is tighter than it appears. All three local banks (DBS, OCBC, UOB) now have live agentic AI signals, whether in compliance (OCBC), payments (DBS and UOB), or operations. But every deployment maintains human checkpoints. The transition from individual agents to orchestrated teams remains the frontier.

What You Can Do Today

Pick three AI-powered processes in your organization. Score each one separately against the Scorecard:

Score it on all four dimensions. Architecture, Evaluation, Governance, Org. Trust. Be honest: if a human approves every output, your Governance score is Rung 2. Your rung = the lowest of the four scores.
Identify your binding constraint. Whichever dimension scored lowest is your bottleneck. From what I have seen, the answer is almost never “we need a better model.”
Check the Deployment Matrix before climbing. Not every process needs to climb. Sequential workflows often perform worse with multi-agent coordination.
Share the scorecard with your team. Instead of “we should be doing more with AI,” you get “our Evaluation is at Rung 3 but our Org. Trust is at Rung 2, so Org. Trust is the bottleneck.”

Write it down. The scoring takes two minutes. The direction it gives you would cost two months of consulting to discover.

This is Part 2 of a four-part series on AI Agent Adoption in Singapore Financial Services.

Part 3 will cover the Builder’s Playbook: practical guidance on building and operating multi-agent AI in regulated financial services.

Disclaimer: This article reflects my personal views and analysis only. It does not represent the views, positions, or research of any current or former employer. All data cited is from publicly available sources.

NextGenProdMan’s Substack

Discussion about this post

Ready for more?