Engineering & Technology

‍Beyond the single rail: How we built Paymend's recovery platform

May 25, 2026

Written by:
Carlos Baldacin
Carlos Baldacin
‍Beyond the single rail: How we built Paymend's recovery platform
Table of Content

Failed payments are not a billing problem. They are a revenue problem. In subscription and recurring payment businesses, they are often silent one. A card decline that goes unaddressed does not just lose a transaction; it starts a churn event. Whether you are processing thousands or tens of millions of transactions, across one payment method or many, a single-rail payment setup creates the same fundamental problem: you are one failure away from losing revenue you could have recovered.

At Paymend, payment recovery is not a feature we bolted on; it is the core of what we do. This post walks through how we architected our payment recovery platform from the ground up: the technical decisions we made, the infrastructure we chose, and the engineering principles that guide our system. 

Why single-rail payment systems break at scale

Most companies start with a single payment gateway. It is the obvious move: pick a well-known provider, integrate their SDK, and start processing. For a while, it works. Then growth exposes the cracks.

A single-rail setup has a fundamental problem: it cannot distinguish between a recoverable failure and a terminal one. When a card is declined, the gateway returns a decline code, but not all declines are equal. A soft decline (insufficient funds, temporary hold) is very different from a hard decline (card cancelled, fraud flag). Treating them the same way, retrying blindly or giving up entirely, means leaving money on the table or, worse, triggering additional failures that increase the chance of the issuing bank flagging the merchant.

Single-rail systems also create a single point of failure. If your gateway experiences downtime, your entire payment flow stops. For global merchants, gateway performance also varies significantly by geography and card network: a gateway that performs well for US Visa cards may underperform for European Mastercard transactions or LATAM-issued cards.

What is a multi-rail payment stack?

A multi-rail payment stack is an architecture where transactions can be routed across multiple payment processors, acquirers, or gateways depending on predefined logic. The "rails" are the payment pathways, each with different acquiring banks, card network agreements, and processing capabilities.

At Paymend, we built our multi-rail stack with three core components:

  • A gateway abstraction layer that normalises requests and responses across different processors,
  • An intelligent routing engine that decides which rail to use for each transaction,
  • A retry and recovery orchestrator that manages cascading retry logic across rails.

Our tech stack and architecture decisions

When we started building Paymend, we made a deliberate choice: own the stack, do not depend on a single third-party provider for mission-critical flows. That philosophy extends to how we chose our technology.

Backend: Java and Spring Boot

Our core services are built in Java with Spring Boot. Java is the language powering the core systems of most major financial institutions, and for good reason: transaction integrity, concurrency at scale, and a mature ecosystem battle-tested in production for decades. For a payments company, that is not a conservative choice; it is the right one.


According to the 2024 JetBrains Developer Ecosystem Survey, over 65% of enterprise backend developers still use Java as their primary language, which means we are also hiring from one of the deepest engineering talent pools in the industry. The last thing a payments company should do is build mission-critical infrastructure on an experimental or narrowly adopted stack, where hiring is harder, community support thinner, and long-term maintainability an open question.

Spring Boot reinforces that decision: dependency injection, transaction management, and a rich integration ecosystem come out of the box, letting the team move fast without sacrificing structure or correctness.

API design: REST with an event-driven core

Our external APIs are RESTful: predictable, well-documented, easy for merchants to integrate. Internally, the system is event-driven. When a payment attempt is made, it emits an event. When a decline occurs, another event fires. The retry orchestrator listens for these events and schedules recovery attempts based on the decline category, the merchant's configuration, and our own routing logic.

This decoupling is intentional. It means the routing engine, the retry logic, and the reporting layer can all evolve independently without touching each other. It also makes the system inherently observable: every state transition is an event that can be logged, replayed, and analyzed.

Infrastructure: Cloud-native and built for resilience

We run on a cloud-native infrastructure with Kubernetes orchestration. Services are containerized and independently deployable. We use managed databases for transactional data and a message queue layer for async processing. Retry jobs, for example, are queued and processed by workers that scale horizontally based on load, which is critical during high-volume periods when recovery attempts spike.

Principle 1: Intelligent routing logic

Not all transactions should travel the same rail. The direction we are building toward is a routing engine that makes a routing decision before every transaction attempt, informed by a combination of factors:

  • Card type and issuing country: some acquirers have stronger relationships with specific card networks or issuing banks in certain regions
  • Historical approval rates: tracking per-gateway, per-card-BIN performance over time to route toward higher-performing paths
  • Gateway health: real-time latency and error rate monitoring to inform routing decisions dynamically
  • Merchant configuration: merchants setting preferences, cost thresholds, or compliance requirements that the router respects

Getting this right is a journey. Each data point collected, every approval, every decline, every retry outcome, feeds back our platform into making routing smarter over time. The foundation is in place; the intelligence compounds as transaction volume grows.

Principle 2: Retry sequencing across rails

Retry logic sounds simple. In practice, it is one of the most complex parts of a payment recovery platform. Retry too aggressively and you trigger bank-side fraud flags or card network violations. Retry too passively and you leave recoverable revenue on the table.

The architecture we are building around is a cascading retry strategy. When a transaction fails on the primary rail, the system evaluates the decline code before deciding the next step. Soft declines, those categorized as temporary, trigger a retry schedule with configurable timing intervals. If the retry on the same rail fails again, the system escalates to an alternate rail. Hard declines, permanent failures, do not trigger retries at all; they surface to the merchant with enriched context so they can take action: updating payment details, contacting the customer, and so on.

Timing is the most important dimension. Transaction data consistently shows that retry timing, the interval between attempts, significantly affects recovery rates. The goal is dynamic scheduling that adapts based on the decline type, the day of week, and historical patterns for each merchant's customer base. This is where recovery stops being reactive and starts being intelligent.

Principle 3: Real-time failure analysis

A payment system that cannot explain why transactions fail is a black box, and black boxes do not improve. The principle guiding this part of the platform is clear: every decline should carry context, and that context should drive action.

The foundation is a normalization layer for decline codes across gateways. Every processor uses its own code system; a common taxonomy translates these into a unified set of categories: insufficient funds, do not honor, card expired, lost or stolen, and so on. This normalization is what makes cross-rail retry logic possible; without it, every gateway integration would require bespoke retry rules.

The longer-term ambition is aggregate analytics that surface patterns across the merchant portfolio: a spike in a specific decline code might indicate a card updater issue, a BIN-level problem, or a processor-side degradation. These signals should feed back into routing decisions automatically, closing the loop between failure analysis and recovery strategy. This is the direction we are building toward.

Implementation considerations

Building a multi-rail payment stack is not just an engineering problem. A few critical considerations engineers and technical leaders should factor in:

  • PCI DSS compliance: Handling payment data across multiple rails increases your compliance surface. We address this with an encryption layer that intercepts and anonymizes PCI-relevant data before it reaches our infrastructure, meaning our systems never handle raw card data, which significantly reduces our PCI scope and the associated audit burden.
  • Observability: Distributed payment systems are notoriously hard to debug. Structured logging, distributed tracing, and alerting across all rails is a non-trivial investment but essential for understanding failure modes in production.
  • Payment ledger: Multi-rail processing means money moving across different processors, acquirers, and bank accounts simultaneously. We use a dedicated ledger product that gives us a real-time, double-entry view of every funds movement: automated ledger entries on each transaction event, accurate settlement state across all rails, and a single source of truth between what was initiated and what actually cleared.
  • Card network rules: Visa and Mastercard have specific rules around retry frequency and chargeback thresholds. Violating these can result in fines or losing your merchant category code. Our retry logic is built with these constraints hardcoded.

Measuring success: KPIs for payment recovery

You cannot improve what you do not measure. As the platform matures, these are the KPIs that will define success for payment recovery, and that we are building our observability layer around:

  • Initial approval rate: The percentage of transactions approved on the first attempt, per gateway and per card type. The baseline everything else is measured against.
  • Recovery rate: Of all transactions that failed on the first attempt, the percentage ultimately recovered through retries or alternate rails. This is the core metric for a recovery platform.
  • Time to recovery: How long it takes, on average, to recover a failed payment. Faster recovery means less churn risk and a better merchant experience.
  • Chargeback rate: A rising chargeback rate can signal problems with retry aggressiveness or routing quality. Monitoring this per merchant and per rail is essential to staying inside card network thresholds.
  • Decline code distribution: Shifts in the mix of decline codes often surface upstream issues before they become visible in approval rates: an early warning system that pays dividends the more data accumulates.

Conclusion: Own your payment infrastructure

The payment industry often presents complexity as a reason to outsource. Use an all-in-one processor. Rely on a third party to handle the hard parts. 

That approach works, until you need to understand why your approval rates are declining, or why recovery is underperforming in a specific market, or why a change in a provider's behaviour is silently affecting your revenue.

At Paymend, we made the opposite bet: build the stack, own the logic, control the product. A multi-rail architecture is not just a technical improvement; it is a strategic one. It gives merchants visibility, flexibility, and resilience that no single-provider setup can match.

Payment recovery is not a background process. It is a competitive advantage. For merchants at scale, the difference between a 70% and an 85% approval rate is not an engineering metric; it is millions in recurring revenue.

Talk to our team or explore our documentation to understand how our multi-rail payment recovery platform can work for your business.

Carlos Baldacin
Carlos Baldacin
VP of Engineering
Carlos has more than 20 years of experience in leading tech teams in the payment industry and he is currently the VP of Engineering at Paymend.
Connect with Carlos Baldacin on:
Carlos Baldacin
Carlos Baldacin
VP of Engineering
Carlos has more than 20 years of experience in leading tech teams in the payment industry and he is currently the VP of Engineering at Paymend.
Get in touch
with one of our payment experts
Unlock your potential monthly recovery. We’ve already recovered $100M+ in failed payments.
100% risk-free: you only pay for recovered revenue, no costs upfront
Boost approval rates
Recover lost sales
Reduce involuntary churn
Trusted by hundreds of brands:

Ready to recover revenue?

Let us show you how much lost revenue we can recover. Paymend powers smarter payments and recovers failed transactions.

rocket icon
Boost approval rates
cart icon
Recover lost sales
objective icon
No impact on customer journey