Scaling from 20 TPS to 200 TPS

Introduction

Modern financial systems must process thousands of transactions with minimal latency while maintaining strict guarantees around security, consistency, and reliability. However, many systems initially built for correctness and feature delivery often struggle when transaction volume increases.

This article describes a real-world engineering journey of scaling a transaction processing system from 20 Transactions Per Second (TPS) to 200 TPS, a 10× performance improvement, through architectural redesign, system optimization, and operational improvements.

Rather than relying solely on additional hardware, the solution involved addressing architectural bottlenecks, optimizing request processing pipelines, and improving system observability.

The Initial System Architecture

The system was originally designed as a synchronous microservice architecture responsible for routing and processing financial transactions between external vendors and client systems.

A typical transaction flow included:

Incoming client request
Authentication and validation
Request transformation
Routing to external vendor APIs
Response processing
Logging and persistence

While the architecture was modular and easy to maintain, several characteristics limited its throughput:

Blocking request handling
Excessive database calls
Heavy request transformation layers
Synchronous external service dependencies

Under load testing, the system began to degrade significantly beyond 20 TPS, with increasing latency and thread saturation.

Performance Bottleneck Discovery

Improving throughput required first understanding where time and resources were being consumed.

Several techniques were used to identify bottlenecks:

Load testing to simulate production traffic
Thread analysis to observe blocking operations
Latency profiling across services
Database query monitoring

Key issues discovered included:

Thread Pool Saturation

The traditional thread-per-request model caused threads to remain blocked while waiting for I/O operations such as database queries or external API responses.

Database Dependency

Each request triggered multiple database queries for configuration, authentication, and routing logic. Under heavy load, database connection pools became saturated.

Excessive Payload Processing

Multiple layers of request transformation increased CPU overhead and added unnecessary latency to each transaction.

Network Latency Accumulation

The system relied heavily on synchronous calls to external services, causing the critical path of request processing to become longer.

Architectural Improvements

To address these issues, several architectural optimizations were implemented.

Moving Toward Non-Blocking Processing

The system transitioned from blocking request handling to non-blocking processing models.

Using event-driven frameworks allowed the system to handle significantly more concurrent requests while using fewer threads.

Benefits included:

Higher concurrency
Reduced thread utilization
Improved CPU efficiency

This change alone significantly increased the system's capacity to process transactions.

Reducing Database Interactions

Since database operations were among the slowest parts of the request flow, several optimizations were introduced:

Caching frequently accessed configuration data
Eliminating redundant queries
Reducing transactional writes in the critical request path

Moving configuration lookups into memory reduced the number of database round trips per request.

Optimizing Request Routing

The request router was redesigned to become a lightweight routing layer capable of handling high concurrency.

Key improvements included:

Minimal validation logic in the critical path
Direct routing rules
Efficient payload transformation

This reduced CPU overhead and improved request processing speed.

Asynchronous Processing for Non-Critical Operations

Operations such as logging, analytics, and monitoring were moved to asynchronous pipelines.

This ensured that the main transaction flow remained focused on the core processing logic, improving both latency and throughput.

Infrastructure Improvements

Beyond application-level changes, infrastructure tuning played a key role in improving performance.

Horizontal Scalability

The system was redesigned to be stateless, enabling multiple instances to process transactions concurrently.

Load balancers distributed traffic across service instances, allowing throughput to increase linearly with additional nodes.

Runtime Tuning

Containerized deployments enabled tuning of runtime parameters such as:

CPU limits
Memory allocation
Thread pool sizes
Connection pools

These improvements ensured efficient resource usage under heavy load.

Observability and Continuous Optimization

Improving performance required strong visibility into system behavior.

Several monitoring and observability tools were introduced:

Request tracing
Real-time performance metrics
Latency monitoring
Error rate tracking

These insights enabled engineers to quickly detect regressions and continuously improve system performance.

Results

After implementing these architectural and operational improvements, the system achieved:

Metric	Before Optimization	After Optimization
Throughput	~20 TPS	~200 TPS
Average Latency	High under load	Stable
Thread Utilization	High	Efficient
Scalability	Limited	Horizontally scalable

The result was a 10× increase in transaction throughput while maintaining reliability and system stability.

Key Lessons Learned

Architectural decisions significantly impact scalability.
Database interactions must be minimized in high-throughput systems.
Non-blocking architectures greatly improve concurrency handling.
Observability is essential for identifying performance bottlenecks.
Incremental improvements with continuous testing lead to sustainable performance gains.

Conclusion

Scaling transaction processing systems requires more than simply adding infrastructure. It demands careful examination of architectural patterns, system bottlenecks, and operational practices.

By transitioning to non-blocking architectures, reducing database dependencies, optimizing request routing, and improving observability, it is possible to significantly increase throughput while maintaining system stability.

As financial systems continue to scale globally, designing for high throughput, resilience, and efficiency will remain a fundamental challenge for modern software engineers.

Scaling from 20 TPS to 200 TPS: A Journey in Optimizing Transaction Systems