Cell-Based Architecture on AWS

by Nilojan Tharmarajah, Team Lead Senior Solutions Architect, Rackspace Technology

Introduction

Monolithic applications are very common, and there is no shame in running one in your workloads. These systems often evolve from many iterations of feature enhancements and fixes, gradually snowballing into what we now recognize as a monolith. This growth is also an indirect sign of your business's ability to expand and meet increasing demands. However, as your user base grows, so do the challenges associated with refactoring and re-engineering your application and workflows, ultimately leading to bottlenecks and potential single points of failure.

This blog introduces cell-based architecture, a name I came across when looking for a viable solution for starting a decoupling journey for several monolithic applications in AWS. The architectural patterns are not a new concept, however, it’s an intent to introduce the concept of bulk-head architecture into your solution.

If you are not familiar with AWS’ Well-Architected Framework, I suggest you read their whitepaper that explains the concept of bulkhead architecture.

Introducing cells

Imagine your application divided into independent, self-contained units called cells. Each cell is a complete instance of your application logic, with its own database replica or storage resources. We need to be careful to not get confused with a high availability solution across multiple Availability Zones or regions as this gives us physical isolation at the infrastructure layer. We’re concerned with the application isolation based on business needs and CI/CD workflows. Furthermore, if you strip down this concept, it’s very much like containerization architecture but on a different scale.

In this diagram, we can consider each cell to represent an application isolated in its own AWS account. I’ll later explain the cell router depicted as the ‘thinnest possible layer’.

Cell based architecture pic 1

Here is what a cell-based architecture will look like for a simple two-tiered application.

cell based architecture pic 2

To breakdown the cell architecture, each cell is an AWS account. If we zoom in on one of the cells, you can see the compute and storage resources set up across multiple AZs for high availability.

cell based architecture pic 3

Benefits of cell-based architecture

Enhanced manageability

  • Reduced blast radius: Issues are contained within individual cells, minimizing the impact on the overall application. This allows you to isolate problems and fix them quickly without affecting a large user base. This is the bulkhead concept mentioned above.
  • Safe deployments: New features or version upgrades can be deployed to a single cell for testing before rolling them out to all cells. This minimizes risk and allows for easier rollbacks if necessary.

Improved fault tolerance

  • High mean time between failures (MTBF): By capping the size of each cell, i.e., limiting the number of users or workload per cell, you can potentially predict and address failures more easily. This leads to a higher MTBF, meaning your application experiences failures less frequently.
  • Lower mean time to recovery (MTTR): The capped size of cells also simplifies troubleshooting. With a smaller pool of resources to diagnose, you can resolve issues faster, resulting in a lower MTTR.

Easy scalability:

  • Horizontal scaling: Unlike scaling up a monolithic application, cell-based architecture allows you to scale out by adding new cells to handle increased traffic. This provides a more efficient and manageable approach to handling growth.
  • Predictable service quota: AWS service quotas define resource usage limits for your account. This can sometimes be a blocker when overseen. By using a cell-based architecture, we can ensure these quotas don't hinder application functionality as you scale out.
  • Improved testability: Canary testing, where you deploy a new version to a small subset of users, becomes easy with cell-based architecture. You can test new features or updates in a single cell without impacting the entire user base.

Building your cell-based architecture on AWS

Here I’ll discuss the key components that AWS provides to set up a cell-based architecture:

  • Route 53: This service distributes traffic across multiple supercells (groups of cells within a region) using weighted routing or advanced health checks for zonal control.
  • Cell routing layer: A microservice or containerized application that receives requests, consults a routing table like DynamoDB to map users to specific cells, and forwards them to the appropriate cell's load balancer. In the first diagram above, it's depicted as the ‘thinnest possible layer’ and that’s exactly what it should be. That should be the only task for this layer.
  • Supercell (Region): Each AWS region represents a supercell and can contain multiple cells. It includes an Elastic Load Balancer (ELB) for traffic distribution, an Auto Scaling group for automatic scaling, a CloudWatch agent for monitoring, and optional S3 buckets for cell-specific data or logs.
  • Cell: This is the core unit, it can contain EC2 instances, Lambda functions, ECS, EKS, etc. running your application code, its own database replica or storage, and cell-specific configuration files.

Centralized management:

Utilizing AWS Organizations and AWS Control Tower, standing up AWS Landing Zones will be automated while adhering to your governance requirements as well as adopting AWS’ best practices in the cloud.

Key considerations for cell-based architecture:

  • Cell placement: Decide how customers and workloads are mapped to cells. This involves partitioning data and traffic based on your specific application (B2B vs. B2C) and business context.
  • For example, do you want cells to be closer to your end user’s geo-location or to customer-based on their user profile.

cell based architecture pic 4
  • Routing: Implement a robust routing mechanism like Route 53 or API Gateway to distribute traffic to the right cells. Ensure high availability by avoiding single points of failure. Consider using multiple VPCs with VPC peering for cell communication. We can incorporate Route 53 Application Recovery Controller for advanced routing scenarios.
  • Cell sizing: Determine the optimal size for your cells to ensure efficient scaling based on your predetermined scale-out plan.

As mentioned earlier, Cell-based architecture is not a new concept. It’s a deliberate pattern to consider when working out a solution to use when decoupling a monolithic application and when designing a new solution. Leveraging some of AWS’ global services such as Route 53 and CloudFront in a cell-based architecture can add new dynamics on how you can manage your CI/CD pipeline with tailoring end-user experiences based on your business needs.

As we navigate the complexities of modern software development, the transition from monolithic to cell-based architecture is not just an upgrade — it's a strategic move toward agility, resilience, and sustained innovation. If you're ready to enhance your system's manageability, fault tolerance and scalability with AWS, let's take the first step together.

Learn more about Rackspace Elastic Engineering.