Highlights from Monday Night Live: Embracing the ‘How’ of AWS Innovations
by Jon (JR) Price, Sr. Manager, Rackspace Elastic Engineering, Rackspace Technology
When Peter DeSantis, SVP of Utility Computing at AWS, took the stage at Monday Night Live during AWS re:Invent 2024, the room buzzed with anticipation. Known for delivering thought-provoking insights, DeSantis’s keynotes are always a highlight of the event, and this year was no exception. His address provided a detailed look at the innovations driving AWS’s progress and their focus on understanding the ‘how’ behind their technology. DeSantis compared this philosophy to the deep taproot of a tree, drawing water from hidden underground sources — a metaphor for AWS leaders immersing themselves in technical details. This focus enables them to make informed decisions, anticipate customer needs and address challenges proactively.
Another key theme DeSantis explored was the collaborative culture within AWS. He described how teams work together across all layers of the technology stack — from data center power and networking to custom chips and software — much like the interconnected root systems of the Amazon rainforest. This synergy enables AWS to develop integrated solutions that address a wide range of customer challenges while supporting continuous innovation. By fostering such cross-functional collaboration, AWS is able to refine its offerings and adapt to the changing needs of its customers. One example of this collaborative innovation is AWS’s work on custom silicon, showcased through the evolution of their Graviton processors.
The Graviton custom silicon journey
During the keynote, DeSantis highlighted the evolution of AWS’s custom silicon development, which has been central to their strategy for optimizing cloud performance:
- Graviton (2018): Introduced to promote industry collaboration around ARM in data centers, providing developers with tangible hardware to test
- Graviton2: AWS’s first purpose-built processor, designed for scaled-out workloads such as web servers, microservices and caching fleets
- Graviton3: Delivered significant performance improvements, targeting specialized workloads requiring high compute power, including machine learning inference, scientific modeling and video transcoding
- Graviton4: AWS’s most advanced chip to date, featuring multi-socket support and triple the original vCPU count, making it suitable for demanding enterprise workloads such as large databases and complex analytics
Rather than focusing on synthetic benchmarks, AWS evaluates real-world performance to better align their processors with customer needs. For example, while Graviton3 demonstrated a 30% improvement over Graviton2 in traditional benchmarks, real-world applications like NGINX saw up to a 60% performance increase.
This emphasis on practical performance has contributed to the growing adoption of Graviton processors, which now account for more than 50% of all new CPU capacity in AWS data centers. By optimizing their silicon for real-world workloads, AWS has built a compelling case for customers seeking reliable and scalable cloud infrastructure.
Revolutionizing security with the AWS Nitro System
Security remains a top priority in the cloud, and the AWS Nitro System represents a significant evolution in how infrastructure can be built and secured. Nitro’s hardware-based security begins at manufacturing, where it provides cryptographic proof — known as attestation — to verify what is running on each system. This creates an unbroken chain of custody and verification, ensuring the integrity of components from the moment they are manufactured to when they are in operation.
With Graviton4, AWS extended attestation to the processor itself, establishing an interconnected framework of trust between critical system components. Connections such as CPU-to-CPU communication and PCIe traffic are protected through hardware-based security anchored in the manufacturing process. This design addresses challenges inherent to traditional servers and data centers, enabling enhanced protection and operational confidence.
Introducing disaggregated storage with Nitro
AWS identified a growing challenge with traditional storage servers: As hard drive capacities increased, these systems struggled to keep pace due to fixed compute-to-storage ratios and tightly coupled components. Scaling up storage servers by simply adding more capacity became inefficient and operationally complex. Recognizing these limitations, AWS took a different approach by breaking storage systems down into more manageable and scalable components.
AWS’s disaggregated storage leverages Nitro technology by embedding Nitro cards directly into JBOD (Just a Bunch of Disks) enclosures. This design gives each drive its own intelligence and network connectivity, eliminating the constraints of traditional fixed ratios. Independent scaling becomes possible, enabling flexible resource allocation based on actual needs. Failures are isolated to individual components, reducing their overall impact and accelerating recovery times. Maintenance is simplified and capacity planning becomes more manageable. As hard drive capacities continue to expand, this disaggregated approach ensures storage solutions can scale effectively into the future.
Advancing AI infrastructure with Tranium2
AI workloads, particularly in model training and inference, present unique challenges. These tasks often require a scale-up approach rather than scale-out due to limitations such as global batch size in data parallelism.
To meet these challenges, AWS developed Tranium2, a next-generation AI training chip that incorporates advanced features to optimize performance for demanding workloads:
- Systolic array architecture: Unlike traditional CPUs and GPUs, Tranium2 uses a systolic array designed specifically for AI workloads, optimizing memory bandwidth and computational efficiency
- Advanced packaging techniques: High-bandwidth memory (HBM) modules are integrated using interposers, enabling efficient use of space and maximizing chip performance within manufacturing constraints.
- Innovations in power delivery: By positioning voltage regulators closer to the chip, AWS reduced voltage drop issues, improving performance and chip longevity.
- Automated manufacturing: The Tranium2 chip is optimized for rapid scaling and deployment, ensuring customers can access the technology quickly and seamlessly.
The Tranium2 server is engineered to handle demanding workloads, offering 20 petaflops of compute capacity and 1.5 terabytes of high-speed HBM memory. AWS’s proprietary interconnect technology, NeuronLink, enables multiple Tranium2 servers to function as a single logical unit. These ‘ultra servers’ are essential for training next-generation AI models with trillions of parameters, pushing the boundaries of what’s possible in AI infrastructure.
Enhancing AI inference with Amazon Bedrock
Recognizing the critical role of both training and inference in AI workloads, AWS introduced latency-optimized options for Amazon Bedrock. These enhancements provide customers with access to the latest AI hardware and software optimizations to enable faster and more efficient inference times.
Through partnerships with leading AI models such as Meta’s Llama 2 and Anthropic’s Claude 3.5, AWS continues to enhance performance for diverse AI use cases. For example, latency-optimized versions of Llama 2 70B and 34B now deliver some of the fastest inference speeds available on AWS. Similarly, a latency-optimized version of Claude 3.5, developed in collaboration with Anthropic, achieves a 60% improvement in speed over the standard model.
Collaborating with Anthropic: Project Rainier
Tom Brown, co-founder and Chief Compute Officer at Anthropic, provided insights into Project Rainier — a high-performance AI cluster powered by hundreds of thousands of Tranium2 chips. This cluster delivers over five times the compute power of previous generations, enabling faster development of the next generation of Claude, Anthropic’s AI assistant.
With this enhanced infrastructure, customers will gain access to smarter AI agents that operate at lower costs and faster speeds, enabling them to tackle larger and more complex projects. This collaboration exemplifies how AWS is partnering with industry leaders to push the boundaries of AI infrastructure.
Scaling AI clusters with elastic AI-optimized networking
AWS showcased its latest generation AI network fabric, the 10P10U Network, designed to deliver massive capacity and ultra-low latency. This advanced fabric provides tens of petabits of network capacity to thousands of servers, achieving latency under 10 microseconds. It can scale flexibly, from a few racks to clusters spanning multiple data center campuses.
To simplify deployment, AWS introduced proprietary trunk connectors, which reduce installation time by 54% and virtually eliminate connection errors. Another key innovation is Scalable Intent-Driven Routing (SIDR), a new network routing protocol that combines central planning with decentralized execution. This approach enables quick, autonomous responses to failures, improving both reliability and performance.
Driving innovation across the entire stack
I left Monday Night Live with a sense of how AWS’s focused approach to innovation is driving the future of cloud computing and AI infrastructure. As DeSantis emphasized, their unique culture and horizontal integration enable advancements across data centers, networking, custom silicon, and software—meeting the demands of modern workloads and shaping what’s next.
Discover how Rackspace Technology can help you harness these innovations to transform your cloud strategy. Visit our AWS Marketplace profile to explore our solutions, and together, we can shape the future of cloud computing and AI.
Recent Posts
Three Benefits of Embracing a FinOps Approach to Cloud Management
December 16th, 2024
How Are You Planning to Strengthen Cybersecurity in 2025?
December 11th, 2024
Data sovereignty: Keeping your bytes in the right place
December 6th, 2024