Arista Load Balances AI, Gets Oracle Onboard

Arista Networks has introduced load balancing for AI workloads, along with advanced observability that encompasses third-party networking equipment. Dubbed EOS Smart AI Suite Innovations, these new functions open the way to advanced Ethernet cluster management for AI networks.
Let’s start at the top. Arista describes its patent-pending Cluster Load Balancing (CLB) as “a new Ethernet-based, RDMA-Aware, AI load balancing solution that enables high bandwidth utilization between spines and leaves.” Designed to work with Arista switches based on the vendor’s Extensible Operating System (EOS) software, which delivers cloud-enabled management across Arista’s product portfolio, CLB is aimed at maximizing the efficiency of AI workloads.
In a twist that's demonstrative of Arista's strategic approach to AI networking, CLB can work with or without third-party network interface cards (NICs). Arista calls this a "NIC-agnostic" approach, which is an important differentiator from NVIDIA, whose networking offerings are heavily geared to the use of its BlueField NICs. The use of NICs has grown in AI networks, because they can offer additional processing power to offload networking functions including encryption. But they also add cost to the network, and Arista is saying—we can work with NICs or not, but we think we can also load balance without them, if you want to.
“AI clusters have highly bursty traffic with small flows and high bandwidth requirements,” said Praful Bhaidasna, Head of Products (Observability) at Arista. “AI traffic is highly latency sensitive, and workflows must be coordinated.” Without this coordination, costly GPUs can be stranded, unused, while data flows catch up. Further, disruption of traffic necessitates rerunning an entire inference workflow, racking up operating costs and interrupting progress.
Solving a Major Snarl
Traditional load balancing protocols don’t work with AI, Bhaidasna says, making performance inconsistent and adding to tail latency, or the latency of the slowest packets in a data flow. The reason for this is that legacy switching doesn’t support the bidirectional flow of data traffic across all paths in a spine-leaf switching configuration. Arista says it’s solved this issue by creating “intelligent, RDMA-aware flow placement in both directions to optimize AI transport.”
Arista also uses AI to optimize AI workflows. In addition to intelligent end-to-end routing, the software ensures that traffic is evenly distributed across spine switches to eliminate bottlenecks, and traffic is regulated for uniformity across flows.
In an unusual move, Oracle has publicly endorsed Arista’s CLB. “As Oracle continues to grow our AI infrastructure, based on Arista EOS-based switches for backend training, we see a need for advanced load balancing techniques to help avoid flow contentions and increase throughput in ML networks,” said Jag Brar, OCI Distinguished Engineer. “Arista’s Cluster Load Balancing feature helps do that.”
Observability Enhanced with Third-Party Views
Arista is also announcing AI enhancements to its CloudVision Universal Network Observability (CV UNO), a software-as-a-service offering for its switching portfolio.
The SaaS now is capable of viewing third-party networks, including applications. It provides application dependency mapping, discovers physical and virtual hosts, identifies issues, and correlates events. Through CV UNO’s network data lake architecture, it centralizes network flows for visibility.
Notably, while CV UNO can solve problems related to Arista’s networking environment, it can’t change third-party devices.
NIC- and GPU-Agnostic
As noted, by supporting Arista’s Etherlink portfolio of switches, CLB and CV UNO are agnostic to network interface cards (NICs) and GPUs from other providers. In addition, by supporting Ethernet, Arista opens the way to an environment that is increasingly taking shape across enterprises—namely, one can build an AI networking infrastructure that is compatible with traditional Ethernet. By avoiding vendor lock-in, customers save costs and increase their networking options.
This is an important strategic move by Arista, to stay pure to Ethernet yet offer its own "secret sauce" with software at the same time. With Cisco's recent partnership in lining up behind NVIDIA's Spectrum-X, the battle lines are being drawn.
CLB is initially available on Arista’s 7260X3, 7280R3, 7500R3 and 7800R3 switches, which also support CV UNO today. In the second quarter this year, CLB will be available for Arista’s 7060X6 and 7060X5 switches, and CV UNO’s AI enhancements, including third-party observability, will be available as well. CLB support on the Arista 7800R4 platform is scheduled for the second half of this year.
Futuriom Take: Arista’s Cluster Load Balancing for its switches shows how quickly the company is innovating to adapt industry-standard Ethernet for AI workloads. By adding AI-driven observability of third-party gear to its net management SaaS, the vendor is looking to extend the value of its management platform.