DriveNets Dives Into AI Networking
It’s time to pile into AI! Cloud networking pioneer DriveNets jumped on the AI wave this week by launching an AI-specific networking product that it says improves AI performance by reducing idle time by as much as 30%.
You might as well call it AI week, because DriveNets' announcement comes in the same week that NVIDIA stock surged after a number of AI-related announcements. The DriveNets announcement could mark the beginning of a new networking arms race, as network providers race to provide better networks to fuel the AI boom.
Called Network Cloud-AI, the product will address what DriveNets says is a $10 billion market opportunity for networking to support AI clusters. DriveNets says that Network Cloud-AI, based on Ethernet, has been validated by leading hyperscalers in recent trials as a cost-effective solution.
“AI compute resources are extremely costly and must be fully utilized to avoid ‘idle cycles’ as they await networking tasks,” said Ido Susan, DriveNets co-founder and CEO, in a company statement. “Leveraging our experience supporting the world’s largest networks, we have developed DriveNets Network Cloud-AI. Network Cloud-AI has already achieved up to a 30% reduction in idle time in recent trials, enabling exponentially higher AI throughput compared to a standard Ethernet solution. This reduction also means the network effectively ‘pays for itself’ through more efficient use of AI resources."
Why AI Will Tax the Network
DriveNets maintains that the traditional Ethernet leaf-and-spine architecture doesn’t support high-performance AI workloads at scale. It also says that some industry solutions, such as NVIDIA's InfiniBand, are too proprietary and don’t promote interoperability.
Indeed, networking experts have been pointing out that AI applications require a different network architecture because they typically involve rapid-fire exchange of data inside of a datacenter – known as East-West traffic. This traffic often requires very high-bandwidth connectivity over short distances.
Meanwhile, the major cloud providers have announced billions of dollars of investment in new infrastructure to provide the backbone for AI services. Generative AI applications, which allow humans to use natural language to interact with computers to generate items such as text, art, and images, are taking off, but they require a massive amount of resources. Because AI is only as good as the set of data and compute power that fuels it, it will need a lot of data and compute power.
These AI applications put massive requirements on the infrastructure to run training algorithms and deep-learning software. This will require massive new amounts of specialized chips such as graphics processing units (GPUs), as well as better networks with high-bandwidth connections to tie all of the GPUs together. It will also come with large energy demands, so reducing energy consumption in the infrastructure will be key.
What DriveNets Is Delivering
DriveNets has been a big promoter of the idea that cloud-based, open networking solutions deliver lower cost and better energy conservation. It has been providing a cloud-based networking solution to major service providers such as AT&T.
The DriveNets platform is also a disaggregated solution, which means it runs as software on any industry-standard server, rather than requiring customers to use software and hardware from the same vendor. DriveNets says this is the key to building affordable, massively scalable networking to support AI clusters by scaling services with open hardware.
The Network Cloud-AI will use Open Compute Project’s (OCP) Distributed Disaggregated Chassis (DDC) architecture to distribute a leaf-and-spine network across a cluster of white-box servers. DriveNets says this approach will result in massive scale for connecting large arrays of GPUs in a compute cluster, all based on Ethernet – the networking standard.
Here's more of the data provided by DriveNets on its Network Cloud-AI:
Scale – Network Cloud-AI connects up to 32,000 GPUs at speeds ranging from 100G to 800G to a single AI cluster with load balancing.
Utilization – Network Cloud-AI equally distributes traffic across the AI network fabric, ensuring maximum network utilization and zero packet loss under the highest loads.
Efficiency – DriveNets says that trials by leading hyperscalers using Network Cloud-AI over white boxes with Broadcom’s Jericho chipset achieved up to 30% improvement in JCT (Job Completion Time) compared to other Ethernet solutions.
Congestion and failover – DriveNets Network Cloud-AI supports congestion-free operations through end-to-end traffic scheduling, avoiding flow collisions and jitter. It also provides failover with sub-10ms automatic path convergence.
Altogether, it's a pretty significant launch that does raise the question of how the networking industry will respond to the AI boom.
DriveNets was selected by the Futuriom analyst team for the prestigious F50 list of the most promising startups. The company has raised more than $587 million in three funding rounds.