← Back to articles
News· 3 min read

What is RoCE (RDMA over Ethernet), and how Ubuntu supports it

Escritorio de Ubuntu 24.10 (Oracular Oriole) con GNOME
Imagen: Canonical Ltd. / GPL · Wikimedia Commons

Canonical has published a technical primer on RDMA over Converged Ethernet (RoCE), the technology that brings InfiniBand’s remote memory access model to ordinary Ethernet networks. This matters because AI training and HPC jobs shuttle enormous amounts of data between nodes, and every microsecond of latency counts when hundreds of GPUs are waiting on each other.

What RoCE solves

RDMA lets one machine read from or write to another machine’s memory directly, skipping the kernel and intermediate copies. Doing that traditionally meant InfiniBand, a dedicated and expensive fabric. RoCE keeps the same programming interface (the RDMA verbs) but swaps the transport for standard Ethernet, the network you already run in the data center.

There are two variants, and the difference is practical. RoCEv1 stays inside a single Layer 2 broadcast domain, so its reach is limited. RoCEv2 encapsulates RDMA traffic in UDP/IP, which makes it routable at Layer 3 and a fit for leaf-spine topologies. That is why production deployments standardize on RoCEv2: it scales across racks without falling apart.

Ethernet’s problem

Ethernet delivers packets on a best-effort basis, meaning it can drop them under congestion. RDMA expects a lossless network and dislikes that intensely. Canonical walks through two mechanisms that close the gap:

  • Priority Flow Control (PFC) pauses traffic before a switch queue overflows. The cost is head-of-line blocking, where unrelated traffic gets stuck behind whatever triggered the pause.
  • Explicit Congestion Notification (ECN) marks packets before queues fill up, so endpoints throttle their send rate gradually.

Data Center Bridging (DCB) and DCQCN build on these signals to regulate transmission rate dynamically.

RoCE traffic is highly synchronized and bursty, especially during AI training. When several senders hit the same receiver at once you get incast: switch buffers fill rapidly, packets drop, and that stalls queue pairs, sending latency ripples through the whole workload.

What’s coming next

The primer surveys what is moving beyond classic RoCE. NVIDIA Spectrum-X combines Spectrum switches, BlueField DPUs, adaptive routing, and telemetry into one RoCE-optimized Ethernet platform. The Ultra Ethernet Consortium, formed in 2023 under the Linux Foundation, focuses on stronger congestion signaling, multipathing, packet spraying, and tolerance for out-of-order delivery. Google’s Falcon adds hardware-assisted retransmission and programmable congestion control. Broadcom’s Scale-Up Ethernet (SUE) uses credit-based flow control and in-network collectives for tightly coupled accelerators.

What Ubuntu brings

Here is the part that matters if you deploy this. Ubuntu ships the rdma-core stack with libibverbs, which provides the consistent RDMA programming interface, plus kernel drivers from the major vendors: mlx5 (NVIDIA/Mellanox ConnectX), irdma (Intel E810), bnxt_re (Broadcom NetXtreme-E), and Intel’s ice/ixgbe. SR-IOV and VF representors are available natively.

For day-to-day operations you get ethtool and devlink-health to expose NIC capabilities, tc with mqprio to shape traffic classes and enable PFC, and cgroups with CPU pinning to keep data paths predictable. Counters under /sys/class/infiniband handle monitoring. On the orchestration side, MAAS provisions bare metal and Juju manages the lifecycle of NIC firmware and network configuration, with support for SR-IOV operators, Multus-based secondary networking, and the NVIDIA Network Operator.

All of this ships by default in the LTS releases, so you don’t need to patch or recompile anything to start testing RoCE. The Ubuntu page has the rest of the releases and the long-term support details.

Source

Original article What is RDMA over Converged Ethernet (RoCE)? published by Canonical on the Ubuntu blog.