Joerg Hiller
March 14, 2025 02:22
NVIDIA’s latest NCCL 2.24 version 2.24 introduces new features to improve multi-GPU and Multinode communication, including the RAS subsystem, Nic fusion and FP8 support, optimizing the in-depth learning formation.
The NVIDIA Collective Communications Library (NCCL) has introduced its latest version, 2.24, providing significant progress in the reliability of networking and observability for multi-GPU and multinod communication (MGMN). As indicated by NVIDIA Developer BlogThis version is specifically optimized for NVIDIA GPUs and networking, making it an essential component for multi-GPU depth training.
NCCL 2.24 new features
The update includes several new features aimed at improving performance and reliability:
- Subsystem of reliability, availability and service (RAS)
- User stamp (UB) for multinoded collectives
- Nic Fusion
- Optional receive
- FP8 support
- Strict application of
NCCL_ALGO
AndNCCL_PROTO
The SubbyStme RAS
The RAS subsystem is one of the out-of-competition additions in NCCL 2.24. It is designed to help users diagnose application problems such as accidents and pendants, especially in large -scale deployments. This low -cost infrastructure offers a global view of the execution of applications, allowing the detection of anomalies such as non -reactive nodes or delay processes. It works by creating a network of threads through NCCL processes that monitor the health of the other thanks to regular storage messages.
Improvements to recording the user buffer
NCCL 2.24 introduces the recording of the user stamp (UB) for multinoded collectives, allowing more efficient data transfer and reduced GPU resources consumption. The library now supports UB recording for several collective networking rows and standard homologous networks, offering significant performance gains, in particular for operations like Allgather and Broadcast.
Nic Fusion
With the expansion of systems in several nons, the NCCL has adapted to optimize network communication. The new Nic Fusion feature allows the logical fusion of several NICs in a single entity, guaranteeing effective use of network resources. This capacity is particularly beneficial for systems with more than one NIC per GPU, solving problems such as accidents and an ineffective resource allowance.
Additional features and fixes
The update also introduces optional reception supplements for LL and L128 protocols, allowing a reduction in general costs and congestion. NCCL 2.24 supports native FP8 reductions on the NVIDIA hopper and more recent architectures, improving treatment capacities. In addition, the stricter application of NCCL_ALGO
And NCCL_PROTO
is implemented, guaranteeing more precise adjustment and error management for users.
This update also includes various bug corrections and minor improvements, such as adjustments to the PAT adjustment and improvements in memory allocation functions, improving the overall robustness and efficiency of the NCCL library.
Image source: Shutterstock
(Tagstotranslate) ai
👑 #MR_HEKA 👑