Nccl

Optimized primitives for collective multi-GPU communication