UCC(Unified Collective Communication)是UCF(Unified Communication Framework)中一个集合通信库,提供了丰富的功能与API,这篇文章是其中集合通信算法选择的部分,理解还不是很透彻,先占一个坑,以后比较全面的了解UCC后再用几篇博客详细介绍一下。
[rdma22@admin1 ~]$ ucc_info -A cl/hier algorithms: Allreduce 0 : rab : intra-node reduce, followed by inter-node allreduce, followed by innode broadcast 1 : split_rail : intra-node reduce_scatter, followed by PPN concurrent inter-node allreduces, followed by intra-node allgather Alltoall 0 : node_split : splitting alltoall into two concurrent a2av calls withing the node and outside of it Alltoallv 0 : node_split : splitting alltoallv into two concurrent a2av calls withing the node and outside of it
tl/ucp algorithms: Allgather 0 : ring : O(N) Ring Allgatherv 0 : ring : O(N) Ring Allreduce 0 : knomial : recursive knomial with arbitrary radix (optimized for latency) 1 : sra_knomial : recursive knomial scatter-reduce followed by knomial allgather (optimized for BW) Alltoall 0 : pairwise : pairwise two-sided implementation 1 : onesided : naive, linear one-sided implementation Alltoallv 0 : pairwise : O(N) pairwise exchange with adjustable number of outstanding sends/recvs Barrier 0 : knomial : recursive knomial with arbitrary radix Bcast 0 : knomial : bcast over knomial tree with arbitrary radix (optimized for latency) 1 : sag_knomial : recursive knomial scatter followed by knomial allgather (optimized for BW) Fanin 0 : knomial : fanin over knomial tree with arbitrary radix Fanout 0 : knomial : fanout over knomial tree with arbitrary radix Gather 0 : knomial : gather over knomial tree with arbitrary radix (optimized for latency) Reduce 0 : knomial : reduce over knomial tree with arbitrary radix (optimized for latency) Reduce_scatter 0 : ring : O(N) ring Reduce_scatterv 0 : ring : O(N) ring