allreduce

功能说明

集合通信算子AllReduce的操作接口，将group内所有节点的输入数据进行reduce操作后，再把结果发送到所有节点的输出buf，其中reduce操作类型由reduction参数指定。

函数原型

def allreduce(tensor, reduction, fusion=1, fusion_id=-1, group = "hccl_world_group")

参数说明

参数名	输入/输出	描述
tensor	输入	TensorFlow的tensor类型。针对Atlas 训练系列产品，tensor支持的数据类型为：int8、int32、int64、float16、float32。针对Atlas 300I Duo 推理卡，支持的数据类型为：int8、int16、int32、float16、float32。针对Atlas A2 训练系列产品，tensor支持的数据类型为：int8、int16、int32、int64、float16、float32、bfp16。
reduction	输入	String类型。 reduce的op类型，可以为max、min、prod和sum。说明：针对Atlas 300I Duo 推理卡，当前版本“prod”、“max”、“min”操作不支持int16数据类型。针对Atlas A2 训练系列产品，当前版本“prod”操作不支持int16、bfp16数据类型。
fusion	输入	int类型。 allreduce算子融合标识。 0：不融合，该allreduce算子不和其他allreduce算子融合。 1：按照梯度切分策略进行融合，默认为1。 2：按照相同fusion_id进行融合。
fusion_id	输入	allreduce算子的融合id。对相同fusion_id的allreduce算子进行融合。
group	输入	String类型，最大长度为128字节，含结束符。 group名称，可以为用户自定义group或者"hccl_world_group"。

返回值

tensor：对输入tensor执行完allreduce操作之后的结果tensor。

约束说明

调用该接口的rank必须在当前接口入参group定义的范围内，不在此范围内的rank调用该接口会失败。
每个rank只能有一个输入。
allreduce上游节点暂不支持variable算子。
该接口要求输入tensor的数据量不超过8GB。
allreduce算子融合只支持reduction为sum类型的算子。

支持的型号

Atlas 训练系列产品

Atlas 300I Duo 推理卡

Atlas A2 训练系列产品

调用示例

      
           from npu_bridge.npu_init import *
tensor = tf.random_uniform((1, 3), minval=1, maxval=10, dtype=tf.float32)
result = hccl_ops.allreduce(tensor, "sum")

父主题： npu_bridge.hccl.hccl_ops