使用说明

Ascend C提供一组Matmul高阶API，方便用户快速实现Matmul矩阵乘法的运算操作。

MatMul的计算公式为：C = A * B + Bias，其示意图如下。

A、B为源操作数，A为左矩阵，形状为[M, K]；B为右矩阵，形状为[K, N]。
C为目的操作数，存放矩阵乘结果的矩阵，形状为[M, N]。
Bias为矩阵乘偏置，形状为[1, N]。对A*B结果矩阵的每一行都采用该bias进行偏置。

图1 Matmul矩阵乘示意图

下文中提及的M轴方向，即为A矩阵纵向；K轴方向，即为A矩阵横向或B矩阵纵向；N轴方向，即为B矩阵横向。

实现MatMul矩阵乘运算的具体步骤如下：

创建Matmul对象。
初始化操作
设置左矩阵A、右矩阵B、Bias。
完成矩阵乘操作。
结束矩阵乘操作。

创建Matmul对象

创建Matmul对象的示例如下：

CUBE_ONLY（只有矩阵计算）场景下，需要设置ASCENDC_CUBE_ONLY代码宏
默认为MIX模式（包含矩阵计算和矢量计算），该场景下，不能设置ASCENDC_CUBE_ONLY代码宏

// 纯cube模式（只有矩阵计算）场景下，需要设置该代码宏，并且必须在#include "lib/matmul_intf.h"之前设置
// #define ASCENDC_CUBE_ONLY 
#include "lib/matmul_intf.h"

typedef MatmulType<TPosition::GM, CubeFormat::ND, half> aType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, half> bType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, float> cType; 
typedef MatmulType<TPosition::GM, CubeFormat::ND, float> biasType; 
Matmul<aType, bType, cType, biasType> mm;

创建对象时需要传入A、B、C、Bias的参数类型信息，类型信息通过MatmulType来定义，包括：内存逻辑位置、数据格式、数据类型。

template <TPosition POSITION, CubeFormat FORMAT, typename TYPE, bool ISTRANS = false> struct MatmulType {
    constexpr static TPosition pos = POSITION;
    constexpr static CubeFormat format = FORMAT;
    using T = TYPE;
    constexpr static bool isTrans = ISTRANS;
};

表1 MatmulType参数说明
参数	说明
POSITION	内存逻辑位置针对Atlas A2训练系列产品： A矩阵可设置为TPosition::GM，TPosition::VECOUT，TPosition::TSCM B矩阵可设置为TPosition::GM，TPosition::VECOUT，TPosition::TSCM Bias可设置为TPosition::GM，TPosition::VECOUT , TPosition::TSCM C矩阵可设置为TPosition::GM，TPosition::VECIN 针对Atlas推理系列产品AI Core： A矩阵可设置为TPosition::GM，TPosition::VECOUT B矩阵可设置为TPosition::GM，TPosition::VECOUT Bias可设置为TPosition::GM，TPosition::VECOUT C矩阵可设置为TPosition::GM，TPosition::VECIN 针对Atlas 200/500 A2推理产品： A矩阵可设置为TPosition::GM，TPosition::VECOUT，TPosition::TSCM B矩阵可设置为TPosition::GM，TPosition::VECOUT，TPosition::TSCM C矩阵可设置为TPosition::GM，TPosition::VECIN，TPosition::TSCM Bias可设置为TPosition::GM，TPosition::VECOUT
CubeFormat	针对Atlas A2训练系列产品： A矩阵可设置为CubeFormat::ND，CubeFormat::NZ B矩阵可设置为CubeFormat::ND，CubeFormat::NZ Bias可设置为CubeFormat::ND C矩阵可设置为CubeFormat::ND，CubeFormat::NZ, CubeFormat::ND_ALIGN 针对Atlas推理系列产品AI Core： A矩阵可设置为CubeFormat::ND，CubeFormat::NZ B矩阵可设置为CubeFormat::ND，CubeFormat::NZ Bias可设置为CubeFormat::ND C矩阵可设置为CubeFormat::ND，CubeFormat::NZ, CubeFormat::ND_ALIGN 注意: 针对Atlas推理系列产品AI Core，C矩阵设置为TPosition::VECIN，CubeFormat::ND时，要求尾轴32字节对齐，比如数据类型是half的情况下，N要求是16的倍数针对Atlas 200/500 A2推理产品： A矩阵可设置为CubeFormat::ND，CubeFormat::NZ B矩阵可设置为CubeFormat::ND，CubeFormat::NZ C矩阵可设置为CubeFormat::ND，CubeFormat::NZ Bias可设置为CubeFormat::ND 注意: 针对Atlas 200/500 A2推理产品， C矩阵设置为TPosition::VECIN或者TPosition::TSCM，CubeFormat::ND时，要求尾轴32字节对齐，比如数据类型是half的情况下，N要求是16的倍数；C矩阵设置为TPosition::VECIN或者TPosition::TSCM，CubeFormat::NZ时，N要求是16的倍数
TYPE	针对Atlas A2训练系列产品： A矩阵可设置为half、float、bfloat16_t 、int8_t、int4b_t B矩阵可设置为half、float、bfloat16_t 、int8_t、int4b_t Bias可设置为half、float、int32_t C矩阵可设置为half、float、bfloat16_t、int32_t、int8_t 针对Atlas推理系列产品AI Core： A矩阵可设置为half、float、int8_t B矩阵可设置为half、float、int8_t Bias可设置为half、float、int32_t C矩阵可设置为half、float、int8_t、int32_t 针对Atlas 200/500 A2推理产品： A矩阵可设置为half、float、bfloat16_t 、int8_t B矩阵可设置为half、float、bfloat16_t 、int8_t Bias矩阵可设置为half、float、int32_t C矩阵可设置为half、float、bfloat16_t、int32_t 注意：A矩阵和B矩阵数据类型需要一致，具体数据类型组合关系请参考表2。
ISTRANS	表征是否使能转置 true为使能转置，Matmul会认为A矩阵形状为[K, M]，B矩阵形状为[N, K] false为不使能转置，Matmul会认为A矩阵形状为[M, K]，B矩阵形状为[K, N] 默认为false不使能转置
LAYOUT	表征数据的排布 NONE：默认值，表示不使用BatchMatmul；其他选项表示使用BatchMatmul。 NORMAL：BMNK的数据排布格式。 BSNGD：原始BSH shape做reshape后的数据排布，具体可参考IterateBatch中对该数据排布的介绍。 SBNGD：原始SBH shape做reshape后的数据排布，具体可参考IterateBatch中对该数据排布的介绍。 BNGS1S2：一般为前两种数据排布进行矩阵乘的输出，S1S2数据连续存放，一个S1S2为一个batch的计算数据。
IBSHARE	是否使能IBShare，不支持同时复用A矩阵和B矩阵的数据

表2 Matmul输入输出数据类型的组合说明
场景	A矩阵	B矩阵	Bias	C矩阵	支持平台
非量化	float	float	half/float	float	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 推理系列产品AI Core Atlas 200I/500 A2 推理产品
	half	half	half/float	half/float	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 推理系列产品AI Core Atlas 200I/500 A2 推理产品
	int8_t	int8_t	int32_t	int32_t	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 推理系列产品AI Core Atlas 200I/500 A2 推理产品
	int4b_t	int4b_t	int32_t	int32_t	Atlas A2 训练系列产品/Atlas 800I A2 推理产品
	bfloat16_t	bfloat16_t	half/float	bfloat16_t	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 200I/500 A2 推理产品
量化	half/bfloat16_t	half/bfloat16_t	float	int8_t	Atlas A2 训练系列产品/Atlas 800I A2 推理产品
	int8_t	int8_t	int32_t	half	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 推理系列产品AI Core Atlas 200I/500 A2 推理产品
	int8_t	int8_t	int32_t	int8_t	Atlas A2 训练系列产品/Atlas 800I A2 推理产品 Atlas 推理系列产品AI Core

初始化操作。

REGIST_MATMUL_OBJ(&pipe, GetSysWorkSpacePtr(), mm, &tiling);

设置左矩阵A、右矩阵B、Bias。

mm.SetTensorA(gm_a);    // 设置左矩阵A
mm.SetTensorB(gm_b);    // 设置右矩阵B
mm.SetBias(gm_bias);    // 设置Bias

完成矩阵乘操作。
- 调用Iterate完成单次迭代计算，叠加while循环完成单核全量数据的计算。Iterate方式，可以自行控制迭代次数，完成所需数据量的计算，方式比较灵活。
```
while (mm.Iterate()) {   
    mm.GetTensorC(gm_c); 
}
```
- 调用IterateAll完成单核上所有数据的计算。IterateAll方式，无需循环迭代，使用比较简单。
```
mm.IterateAll(gm_c);
```
结束矩阵乘操作。
```
mm.End();
```

父主题： Matmul