MulAddDst
功能说明
按元素将src0Local和src1Local相乘并和dstLocal相加,将最终结果存放进dstLocal中。计算公式如下,其中PAR表示矢量计算单元一个迭代能够处理的元素个数:
函数原型
- tensor前n个数据计算
1 2
template <typename T, typename U> __aicore__ inline void MulAddDst(const LocalTensor<T>& dstLocal, const LocalTensor<U>& src0Local, const LocalTensor<U>& src1Local, const int32_t& calCount)
- tensor高维切分计算
- mask逐bit模式
1 2
template <typename T, typename U, bool isSetMask = true> __aicore__ inline void MulAddDst(const LocalTensor<T>& dstLocal, const LocalTensor<U>& src0Local, const LocalTensor<U>& src1Local, const uint64_t mask[], const uint8_t repeatTimes, const BinaryRepeatParams& repeatParams)
- mask连续模式
1 2
template <typename T, typename U, bool isSetMask = true> __aicore__ inline void MulAddDst(const LocalTensor<T>& dstLocal, const LocalTensor<U>& src0Local, const LocalTensor<U>& src1Local, uint64_t mask, const uint8_t repeatTimes, const BinaryRepeatParams& repeatParams)
- mask逐bit模式
参数说明
参数名 |
描述 |
---|---|
T |
目的操作数数据类型。 |
U |
源操作数数据类型。 |
isSetMask |
是否在接口内部设置mask。
|
参数名 |
输入/输出 |
描述 |
---|---|---|
dstLocal |
输出 |
目的操作数。 类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 LocalTensor的起始地址需要32字节对齐。
|
src0Local、src1Local |
输入 |
源操作数。源操作数数据类型和目的操作数数据类型可以不一致。 类型为LocalTensor,支持的TPosition为VECIN/VECCALC/VECOUT。 LocalTensor的起始地址需要32字节对齐。
|
calCount |
输入 |
输入数据元素个数。 |
mask |
输入 |
|
repeatTimes |
输入 |
重复迭代次数。矢量计算单元,每次读取连续的256 Bytes数据进行计算,为完成对输入数据的处理,必须通过多次迭代(repeat)才能完成所有数据的读取与计算。repeatTimes表示迭代的次数。 |
repeatParams |
输入 |
控制操作数地址步长的参数。BinaryRepeatParams类型,包含操作数相邻迭代间相同datablock的地址步长,操作数同一迭代内不同datablock的地址步长等参数。 相邻迭代间的地址步长参数说明请参考repeatStride;同一迭代内datablock的地址步长参数说明请参考dataBlockStride。 |
返回值
无
支持的型号
约束说明
- 使用高维切分计算接口时,节省地址空间,开发者可以定义一个Tensor,供源操作数与目的操作数同时使用(即地址重叠),相关约束如下:
- 单次迭代内,要求源操作数和目的操作数之间100%重叠,不支持部分重叠。
- 多次迭代间,第N次目的操作数是第N+1次源操作数的情况下,是不支持地址重叠的,因为第N+1次依赖第N次的结果。
- 该接口支持的精度组合如下:
- half精度组合:src0Local数据类型=half;src1Local数据类型=half;dstLocal数据类型=half;PAR=128
- float精度组合:src0Local数据类型=float;src1Local数据类型=float;dstLocal数据类型=float;PAR=64
- mix精度组合:src0Local数据类型=half;src1Local数据类型=half;dstLocal数据类型=float;PAR=64
- mix精度组合下,源操作数和目的操作数无法100%重叠,故不支持地址重叠。
- 操作数地址偏移对齐要求请参见通用约束。
调用示例
本样例中只展示Compute流程中的部分代码。如果您需要运行样例代码,请将该代码段拷贝并替换双目指令样例模板更多样例中的Compute函数即可。
- 高维切分计算接口样例-mask连续模式(mix精度组合)
1 2 3 4 5
uint64_t mask = 64; // repeatTimes = 4, 一次迭代计算64个数, 共计算256个数 // dstBlkStride, src0BlkStride, src1BlkStride = 1, 单次迭代内数据连续读取和写入 // dstRepStride = 8, src0RepStride, src1RepStride = 4, 相邻迭代间数据连续读取和写入 AscendC::MulAddDst(dstLocal, src0Local, src1Local, 64, 4, { 1, 1, 1, 8, 4, 4 });
- 高维切分计算接口样例-mask逐bit模式(mix精度组合)
1 2 3 4 5
uint64_t mask[2] = { UINT64_MAX, 0 }; // repeatTimes = 4, 一次迭代计算64个数, 共计算256个数 // dstBlkStride, src0BlkStride, src1BlkStride = 1, 单次迭代内数据连续读取和写入 // dstRepStride = 8, src0RepStride, src1RepStride = 4, 相邻迭代间数据连续读取和写入 AscendC::MulAddDst(dstLocal, src0Local, src1Local, mask, 4, { 1, 1, 1, 8, 4, 4 });
- tensor前n个数据计算样例(mix精度组合)
1
AscendC::MulAddDst(dstLocal, src0Local, src1Local, 256);
输入数据(src0Local): [-83. 58.2 -14.28 -43.12 20.72 -79.9 54.16 31.56 -1.464 68.25 -28.31 -93.5 -4.2 -46.56 -22.23 78.5 -69.56 -37.03 -53.12 58.28 -71.56 -34.44 85.94 96.3 66.06 99.94 -45.94 8.75 -93.9 35.56 82.56 -70.8 -68.75 -35.4 95.3 -49.1 -56.34 86.75 90.25 24.17 79.06 -49.66 -95.3 -6.965 -63.72 -33.16 -15.56 -43.28 51.28 40.1 83.25 49.72 55.47 -53.7 17.55 -36.06 63. 59.16 -66.8 -9.01 25.56 44.28 22.12 -33.84 -31.9 -74.2 79.94 -34.94 1.119 18.45 -92.75 -83.25 42.66 -77.6 33.28 0.709 -19.3 44.44 45.28 -33.4 -55.94 -42.22 -37.72 39.4 87.25 23.19 34.16 51.3 -22.16 15.234 59. -20.45 -63.9 41.84 -14.63 -80.94 47.8 -36.84 8.47 -60.66 -26.06 -42.78 30.5 -91.3 55.84 -85.44 -99.44 68.2 -71.7 27.45 -11.48 -48.03 71. 71.5 -59.2 14.67 79.25 32.7 -54.22 6.17 -69.94 -49.22 87.7 -61.53 36.25 -57.84 -81.75 -24.84 -35. -62.44 -47.22 19.95 21.16 -31.56 13.38 72.4 -64.06 -89.75 -28.17 34.4 -68.06 -46.94 16.06 65.56 3.16 -59.88 -32.97 30.69 89.5 16.66 25.05 -1.988 5.27 -23.14 -26.89 -24.72 1.427 -14.46 81.9 -59.94 68.7 -83.2 -75.44 88.6 27.62 -58.06 -36.1 -49.53 27.73 89.5 -51.5 90. 67.94 -70.8 24.2 -75.8 -96.75 -22.66 33.03 6.293 -87.5 36.56 36.06 -76.8 1.786 82.9 87.6 -63.94 -4.51 -89.06 -56.06 75.2 -31.89 27.44 35.22 -27.19 37.53 96.94 -83.25 -49.6 31.78 -50.25 65.2 69.9 63.03 53. -70.1 -57.22 -11.99 -23.14 44.28 -77.3 77.25 10.805 16.3 -96.6 -94.9 34.1 -40.25 -99.7 -6.156 44.97 82.7 51.1 -53.28 85.44 -80.94 -47. -53.47 -35.22 76.75 -28.38 26.48 -67.06 34.28 -54.6 21.52 -38.9 79.75 51.7 -39.44 48.56 -91.7 -44.06 92.9 11.79 8.98 -5.074 12.375 -24.77 -27.31 76.2 39.8 -5.46 25.17 47. ] 输入数据(src1Local): [-57.97 43.5 8.08 72.4 -81.44 -52. 69.1 -84.25 31.12 34.34 74.75 83.56 -83. 80.1 42.84 -31.6 88.56 47.34 18.89 -95.25 16.88 -85.75 76.75 -17.19 23.39 92.56 22.81 77.94 38.62 -55.8 38.22 -88.6 -99.4 -66.75 90.44 80.56 12.78 -12.6 -68.4 2.816 27.45 -60.88 70. 61.78 -90.56 -99.25 38.25 -14.49 -35.88 38.1 13. 29.22 -57.06 -44.7 6.535 -44.6 -76.3 91.7 36.66 83.9 66. -81.25 -50.06 68. 2.705 -51.72 66.9 49.03 15.76 9.37 33.2 99.56 -20.55 83.3 -57.1 37.06 68.94 -91.9 -46.06 -92.7 64.4 8.164 8.98 10.76 -75.6 26.94 46.8 62. 8.734 -69.25 -70.2 -59. 67.25 87.6 48.72 60.16 19.39 48.62 21.64 25.06 1.013 -36.6 -46.28 -29.14 67.44 56.7 32.03 -28.81 -94.44 49.6 0.583 -84.4 -51.53 -43. 66. -68. 77.44 -50.16 -90.4 -46.22 90.25 88. 79.25 -40.84 -71.7 -27.03 19.53 85.44 45.06 60.72 19.22 -28.95 -47.72 97.8 -51.6 31.42 31.75 -21.84 -71.4 77.9 43.12 35.66 -50.84 -52. -48.84 -53.97 -59.56 31.2 -64.3 -10.47 86.25 -84.44 -56.4 -63.03 -99.9 54.44 40.72 74.94 8.305 18.52 -47.34 -74.06 79.1 92.44 84.94 -98.7 -41.06 -80.2 -71.06 89.06 96.2 -19.83 -51.03 -92. 82.25 -75.75 58.66 22.72 -89.06 -83.06 -73.5 18.75 -0.939 -96.4 50.12 -73.9 -56.97 52.34 -95.56 11.02 -46.3 -52.2 -8.46 80.56 77. -51.72 38.8 -66.44 -69. -30.33 -53.3 5.406 74.8 52.25 -35.88 92.5 51.38 40.47 43.94 -29.05 89.7 -74.5 -83.5 81.75 -56.6 -13.625 86.9 -4.58 -67.5 -6.67 -59.53 -30.4 -91.75 -84.3 -66.6 -28.61 -13.79 -70.75 -90.2 -47.94 59.56 84.2 0.7085 -57.44 -24.94 -11.875 -90.4 54.22 -44.16 -36.34 -31.64 72.1 -81.25 75.8 93.9 -28.28 -20.53 90.2 -58.97 -95.7 59.22 -37.8 94.9 -86.7 36.16 26.47 ] 输入数据(dstLocal): [-97.94773 -61.303955 32.56878 -87.50743 -78.92147 59.20739 50.336506 49.039738 -76.2525 0.25441223 -71.73807 6.481831 -55.5052 -51.057415 31.403702 63.285076 98.1897 86.71727 -50.16466 88.94256 72.111435 8.4164915 34.524082 73.14016 4.838548 69.67902 -97.855736 90.358696 9.051491 37.595695 -66.01661 -97.110634 82.84477 69.46122 25.561102 47.926853 -10.202202 78.2545 31.339691 12.940468 -31.499294 -3.351652 62.46355 45.0427 -86.02812 -43.48385 -62.274956 -36.077827 51.81446 32.47797 59.10228 68.18655 9.3604145 -76.47674 -50.29268 94.496346 30.837933 -48.315712 -44.92399 -62.369625 47.578724 84.84092 -66.64584 88.376434 95.05615 -92.37309 3.0038757 85.21814 -6.688882 97.74142 20.733965 -5.62451 69.6166 -64.435455 94.09325 -63.13334 89.150345 -17.61865 32.776333 27.28345 31.288876 -9.983517 -46.39662 -37.025536 47.853374 -30.384796 -79.801544 -11.131944 -36.417023 84.25002 -74.19904 -86.72338 -6.5878353 26.253004 -28.112898 -64.88305 -40.56897 -65.849686 22.276798 -3.356709 -78.41364 -67.26924 -10.346288 -43.172684 10.149812 -22.575602 -28.780804 -64.24396 -14.579756 -30.369322 -59.28742 -37.098255 31.078829 29.901808 50.531147 -88.35735 -45.65366 -6.7495203 6.8026304 56.172153 -0.8727364 9.618746 89.294815 75.4403 81.63827 -61.722088 -72.85743 9.296161 -69.17855 2.3497865 20.234892 -13.279363 -44.531677 55.188084 -45.736256 -30.018398 27.09971 28.841034 35.764072 21.457811 -15.206495 94.05271 79.9942 -36.39198 38.40136 5.2365685 -11.435508 67.15551 87.03286 7.9285994 78.32062 97.863335 -28.68556 -72.658554 -79.39075 -82.65206 39.52689 -22.053177 30.602457 -26.158005 49.83525 -72.24563 -97.10148 54.803936 65.070786 -57.019573 35.972733 6.694148 -74.88097 -71.13884 -84.549545 -26.875593 -3.2775877 -8.592472 -5.248627 -22.2127 98.26377 -51.741936 -69.48398 -47.230175 92.72371 18.192408 -39.66745 44.556633 -21.733562 15.191482 5.9535656 41.23602 89.30139 -32.57541 -47.595608 -50.371124 -87.899666 57.644466 38.85747 47.65093 49.42874 -32.424126 -22.5012 78.78245 -70.6598 -87.218544 50.347565 55.945244 -3.4658287 17.902784 -30.977674 53.424767 -82.00753 2.9060571 -1.010124 -94.316765 13.186674 -52.089214 58.975357 48.281635 26.436571 -27.11565 89.21593 -10.962796 49.347828 21.556795 78.163956 35.06028 10.803711 53.231297 -44.78757 -0.6473386 26.717777 63.757347 -4.90904 21.724916 37.443634 -89.250656 62.98874 72.13095 -12.19138 84.16487 71.54008 -73.41178 -97.612564 39.947853 -1.3887504 -5.6196795 -54.509125 -28.877354 26.259935 42.28702 -38.848114 -76.46558 -91.69401 71.27111 89.36143 -65.70425 -31.810083 82.811226 ] 输出数据(dstLocal): [ 4.71345850e+03 2.46985229e+03 -8.27969437e+01 -3.20867920e+03 -1.76620471e+03 4.21270752e+03 3.79388721e+03 -2.61010083e+03 -1.21815369e+02 2.34421533e+03 -2.18809741e+03 -7.80661182e+03 2.93029968e+02 -3.78187769e+03 -9.21200317e+02 -2.41682422e+03 -6.06243945e+03 -1.66648096e+03 -1.05372913e+03 -5.46234668e+03 -1.13550574e+03 2.96143213e+03 6.63022705e+03 -1.58223096e+03 1.55008167e+03 9.32014355e+03 -1.14580493e+03 7.72311829e+02 -3.61687036e+03 -1.94723633e+03 3.08941895e+03 6.17864697e+03 6.91487598e+03 2.43282837e+03 8.64538574e+03 -3.90718848e+03 -7.30345764e+02 -1.01493103e+03 -6.13950391e+03 8.10182877e+01 2.13901343e+03 3.01947266e+03 -6.60941162e+03 -3.85254059e+02 5.68450098e+03 3.24727393e+03 -6.57540588e+02 5.91162170e+02 -1.78790039e+03 1.55979932e+03 1.14135229e+03 1.52090625e+03 -3.15582520e+03 2.32268335e+03 6.43788910e+01 1.70265845e+03 -4.77684961e+03 5.37557275e+03 -2.49401978e+03 -8.17899902e+02 1.73470374e+03 -3.51301074e+03 -1.17427869e+03 -2.21299854e+03 8.74725342e+00 3.74451172e+03 5.34882422e+03 -1.62781116e+03 1.09463263e+01 2.70595306e+02 -3.05740674e+03 -8.29420215e+03 -8.06836060e+02 -6.53156836e+03 -1.80605811e+03 -3.68566055e+01 -1.24112793e+03 -4.10031396e+03 -2.05299121e+03 3.12362524e+03 -3.56968774e+03 -3.54660034e+02 -3.84981323e+02 3.86899506e+02 -6.55042773e+03 5.94228516e+02 1.51913794e+03 3.17024316e+03 -2.29938019e+02 -9.70730469e+02 -4.21526172e+03 1.12001099e+03 -4.30428320e+03 3.69281152e+03 -7.41005249e+02 -4.93377930e+03 8.86545288e+02 -1.85737708e+03 2.05545837e+02 -1.52355396e+03 -1.04807014e+02 1.49825708e+03 -1.42192444e+03 2.61773071e+03 3.77611279e+03 -4.86581396e+03 -3.21388818e+03 -2.02889624e+03 6.75540869e+03 1.33113416e+03 -6.59783478e+01 4.01553857e+03 -3.62763989e+03 -3.04459814e+03 -3.85584375e+03 -1.08604480e+03 6.09126807e+03 -1.64623193e+03 4.90682227e+03 -2.29084198e+02 -6.31273193e+03 -4.32163135e+03 7.03852930e+03 2.58860718e+03 -2.51703369e+03 1.50186682e+03 -1.66953711e+03 -2.11329175e+03 -1.64636609e+03 -3.78877710e+03 -8.87250488e+02 -5.90984680e+02 -1.05408154e+03 -3.03201904e+03 -7.36205750e+02 2.24413989e+03 -2.00688464e+03 1.98931763e+03 2.04653162e+03 2.70084448e+03 -2.95040186e+03 -1.57956250e+03 -7.36683533e+02 -3.44564209e+03 -1.15952522e+02 3.23661548e+03 1.95226562e+03 1.02470142e+03 -5.66893604e+03 -1.66441513e+02 2.23861353e+03 2.65748840e+02 -3.25920044e+02 1.38592395e+03 2.60631055e+03 -1.42827905e+03 9.76226807e+01 -1.10571973e+03 7.10548767e+02 -1.13593823e+03 -3.20208862e+03 6.08882861e+03 -6.06609375e+03 8.24707715e+03 2.41146924e+03 5.67302344e+03 1.51807239e+03 3.97848120e+03 -2.04575500e+03 7.89995508e+03 -5.03820557e+03 -1.81140686e+03 -3.47021313e+03 6.50615771e+03 1.98545837e+03 5.72058398e+03 -5.57672852e+03 -5.66463623e+02 -3.01132959e+03 -5.69939880e+02 6.52397363e+03 7.03739258e+02 -7.35288696e+01 7.44736133e+03 6.77963409e+01 -6.10719922e+03 -4.98593311e+03 -3.30549243e+03 5.20452515e+02 -1.01435034e+03 2.54879883e+03 -3.97421875e+03 1.81924927e+02 2.26807812e+03 2.75070117e+03 1.45375439e+03 1.50611035e+03 -6.47270947e+03 5.72174902e+03 1.58286792e+03 -1.76499768e+03 -3.58882599e+02 4.92718750e+03 3.70691406e+03 -2.26471191e+03 4.92040283e+03 -3.63364966e+03 -2.26214648e+03 -6.08914246e+02 6.75068909e+02 3.97046460e+03 5.66546436e+03 -6.43718848e+03 8.31193970e+02 -8.63325928e+02 1.36479724e+03 -8.21582910e+03 -1.83201096e+02 2.80609082e+03 6.54139771e+02 4.15837097e+02 -1.34577429e+03 -7.50841406e+03 -4.27278174e+03 3.56066699e+03 -2.39108228e+03 1.07126465e+03 3.32460278e+03 4.84893066e+03 1.75205615e+03 4.56651270e+03 -2.36709546e+03 5.62077103e+01 3.76265161e+03 -7.91899902e+02 7.20431763e+02 -1.95666602e+03 -2.02528333e+03 -3.44992090e+03 -1.95192932e+03 1.15021460e+03 3.54251807e+03 7.44822070e+03 -3.34610791e+03 8.66413184e+03 -3.62286774e+02 -1.58040115e+02 -4.15344086e+02 -7.68586426e+02 2.29329517e+03 -1.70910608e+03 -2.80956885e+03 3.86657227e+03 4.07690765e+02 8.78310547e+02 1.32684253e+03]