下载
中文
注册

检测内核调用符方式的Ascend C算子

操作步骤

  1. 请参考内核调用符场景准备,完成使用前准备。
  2. 参考使用前准备完成相关环境变量的配置。
  3. 构建单算子可执行文件。

    以Add算子为例,可执行文件的构建命令示例如下:

    bash run.sh -r npu -v <soc_version> 

    一键式编译运行脚本完成后,在工程目录下生成NPU侧可执行文件<kernel_name>_npu

  4. 使用msSanitizer检测工具拉起单算子可执行文件(以add_npu为例)。
    • 内存检测执行以下命令,具体参数说明请参考表2表3,内存检测请参考内存检测示例说明
      mssanitizer --tool=memcheck ./add_npu   # 内存检测需指定 --tool=memcheck
    • 竞争检测执行以下命令,具体参数说明请参考表2,竞争检测请参考竞争检测示例说明
      mssanitizer --tool=racecheck ./add_npu  # 竞争检测需指定 --tool=racecheck

    单算子可执行文件所在路径可配置为绝对路径或相对路径,请根据实际环境配置。

内存检测示例说明

  • 步骤1之前,需要在Add 算子中构造一个非法读写的场景,将DataCopy内存拷贝长度从TILE_LENGTH 改为2 * TILE_LENGTH ,此时最后一次拷贝会发生内存读写越界。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
      __aicore__ inline void CopyOut(int32_t progress)
        {
            // deque output tensor from VECOUT queue
            LocalTensor<half> zLocal = outQueueZ.DeQue<half>();
            // copy progress_th tile from local tensor to global tensor
            // 构造非法读写场景
            DataCopy(zGm[progress * TILE_LENGTH], zLocal, 2 * TILE_LENGTH);
            // free output tensor for reuse
            outQueueZ.FreeTensor(zLocal);
        }
    
  • 根据检测工具输出的报告,可以发现在add_custom.cpp的65行对GM存在 224 字节的非法写操作,与我们构造的异常场景对应。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    $ mssanitizer --tool=memcheck ./add_npu
    ====== ERROR: illegal write of size 224
    ======    at 0x12c0c002ef00 on GM
    ======    in block aiv(7)
    ======    code in pc current 0x1644 (serialNo:2342)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8
    

竞争检测示例说明

  • 步骤1之前,需要在Add 算子中构造一个核间竞争的场景,将DataCopy内存拷贝长度从TILE_LENGTH 改为2 * TILE_LENGTH ,此时会在GM内存上存在核间竞争。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
      __aicore__ inline void CopyOut(int32_t progress)
        {
            // deque output tensor from VECOUT queue
            LocalTensor<half> zLocal = outQueueZ.DeQue<half>();
            // copy progress_th tile from local tensor to global tensor
            // 构造核间竞争场景
            DataCopy(zGm[progress * TILE_LENGTH], zLocal, 2 * TILE_LENGTH);
            // free output tensor for reuse
            outQueueZ.FreeTensor(zLocal);
        }
    
  • 根据检测工具输出的报告,可以发现在add_kernel.cpp的65行,AIV的0核和1核存在核间竞争,符合我们构造的异常场景。
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    $ mssanitizer --tool=racecheck ./add_npu
    ====== ERROR: Potential WAW hazard detected at GM :
    ======    PIPE_MTE3 Write at WAW()+0x12c0c0025f00 in block 0 (aiv) at pc current 0x1644 (serialNo:305)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8
    ======    PIPE_MTE3 Write at WAW()+0x12c0c0026000 in block 1 (aiv) at pc current 0x1644 (serialNo:329)
    ======    #0 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/impl/dav_c220/kernel_operator_data_copy_impl.h:107:9
    ======    #1 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:155:9
    ======    #2 ${ASCEND_HOME_PATH}/compiler/tikcpp/tikcfw/inner_interface/inner_kernel_operator_data_copy_intf.cppm:459:5
    ======    #3 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:65:9
    ======    #4 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:38:13
    ======    #5 samples/operator/AddCustomSample/KernelLaunch/AddKernelInvocation/add_custom.cpp:82:8