调试PyTorch接口调用的算子

展示如何使用msDebug工具来上板调试一个PyTorch接口调用的add算子，该add算子可实现两个向量相加并输出结果的功能。

前提条件

单击Link获取样例工程，为进行算子调试做准备。
已参考《Ascend Extension for PyTorch 配置与安装》，完成PyTorch框架和torch_npu插件的安装。
参考使用前准备完成相关环境变量配置。

操作步骤

参考创建算子工程完成算子工程的创建。
参考算子开发完成算子开发。
参考算子编译部署，完成算子的编译部署。
编辑op_kernel/CMakeLists.txt文件，增加编译选项-O0 -g。
```
add_ops_compile_options(ALL OPTIONS -O0 -g )
```

进入PyTorch接入工程，使用PyTorch调用方式调用AddCustom算子工程，并按照指导完成编译。

PytorchInvocation
├── op_plugin_patch         
├── run_op_plugin.sh      //  5.执行样例时，需要使用
└── test_ops_custom.py    //  步骤6启动工具时，需要使用

执行样例，样例执行过程中会自动生成测试数据，然后运行pytorch样例，最后检验运行结果。

bash run_op_plugin.sh
-- CMAKE_CCE_COMPILER: ${INSTALL_DIR}/toolkit/tools/ccec_compiler/bin/ccec
-- CMAKE_CURRENT_LIST_DIR: ${INSTALL_DIR}/AddKernelInvocation/cmake/Modules
-- ASCEND_PRODUCT_TYPE:
  Ascendxxxyy
-- ASCEND_CORE_TYPE:
  VectorCore
-- ASCEND_INSTALL_PATH:
  /usr/local/Ascend/ascend-toolkit/latest
-- The CXX compiler identification is GNU 10.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: ${INSTALL_DIR}/AddKernelInvocation/build
Scanning dependencies of target add_npu
...
[100%] Built target add_npu
INFO: Ascend C Add Custom SUCCESS
...
INFO: Ascend C Add Custom  in torch.compile graph SUCCESS

手动导入算子调试信息。
- ${INSTALL_DIR}请替换为CANN软件安装后文件存储路径。若安装的Ascend-cann-toolkit软件包，以root安装举例，则安装后文件存储路径为：/usr/local/Ascend/ascend-toolkit/latest。
- 在安装昇腾AI处理器的服务器执行npu-smi info命令进行查询，获取Chip Name信息。实际配置值为AscendChip Name，例如Chip Name取值为xxxyy，实际配置值为Ascendxxxyy。
```
(msdebug) export 
LAUNCH_KERNEL_PATH=${INSTALL_DIR}/opp/vendors/customize/op_impl/ai_core/tbe/kernel/SOC_VERSION/add_custom/AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o
```

启动msDebug工具拉起Python程序，进入调试界面。

msdebug python3 test_ops_custom.py
(msdebug) target create "python3"
Current executable set to '/home/mindstudio/miniconda3/envs/py37/bin/python3' (aarch64).
(msdebug) settings set -- target.run-args  "test_ops_custom.py"
(msdebug)

设置断点。

根据指定源码文件与对应行号，在核函数中设置NPU断点。

(msdebug) b add_custom.cpp:60
Breakpoint 1: where = AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) + 9912 [inlined] KernelAdd::Compute(int) + 3400 at add_custom.cpp:60:9, address = 0x00000000000026b8

运行程序，等待直到命中断点。

(msdebug) r
Process 197189 launched: '/home/miniconda3/envs/py38/bin/python3' (aarch64)
Process 197189 stopped and restarted: thread 1 received signal: SIGCHLD
...
[Launch of Kernel anonymous on Device 0]
Process 197189 stopped
[Switching to focus on Kernel anonymous, CoreId 8, Type aiv]
* thread #1, name = 'python3', stop reason = breakpoint 2.1
    frame #0: 0x00000000000026b8 AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b.o`::AddCustom_1e04ee05ab491cc5ae9c3d5c9ee8950b_1(uint8_t *, uint8_t *, uint8_t *, uint8_t *, uint8_t *) [inlined] KernelAdd::Compute(this=0x000000000020efb8, progress=1) at add_custom.cpp:60:9
   57              LocalTensor<DTYPE_Y> yLocal = inQueueY.DeQue<DTYPE_Y>();
   58              LocalTensor<DTYPE_Z> zLocal = outQueueZ.AllocTensor<DTYPE_Z>();
   59              Add(zLocal, xLocal, yLocal, this->tileLength);
-> 60              outQueueZ.EnQue<DTYPE_Z>(zLocal);
   61              inQueueX.FreeTensor(xLocal);
   62              inQueueY.FreeTensor(yLocal);
   63          }
(msdebug)

其他调试操作可参考导入调试信息、内存与变量打印、调试信息展示及核切换等，与其操作一致。

删除断点，具体操作请参见删除断点。

调试完以后，执行q命令并输入Y或y结束调试。

(msdebug) q
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

父主题： 典型案例