DumpAccChkPoint功能

功能介绍

使用工具进行算子调测时，支持指定偏移位置的Tensor打印。

该功能与DumpTensor类似，其使用更加灵活。当Tensor数据较大时，可通过DumpAccChkPoint指定偏移位置，截取指定长度的元素值打印。
固定为每个核分配的打印数据的最大可使用空间为1M，目前该大小不支持修改，若打印超过1M，打印内容不再显示，请开发者控制待打印的数据量。

使用方法（命令行）

在核函数代码中按需在需要打印Tensor偏移数据的地方调用DumpAccChkPoint接口，接口说明参见DumpAccChkPoint接口说明，样例如下：
```
DumpAccChkPoint(srcLocal, 5, 32, dataLen);
```
NPU调测场景执行如下命令，使能Dump开关。
```
ascendebug kernel --backend npu --dump-mode acc_chk ... {其他NPU调测参数}
```
--dump-mode取acc_chk，开启偏移位置打印Tensor模式，其他参数说明请参考NPU调测参数，用户按需配置即可。
查看打印结果。
Dump的偏移位置Tensor数据存放到${root}/${work_dir}/npu路径下，其结果目录和结果文件与DumpTensor功能类似，此处不再赘述。

使用方法（API）

在核函数代码中按需在需要打印Tensor偏移数据的地方调用DumpAccChkPoint接口，接口说明参见DumpAccChkPoint接口说明，样例如下：
```
DumpAccChkPoint(srcLocal, 5, 32, dataLen);
```

设置dump_mode='acc_chk'，调用算子编译、运行API接口。这里以标准自定义场景下实现NPU上板打印偏移位置的Tensor为例：

compile_npu_options = ascendebug.CompileNpuOptions(dump_mode='acc_chk')
name, kernel_file, extern = op_executor.compile_custom_npu(customize_path, tiling_info.tiling_key, compile_npu_options)
npu_compile_info = ascendebug.NpuCompileInfo(syncall=extern['cross_core_sync'], task_ration=extern['task_ration'], dump_mode='acc_chk')
run_npu_options = ascendebug.RunNpuOptions()
op_executor.run_npu(kernel_file, run_npu_options, npu_compile_info=npu_compile_info, tiling_info=tiling_info)

查看打印结果。
Dump的偏移位置Tensor数据存放到${root}/${work_dir}/npu路径下，其结果目录和结果文件与DumpTensor功能类似，此处不再赘述。

DumpAccChkPoint接口说明

表1 DumpAccChkPoint接口说明表
函数原型	void DumpAccChkPoint(const LocalTensor<T> &tensor, uint32_t desc, uint32_t offset, uint32_t dumpNum) void DumpAccChkPoint(const GlobalTensor<T> &tensor, uint32_t desc, uint32_t offset, uint32_t dumpNum)
函数功能	支持指定偏移位置的Tensor打印。
参数（IN）	tensor	用户需要Dump的Tensor。多个DumpTensor调用时，不可重复。待dump的Tensor位于Unified Buffer/L1 Buffer/L0C Buffer时使用LocalTensor类型的Tensor参数输入。待dump的Tensor位于Global Memory时使用GlobalTensor类型的Tensor参数输入。当前支持的数据类型为uint8_t、int8_t、int16_t、uint16_t、int32_t、uint32_t、int64_t、uint64_t、float、half。
	desc	用户自定义附加信息（行号或其他自定义数字）。
	offset	偏移元素个数。
	dumpNum	需要Dump的元素个数。
参数（OUT）	NA	-
返回值	NA	-
使用约束	当前接口仅支持位于Unified Buffer/L1 Buffer/L0C Buffer/Global Memory的数据Dump。偏移量需符合UB读取数据32B对齐的限制，即offsetsizeof(T)需按32B对齐。每次Dump的大小(dataNumsizeof(T))需要32B对齐。
调用示例	DumpAccChkPoint(srcLocal, 7, 32 , 128);

父主题： Debug调试功能