ST测试时ACL单算子匹配模型失败,报错提示“[Match][OpModel]MatchOpModel”
2023/05/08
148
问题信息
问题来源 | 产品大类 | 关键字 |
---|---|---|
官方 | 算子开发 | ST测试、ACL单算子 |
问题现象描述
ST测试时ACL单算子匹配模型失败,报错提示“[Match][OpModel]MatchOpModel”。
原因分析
模型加载时及模型执行时的算子信息不匹配。
解决措施
运行ST测试时通过配置环境变量“export ASCEND_GLOBAL_LOG_LEVEL=0”打开debug日志,在~/ascend/log下查看模型加载时的关键日志“Register model. OpModelDef =”及模型执行时的关键日志“OpExecutor::ExecuteAsync aclOp =”,需要确认加载和执行时模型中的算子信息是否匹配。
单算子模型推理时,模型匹配失败的问题较为常见,以下此类问题的定位方法供大家参考。
- 查看模型加载关键日志。
此条日志表示加载了静态单算子模型, debug级别:
AclOpMap::Insert IN, aclOp =
此条日志表示加载了动态单算子模型,info级别:
AclShapeRangeMap::Insert IN, aclOp =
如果没有开debug级别日志,可以搜索INFO关键字(Register model. OpModelDef = ),无论静态还是动态,加载模型时都会打印此条日志。
- 查看模型执行时关键日志。
此条日志表示opexecute时用户实际输入的算子shape,info级别。
OpExecutor::ExecuteAsync aclOp =
要确认加载的动态模型和执行时传的shape是不是匹配。
- 以静态单算子模型匹配进行分析举例。
- 查看加载模型关键日志(Register model. OpModelDef =),此条日志打印了加载的模型的全部描述信息,包括shape,attr等。
[INFO] ASCENDCL(30164,execute_op):2020-11-18-22:49:52.009.052 [../../../../../acl/single_op/op_model_manager.cpp:308]30164 RegisterModel: "Register model. OpModelDef = [OpModelDef] Path: op_models/0_StnPre_1_2_4_2_6_8_1_2_4_2_6_8_0_2_4_2_6_8_1_2_4_2_6_8_3_2_4_2_6_8.om, OpType: StnPre, InputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] InputDesc[1]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] InputDesc[2]: [TensorDesc] DataType = 0, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] OutputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] OutputDesc[1]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] , Attr: {align_corners = True, default_theta = [1.3, 1.2, 1.3, 1.2], size = [1, 1, 1, 1], use_default_theta = [False, False, True, False, False, False]}"
- 查看执行模型关键日志(OpExecutor::ExecuteAsync aclOp =),此条日志打印了用户调用执行接口的输入shape信息,也包括执行的shape和attr等。
[INFO] ASCENDCL(30164,execute_op):2020-11-18-22:49:52.277.245 [../../../../../acl/single_op/op_executor.cpp:177]30166 ExecuteAsync: "OpExecutor::ExecuteAsync aclOp = OpType: StnPre, InputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] InputDesc[1]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] InputDesc[2]: [TensorDesc] DataType = 0, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] OutputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] OutputDesc[1]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] Attr: {align_corners = True, default_theta = [1.3, 1.2, 1.3, 1.2], size = [1, 1, 1, 1], use_default_theta = [False, False, True, False, False, False]}"
- 静态单算子模型匹配要求用户输入的shape信息和加载的模型信息完全一致,才可以匹配成功。要保证以上日志中的每个InputDesc和OutputDesc中的描述完全一致,如果不一致则会匹配失败。
- 查看加载模型关键日志(Register model. OpModelDef =),此条日志打印了加载的模型的全部描述信息,包括shape,attr等。
- 以动态单算子模型匹配进行分析举例。
- 查看加载模型关键日志。
[INFO] ASCENDCL(4775,execute_mul_op):2020-11-22-00:35:36.262.762 [../../../../../acl/single_op/op_model_manager.cpp:308]4775 RegisterModel: "Register model. OpModelDef = [OpModelDef] Path: op_models/0_Where_3_2_256_3_2_-1_1.om, OpType: Where, InputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256], StorageShape = [], shapeRange = 256, 256 OutputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [-1, 1], StorageShape = [], shapeRange = [[0, 256], [1, 1]] , Attr: {}"
- 查看执行模型关键日志。
[INFO] ASCENDCL(4775,execute_mul_op):2020-11-22-00:35:36.516.023 [../../../../../acl/single_op/op_executor.cpp:177]4775 ExecuteAsync: "OpExecutor::ExecuteAsync aclOp = OpType: Where, InputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256], StorageShape = [], shapeRange = [] OutputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256, 1], StorageShape = [], shapeRange = [], Attr: {}"
- 动态单算子模型匹配,除了需要保证执行阶段除shape和shapeRange外的所有信息要和模型加载的一致之外,还需要保证执行时的shape在模型的shaperange中的范围内。
- 查看加载模型关键日志。
- 单算子aclopExecute端到端流程。
- 执行单算子匹配时会先匹配静态map,再去匹配动态map,若静态和动态都匹配不上,则匹配失败。
- 以下日志表示静态表没有命中后再去匹配动态表,并说明静态表中没匹配上的原因是opType没匹配上。
[WARNING] ASCENDCL(51297,python):2021-03-24-22:37:38.638.462 [op_model_manager.cpp:235]51297 Get: Match op type failed. opType = Equal [INFO] ASCENDCL(51297,python):2021-03-24-22:37:38.638.466 [op_model_manager.cpp:728]51297 MatchOpModel: Match static opModels fail, begin to match model from dynamic opModels. opType = Equal
- 以下日志表示动态表没有匹配上,动态表没匹配上存在多种原因。
- 表示动态表中没有该opType对应的模型:
[INFO] ASCENDCL(51297,python):2021-03-24-22:37:38.638.472 [op_model_manager.cpp:253]51297 GetTensorShapeStatus: GetTensorShapeStatus opType is Equal, size of shapeStatus is 0 [WARNING] ASCENDCL(51297,python):2021-03-24-22:37:38.638.475 [op_model_manager.cpp:783]51297 MatchOpModel: MatchOpModel fail from static map or dynamic map
- 表示inputDesc没有匹配上:
[ERROR] ASCENDCL(51297,python):2021-03-24-22:37:38.840.261 [op_model_manager.cpp:293]51297 Get: Match op inputs failed. opType = Equal, inputDesc = 2~9_2_2_-2_false_0|9_2_false_1| [ERROR] ASCENDCL(51297,python):2021-03-24-22:37:38.840.266 [op_model_manager.cpp:783]51297 MatchOpModel: MatchOpModel fail from static map or dynamic map
此时需要对比前面加载进来的动态模型中的inputDesc是什么,查找方式可参考步骤1中的方法。
- 表示动态表中没有该opType对应的模型:
- 查看已经加载进来的equal模型信息。
Insert: AclShapeRangeMap::Insert IN, aclOp = OpType: Equal, InputDesc[0]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = 2, Shape = [-2], StorageShape = [-2], shapeRange = [], memtype = 0, isConst = 0 InputDesc[1]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = -1, Shape = [], StorageShape = [], shapeRange = [], memtype = 1, isConst = 1 , isConst = true, Const Len = 2 ,Const data = 2,0, OutputDesc[0]: [TensorDesc] DataType = 12, Format = 2, StorageFormat = 2, Shape = [-2], StorageShape = [-2], shapeRange = [], memtype = 0, isConst = 0 Attr: {}
- 查看用户输入的equal模型信息。
aclopCompileAndExecute: ExecuteAsync::aclOp = OpType: Equal, InputDesc[0]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = 2, Shape = [1], StorageShape = [1], shapeRange = [], memtype = 0, isConst = 0 InputDesc[1]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = -1, Shape = [], StorageShape = [], shapeRange = [], memtype = 1, isConst = 0 OutputDesc[0]: [TensorDesc] DataType = 12, Format = 2, StorageFormat = 2, Shape = [1], StorageShape = [1], shapeRange = [], memtype = 0, isConst = 0 Attr: {}
- 从以上两个信息的对比可以发现,第二个tensor输入中模型上的isConst为1,用户输入的isConst为0,模型没有匹配上。
本页内容