下载
中文
注册
ST测试时ACL单算子匹配模型失败,报错提示“[Match][OpModel]MatchOpModel”

ST测试时ACL单算子匹配模型失败,报错提示“[Match][OpModel]MatchOpModel”

2023/05/08

148

暂无评分
我要评分

问题信息

问题来源产品大类关键字
官方算子开发ST测试、ACL单算子

问题现象描述

ST测试时ACL单算子匹配模型失败,报错提示“[Match][OpModel]MatchOpModel”。

原因分析

模型加载时及模型执行时的算子信息不匹配。

解决措施

运行ST测试时通过配置环境变量“export ASCEND_GLOBAL_LOG_LEVEL=0”打开debug日志,在~/ascend/log下查看模型加载时的关键日志“Register model. OpModelDef =”及模型执行时的关键日志“OpExecutor::ExecuteAsync aclOp =”,需要确认加载和执行时模型中的算子信息是否匹配。

单算子模型推理时,模型匹配失败的问题较为常见,以下此类问题的定位方法供大家参考。

  1. 查看模型加载关键日志。

    此条日志表示加载了静态单算子模型, debug级别:

    AclOpMap::Insert IN, aclOp =

    此条日志表示加载了动态单算子模型,info级别:

    AclShapeRangeMap::Insert IN, aclOp =

    如果没有开debug级别日志,可以搜索INFO关键字(Register model. OpModelDef = ),无论静态还是动态,加载模型时都会打印此条日志。

  2. 查看模型执行时关键日志。

    此条日志表示opexecute时用户实际输入的算子shape,info级别。

    OpExecutor::ExecuteAsync aclOp =

    要确认加载的动态模型和执行时传的shape是不是匹配。

  3. 以静态单算子模型匹配进行分析举例。

    1. 查看加载模型关键日志(Register model. OpModelDef =),此条日志打印了加载的模型的全部描述信息,包括shape,attr等。
      [INFO] ASCENDCL(30164,execute_op):2020-11-18-22:49:52.009.052 [../../../../../acl/single_op/op_model_manager.cpp:308]30164 RegisterModel: "Register model. OpModelDef = [OpModelDef] Path: op_models/0_StnPre_1_2_4_2_6_8_1_2_4_2_6_8_0_2_4_2_6_8_1_2_4_2_6_8_3_2_4_2_6_8.om, 
      OpType: StnPre, 
      InputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = []
      InputDesc[1]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      InputDesc[2]: [TensorDesc] DataType = 0, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      OutputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      OutputDesc[1]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] , 
      Attr: {align_corners = True, default_theta = [1.3, 1.2, 1.3, 1.2], size = [1, 1, 1, 1], use_default_theta = [False, False, True, False, False, False]}"
    2. 查看执行模型关键日志(OpExecutor::ExecuteAsync aclOp =),此条日志打印了用户调用执行接口的输入shape信息,也包括执行的shape和attr等。
      [INFO] ASCENDCL(30164,execute_op):2020-11-18-22:49:52.277.245 [../../../../../acl/single_op/op_executor.cpp:177]30166 ExecuteAsync: "OpExecutor::ExecuteAsync aclOp = OpType: StnPre, 
      InputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      InputDesc[1]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      InputDesc[2]: [TensorDesc] DataType = 0, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = []
      OutputDesc[0]: [TensorDesc] DataType = 1, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      OutputDesc[1]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [4, 2, 6, 8], StorageShape = [], shapeRange = [] 
      Attr: {align_corners = True, default_theta = [1.3, 1.2, 1.3, 1.2], size = [1, 1, 1, 1], use_default_theta = [False, False, True, False, False, False]}"
    3. 静态单算子模型匹配要求用户输入的shape信息和加载的模型信息完全一致,才可以匹配成功。要保证以上日志中的每个InputDesc和OutputDesc中的描述完全一致,如果不一致则会匹配失败。

  4. 以动态单算子模型匹配进行分析举例。

    1. 查看加载模型关键日志。
      [INFO] ASCENDCL(4775,execute_mul_op):2020-11-22-00:35:36.262.762 [../../../../../acl/single_op/op_model_manager.cpp:308]4775 RegisterModel: "Register model. OpModelDef = [OpModelDef] Path: op_models/0_Where_3_2_256_3_2_-1_1.om, 
      OpType: Where, 
      InputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256], StorageShape = [], shapeRange = 256, 256 
      OutputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [-1, 1], StorageShape = [], shapeRange = [[0, 256], [1, 1]] , Attr: {}"
    2. 查看执行模型关键日志。
      [INFO] ASCENDCL(4775,execute_mul_op):2020-11-22-00:35:36.516.023 [../../../../../acl/single_op/op_executor.cpp:177]4775 ExecuteAsync: "OpExecutor::ExecuteAsync aclOp = OpType: Where, 
      InputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256], StorageShape = [], shapeRange = [] 
      OutputDesc[0]: [TensorDesc] DataType = 3, Format = 2, StorageFormat = -1, Shape = [256, 1], StorageShape = [], shapeRange = [], Attr: {}"
    3. 动态单算子模型匹配,除了需要保证执行阶段除shape和shapeRange外的所有信息要和模型加载的一致之外,还需要保证执行时的shape在模型的shaperange中的范围内。

  5. 单算子aclopExecute端到端流程。

    1. 执行单算子匹配时会先匹配静态map,再去匹配动态map,若静态和动态都匹配不上,则匹配失败。
    2. 以下日志表示静态表没有命中后再去匹配动态表,并说明静态表中没匹配上的原因是opType没匹配上。
      [WARNING] ASCENDCL(51297,python):2021-03-24-22:37:38.638.462 [op_model_manager.cpp:235]51297 Get: Match op type failed. opType = Equal
      [INFO] ASCENDCL(51297,python):2021-03-24-22:37:38.638.466 [op_model_manager.cpp:728]51297 MatchOpModel: Match static opModels fail, begin to match model from dynamic opModels. opType = Equal 
    3. 以下日志表示动态表没有匹配上,动态表没匹配上存在多种原因。
      • 表示动态表中没有该opType对应的模型:
        [INFO] ASCENDCL(51297,python):2021-03-24-22:37:38.638.472 [op_model_manager.cpp:253]51297 GetTensorShapeStatus: GetTensorShapeStatus opType is Equal, size of shapeStatus is 0
        [WARNING] ASCENDCL(51297,python):2021-03-24-22:37:38.638.475 [op_model_manager.cpp:783]51297 MatchOpModel: MatchOpModel fail from static map or dynamic map
      • 表示inputDesc没有匹配上:
        [ERROR] ASCENDCL(51297,python):2021-03-24-22:37:38.840.261 [op_model_manager.cpp:293]51297 Get: Match op inputs failed. opType = Equal, inputDesc = 2~9_2_2_-2_false_0|9_2_false_1|
        [ERROR] ASCENDCL(51297,python):2021-03-24-22:37:38.840.266 [op_model_manager.cpp:783]51297 MatchOpModel: MatchOpModel fail from static map or dynamic map

        此时需要对比前面加载进来的动态模型中的inputDesc是什么,查找方式可参考步骤1中的方法。

  6. 查看已经加载进来的equal模型信息。

    Insert: AclShapeRangeMap::Insert IN, aclOp = OpType: Equal, 
    InputDesc[0]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = 2, Shape = [-2], StorageShape = [-2], shapeRange = [], memtype = 0, isConst = 0 
    InputDesc[1]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = -1, Shape = [], StorageShape = [], shapeRange = [], memtype = 1, isConst = 1 , isConst = true, Const Len = 2 ,Const data = 2,0, 
    OutputDesc[0]: [TensorDesc] DataType = 12, Format = 2, StorageFormat = 2, Shape = [-2], StorageShape = [-2], shapeRange = [], memtype = 0, isConst = 0 Attr: {}

  7. 查看用户输入的equal模型信息。

    aclopCompileAndExecute: ExecuteAsync::aclOp = OpType: Equal, 
    InputDesc[0]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = 2, Shape = [1], StorageShape = [1], shapeRange = [], memtype = 0, isConst = 0 
    InputDesc[1]: [TensorDesc] DataType = 9, Format = 2, StorageFormat = -1, Shape = [], StorageShape = [], shapeRange = [], memtype = 1, isConst = 0 
    OutputDesc[0]: [TensorDesc] DataType = 12, Format = 2, StorageFormat = 2, Shape = [1], StorageShape = [1], shapeRange = [], memtype = 0, isConst = 0 Attr: {}

  8. 从以上两个信息的对比可以发现,第二个tensor输入中模型上的isConst为1,用户输入的isConst为0,模型没有匹配上。

本页内容