dynamo export功能

功能简介

通过dynamo_export可以导出air格式的图，导出的推理模型不再依赖PyTorch框架，可直接由CANN软件栈加载执行，减少了框架调度带来的性能损耗，方便在不同的部署环境上移植。

导出时需要保证被导出部分能构成一张图。
支持单卡和多卡场景下导出图，且支持导出后带allreduce等通信类算子。
受dynamo功能约束，不支持动态控制流if/else。

使用方法

def dynamo_export(*args, model: torch.nn.Module, export_path: str = "export_file", export_name: str = "export", dynamic: bool = False, config=CompilerConfig(), **kwargs)

表1 参数说明表
参数名	说明	是否必选
model	需要导出的model。	是
export_path	设置导出的文件存放路径，默认值为当前路径下的"export_file"文件夹内。	否
export_name	设置导出的离线模型名字，默认值为"export"。	否
dynamic	设置导出静态模型还是动态模型，默认值为False，导出静态模型，为True时导出动态模型。	否
config	通过config设置功能开关。当前支持功能：前端切分场景下（即python脚本中包含了集合通信逻辑），自动生成ATC（Ascend Tensor Compiler，昇腾张量编译）的json配置文件样例模板。默认值为false，不生成config文件。导出图中携带nn_module_stack信息，方便后端切分运用模板。 config参数构造示例如下： import torch_npu import torchair as tng config = tng.CompilerConfig() # 自动生成ATC的json配置文件样例 config.export.experimental.auto_atc_config_generated = True # 携带nn_module_stack信息 config.export.experimental.enable_record_nn_module_stack = True	否
args，*kwargs	导出model时的样例输入，不同的输入可能导致model走入不同的分支，进而导致trace的图不同。应当选取执行推理时的典型值。	是

export_path：由于导出air图时会将权重外置，权重也会被保存到该路径下，该路径同时被标记在air文件中的fileconst节点中。注意此处路径可以配置为相对路径或绝对路径，若选择相对路径，在ATC编译、执行离线模型时也需要在相对路径的父路径中执行；若选择绝对路径，在ATC编译、执行离线模型时无路径限制，但是当编译好的模型拷贝至其他服务器环境时需要保证绝对路径相同，否则会找不到权重文件。
对于导出的权重文件，若权重很小能被保存在export.air文件中，则不会在目录下生成额外的权重文件。若权重文件很大，导致生成的air文件过大则无法存在air文件中，权重将会转为额外的权重文件存储在导出路径中（如场景示例中的p1、p2文件）。通过查看dynamo.txt文件中是否存在FileConstant节点，确定是否生成了额外的权重文件。
携带nn_module_stack信息有如下约束：
- 前端脚本定义layer时，需要以数组的形式，即类似layer[0] = xxx, layer[1] = xxx。若不以数组形式表现变量名，相同模型结构被重复执行，从栈信息中将无法看出模型的layer结构，后端也无法切分。
- record_nn_module_stack只有在model结构深度两层及以上才能获取到。

场景示例

单卡场景下dynamo export示例：

import torch
import torch_npu
import torchair

class Model(torch.nn.Module):

    def __init__(self):
        super().__init__()
        self.p1 = torch.nn.Parameter(torch.randn(2, 4))
        self.p2 = torch.nn.Parameter(torch.randn(2, 4))

    def forward(self, x, y):
        x = x + y + self.p1 + self.p2
        return x

model = Model()
x = torch.randn(2, 4)
y = torch.randn(2, 4)

torchair.dynamo_export(x, y, model=model, export_path="./test_export_file_False", dynamic=False)

执行如下命令查看导出结果：

[root@localhost example_export]# tree
.
├── example1.py
└── test_export_file_False  // 指定导出的文件夹，当文件夹不存在时会自动创建
    ├── dynamo.pbtxt       // 导出可读的图信息，用于debug
    ├── export.air        // 导出的模型文件，ATC编译时的输入。其中通过fileconst节点记录了权重所在的路径与文件名
    ├── p1               // 导出的权重文件
    └── p2              // 导出的权重文件

1 directory, 5 files

多卡场景下dynamo export示例：

import torch, os
import torch_npu
import torchair
from torchair import CompilerConfig

class AllReduceSingeGroup(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.p1 = torch.nn.Parameter(torch.tensor([[1.1, 1.1], [1.1, 1.1]]))
        self.p2 = torch.nn.Parameter(torch.tensor([[2.2, 2.2], [3.3, 3.3]]))

    def forward(self, x, y):
        x = x + y + self.p1 + self.p2
        torch.distributed.all_reduce(x)
        return x

def example(rank, world_size):
       torch.distributed.init_process_group("gloo", rank=rank, world_size=world_size)
       x = torch.ones([2, 2], dtype=torch.int32)
       y = torch.ones([2, 2], dtype=torch.int32)
       mod = AllReduceSingeGroup()
       config = CompilerConfig()
       config.export.experimental.auto_atc_config_generated = True
       config.export.experimental.enable_record_nn_module_stack = True
       torchair.dynamo_export(x, y, model=mod, dynamic=True, export_path="./mp", export_name="mp_rank"，config=config)

def mp():
    world_size = 2
    torch.multiprocessing.spawn(example, args=(world_size, ), nprocs=world_size, join=True)

if __name__ == '__main__':
     os.environ["MASTER_ADDR"] = "localhost"
     os.environ["MASTER_PORT"] = "29505"
     mp()

执行如下命令查看导出结果：

[root@localhost example_export]# tree
.
├── example1.py
├── example2.py
└── mp                                       
    ├── model_relation_config.json          
    ├── mp_rank0.air                  // 第一张卡导出的模型文件
    ├── mp_rank1.air                  // 第二张卡导出的模型文件
    ├── numa_config.json          
    ├── rank_0                    // 第一张卡子目录
    │   ├── dynamo.pbtxt        // 导出可读的图信息，用于debug
    │   ├── p1                 // 导出的权重文件
    │   └── p2                 // 导出的权重文件
    └── rank_1
        ├── dynamo.pbtxt
        ├── p1
        └── p2

3 directories, 12 files

mp：指定导出的文件夹，即export_path，当文件夹不存在时会自动创建。
mp_rank0/1：由指定的export_name加上rank id拼接而成。
model_relation_config.json、numa_config.json：前端切分场景下，自动生成ATC编译的json配置文件模板。
json中相关字段需要用户根据自己的需求修改，字段参数含义请参考《CANN ATC工具使用指南》中的“--model_relation_config”章节与《CANN ATC工具使用指南》中的“--cluster_config”章节。针对多卡场景，item节点被生成为node_id为0的表中，需要用户根据自己的需求手动划分至不同的node下。
mp/rank_0和mp/rank_1：生成的子目录，里面存放着每张卡的dynamo.pbtxt图信息、权重文件（若权重没有被保存在air文件中）。

父主题： 特性介绍