固定权重类输入地址功能

功能简介

推理场景下，该功能将模型执行期间地址不变的Tensor标识为Parameter类型，从而缩短图下发时间，提升下发性能。

该功能一般适用于ChatGPT、LLAMA等开源大模型，请根据自身实际情况开启。

使用方法

该功能通过torchair.get_npu_backend中compiler_config参数配置，配置示例如下，参数说明参见表1。

import torch_npu, torchair
config = torchair.CompilerConfig()
# 固定权重类输入地址功能开关
config.experimental_config.frozen_parameter = True
npu_backend = torchair.get_npu_backend(compiler_config=config)

表1 参数说明
参数名	参数说明	是否必选
frozen_parameter	图执行时是否固定权重类输入地址。 False（缺省值）：不固定权重类输入地址。 True：固定权重类输入地址。	否

特殊场景

PyTorch的to算子转换时会丢失Parameter类型，因此需要先将CPU tensor转换为NPU tensor，再通过torch.nn.Parameter(tensor)等方式，将普通tensor转换为Parameter类型的tensor。

PyTorch的to算子转换示例如下：

import torch
import torch_npu
import torchair

config = torchair.CompilerConfig()
config.experimental_config.frozen_parameter = True
npu_backend = torchair.get_npu_backend(compiler_config=config)

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y, z):
        return torch.add(x, y*z)
 
model = Model()
# 正确转化方式：先将tensor转化为npu tensor，再转化为Parameter类型，转化后in1是Parameter类型
in1 = torch.nn.Parameter(torch.randn(4, 1).float().npu())
# 错误转化方式：先转化为Parameter类型，再将tensor转化为npu tensor，转化后的in1不是Parameter类型
in1 = torch.nn.Parameter(torch.randn(4, 1).float()).npu()
in2 = torch.randn(4, 4).float().npu()
in3 = torch.randn(4, 4).int().npu()
model = torch.compile(model, backend=npu_backend, dynamic=True)
graph_result = model(in1, in2, in3)

父主题： 更多功能