下载
中文
注册

使用MindIE Torch对文本嵌入/重排序模型进行编译优化

利用MindIE Torch对文本嵌入模型和重排序模型进行编译优化、权重保存。

  • 样例一(对于文本嵌入模型):
    请用户自行创建Python文件并对文本嵌入模型进行编译优化,保存pt文件于模型权重文件夹下。
    import torch
    import mindietorch
    from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
    
    MIN_SHAPE = (1, 1)
    MAX_SHAPE = (300, 512)
    BATCH_SIZE = 300
    sentences = ["This is a sentence." for _ in range(BATCH_SIZE)]
    
    # load model
    with torch.no_grad():
        # 注意将文件中的model_id修改为实际路径
        model_id = '/home/data/embedding_models/bge-large-zh-v1.5'
        tokenizer = AutoTokenizer.from_pretrained(model_id)
        model = AutoModel.from_pretrained(model_id, torchscript=True)
        model.eval()
    
        inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt', max_length=512)
        inputs['input_ids'] = inputs['input_ids'].to(torch.int32)
        inputs['attention_mask'] = inputs['attention_mask'].to(torch.int32)
        model = torch.jit.trace(model, [inputs['input_ids'], inputs['attention_mask']], strict=False)
        
        # compile
        dynamic_inputs = []
        dynamic_inputs.append(mindietorch.Input(min_shape=MIN_SHAPE, max_shape=MAX_SHAPE, dtype=inputs['input_ids'].dtype))
        dynamic_inputs.append(mindietorch.Input(min_shape=MIN_SHAPE, max_shape=MAX_SHAPE, dtype=inputs['attention_mask'].dtype))
        compiled_model = mindietorch.compile(
           model,
           inputs = dynamic_inputs,
           precision_policy = mindietorch.PrecisionPolicy.FP32,
           truncate_long_and_double=True,
           require_full_compilation=False,
           allow_tensor_replace_int=False,
           min_block_size=3,
           torch_executed_ops=[],
           soc_version="Ascend310xxx",   # Ascend310xxx昇腾AI处理器类型,根据服务器设备类型配置
           optimization_level=0
        )
    
        # save model
        compiled_model.save(model_id+"/compiled_model.pt")
        print('compiled model saved!')
    
  • 样例二(对于重排序模型):

    由于PyTorch 2.1.0 Arm版本不支持对reranker模型进行trace操作(原因是该版本下执行out_features=1的Linear操作存在待修复问题,报错为:RuntimeError: could not create a primitive descriptor for a matmul primitive),请使用torch>=2.2.0版本的Python环境执行trace操作(执行完trace.py后可继续使用2.1.0版本的torch及torch_npu)。

    1. 首先通过容器、Anaconda或venv等方式创建一个与当前环境隔离的Python环境(建议该环境下Python版本与NPU开发环境相同,torch>=2.2.0),创建方式请参考以下示例。
      # 对于Anaconda
      conda create -n myenv python=3.10
      conda activate myenv
      conda install pytorch==2.2.0 -c pytorch
      
      # 对于venv
      python3.10 -m venv myenv
      source myenv/bin/activate   # Linux/macOS
      # 或者 myenv\Scripts\activate  # Windows
      pip install --upgrade pip
      pip install torch==2.2.0
      
    2. 修改transformers源码避免数据溢出。
      1. 执行以下命令查看transformers源码的安装路径:
        pip show transformers

        回显示例如下所示:

        /usr/local/python3.10/site-packages/transformers
      2. 执行以下命令打开modeling_utils.py文件。
        vim /usr/local/python3.10/site-packages/transformers/modeling_utils.py
      3. 定位到invert_attention_mask方法中的以下代码。
        extended_attention_mask = (1.0 - extended_attention_mask) * torch.finfo(dtype).min

        将其修改为如下所示:

        extended_attention_mask = (1.0 - extended_attention_mask) * (-1000)
    3. 自行创建并执行以下Python文件进行trace操作,保存ts文件于模型权重文件夹下。
      import torch
      from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
      
      pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
      with torch.no_grad():
          model_path = '/home/data/TEI_models/bge-reranker-large'  # 注意将文件中的model_path修改为实际路径
          tokenizer = AutoTokenizer.from_pretrained(model_path)
          model = AutoModelForSequenceClassification.from_pretrained(model_path, trust_remote_code=True, torchscript=True)
          model.eval()
          inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
          inputs['input_ids'] = inputs['input_ids'].to(torch.int32)
          inputs['attention_mask'] = inputs['attention_mask'].to(torch.int32)
          model = torch.jit.trace(model, [inputs['input_ids'], inputs['attention_mask']], strict=False)
          model.save(model_path+"/traced_model.ts")
          print('tradced model saved!')
    4. trace成功后使用以下命令退出隔离环境。
      # 对于Anaconda
      conda deactivate
      
      # 对于venv
      deactivate
    5. 在NPU开发环境中自行创建并执行以下Python文件进行compile操作,保存pt文件于模型权重文件夹下,完成模型的编译优化。

      整体编译耗时约为30分钟,请耐心等待,当打印'compiled model saved!'回显时,则表示模型编译成功。

      import torch
      import mindietorch
      from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModel
      import sys
      
      MIN_SHAPE = (1, 1)
      MAX_SHAPE = (300, 512)
      
      with torch.no_grad():
          model_id = '/home/data/TEI_models/bge-reranker-large'  # 注意将文件中的model_id修改为实际路径
          tokenizer = AutoTokenizer.from_pretrained(model_id)
          model = torch.jit.load(model_id+'/traced_model.ts')
          model.eval()
      
          dynamic_inputs = []
          dynamic_inputs.append(mindietorch.Input(min_shape=MIN_SHAPE, max_shape=MAX_SHAPE, dtype=torch.int32))
          dynamic_inputs.append(mindietorch.Input(min_shape=MIN_SHAPE, max_shape=MAX_SHAPE, dtype=torch.int32))
      
          compiled_model = mindietorch.compile(
             model,
             inputs = dynamic_inputs,
             precision_policy = mindietorch.PrecisionPolicy.FP16,
             truncate_long_and_double=True,
             require_full_compilation=False,
             allow_tensor_replace_int=False,
             min_block_size=3,
             torch_executed_ops=[],
             # soc_version根据硬件型号填入,"xxxxx"与npu-smi info打屏信息中的'Name'字段一致,共五位
             soc_version="Ascendxxxxx",
             optimization_level=0
           )
      
          compiled_model.save(model_id+"/compiled_model.pt")
          print('compiled model saved!')