在模型运行时遇到报错“RuntimeError: context has already been set”

问题描述

报错文本:

Traceback (most recent call last):
  File "bad.py", line 6, in <module>
    mp.set_start_method('spawn')
  File "/usr/local/python3.8.5/lib/python3.8/multiprocessing/context.py", line 243, in set_start_method
    raise RuntimeError('context has already been set')
RuntimeError: context has already been set
[ERROR] 2024-11-28-14:57:00 (PID:731691, Device:-1, RankID:-1) ERR99999 UNKNOWN application exception

问题分析

昇腾NPU依赖python的multiprocessing.get_context多进程管理能力,受限于python本身的约束,用户如果使用mp.set_start_method('spawn')再次设置多进程启动方式,python会抛异常。

处理方法

使用try-except语句进行异常处理,具体操作如下:

import torch
import torch_npu
import multiprocessing as mp
if __name__ == "__main__":
    try:
        mp.set_start_method('spawn')
    except:
        print("context has already been set")