用户进程卡住或者用户强制退出进程后,再次重启,重启后发现进程无法正常启动。类似的日志信息如下:
AscendCL日志信息:aclrtProcessReport failed
aclrtProcessReport failed, ret = 107012 aclrtProcessReport failed, ret = 107012
Runtime日志信息:halResourceIdAlloc xxx failed
[ERROR] DRV(2086,rtstest_host):2021-06-09-02:14:46.034.368 [ascend][curpid: 2086, 2086][drv][tsdrv][halResourceIdAlloc 477]id is exhausted, type(0 stream), range[0, 1024), dev_id(0), tsid(0). [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.380 [npu_driver.cc:285]2086 StreamIdAlloc:[driver interface] halResourceIdAlloc streamid failed: device_id=0, tsId=0, drvRetCode=48! [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.401 [stream.cc:448]2086 Setup:Failed to alloc stream id, retCode=0x702001a. [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.416 [context.cc:1251]2086 StreamCreate:Setup stream failed, retCode=0x702001a. [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.440 [logger.cc:211]2086 StreamCreate:Create stream failed, priority=7 ,flags=0. [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.458 [api_c.cc:461]2086 rtStreamCreateWithFlags:ErrCode=207008, desc=[driver error:no stream resource], InnerCode=0x702001a [ERROR] RUNTIME(2086,rtstest_host):2021-06-09-02:14:46.034.469 [error_message_manage.cc:26]2086 ReportFuncErrorReason:rtStreamCreateWithFlags execute failed, reason=[driver error:no stream resource]
通过日志分析无法正常重启的原因可能是public taskid、stream id、eventid等资源申请不到引起的:
针对上述可能原因,可以按以下方式处理: