PCI总线上没检测到昇腾AI处理器

适用场景

现象描述

安装run包时,提示固件升级失败,并且日志中打印信息显示未找到昇腾AI处理器,如下所示。

Verifying archive integrity...  100%   SHA256 checksums are OK. All good.
Uncompressing ASCENDXXX FIRMWARE RUN PACKAGE  100%
[Firmware] [2021-06-23 19:27:30] [INFO]Start time: 2021-06-23 19:27:30
[Firmware] [2021-06-23 19:27:30] [INFO]LogFile: /var/log/ascend_seclog/ascend_install.log
[Firmware] [2021-06-23 19:27:30] [INFO]OperationLogFile: /var/log/ascend_seclog/operation.log
[Firmware] [2021-06-23 19:27:31] [INFO]base version is 1.78.T23.0.B230.
[Firmware] [2021-06-23 19:27:31] [WARNING]Do not power off or restart the system during the installation/upgrade
[Firmware] [2021-06-23 19:27:31] [ERROR]Pcie device is missing, cold reset may fix it.
[Firmware] [2021-06-23 19:27:31] [INFO]End time: 2021-06-23 19:27:31

可能原因

服务器上昇腾处理器的标卡未插紧、接触不良或通风散热不好。

定位思路

图1所示,通过指令lspci | grep d100查看当前环境PCI总线上是否检测到昇腾处理器,如果没有检测到昇腾处理器,则确认昇腾处理器的标卡状态异常,请按处理步骤检查。

图1 检查PCIE设备

处理步骤

针对分析的可能原因,可以参考以下方式处理: