根据查询信息分析,可能存在以下原因:
针对上述可能原因,可以参考以下方法处理:
使用msnpureport工具导出并查看黑匣子日志,结合《黑匣子日志参考》中LPM3页签的内容,确认是否存在温度过高造成的异常码。
msnpureport工具使用方法请参见《日志参考》中“msnpureport工具使用”章节。
如果存在类似日志信息,返回错误码-28(表示内核资源不足),可能为硬件环境MSI-X中断不足导致:
[ 7.448019] devdrv_device_driver 0000:09:00.0: irq 503 for MSI/MSI-X [ 7.448024] devdrv_device_driver 0000:09:00.0: irq 504 for MSI/MSI-X [ 7.448030] devdrv_device_driver 0000:09:00.0: irq 505 for MSI/MSI-X [ 7.448036] devdrv_device_driver 0000:09:00.0: irq 506 for MSI/MSI-X [ 7.448042] devdrv_device_driver 0000:09:00.0: irq 507 for MSI/MSI-X [ 7.448124] [drv_pcie] [devdrv_init_interrupt_normal 377] <systemd-udevd:605:605> vector_num -28 [ 7.448140] [ERROR] [drv_pcie] [devdrv_init_interrupt_normal 382] <systemd-udevd:605:605> devdrv_device_driver: vector_num -28 error [ 7.448143] [ERROR] [drv_pcie] [devdrv_probe 768] <systemd-udevd:605:605> devdrv_device_driver, init interrupt failed. ret -1 [ 7.448374] devdrv_device_driver: probe of 0000:09:00.0 failed with error -1 [ 7.448387] [drv_pcie] [devdrv_probe 703] <systemd-udevd:605:605> probe driver IN. bdf:0a:00.0 [ 7.448551] [drv_pcie] [devdrv_set_startup_status 1404] <systemd-udevd:605:605> dev id -1 startup status init jiffies 4294674711 [ 7.448641] [drv_pcie] [devdrv_register_pci_devctrl 1263] <systemd-udevd:605:605> devdrv_device_driver, dev_id:2, bus:ffff8c1b39997c00 [ 7.448643] [drv_pcie] [drvdrv_dev_startup_record 248] <systemd-udevd:605:605> probe new dev 2, add to report,dev_num:3. [ 7.448644] [drv_pcie] [drvdrv_dev_startup_report 289] <systemd-udevd:605:605> dev startup no report id:2
如图2,显示Device数量比实际数量少,原因为Device硬件通信线路不通。
如图3,显示Device有“ff”状态,原因为Device通信线路断链。
可能是PCIE标卡与主机接触不良,则下电后重新插拔板卡,再上电启动。
如果以上操作无法解决该异常现象,请联系技术支持处理。