本章节以Atlas 800 训练服务器(型号:9000)openEuler 20.03 LTS系统为例说明如何将vNPU(虚拟NPU)直通虚拟机,操作步骤中的和打印信息仅为示例,请以实际操作界面为准。
[root@localhost ~]# lspci | grep d801 3d:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) 3e:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) 60:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) 61:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) b1:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) b2:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) da:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) db:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20)
npu-smi info
[root@localhost ~]# npu-smi info +-------------------------------------------------------------------------------------------+ | npu-smi 23.0.rc1.b050 Version: 23.0.rc1.b050 | +----------------------+---------------+----------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | +======================+===============+====================================================+ | 0 910A | OK | 67.8 45 0 / 0 | | 0 | 0000:61:00.0 | 0 2006 / 15039 0 / 32768 | +======================+===============+====================================================+ | 1 910A | OK | 64.4 39 0 / 0 | | 0 | 0000:DB:00.0 | 0 2022 / 15039 0 / 32768 | +======================+===============+====================================================+ | 2 910A | OK | 65.0 42 0 / 0 | | 0 | 0000:B2:00.0 | 0 2022 / 15039 0 / 32768 | +======================+===============+====================================================+ | 3 910A | OK | 64.1 43 0 / 0 | | 0 | 0000:3E:00.0 | 0 2020 / 15039 0 / 32768 | +======================+===============+====================================================+ | 4 910A | OK | 68.0 44 0 / 0 | | 0 | 0000:60:00.0 | 0 2005 / 15039 0 / 32768 | +======================+===============+====================================================+ | 5 910A | OK | 63.4 38 0 / 0 | | 0 | 0000:DA:00.0 | 0 2020 / 15039 0 / 32768 | +======================+===============+====================================================+ | 6 910A | OK | 65.9 40 0 / 0 | | 0 | 0000:B1:00.0 | 0 2007 / 15039 0 / 32768 | +======================+===============+====================================================+ | 7 910A | OK | 64.4 44 0 / 0 | | 0 | 0000:3D:00.0 | 0 2006 / 15039 0 / 32768 | +======================+===============+====================================================+
[root@localhost ~]# npu-smi info -t template-info -i 1 +------------------------------------------------------------------------------------------+ |NPU instance template info is: | |Name AICORE Memory AICPU VPC VENC JPEGD | | GB PNGD VDEC JPEGE | |==========================================================================================| |vir02 2 1 1 1 0 1 | | 1.5 1 0 | +------------------------------------------------------------------------------------------+ |vir04 4 2 1 2 0 2 | | 3 2 1 | +------------------------------------------------------------------------------------------+ |vir08 8 4 3 4 0 4 | | 6 4 2 | +------------------------------------------------------------------------------------------+ |vir16 16 8 7 8 0 8 | | 12 8 4 | +------------------------------------------------------------------------------------------+ [root@localhost ~]#
cat /proc/sys/kernel/random/uuid
[root@localhost ~]# cat /proc/sys/kernel/random/uuid bcdca436-e624-4b9c-a6e3-a62742a86ff6 [root@localhost ~]#
[root@localhost ~]# echo bcdca436-e624-4b9c-a6e3-a62742a86ff6 > /sys/bus/pci/devices/0000\:db\:00.0/mdev_supported_types/vnpu-vir02/create [root@localhost ~]#
如需删除创建好的vNPU,可通过执行echo 1 > /sys/bus/mdev/devices/bcdca436-e624-4b9c-a6e3-a62742a86ff6/remove。
[root@localhost ~]# ls -l /sys/bus/mdev/devices/ total 0 lrwxrwxrwx 1 root root 0 Feb 24 10:01 bcdca436-e624-4b9c-a6e3-a62742a86ff6 -> ../../../devices/pci0000:d7/0000:d7:00.0/0000:d8:00.0/0000:d9:04.0/0000:db:00.0/bcdca436-e624-4b9c-a6e3-a62742a86ff6 [root@localhost ~]#
[root@localhost ~]# virsh list --all Id Name State ---------------------------- 1 openeuler running
[root@localhost vm]# virsh shutdown openeuler Domain openeuler is being shutdown
[root@localhost vm]# virsh edit openeuler Domain openeuler XML configuration edited. [root@localhost vm]#
ubuntu系统可能会出现如下提示,请按照提示选择编辑器。
Select an editor. To change later, run 'select-editor'. 1. /bin/nano <---- easiest 2. /usr/bin/vim.basic 3. /usr/bin/vim.tiny 4. /bin/ed Choose 1-4 [1]:
<hostdev mode='subsystem' type='mdev' model='vfio-pci'> <source> <address uuid='bcdca436-e624-4b9c-a6e3-a62742a86ff6'/> </source> </hostdev>
在物理机上通过ssh root@xxx 命令登录目标虚拟机(xxx为目标虚拟机IP地址,如:192.168.1.199)
[root@localhost ~]# lspci | grep d801 08:00.0 Processing accelerators: Huawei Technologies Co., Ltd. Device d801 (rev 20) [root@localhost ~]#
参数 |
说明 |
---|---|
device_name |
NPU芯片的PCIe名称。
|
Bus-Id |
NPU芯片的Bus-Id值,可通过执行npu-smi info查询。 说明:
Bus-Id值中的“:”需要用“\”进行转义。 |
id |
NPU芯片的ID,与1.c中查询到ID保持一致。 |
uuid |
每个vNPU特有的编号,不能重复。 |
template-name |
算力切分模板名称。 |
domain |
目标虚拟机名称。可以通过virsh list --all命令查看。 |