下载
中文
注册

防火墙未关闭

问题现象

测试报错如下,查询机器防火墙发现防火墙未关闭

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[root@node-87-66 hccl_test]# mpirun -f hostfile -n 16 ./bin/all_reduce_test -b 8K -e 4G -f 2 -d fp32 -p 8
the minbytes is 8192, maxbytes is 4294967296, iters is 20, warmup_iters is 5
Fatal error in PMPI_Barrier: Unknown error class, error stack:
PMPI_Barrier(425)...............: MPI_Barrier(MPI_COMM_WORLD) failed
MPIR_Barrier_impl(332)..........: Failure during collective
MPIR_Barrier_impl(327)..........:
MPIR_Barrier(292)...............:
MPIR_Barrier_intra(150).........:
barrier_smp_intra(96)...........:
MPIR_Barrier_impl(332)..........: Failure during collective
MPIR_Barrier_impl(327)..........:
MPIR_Barrier(292)...............:
MPIR_Barrier_intra(169).........:
MPIDI_CH3U_Recvq_FDU_or_AEP(629): Communication error with rank 8
barrier_smp_intra(111)..........:
MPIR_Bcast_impl(1452)...........:
MPIR_Bcast(1476)................:
MPIR_Bcast_intra(1287)..........:
MPIR_Bcast_binomial(310)........: Failure during collective

原因分析

发生此问题的原因一般是防火墙未关闭。

不同系统防火墙查询命令略有不同,例如:

systemctl status firewalld

解决步骤

通过systemctl命令关闭防火墙,不同系统设置命令略有不同,命令示例如下:

  • 关闭防火墙:

    systemctl stop firewalld

  • 禁用防火墙(开机不启动):

    systemctl disable firewalld

    sudo ufw status