下载
中文
注册
我要评分
文档获取效率
文档正确性
内容完整性
文档易理解
在线提单
论坛求助
昇腾小AI

应用进程占用内存超出限制导致系统异常终止

现象描述

以ATC工具为例,当打开了虚拟内存限制而atc尝试使用超出限制的内存大小时会报错,报错信息一般如下所示:

ATC start working now, please wait for a moment.
terminate called after throwing an instance of 'std::system_error'
  what():  Resource temporarily unavailable
/usr/local/Ascend/ascend-toolkit/latest/bin/atc: line 17: 44752 Aborted                 (core dumped) ${PKG_PATH}/bin/atc.bin "$@"
(base) root@davinci-mini:/home/lanxi-samples/samples/cplusplus/level1_single_api/4_op_dev/2_verify_op/acl_execute_add/run/out# Process ForkServerProcess-2:
Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/root/miniconda3/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 61, in wrapper
    raise exp
  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 58, in wrapper
    func(*args, **kwargs)
  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 275, in task_distribute
    key, func_name, detail = resource_proxy[TASK_QUEUE].get()
  File "<string>", line 2, in get
  File "/root/miniconda3/lib/python3.9/multiprocessing/managers.py", line 809, in _callmethod
    kind, result = conn.recv()
  File "/root/miniconda3/lib/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/root/miniconda3/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/root/miniconda3/lib/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError

解决方案

  1. Ctrl+C关闭当前进程。

    如果发现系统卡死可以在SSH客户端界面中立刻按Ctrl+C关闭当前进程。

    注意:系统可能需要至多50秒才能从假死状态中恢复响应。

  2. 借助ulimit命令,在atc等内存消耗比较大的进程启动前设置内存使用上限,防止内存耗尽导致宕机。

    目前的测试中,仅限制虚拟内存的使用也有防止系统宕机的作用。

    1. 打开虚拟内存限制
      ulimit -v 20480000

      2048000代表kb为单位的虚拟内存上限即2GB。

      查看虚拟内存限制是否生效,可以使用ulimit -a命令或者直接使用ulimit -v

      ulimit -a输出:
      (base) root@davinci-mini:~# ulimit -a
      real-time non-blocking time  (microseconds, -R) unlimited
      core file size              (blocks, -c) 0
      data seg size               (kbytes, -d) unlimited
      scheduling priority                 (-e) 0
      file size                   (blocks, -f) unlimited
      pending signals                     (-i) 12584
      max locked memory           (kbytes, -l) 449876
      max memory size             (kbytes, -m) unlimited
      open files                          (-n) 1024
      pipe size                (512 bytes, -p) 8
      POSIX message queues         (bytes, -q) 819200
      real-time priority                  (-r) 0
      stack size                  (kbytes, -s) 8192
      cpu time                   (seconds, -t) unlimited
      max user processes                  (-u) 12584
      virtual memory              (kbytes, -v) 2048000
      file locks                          (-x) unlimited
      ulimit -v输出:
      (base) root@davinci-mini:~# ulimit -v
      2048000
    2. 关闭虚拟内存限制

      当确定不需要运行atc等有宕机风险的程序时,可以关闭内存限制。

      ulimit -v unlimited

      确保内存限制已关闭,可再次使用ulimit -a查看限制详情。

搜索结果
找到“0”个结果

当前产品无相关内容

未找到相关内容,请尝试其他搜索词