torch_npu.profiler.profile(activities=None, schedule=None, on_trace_ready=None, record_shapes=False, profile_memory=False, with_stack=False, with_flops=False, with_modules=False, experimental_config=None, use_cuda=None)
提供对训练过程数据的profiling功能。
torch_npu.profiler.ProfilerActivity.CPU:框架侧数据采集的开关。
torch_npu.profiler.ProfilerActivity.NPU:CANN软件栈及NPU数据采集的开关。
默认情况下两个开关同时开启。
开启torch_npu.profiler.ProfilerActivity.CPU时生效。
experimental_config:扩展参数,通过扩展配置性能分析工具常用的采集项。支持采集项和详细介绍请参见torch_npu.profiler._ExperimentalConfig
experimental_config = torch_npu.profiler._ExperimentalConfig( aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, profiler_level=torch_npu.profiler.ProfilerLevel.Level1, l2_cache=False ) with torch_npu.profiler.profile( activities=[ torch_npu.profiler.ProfilerActivity.CPU, torch_npu.profiler.ProfilerActivity.NPU ], schedule=torch_npu.profiler.schedule(wait=1, warmup=1, active=2, repeat=2, skip_first=10), on_trace_ready=torch_npu.profiler.tensorboard_trace_handler("./result"), record_shapes=True, profile_memory=True, with_stack=True, with_flops=False, with_modules=False, experimental_config=experimental_config) as prof: for step in range(steps): train_one_step(step, steps, train_loader, model, optimizer, criterion) prof.step()