torch_npu.contrib.function.matmul_transpose(tensor1, tensor2):
使用NPU自定义算子替换原生写法,以提高性能。
Tensor - 输出张量。
在动态shape场景中,由于算子限制,不支持Box transformation deltas。
1 2 3 4 5 6 7 | >>> from torch_npu.contrib.function import matmul_transpose >>> tensor1 = torch.randn(68, 5, 75, 16).npu() >>> tensor1.requires_grad = True >>> tensor2 = torch.randn(68, 5, 75, 16).npu() >>> tensor2.requires_grad = True >>> output = matmul_transpose(tensor1, tensor2) >>> output.sum().backward() |