torch_npu.npu_fused_attention_score

torch_npu.npu_fused_attention_score(Tensor query_layer, Tensor key_layer, Tensor value_layer, Tensor attention_mask, Scalar scale, float keep_prob, bool query_transpose=False, bool key_transpose=False, bool bmm_score_transpose_a=False, bool bmm_score_transpose_b=False, bool value_transpose=False, bool dx_transpose=False) -> Tensor
实现“Transformer attention score”的融合计算逻辑,主要将matmul、transpose、add、softmax、dropout、batchmatmul、permute等计算进行了融合。