torch_npu.contrib.npu_fused_attention(hidden_states, attention_mask, query_kernel, key_kernel, value_kernel, query_bias, key_bias, value_bias, scale=1, keep_prob=0)

bert自我注意的融合实现。

参数解释：
- hidden_states (Tensor)：the hidden_states of the last layer
- attention_mask (Tensor)：attention mask
- query_kernel (Tensor): the weight of query
- key_kernel (Tensor)：the weight of key
- value_kernel (Tensor): the weight of value
- query_bias (Tensor)：the weight of query
- key_bias (Tensor)：the weight of key
- value_bias (Tensor): the weight of value
- scale=1 (double)：scaling coefficient of the calculated score
- keep_prob=0: probability of "keeping items", should equal to 1 - drop date.
返回值：
torch.Tensor：The result of self attention
约束条件：
无
示例：
无