GlobalTensor

功能说明

GlobalTensor用来存放Global Memory（外部存储）的全局数据。

定义原型

template <typename T> class GlobalTensor : public BaseGlobalTensor<T> {
public:
    // PrimT用于在T传入为TensorTrait类型时萃取TensorTrait中的LiteType基础数据类型
    using PrimType = PrimT<T>;
    __aicore__ inline GlobalTensor<T>() {}
    __aicore__ inline void SetGlobalBuffer(__gm__ PrimType* buffer, uint64_t bufferSize);  // 传入全局数据的指针，并手动设置一个buffer size，初始化GlobalTensor
    __aicore__ inline void SetGlobalBuffer(__gm__ PrimType* buffer);  // 传入全局数据的指针，初始化GlobalTensor,可以不传入buffer size,但此时使用GetSize获取的长度是随机值
    __aicore__ inline const __gm__ PrimType* GetPhyAddr() const;   // 返回全局数据的地址
    __aicore__ inline __gm__ PrimType* GetPhyAddr(const uint64_t offset) const;  // 返回全局数据(指定偏移offset个元素)的地址
    __aicore__ inline __inout_pipe__(S) PrimType GetValue(const uint64_t offset) const;  // 获取Tensor的相应偏移位置的值
    __aicore__ inline __inout_pipe__(S) __gm__ PrimType& operator()(const uint64_t offset) const;   // 返回某个index位置的元素的引用
    __aicore__ inline void SetValue(const uint64_t offset, PrimType value);  // 设置Tensor相应偏移位置的值。
    __aicore__ inline uint64_t GetSize() const;  // 返回Tensor中的element个数
    __aicore__ inline GlobalTensor operator[](const uint64_t offset) const;  // 指定偏移返回一个GlobalTensor，offset单位为element
    __aicore__ inline void SetShapeInfo(const ShapeInfo& shapeInfo);
    __aicore__ inline ShapeInfo GetShapeInfo() const;
    template<CacheRwMode rwMode = CacheRwMode::RW>
    __aicore__ inline void SetL2CacheHint(CacheMode mode);   // 设置Tensor写入L2 Cache的模式（允许/禁止）
    ...
};

函数说明

类型T支持基础数据类型以及TensorTrait类型，但需要遵循使用此GlobalTensor的指令的数据类型支持情况。

表1 函数说明
函数名称	入参说明	含义
GetValue	offset：偏移量，单位为element	获取GlobalTensor中的某个值，返回T类型的立即数。注意：如果GetValue的Global Memory地址内容存在被外部改写的可能，需要先调用DataCacheCleanAndInvalid，确保Data Cache与Global Memory的Cache一致性，之后在调用此接口。
SetValue	offset：偏移量，单位为element value: 设置值，单位为任意类型	设置GlobalTensor中的某个值。因为硬件实现不同，其与通用CPU标量赋值操作存在差异。SetValue赋值操作首先改写的是每个AI Core内部的Data Cache，不会立刻写出到Global Memory，且后续写出时以CacheLine（64B）为单位。使用该接口之前必须了解Data Cache结构和Cache一致性原理（参见DataCacheCleanAndInvalid），否则可能存在误用的情况，请谨慎使用。调用SetValue后，首先改写的是每个AI Core内部的Data Cache，如果需要立即写出到Global Memory，需要在调用此接口后，再调用DataCacheCleanAndInvalid，确保Data Cache与Global Memory的Cache一致性。多核操作GM地址时，要求不同核操作的地址（通过offset参数设置元素偏移，可以转换为地址）至少有CacheLine大小的偏移，否则会出现多核数据随机覆盖。同时需要考虑地址对齐(64B)的问题。详细内容请参考调用示例。
SetGlobalBuffer	buffer：主机侧传入的全局数据指针 bufferSize：所包含的类型为T的数据个数，单位为element，需自行保证不会超出实际数据的长度	设置GlobalTensor的存储位置：buffer指向外部存储的起始地址，bufferSize为Tensor所占外部存储的大小，如指向的外部存储有连续256个int32_t，则其bufferSize为256。
GetPhyAddr	-	返回GlobalTensor的地址，如果传入offset，则表示偏移offset个元素。
GetSize	-	返回GlobalTensor的element个数。
operator[]	offset：用户指定的偏移位置	根据输入的offset偏移返回新的Tensor，offset的单位为element的个数。
operator()	index: 下标索引	获取本GlobalTensor的第index个变量的引用。与LocalTensor的operator()类似。该功能仅在Atlas A2训练系列产品/Atlas 800I A2推理产品型号支持。
SetShapeInfo	shapeInfo：ShapeInfo结构体	设置GlobalTensor的shapeInfo。
GetShapeInfo	无	获取GlobalTensor的shapeInfo。注意：Shape信息没有默认值，只有通过SetShapeInfo设置过Shape信息后，才可以调用该接口获取正确的ShapeInfo。
SetL2CacheHint	mode：用户指定的L2 Cache模式； CacheMode为枚举类型，取值为【CACHE_MODE_NORMAL, CACHE_MODE_DISABLE】	如果用户在写算子时，相比不写入L2 Cache中，某Global Tensor使用L2 Cache反而会导致实测性能下降，可以手动禁止该Global Tensor写入L2 Cache。当mode == CacheMode::CACHE_MODE_NORMAL时，Tensor可被写入L2 Cache。当mode == CacheMode::CACHE_MODE_DISABLE时，Tensor不会写入L2 Cache。如果不调用该接口，默认为CacheMode::CACHE_MODE_NORMAL，即Tensor可被写入L2 Cache。该功能仅在Atlas A2训练系列产品/Atlas 800I A2推理产品型号支持。

调用示例

void Init(__gm__ uint8_t *src_gm, __gm__ uint8_t *dst_gm)
{
    uint64_t dataSize = 256; //设置input_global的大小为256

    AscendC::GlobalTensor<int32_t> inputGlobal; // 类型为int32_t
    inputGlobal.SetGlobalBuffer(reinterpret_cast<__gm__ int32_t *>(src_gm), dataSize); // 设置源操作数在Global Memory上的起始地址为src_gm，所占外部存储的大小为256个int32_t

    AscendC::LocalTensor<int32_t> inputLocal = inQueueX.AllocTensor<int32_t>();    
    AscendC::DataCopy(inputLocal, inputGlobal, dataSize); // 将Global Memory上的inputGlobal拷贝到Local Memory的inputLocal上
    ...
}

父主题： 数据类型定义