Template Class device_tensor3

Class Documentation

template<typename T>
class icrar::cuda::device_tensor3

A cuda device buffer object that represents a memory buffer on a cuda device.

Note

See https://www.quantstart.com/articles/Matrix-Matrix-Multiplication-on-the-GPU-with-Nvidia-CUDA/

Note

See https://forums.developer.nvidia.com/t/guide-cudamalloc3d-and-cudaarrays/23421

Template Parameters
  • T:

Public Functions

device_tensor3(size_t sizeDim0, size_t sizeDim1, size_t sizeDim2, const T *data = nullptr)
device_tensor3(const Eigen::Tensor<T, 3> &tensor)
device_tensor3(device_tensor3 &&other)

Copy Constructor.

Parameters
  • other:

device_tensor3 &operator=(device_tensor3 &&other) noexcept
~device_tensor3()
__host__ __device__ T *Get()
__host__ __device__ const T *Get() const
__host__ __device__ size_t GetDimensionSize(int dim) const
__host__ __device__ Eigen::DSizes<Eigen::DenseIndex, 3> GetDimensions() const
__host__ __device__ size_t GetCount() const

Gets the total number of elements in the tensor.

Return

host

__host__ __device__ size_t GetSize() const

Gets the total number of elements in the tensor.

Return

host

__host__ __device__ size_t GetByteSize() const

Gets the total number of bytes in the memory buffer.

Return

host

__host__ void SetDataSync(const T *data)

Performs a synchronous copy of data into the device buffer.

Return

host

Parameters
  • data:

__host__ void SetDataAsync(const T *data)

Set the Data asyncronously from host memory.

Return

host

Parameters
  • data:

__host__ void SetDataAsync(const device_tensor3<T> &data)
__host__ void ToHost(T *out) const
__host__ void ToHost(std::vector<T> &out) const
__host__ void ToHost(Eigen::Tensor<T, 3> &out) const
__host__ void ToHostAsync(T *out) const