Template Class device_tensor3¶
Defined in File device_tensor.h
Class Documentation¶
-
template<typename
T
>
classicrar::cuda
::
device_tensor3
¶ A cuda device buffer object that represents a memory buffer on a cuda device.
- Note
See https://www.quantstart.com/articles/Matrix-Matrix-Multiplication-on-the-GPU-with-Nvidia-CUDA/
- Note
See https://forums.developer.nvidia.com/t/guide-cudamalloc3d-and-cudaarrays/23421
- Template Parameters
T
:
Public Functions
-
device_tensor3
(device_tensor3 &&other)¶ Copy Constructor.
- Parameters
other
:
-
device_tensor3 &
operator=
(device_tensor3 &&other) noexcept¶
-
~device_tensor3
()¶
-
__host__ __device__ size_t
GetDimensionSize
(int dim) const¶
-
__host__ __device__ size_t
GetCount
() const¶ Gets the total number of elements in the tensor.
- Return
host
-
__host__ __device__ size_t
GetSize
() const¶ Gets the total number of elements in the tensor.
- Return
host
-
__host__ __device__ size_t
GetByteSize
() const¶ Gets the total number of bytes in the memory buffer.
- Return
host
-
__host__ void
SetDataSync
(const T *data)¶ Performs a synchronous copy of data into the device buffer.
- Return
host
- Parameters
data
:
-
__host__ void
SetDataAsync
(const T *data)¶ Set the Data asyncronously from host memory.
- Return
host
- Parameters
data
:
-
__host__ void
SetDataAsync
(const device_tensor3<T> &data)¶