Cheetah - SKA - PSS - Prototype Time Domain Search Pipeline
Public Types | Public Member Functions | List of all members
ska::cheetah::hrms::cuda::Hrms Class Reference

CUDA/Thrust implementation of the Hrms algorithm. More...

Inheritance diagram for ska::cheetah::hrms::cuda::Hrms:
Inheritance graph
Collaboration diagram for ska::cheetah::hrms::cuda::Hrms:
Collaboration graph

Public Types

typedef cheetah::Cuda Architecture
 
typedef panda::nvidia::DeviceCapability< 2, 0, panda::nvidia::giga/2 > ArchitectureCapability
 
typedef panda::PoolResource< Architecture > ResourceType
 

Public Member Functions

 Hrms (Config const &config, hrms::Config const &algo_config)
 Construct an instance of Hrms. More...
 
 Hrms (Hrms const &)=delete
 
 Hrms (Hrms &&)=default
 
template<typename T , typename Alloc >
void process (ResourceType &gpu, data::PowerSeries< cheetah::Cuda, T, Alloc > const &input, std::vector< data::PowerSeries< cheetah::Cuda, T, Alloc >> &output)
 Perform harmonic summing of a PowerSeries object. More...
 
- Public Member Functions inherited from ska::cheetah::utils::AlgorithmBase< Config, hrms::Config >
 AlgorithmBase (Config const &impl_config, hrms::Config const &algo_config)
 
 AlgorithmBase (AlgorithmBase const &)=delete
 
 AlgorithmBase (AlgorithmBase &&)=default
 

Additional Inherited Members

- Protected Attributes inherited from ska::cheetah::utils::AlgorithmBase< Config, hrms::Config >
Config const & _impl_config
 
hrms::Config const & _algo_config
 

Detailed Description

CUDA/Thrust implementation of the Hrms algorithm.

Definition at line 24 of file Hrms.cuh.

Constructor & Destructor Documentation

◆ Hrms()

ska::cheetah::hrms::cuda::Hrms::Hrms ( Config const &  config,
hrms::Config const &  algo_config 
)

Construct an instance of Hrms.

Parameters
configThe Hrms configuration

Definition at line 8 of file Hrms.cu.

9  : utils::AlgorithmBase<Config, hrms::Config>(config,algo_config)
10 {
11 }

Member Function Documentation

◆ process()

template<typename T , typename Alloc >
void ska::cheetah::hrms::cuda::Hrms::process ( ResourceType &  gpu,
data::PowerSeries< cheetah::Cuda, T, Alloc > const &  input,
std::vector< data::PowerSeries< cheetah::Cuda, T, Alloc >> &  output 
)

Perform harmonic summing of a PowerSeries object.

Parameters
gpuThe device on which to process
inputThe input PowerSeries object
outputA PowerSeries object for each sum output
Template Parameters
TThe value type of the input and outputs
AllocThe allocator types of the inputs and outputs

To keep the cleanness of just using the PowerSeries class rather than implementing a PowerSeries2D-like class with a single contiguous storage buffer, we make the decision here to use std::vectors of PowerSeries objects for the output of the method. To use these on the device, it is necessary to first create thrust::device_vector of device pointers that can itself be passed as a raw pointer to any function. Here this is done by first creating a host_vector, filling it with raw pointer casts of each output series and finally copying that host_vector back to the device where it can be used.

Definition at line 89 of file Hrms.cu.

92 {
105  PUSH_NVTX_RANGE("cuda_Hrms_process",0);
106  PANDA_LOG_DEBUG << "GPU ID: "<<gpu.device_id();
107 
108  PUSH_NVTX_RANGE("cuda_Hrms_process_prepare",1);
109  //container for device pointers
110  thrust::host_vector<T*> output_ptrs_host(output.size());
111 
112  for (int idx=0; idx<output.size(); ++idx)
113  {
114  auto& series = output[idx];
115  double hnum = (double)(1<<(idx+1));
116  //resize the outputs to the correct size
117  series.resize(input.size());
118 
119  //recalculate the dof
120  series.degrees_of_freedom(input.degrees_of_freedom()*hnum);
121 
122  //set the outputs metadata
123  series.frequency_step((input.frequency_step().value()/hnum) * data::hz);
124 
125  //store device pointer in host_vector
126  output_ptrs_host[idx] = thrust::raw_pointer_cast(series.data());
127  }
128 
129  //copy the host array of device pointers to the device
130  thrust::device_vector<T*> output_ptrs_device = output_ptrs_host;
131  auto input_ptr = thrust::raw_pointer_cast(input.data());
132  auto output_ptrs = thrust::raw_pointer_cast(output_ptrs_device.data());
133  thrust::counting_iterator<unsigned> begin(0);
134  thrust::counting_iterator<unsigned> end = begin + input.size();
135 
136  POP_NVTX_RANGE; // cuda_Hrms_process_prepare
137  PUSH_NVTX_RANGE("cuda_Hrms_process_execute_kernels",2);
138  //Here we use a switch case based on the number of PowerSeries objects in
139  //the output vector. This determines the number of harmonic sums to be performed.
140  //The switch case then allows us to dispatch to a partial specialisation of the
141  //harmonic summing functor.
142  switch (output.size())
143  {
144  case 1:
145  thrust::for_each(thrust::cuda::par, begin, end, detail::HarmonicSumFunctor<T,1>(input_ptr,output_ptrs));
146  break;
147  case 2:
148  thrust::for_each(thrust::cuda::par, begin, end, detail::HarmonicSumFunctor<T,2>(input_ptr,output_ptrs));
149  break;
150  case 3:
151  thrust::for_each(thrust::cuda::par, begin, end, detail::HarmonicSumFunctor<T,3>(input_ptr,output_ptrs));
152  break;
153  case 4:
154  thrust::for_each(thrust::cuda::par, begin, end, detail::HarmonicSumFunctor<T,4>(input_ptr,output_ptrs));
155  break;
156  case 5:
157  thrust::for_each(thrust::cuda::par, begin, end, detail::HarmonicSumFunctor<T,5>(input_ptr,output_ptrs));
158  break;
159  default:
160  panda::Error("Invalid number of sums requested of Hrms.");
161  }
162  POP_NVTX_RANGE; // cuda_Hrms_process_execute_kernels
163  POP_NVTX_RANGE; // cuda_Hrms_process
164 }
Here is the call graph for this function:

The documentation for this class was generated from the following files: