Arrays are stored as xyzxyz..., I want to get the maximum and minimum for some direction(x or y or z), and here is the test program:
#include <cuda_runtime.h>
#include <cuda_runtime_api.h> // cudaMalloc, cudaMemcpy, etc.
#include <cublas_v2.h>
#include <helper_functions.h> // shared functions common to CUDA Samples
#include <helper_cuda.h> // CUDA error checking
#include <stdio.h> // printf
#include <iostream>
template <typename T>
void print_arr(T *arr, int L)
{
for (int i = 0; i < L; i++)
{
std::cout << arr[i] << " ";
}
std::cout << std::endl;
}
int main()
{
float hV[10] = {3, 0, 7, 1, 2, 8, 6, 7, 6, 4};
print_arr(hV, 10);
float *dV;
cudaMalloc(&dV, sizeof(float) * 10);
cudaMemcpy(dV, hV, sizeof(float) * 10, cudaMemcpyHostToDevice);
cublasHandle_t cublasHandle = NULL;
checkCudaErrors(cublasCreate(&cublasHandle));
int hResult[2] = {0};
checkCudaErrors(cublasIsamax(cublasHandle, 10, dV, 3, hResult + 0));
checkCudaErrors(cublasIsamin(cublasHandle, 10, dV, 3, hResult + 1));
print_arr(hResult, 2);
return 0;
}
expected result:
3 0 7 1 2 8 6 7 6 4
3 2
result:
3 0 7 1 2 8 6 7 6 4
3 5
Is there a problem with this result? Or I misunderstood?
link to cublasIsamin.
cublasIsaminfinds the index of the minimum value. This index is not computed over the original array, but also takes theincxparameter into account. Furthermore, it will search overnelements (the first parameter) regardless of other parameters such asincx.You have an array like this:
Therefore the minimum
xvalue is at index 3, searching over a total ofn=4(not 10) elements. With respect to the x values, we must begin searchingdVat offset 0 with an increment of 3, for a maximum ofn=4elements.Taking all this into account, the correct calls are:
And the expected result is: