update aTable set a,b,c = func(x,y,z,…)

56 Views Asked by At

I need a quick advice how-to. I mention that the following scenario is based on the use of c_api available already to my monetdblite compilation on 64bit, intention is to use it with some adhoc C written functions.

Short: how can I achieve or simulate the following scenario: update aTable set a,b,c = func(x,y,z,…)

Long. Many algorithms are returning more than one variable as, for instance, multiple regression.

bool m_regression(IN const double **data, IN const int cols, IN const int rows, OUT double *fit_values, OUT double *residuals, OUT double *std_residuals, OUT double &p_value);

In order to minimize the transfer of data between monetdb and heavy computational function, all those results are generated in one step. Question is how can I transfer them back at once, minimizing computational time and memory traffic between monetdb and external C/C++(/R/Python) function?


My first thought to solve this is something like this:

1. update aTable set dummy = func_compute(x,y,z,…)

where dummy is a temporary __int64 field and func_compute will compute all the necessary outputs and store the result into a dummy pointer. To make sure is no issue with constant estimation, first returned value in the array will be the real dummy pointer, the rest just an incremented value of dummy + i;

2. update aTable set a = func_ret(dummy, 1), b= func_ret (dummy, 2), c= func_ret (dummy, 3) [, dummy=func_free(dummy)];

Assuming the func_ret will get the dummy in the same order that it was returned on first call, I would just copy the prepared result into provided storage; In case the order is not preserved, I will need an extra step to get the minimum (real dummy pointer), then to use the offset of current value to lookup in my array.

__int64 real_dummy = __inputs[0][0];

double *my_pointer_data = (double *) (real_dummy + __inputs[1][0] * sizeof(double)* row_count);

memcpy(__outputs[0], my_pointer_data, sizeof(double)* row_count);

// or ============================

__int64 real_dummy = minimum(__inputs[0]);

double *my_pointer_data = (double *) (real_dummy + __inputs[0][1] * sizeof(double)* row_count);

for (int i=0;i<row_count;i++)
   __outputs[0][i] = my_pointer_data[__inputs[0][i] - real_dummy];

It is less relevant how am I going to free the temporary memory, can be in the last statement in update or in a new fake update statement using func_free. Problem is that it doesn’t look to me that, even if I save some computational (big) time, the passing of the dummy is still done 3 times (any chance that memory is actually not copied?).

Is it any other better way of achieving this?

1

There are 1 best solutions below

0
On

I am not aware of a good way of doing this, sorry. You could retrieve the table, add your columns as BATs in whichever way you like and write it back.