I have a table d
with fields x, y, f
, (PK is x,y
) and would like to implement convolution, where a new column, c
, is defined, as the Convolution (2D) given an arbitrary kernel. In a procedural language, this is easy to define (see below). I'm confident it can be defined in SQL using a JOIN, but I'm having trouble doing so.
In a procedural language, I would do:
def conv(x, y):
c = 0
# x_ and y_ are pronounced "x prime" and "y prime",
# and take on *all* x and y values in the table;
# that is, we iterate through *all* rows
for all x_, y_
c += f(x_, y_) * kernel(x_ - x, y_ - y)
return c
kernel
can be any arbitrary function. In my case, it's 1/k^(sqrt(x_dist^2, y_dist^2))
, with kernel(0,0) = 1
.
For performance reasons, we don't need to look at every x_, y_
. We can filter it where the distance < threshold.
I think this can be done using a Cartesian product join, followed by aggregate SQL SUM, along with a WHERE clause.
One additional challenge of doing this in SQL is NULLs. A naive implementation would treat them as zeroes. What I'd like to do is instead treat the kernel as a weighted average, and just leave out NULLs. That is, I'd use a function wkernel
as my kernel, and modify the code above to be:
def conv(x, y):
c = 0
w = 0
for all x_, y_
c += f(x_, y_) * wkernel(x_ - x, y_ - y)
w += wkernel(x_ - x, y_ - y)
return c/w
That would make NULLs work great.
To clarify: You can't have a partial observation, where x=NULL and y=3. However, you can have a missing observation, e.g. there is no record where x=2 and y=3. I am referring to this as NULL, in the sense that the entire record is missing. My procedural code above will handle this fine.
I believe the above can be done in SQL (assuming wkernel
is already implemented as a function), but I can't figure out how. I'm using Postgres 9.4.
Sample table:
Table d
x | y | f
0 | 0 | 1.4
1 | 0 | 2.3
0 | 1 | 1.7
1 | 1 | 1.2
Output (just showing one row):
x | y | c
0 | 0 | 1.4*1 + 2.3*1/k + 1.7*1/k + 1.2*1/k^1.414
Convolution https://en.wikipedia.org/wiki/Convolution is a standard algorithm used throughout image processing and signal processing, and I believe it can be done in SQL, which is very useful given the large data sets we're now using.
I assumed a function wkernel, for example:
The following query gives what you want but without restricting to close values:
With some arbitrary restriction:
For the weighted average point:
Now onto the missing information thing. In the following code, please replace 2 by the maximum distance to be considered.
The idea is the following: We find the bounds of the considered image and we generate all the information that could be needed. With your example and with a maximum scope of 1, we need all the couples (x, y) such that (-1 <= x <= 2) and (-1 <= y <= 2).
Finding bounds and fixing scope=1 and k=2 (call this relation cfg):
Generating completed set of values (call this relation completed):
Now we just have to compute the values with the query given before and the cfg and completed relations. Note that we do not compute convolution for the values on the borders:
All in one, this gives:
I hope this helps :-)