Implementing Convolution in SQL

1.1k Views Asked by At

I have a table d with fields x, y, f, (PK is x,y) and would like to implement convolution, where a new column, c, is defined, as the Convolution (2D) given an arbitrary kernel. In a procedural language, this is easy to define (see below). I'm confident it can be defined in SQL using a JOIN, but I'm having trouble doing so.

In a procedural language, I would do:

def conv(x, y):
     c = 0
     # x_ and y_ are pronounced "x prime" and "y prime", 
     # and take on *all* x and y values in the table; 
     # that is, we iterate through *all* rows
     for all x_, y_
         c += f(x_, y_) * kernel(x_ - x, y_ - y)
     return c

kernel can be any arbitrary function. In my case, it's 1/k^(sqrt(x_dist^2, y_dist^2)), with kernel(0,0) = 1.

For performance reasons, we don't need to look at every x_, y_. We can filter it where the distance < threshold.

I think this can be done using a Cartesian product join, followed by aggregate SQL SUM, along with a WHERE clause.

One additional challenge of doing this in SQL is NULLs. A naive implementation would treat them as zeroes. What I'd like to do is instead treat the kernel as a weighted average, and just leave out NULLs. That is, I'd use a function wkernel as my kernel, and modify the code above to be:

def conv(x, y):
     c = 0
     w = 0
     for all x_, y_  
         c += f(x_, y_) * wkernel(x_ - x, y_ - y)
         w += wkernel(x_ - x, y_ - y)
     return c/w

That would make NULLs work great.

To clarify: You can't have a partial observation, where x=NULL and y=3. However, you can have a missing observation, e.g. there is no record where x=2 and y=3. I am referring to this as NULL, in the sense that the entire record is missing. My procedural code above will handle this fine.

I believe the above can be done in SQL (assuming wkernel is already implemented as a function), but I can't figure out how. I'm using Postgres 9.4.


Sample table:

 Table d
 x  | y  | f
 0  | 0  | 1.4
 1  | 0  | 2.3
 0  | 1  | 1.7
 1  | 1  | 1.2

Output (just showing one row):

 x  | y  | c
 0  | 0  | 1.4*1 + 2.3*1/k + 1.7*1/k + 1.2*1/k^1.414

Convolution https://en.wikipedia.org/wiki/Convolution is a standard algorithm used throughout image processing and signal processing, and I believe it can be done in SQL, which is very useful given the large data sets we're now using.

1

There are 1 best solutions below

6
On BEST ANSWER

I assumed a function wkernel, for example:

create or replace function wkernel(k numeric, xdist numeric, ydist numeric)
returns numeric language sql as $$
  select 1. / pow(k, sqrt(xdist*xdist + ydist*ydist))
$$;

The following query gives what you want but without restricting to close values:

select d1.x, d1.y, SUM(d2.f*wkernel(2, d2.x-d1.x, d2.y-d1.y)) AS c
from d d1 cross join d d2 
group by d1.x, d1.y;

 x | y |            c            
---+---+-------------------------
 0 | 0 | 3.850257072695778143380
 1 | 0 | 4.237864186319019036455
 0 | 1 | 3.862992722666908108145
 1 | 1 | 3.725299918145074500610
(4 rows)

With some arbitrary restriction:

select d1.x, d1.y, SUM(d2.f*wkernel(2, d2.x-d1.x, d2.y-d1.y)) AS c
from d d1 cross join d d2
where abs(d2.x-d1.x)+abs(d2.y-d1.y) < 1.1
group by d1.x, d1.y;

 x | y |            c            
---+---+-------------------------
 0 | 0 | 3.400000000000000000000
 1 | 0 | 3.600000000000000000000
 0 | 1 | 3.000000000000000000000
 1 | 1 | 3.200000000000000000000
(4 rows)

For the weighted average point:

select d1.x, d1.y, SUM(d2.f*wkernel(2, d2.x-d1.x, d2.y-d1.y)) / SUM(wkernel(2, d2.x-d1.x, d2.y-d1.y)) AS c
from d d1 cross join d d2
where abs(d2.x-d1.x)+abs(d2.y-d1.y) < 1.1
group by d1.x, d1.y;

Now onto the missing information thing. In the following code, please replace 2 by the maximum distance to be considered.

The idea is the following: We find the bounds of the considered image and we generate all the information that could be needed. With your example and with a maximum scope of 1, we need all the couples (x, y) such that (-1 <= x <= 2) and (-1 <= y <= 2).

Finding bounds and fixing scope=1 and k=2 (call this relation cfg):

SELECT MIN(x), MAX(x), MIN(y), MAX(y), 1, 2
FROM d;

 min | max | min | max | ?column? | ?column? 
-----+-----+-----+-----+----------+----------
   0 |   1 |   0 |   1 |        1 |        2

Generating completed set of values (call this relation completed):

SELECT x.*, y.*, COALESCE(f, 0)
FROM cfg
CROSS JOIN generate_series(minx - scope, maxx + scope) x
CROSS JOIN generate_series(miny - scope, maxy + scope) y
LEFT JOIN d ON d.x = x.* AND d.y = y.*;

 x  | y  | coalesce 
----+----+----------
 -1 | -1 |        0
 -1 |  0 |        0
 -1 |  1 |        0
 -1 |  2 |        0
  0 | -1 |        0
  0 |  0 |      1.4
  0 |  1 |      1.7
  0 |  2 |        0
  1 | -1 |        0
  1 |  0 |      2.3
  1 |  1 |      1.2
  1 |  2 |        0
  2 | -1 |        0
  2 |  0 |        0
  2 |  1 |        0
  2 |  2 |        0
(16 rows)

Now we just have to compute the values with the query given before and the cfg and completed relations. Note that we do not compute convolution for the values on the borders:

SELECT d1.x, d1.y, SUM(d2.f*wkernel(k, d2.x-d1.x, d2.y-d1.y)) / SUM(wkernel(k, d2.x-d1.x, d2.y-d1.y)) AS c
FROM cfg cross join completed d1 cross join completed d2
WHERE d1.x BETWEEN minx AND maxx
  AND d1.y BETWEEN miny AND maxy
  AND abs(d2.x-d1.x)+abs(d2.y-d1.y) <= scope
GROUP BY d1.x, d1.y;

 x | y |            c            
---+---+-------------------------
 0 | 0 | 1.400000000000000000000
 0 | 1 | 1.700000000000000000000
 1 | 0 | 2.300000000000000000000
 1 | 1 | 1.200000000000000000000
(4 rows)

All in one, this gives:

WITH cfg(minx, maxx, miny, maxy, scope, k) AS (
  SELECT MIN(x), MAX(x), MIN(y), MAX(y), 1, 2
  FROM d
), completed(x, y, f) AS (
  SELECT x.*, y.*, COALESCE(f, 0)
  FROM cfg
  CROSS JOIN generate_series(minx - scope, maxx + scope) x
  CROSS JOIN generate_series(miny - scope, maxy + scope) y
  LEFT JOIN d ON d.x = x.* AND d.y = y.*
)
SELECT d1.x, d1.y, SUM(d2.f*wkernel(k, d2.x-d1.x, d2.y-d1.y)) / SUM(wkernel(k, d2.x-d1.x, d2.y-d1.y)) AS c
FROM cfg cross join completed d1 cross join completed d2
WHERE d1.x BETWEEN minx AND maxx
  AND d1.y BETWEEN miny AND maxy
  AND abs(d2.x-d1.x)+abs(d2.y-d1.y) <= scope
GROUP BY d1.x, d1.y;

I hope this helps :-)