I have 2 questions about tensorflow 2.0, with the focus on how tensorflow handles combined conditional tests in it's operations graph.
The task: cut up a volume of data points into blocks, and store the indices to the samples that belong to the volume (not the samples themselves).
My initial approach: loop all elements and collect the indices of the data points that are inside the 'bounding volume'. This was pretty slow, no matter how I reordered the compares on the coordinates.
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
i = 0
indices = tf.zero(shape[0], dtype = tf.int32)
for x in X:
if (x[0] > xmin[0]):
if (x[1] > xmin[1]):
if (x[2] <= xmax[2]):
# ...and so on...
indices = tf.concat([indices, i], axis = 0)
i = i + 1
return indices
I then came up with the idea to produce boolean tensors and logically 'and' them to get the indices
of the elements I need. A whole lot faster, as shown in the next sample:
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
# example of 3 different conditions to clip to (a part of) the bounding volume
# X is the data and xmin and xmax are tensors containing the bounding volume
c0 = (X[:,0] > xmin[0])
c1 = (X[:,1] > xmin[1]) # processing all elements
c2 = (X[:,2] <= xmax[2]) # idem
# ... there could be many more conditions, you get the idea..
indices = tf.where(tf.math.logical_and(c1, tf.math.logical_and(c2, c3) )
return indices
# ...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices)
This code produces the correct result, but I wonder if it is optimal.
The first question is about scheduling:
Will the tensorflow graph that holds the operations cull (blocks of) conditional tests if it knows some (blocks of) elements already tested
False
. Because of thelogical_and
combining the logical conditionals, no subsequent conditional test on these elements will ever yield aTrue
.
Indeed, in the above example c1
and c2
are asking questions on elements that may c0
already excluded from the set. Especially when you have a high number of elements to test, this could be a waste of time, even on parallel hardware platforms
So, what if we cascade the tests based on the results of a previous test? Although it seems like a solved problem, this solution is incorrect, because the final indices
tensor will refer to a subset _X
, not to the total set X
:
# X.shape == [elements,features]
# xmin.shape == xmax.shape == [features]
def getIndices(X, xmin, xmax):
c0 = (X[:,0] > xmin[0])
indices = tf.where(c0)
_X = tf.gather(X, indices)
c1 = (_X[:,1] > xmin[1]) # processing only trimmed elements
indices = tf.where(c1)
_X = tf.gather(_X, indices)
c2 = (_X[:,2] <= xmax[2]) # idem
indices = tf.where(c2)
return indices
...
indices = getIndices(X, xmin, xmax)
trimmedX = tf.gather(X, indices) # fails: indices refer to a trimmed subset, not X
I could of course 'solve' this by simply expanding X
, so that each element also includes the index of itself in the original list, and then proceed as before.
So my second question is about functionality:
Does tf have a method to make the GPU/tensor infrastructure provide the bookkeeping without spending memory / time on this seemingly simple problem?
This will return all indices larger than
minimum
and less thanmaximum
when both of these have the same number of features asX