Tensorflow. Batch Tensor modify an entry (tensor)

300 Views Asked by At

I am following the example: https://www.tensorflow.org/tutorials/structured_data/time_series.

In my case I have a sensor which collect the data every hour. This one has not being really reliable during the last months and I have lost some data. To solve this problem, the values have being replaced with the previous valid value. I got many duplicated values and I think this is the reason why my NN is unable to predict anything. I do not want to skip the wrong values before creating the dataset because it will create time series with no consecutive values.

I would like to create the timeseries dataset as in the example and then, remove the entries/outputs (tensors) which has certain duplicity in de data or update the tensor values with the value 0.

def hasMultipleDuplicatedElements (mylist, multiplicity):

 return   Counter(mylist[:,0]).most_common(1)[0][1] >multiplicity

WindowGenerator.hasMultipleDuplicatedElements = hasMultipleDuplicatedElements

def dsCleanedRowsWithHighMultiplycity(self,ds,multiplicity):
  
   for batch in ds:
       dataBatch=batch.numpy()
       for j in range (len (dataBatch)):
            selectedDataBatch=dataBatch[j]

            indices = tf.constant([[j] for j in range(len(selectedDataBatch))])
            inputData =(selectedDataBatch[:self.input_width])
            labelData= (selectedDataBatch[self.input_width:])
            if ( hasMultipleDuplicatedElements(inputData,multiplicity)  or 
                             (  hasMultipleDuplicatedElements(labelData,multiplicity) )):
               #print(batch[j])
               tf.tensor_scatter_nd_update(batch[j], indices, 
                                  tf.zeros(shape=selectedDataBatch.shape,dtype=tf.float32), 
                                  name=None) 
               #print(batch[j])

    
    
WindowGenerator.dsCleanedOfRowsWithHighMultipliciy = dsCleanedOfRowsWithHighMultipliciy

def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=32,)

  self.dsCleanedRowsWithHighMultiplycity(ds,10)

  ds = ds.map(self.split_window)

  return ds

The dataset contains batches, each one with 32 entries/outputs(tensors). I scan every entry/output looking for duplicated data, which a minimum of 10 times. I manage to spot this entries and create a new tensor with tf.tensor_scatter_nd_update but what I would like is to update the original tensor inside the batch.

If there is a way to remove the wrong tensor from the batch, it would also be an acceptable solution.

Thanks in advance!

0

There are 0 best solutions below