Agumentation of Tabular Dataset

57 Views Asked by At

I am trying to increase the size of a numeric dataset that I have been working with. The dataset comprises the columns: [‘Name’, ‘O3_1hr’, ‘O3_4hr’, ‘O3_8hr’, ‘PM10 1 hr’, ‘PM10’, ‘AQI_O3_1hr’, ‘AQI_O3_4hr’, ‘AQI_PM10’, ‘AQI_Site’, ‘Date’, ‘Time’, ‘Latitude’, Longitude’].

Previously, I attempted to use a GAN (Generative Adversarial Network) for data augmentation. The architecture of GAN is:

def build_generator(latent_dim):
  model = Sequential()
  model.add(Dense(128, input_dim=latent_dim))
  model.add(LeakyReLU(alpha=0.2))
  model.add(BatchNormalization(momentum=0.8))
  model.add(Dense(256))
  model.add(LeakyReLU(alpha=0.2))
  model.add(BatchNormalization(momentum=0.8))
  model.add(Dense(512))
  model.add(LeakyReLU(alpha=0.2))
  model.add(BatchNormalization(momentum=0.8))
  model.add(Dense(14, activation='sigmoid'))
  model.add(Reshape((14,)))
  noise = Input(shape=(latent_dim,))
  feature_gen = model(noise)
  return Model(noise, feature_gen)

def build_discriminator():
  model = Sequential()
  model.add(Dense(512, input_dim=14))
  model.add(LeakyReLU(alpha=0.2))
  model.add(Dropout(0.4))
  model.add(Dense(256))
  model.add(LeakyReLU(alpha=0.2))
  model.add(Dropout(0.4))
  model.add(Dense(128))
  model.add(LeakyReLU(alpha=0.2))
  model.add(Dropout(0.4))
  model.add(Dense(1, activation='sigmoid'))
  feature = Input(shape=(14,))
  validity = model(feature)
  return Model(feature, validity)


def build_gan(generator, discriminator):
  discriminator.trainable = False
  gan_input = Input(shape=(100,))
  generated_feature = generator(gan_input)
  gan_output = discriminator(generated_feature)
  gan = Model(gan_input, gan_output)
  gan.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5))
  return gan

which yielded the following result:

Epoch 999/1000 [Dis loss: 0.3747815638780594, acc real: 78.91%, acc fake: 82.81%] [Gen loss: 2.687302589416504. 

I did a lot of parameter tunning stuff. This accuracy is not acceptable, very low. I would greatly appreciate any guidance or recommendations in this regard. Specifically, I am interested in alternative methods of data augmentation or any other techniques that can effectively expand a numeric dataset while preserving the integrity and patterns that already exist in the data.

0

There are 0 best solutions below