I am trying to increase the size of a numeric dataset that I have been working with. The dataset comprises the columns: [‘Name’, ‘O3_1hr’, ‘O3_4hr’, ‘O3_8hr’, ‘PM10 1 hr’, ‘PM10’, ‘AQI_O3_1hr’, ‘AQI_O3_4hr’, ‘AQI_PM10’, ‘AQI_Site’, ‘Date’, ‘Time’, ‘Latitude’, Longitude’].
Previously, I attempted to use a GAN (Generative Adversarial Network) for data augmentation. The architecture of GAN is:
def build_generator(latent_dim):
model = Sequential()
model.add(Dense(128, input_dim=latent_dim))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(256))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(512))
model.add(LeakyReLU(alpha=0.2))
model.add(BatchNormalization(momentum=0.8))
model.add(Dense(14, activation='sigmoid'))
model.add(Reshape((14,)))
noise = Input(shape=(latent_dim,))
feature_gen = model(noise)
return Model(noise, feature_gen)
def build_discriminator():
model = Sequential()
model.add(Dense(512, input_dim=14))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Dense(256))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Dense(128))
model.add(LeakyReLU(alpha=0.2))
model.add(Dropout(0.4))
model.add(Dense(1, activation='sigmoid'))
feature = Input(shape=(14,))
validity = model(feature)
return Model(feature, validity)
def build_gan(generator, discriminator):
discriminator.trainable = False
gan_input = Input(shape=(100,))
generated_feature = generator(gan_input)
gan_output = discriminator(generated_feature)
gan = Model(gan_input, gan_output)
gan.compile(loss='binary_crossentropy', optimizer=Adam(lr=0.0001, beta_1=0.5))
return gan
which yielded the following result:
Epoch 999/1000 [Dis loss: 0.3747815638780594, acc real: 78.91%, acc fake: 82.81%] [Gen loss: 2.687302589416504.
I did a lot of parameter tunning stuff. This accuracy is not acceptable, very low. I would greatly appreciate any guidance or recommendations in this regard. Specifically, I am interested in alternative methods of data augmentation or any other techniques that can effectively expand a numeric dataset while preserving the integrity and patterns that already exist in the data.