I am trying to have a neural network predict whether or not a transaction is suspicious. I have created 50000 synthetic transactions for training (Format is shown below). But no matter what I do, I can only get the NN to slowly learn the training data to around 55% accuracy, then completely fail on testing data. I have tried different network architecture, learning rate, batch size, epochs etc. I suspect there is a problem with the pre-processing. Thank you in advance.
Below is code, end of output log, and json format:
# Read the JSON data
with open('new_transactions_training.json') as f:
data = json.load(f)
# Initialize lists to store values for each column
types = []
amounts = []
transactionTimes = []
transactionLocations = []
devices = []
paymentMethods = []
recentChanges = []
suspiciousFlags = []
# Iterate over each transaction dictionary
for transaction in data:
types.append(transaction['type'])
amounts.append(transaction['amount'])
transactionTimes.append(transaction['transactionTime'])
transactionLocations.append(transaction['transactionLocation'])
devices.append(transaction['device'])
paymentMethods.append(transaction['paymentMethod'])
recentChanges.append(transaction['recentChangeInAccountDetails'])
suspiciousFlags.append(transaction['suspicious'])
# Create DataFrame from the lists of values
df = pd.DataFrame({
'type': types,
'amount': amounts,
'transactionTime': transactionTimes,
'transactionLocation': transactionLocations,
'device': devices,
'paymentMethod': paymentMethods,
'recentChangeInAccountDetails': recentChanges,
'suspicious': suspiciousFlags
})
# Extract features and labels
X = df.drop('suspicious', axis=1) # Features
y = df['suspicious'] # Labels
# Preprocess the features
# Encode categorical variables and scale numerical features
categorical_features = ['type', 'transactionLocation', 'device', 'paymentMethod']
numerical_features = ['amount', 'transactionTime', 'recentChangeInAccountDetails']
# Define transformers for the preprocessing pipeline
categorical_transformer = OneHotEncoder(handle_unknown='ignore')
numerical_transformer = StandardScaler()
# Combine transformers into a ColumnTransformer
preprocessor = ColumnTransformer(
transformers=[
('num', numerical_transformer, numerical_features),
('cat', categorical_transformer, categorical_features)
])
# Apply transformations
X_processed = preprocessor.fit_transform(X)
# Convert sparse matrices to dense arrays
if isinstance(X_processed, csr_matrix):
X_processed = X_processed.toarray()
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)
# Define the neural network architecture
model = Sequential([
Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
Dense(32, activation='relu'),
Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_data=(X_test, y_test))
# Evaluate the model
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test Accuracy: {test_acc:.3f}')
End of output Log:
Epoch 20/20
1/1250 [..............................] - ETA: 1s - loss: 0.6764 - accuracy: 0.5938
46/1250 [>.............................] - ETA: 1s - loss: 0.6760 - accuracy: 0.5428
92/1250 [=>............................] - ETA: 1s - loss: 0.6730 - accuracy: 0.5649
137/1250 [==>...........................] - ETA: 1s - loss: 0.6740 - accuracy: 0.5614
183/1250 [===>..........................] - ETA: 1s - loss: 0.6749 - accuracy: 0.5570
227/1250 [====>.........................] - ETA: 1s - loss: 0.6746 - accuracy: 0.5592
273/1250 [=====>........................] - ETA: 1s - loss: 0.6747 - accuracy: 0.5568
318/1250 [======>.......................] - ETA: 1s - loss: 0.6759 - accuracy: 0.5540
365/1250 [=======>......................] - ETA: 0s - loss: 0.6756 - accuracy: 0.5554
411/1250 [========>.....................] - ETA: 0s - loss: 0.6760 - accuracy: 0.5544
456/1250 [=========>....................] - ETA: 0s - loss: 0.6758 - accuracy: 0.5529
500/1250 [===========>..................] - ETA: 0s - loss: 0.6761 - accuracy: 0.5515
545/1250 [============>.................] - ETA: 0s - loss: 0.6761 - accuracy: 0.5513
590/1250 [=============>................] - ETA: 0s - loss: 0.6763 - accuracy: 0.5502
634/1250 [==============>...............] - ETA: 0s - loss: 0.6763 - accuracy: 0.5501
680/1250 [===============>..............] - ETA: 0s - loss: 0.6762 - accuracy: 0.5502
725/1250 [================>.............] - ETA: 0s - loss: 0.6764 - accuracy: 0.5508
769/1250 [=================>............] - ETA: 0s - loss: 0.6766 - accuracy: 0.5515
815/1250 [==================>...........] - ETA: 0s - loss: 0.6766 - accuracy: 0.5512
861/1250 [===================>..........] - ETA: 0s - loss: 0.6771 - accuracy: 0.5500
906/1250 [====================>.........] - ETA: 0s - loss: 0.6774 - accuracy: 0.5496
950/1250 [=====================>........] - ETA: 0s - loss: 0.6776 - accuracy: 0.5489
994/1250 [======================>.......] - ETA: 0s - loss: 0.6777 - accuracy: 0.5492
1039/1250 [=======================>......] - ETA: 0s - loss: 0.6779 - accuracy: 0.5488
1084/1250 [=========================>....] - ETA: 0s - loss: 0.6779 - accuracy: 0.5484
1129/1250 [==========================>...] - ETA: 0s - loss: 0.6780 - accuracy: 0.5475
1175/1250 [===========================>..] - ETA: 0s - loss: 0.6779 - accuracy: 0.5481
1222/1250 [============================>.] - ETA: 0s - loss: 0.6779 - accuracy: 0.5482
1250/1250 [==============================] - 2s 1ms/step - loss: 0.6777 - accuracy: 0.5484 - val_loss: 0.6888 - val_accuracy: 0.5209
1/313 [..............................] - ETA: 4s - loss: 0.7425 - accuracy: 0.4062
49/313 [===>..........................] - ETA: 0s - loss: 0.6928 - accuracy: 0.5045
99/313 [========>.....................] - ETA: 0s - loss: 0.6935 - accuracy: 0.5079
148/313 [=============>................] - ETA: 0s - loss: 0.6916 - accuracy: 0.5065
198/313 [=================>............] - ETA: 0s - loss: 0.6898 - accuracy: 0.5115
246/313 [======================>.......] - ETA: 0s - loss: 0.6898 - accuracy: 0.5163
296/313 [===========================>..] - ETA: 0s - loss: 0.6892 - accuracy: 0.5201
313/313 [==============================] - 0s 1ms/step - loss: 0.6888 - accuracy: 0.5209
Test Accuracy: 0.521
Json Snippet:
[{"type":"WITHDRAWAL","amount":280.65,"transactionTime":1708716420.000000000,"transactionLocation":"AUSTRALIA","device":"SMART_WATCH","paymentMethod":"DEBIT_CARD","recentChangeInAccountDetails":false,"suspicious":false},{"type":"WITHDRAWAL","amount":917.46,"transactionTime":1708742400.000000000,"transactionLocation":"AUSTRALIA","device":"MOBILE","paymentMethod":"WIRE_TRANSFER","recentChangeInAccountDetails":false,"suspicious":false},
Expected 100% accuracy as the code for different transactions and the associated labels is no more than 10 if else statements.