I'm doing some research into fraud detection for academic purposes. I' d like to know specifically about techniques for feature selection\engeneering from a transactional dataset. In more details, given a dataset of transactions (credit card for example), what kind of features are selected to be used on the model and how are they engineered?
All the papers I've come across focus on the model itself (SVM, NN, ...) not really touching on this subject.
Also, if anyone knows of public datasets that are not anonymized - that would also help.
Thanks
Having a good understanding of feature selection/ranking can be a great asset for a data scientist or machine learning practitioner. A good grasp of these methods leads to better performing models, better understanding of the underlying structure and characteristics of the data and leads to better intuition about the algorithms that underlie many machine learning models.
There are in general two reasons why feature selection is used: 1. Reducing the number of features, to reduce overfitting and improve the generalization of models. 2. To gain a better understanding of the features and their relationship to the response variables.
Possible methods:
Univariate feature selection:
Tree based methods:
Others: