KNN Classifier to detect potential credit card fraud
Through this project, I was attempting to understand the K-nearest-neighbors classification algorithm and the process of selecting the optimal estimators.
Understanding the dataset#
The dataset was a CSV file consisting of PCA values for certain transaction information to protect consumer privacy. The Amount feature is the amount of money in that particular transaction and the Class feature contains two classes safe
and fraud
Objective#
The goal was to find the optimal parameters of the KNN estimator using cross validation and then provide a final estimate of the model’s generalization performance via the test set.
Methodology#
A grid search was performed to optimize the following hyperparameters:
- The number of neighbors
n_neighbors
- The type of weights considered :
uniform
ordistance
based on whether each neighbor was to be assigned a uniform weight or a weight proportional to the inverse of the distance from the query point - The type of metrics considered:
minkowski
orchebyshev
to see which distance measurement metric is better suited to the dataset
The gridseach was done by specifying the number of folds to 5.
Results#
The best parameters are {'metric': 'minkowski', 'n_neighbors': 3, 'weights': 'uniform'}
The best accuracy on the training data is 0.9546875
The best accuracy on the testing data is 0.90625
Link to Code Repository#
https://github.com/vigneshsundararajan/KNN-credit-fraud-dataset