Abstract: There is a perception that Bitcoin is utilized for illicit purposes like dark web trading, money laundering, and purchasing ransomware linked to smart city systems. While it cannot identify illicit transactions, blockchain technology stops them. One of the most important methods for spotting possible fraud is anomaly detection. Sadly, the heuristic and signature-based approaches that underpinned earlier detection techniques were insufficient to fully explore the complexity of anomaly detection. Machine learning (ML) is a potential tool for anomaly detection because it can be taught on vast datasets of known malware samples to find patterns and features of events. The goal of research is to develop a fraud and security threat detection model that is more effective than current approaches.
Consequently, ensemble learning can be used to identify anomalies in Bitcoin by combining multiple ML classifiers. The data balancing method in the suggested model is called ADASYN-TL (Adaptive Synthetic + Tomek Link). Hyperparameter tuning involves the use of Bayesian optimization, grid search, and random search techniques. The model's performance is significantly impacted by the hyperparameters. We combined K-Nearest Neighbors, Random Forest, Decision Tree, and Naive Bayes to create the stacking model, which we used for classification. Shapley Additive ex-Planation (SHAP) was utilized to analyze and interpret the stacking model's predictions. Additionally, the model investigates how well various classifiers perform using accuracy, F1-score, and Area Under Curve-Receiver Operating.
In the end, it chooses the best model based on characteristics (AUC-ROC), precision, recall, False Positive Rate (FPR), and execution time. The suggested model aids in the creation of efficient fraud detection models that overcome the shortcomings of the current algorithms. We achieved the highest F1-score of 97%, precision of 96%, recall of 98%, accuracy of 97%, AUC-ROC of 99%, and FPR of 3% with our stacking model, which combines the prediction of multiple classifiers.
| DOI: 10.17148/IARJSET.2024.11744