Deanonymizing Bitcoin Transactions An Investigative Study On Large-scale Graph Clustering
Bitcoin has emerged from the fringes of technology to the mainstream recently. With speculation rampant, it has become more and more the subject of harsh criticism in ascertaining its use case. Unfortunately, much of Bitcoin’s present use case is for transactions in online black markets. Towards that end, various studies have sought to partially deanonymize Bitcoin transactions, identifying wallets associated with major players in the space to help forensic analysis taint wallets involved with criminal activity. Relevant past studies, however, have rigidly enforced manually constructed heuristics to perform such deanonymization, paralleling an extensive union-find algorithm. We wish to extend this work by introducing many more heuristics than were previously considered by constructing a separate “heuristics graph” layered atop the transactions graph and performing a graph clustering on this heuristics graph. Towards that end, we explored the performance of various clustering algorithms on the SBM (stochastic block model) as a prototype of the heuristics graph and additionally tested graph preprocessing algorithms, specifically sparsification and coarsening to determine the extent they could speed up computation while retaining reasonable accuracies. We found hierarchical spectral clustering and METIS to have the best performance by the standard purity, NMI, and F-score clustering accuracy metrics. We also found sparsification and coarsening to result in little reduction in time with the former severely detracting from accuracies and the latter less so, suggesting the latter holds potential given implementation optimization in future studies. METIS was subsequently employed to cluster a subset of the full graph due to major time concerns with hierarchical spectral clustering. Several wallet clusters were identified as a result, though the accuracy of this could not be determined due to the limited ground truth available. Future extensions of this work should seek to refine the hierarchical spectral clustering algorithm for its time deficiencies and extend the ground truth available.