Publications

Deanonymizing Bitcoin Transactions An Investigative Study On Large-scale Graph Clustering

Bitcoin has emerged from the fringes of technology to the mainstream recently. With speculation rampant, it has become more and more the subject of harsh criticism in ascertaining its use case. Unfortunately, much of Bitcoin’s present use case is for transactions in online black markets. Towards that end, various studies have sought to partially deanonymize Bitcoin transactions, identifying wallets associated with major players in the space to help forensic analysis taint wallets involved with criminal activity. Relevant past studies, however, have rigidly enforced manually constructed heuristics to perform such deanonymization, paralleling an extensive union-find algorithm. We wish to extend this work by introducing many more heuristics than were previously considered by constructing a separate “heuristics graph” layered atop the transactions graph and performing a graph clustering on this heuristics graph. Towards that end, we explored the performance of various clustering algorithms on the SBM (stochastic block model) as a prototype of the heuristics graph and additionally tested graph preprocessing algorithms, specifically sparsification and coarsening to determine the extent they could speed up computation while retaining reasonable accuracies. We found hierarchical spectral clustering and METIS to have the best performance by the standard purity, NMI, and F-score clustering accuracy metrics. We also found sparsification and coarsening to result in little reduction in time with the former severely detracting from accuracies and the latter less so, suggesting the latter holds potential given implementation optimization in future studies. METIS was subsequently employed to cluster a subset of the full graph due to major time concerns with hierarchical spectral clustering. Several wallet clusters were identified as a result, though the accuracy of this could not be determined due to the limited ground truth available. Future extensions of this work should seek to refine the hierarchical spectral clustering algorithm for its time deficiencies and extend the ground truth available.

FairTear: Automated Probabilistic Analysis on Dataset Models

Given the extent to which machine learning algorithms have come to characterize lives, both on a daily and longscale basis, the study of their ingrained biases is much in order. Many tools have emerged to understand such biases, both those that explicitly look at the underlying classifier code (white-box) and those that are agnostic thereof (black-box). White-box tools can provide greater insight, but are typically limited in the types of models they can analyze. A new tool, FairSquare, provides a method of applying white-box techniques to more complex models. However, since FairSquare requires a new classifier syntax and knowledge of an underlying population model, there was much left to be desired as an end-user. We present a tool, FairTear, which provides a clean UI through which end users can feed in their classifier and view its analysis result from the FairSquare tool. Our tool automates both the process of generating the population model and the process of converting a classifier to the FairSquare syntax. In turn, the user is fully abstracted from the FairSquare back-end, allowing them to determine the fairness of his algorithm without any additional knowledge than what is contained in their code. FairTear is capable of making use of nearly all the supported FairSquare functionality, supporting multi-level conditioning of population model features and different feature distributions (Gaussian and multi-step uniform). FairTear also integrates with the popular scikit-learn Python machine learning package, supporting several of its classifiers (decision trees, SVMs, and neural networks) in addition to additional preprocessing steps (StandardScaler). In doing so, we hope to allow a variety of endusers, from academia and industry alike, to take advantage of our system in real-world machine learning pipelines. Tests revealed full automation on all ends (i.e. supporting each of the classifiers referenced above), with fairness results being displayed on the front-end and an appropriate classifier decomposition visible on the back-end. In line with that, we considered further extensions to both our tool and FairSquare. These largely revolve around supporting a greater extent of the sklearn library, including additional distributions, preprocessing features, and classifiers.

An Analysis of Selfish Mining Attacker Incentives in Bitcoin and Ethereum

Both the Bitcoin and Ethereum decentralized systems rely on the same distributed public Blockchain mining model of transmitting and recording history. Previous thought was that this system would be held in check through a balanced proof of work incentive system. However, previous studies have revealed an attack dubbed “selfish mining” whereby miners can exploit this incentive system to increase their expected rewards. Such models have further been applied to studying the transaction fee system that is expected to largely replace the block rewards system over the following years. Despite extensive study in the past, such models have failed to include the associated effects of these selfish mining attacks on exchange rates, which is of primary focus herein. These models are further extend to the context of the Ethereum network, which has not been studied with respect to selfish mining previously. In addition, this study sought to align and compare the current empirical status of the Bitcoin and Ethereum networks to the model results, to determine whether it is currently in the miners’ economic interest to engage in selfish mining or not. In the end, the necessary devaluation was studied as a function of the attacker’s hashrate, selfish mining (SM) hashrate proportion, SM engagement delay, and uncle block reward (Ethereum) were obtained, and it was found that the current state of Bitcoin and Ethereum are highly conducive to selfish mining, making it of interest to find countermeasures thereof in future studies.

Optimal Charging Station Locations

Tesla and electrical vehicles (EVs) have become more prevalent in the last decade. With the great rise in projected growth in EVs, the issue of placing electrical charging stations has grown to the forefronts of customers’ and business owners’ minds alike. We seek to address this problem, namely by investigating policies to determine the optimal locations to place electrical charging stations in a city setting. For this task, we developed a lookup-table model, with altered updating equations, and tested a few learning policies, in the forms of online and offline Knowledge Gradient Exploration (KG), Interval Estimation (IE), Boltzmann Exploration, and Pure Exploitation. Upon doing so, we found that the Knowledge Gradient Policy was the most effective in maximizing our total usage over all stations within our time horizon. We therefore, recommend it as a baseline for building future policies in this context of maximizing station utilization. Future studies may wish to expand upon the bottleneck employed in the model for charging stations and also time inhomogeneity