Abstract
Blockchain datasets, such as those generated by popular cryptocurrencies Bitcoin, Ethereum, and others, are intriguing examples of big data. Analysis of these datasets has diverse applications, such as detecting fraud and illegal transactions, characterizing major services, identifying financial hotspots, and characterizing usage and performance characteristics of large peer-to-peer consensus-based systems. Unsupervised learning methods in general, and clustering methods in particular, hold the potential to discover unanticipated patterns leading to valuable insights. However, the volume, velocity, and variety of blockchain data, as well as the difficulties in evaluating results, pose significant challenges to the efficient and effective application of clustering methods to blockchain data. Nevertheless, recent and ongoing work has adapted classic methods, as well as developed new methods tailored to the characteristics of such data. This chapter motivates the study of clustering methods for blockchain data, and introduces the key blockchain concepts from a data-centric perspective. It presents different models and methods used for clustering blockchain data, and describes the challenges and some solutions to the problem of evaluating such methods.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
M. Ankerst, M.M. Breunig, H.P. Kriegel, J. Sander, Optics: ordering points to identify the clustering structure, in: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD’99. ACM, New York (1999), pp. 49–60. https://doi.org/10.1145/304182.304187
M.K. Awan, A. Cortesi, Blockchain transaction analysis using dominant sets, in Computer Information Systems and Industrial Management, ed. by K. Saeed, W. Homenda, R. Chaki. Springer, Cham (2017), pp. 229–239
L. Backstrom, C. Dwork, J. Kleinberg, Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography, in Proceedings of the 16th International World Wide Web Conference (2007)
G. Becker, Merkle signature schemes,Merkle trees and their cryptanalysis. Ruhr-Universität Bochum (2008)
Bitcoin price—time series—daily (2018). https://docs.google.com/spreadsheets/d/1cdP-AArCNUB9jS8hEYFFC1qxp4DMEpBCvvC5yuopD68/
Bitcoin Genesis Block, Blockchain.info Blockchain Explorer (2009). https://blockchain.info/tx/4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b
Blockchain Luxembourg S.A., Address tags. Bitcoin address tags database (2018). https://blockchain.info/tags
Blockchain Luxembourg S.A., Blockchain explorer (2018). https://blockchain.info/
J. Bondy, U. Murty, Graph Theory (Springer, London, 2008)
J. Bonneau, A. Miller, J. Clark, A. Narayanan, J.A. Kroll, E.W. Felten, SoK: research perspectives and challenges for Bitcoin and cryptocurrencies, in Proceedings of the 36th IEEE Symposium on Security and Privacy, San Jose, California (2015), pp. 104–121
V. Buterin, et al., Ethereum whitepaper (2013). https://github.com/ethereum/wiki/wiki/White-Paper
Chainanalysis, Inc., Chainanalysis reactor (2018). https://www.chainalysis.com/
CoinMarketCap, Historical data for Bitcoin (2018). https://coinmarketcap.com/currencies/bitcoin/historical-data/
K. Collins, Inside the digital heist that terrorized the world—and only made $100k. Quartz (2017). https://qz.com/985093/inside-the-digital-heist-that-terrorized-the-world-and-made-less-than-100k/
J.A. Cuesta-Albertos, A. Gordaliza, C. Matran, Trimmed k-means: an attempt to robustify quantizers. Ann. Stat. 25(2), 553–576 (1997)
D. Di Francesco Maesa, A. Marino, L. Ricci, Uncovering the Bitcoin blockchain: an analysis of the full users graph, in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (2016), pp. 537–546. https://doi.org/10.1109/DSAA.2016.52
C. Ding, X. He, K-means clustering via principal component analysis, in Proceedings of the Twenty-first International Conference on Machine Learning, ICML’04 (ACM, Banff, 2004), p. 29. https://doi.org/10.1145/1015330.1015408
R. Dubes, A.K. Jain, Validity studies in clustering methodologies. Pattern Recogn. 11, 235–254 (1979)
A. Epishkina, S. Zapechnikov, Discovering and clustering hidden time patterns in blockchain ledger, in First International Early Research Career Enhancement School on Biologically Inspired Cognitive Architectures (2017)
D. Ermilov, M. Panov, Y. Yanovich, Automatic Bitcoin address clustering, in Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico (2017)
T. Fawcett, ROC graphs: notes and practical considerations for researchers. Pattern Recogn. Lett. 27(8), 882–891 (2004)
M. Fleder, M.S. Kester, S. Pillai, Bitcoin transaction graph analysis. CoRR (2015). abs/1502.01657
B. Fung, Bitcoin got a big boost in 2017. Here are 5 other cryptocurrencies to watch in 2018. Washington Post—Blogs (2018)
J. Gan, Y. Tao, Dbscan revisited: mis-claim, un-fixability, and approximation, in Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD’15 (ACM, New York, 2015), pp. 519–530. https://doi.org/10.1145/2723372.2737792
Z. Ghahramani, Unsupervised learning, in Advanced Lectures on Machine Learning, ed. by O. Bousquet, U. von Luxburg, G. Rätsch. Lecture Notes in Computer Science, vol. 3176, chap. 5 (Springer, Berlin, 2004), pp. 72–112
A. Gunawan, A faster algorithm for DBSCAN. Master’s Thesis, Technical University of Eindhoven (2013)
M. Harrigan, C. Fretter, The unreasonable effectiveness of address clustering, in International IEEE Conferences on Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) (2016), pp. 368–373. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071
Y. He, H. Tan, W. Luo, S. Feng, J. Fan, MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comp. Sci. 8(1), 83–99 (2014)
B. Huang, Z. Liu, J. Chen, A. Liu, Q. Liu, Q. He, Behavior pattern clustering in blockchain networks. Multimed. Tools Appl. 76(19), 20099–20110 (2017). https://doi.org/10.1007/s11042-017-4396-4
A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999). https://doi.org/10.1145/331499.331504.
A. Janda, WalletExplorer.com: smart Bitcoin block explorer (2018). Bitcoin block explorer with address grouping and wallet labeling
D. Kaminsky, Black ops of TCP/IPi. Presentation slides (2011). http://dankaminsky.com/2011/08/05/bo2k11/
T. Kohonen, Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013). https://doi.org/10.1016/j.neunet.2012.09.018. Twenty-fifth Anniversary Commemorative Issue
H. Kuzuno, C. Karam, Blockchain explorer: an analytical process and investigation environment for Bitcoin, in Proceedings of the APWG Symposium on Electronic Crime Research (eCrime) (2017), pp. 9–16. https://doi.org/10.1109/ECRIME.2017.7945049
P.C. Mahalanobis, On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)
S.T. Mai, I. Assent, M. Storgaard, AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16 (ACM, New York, 2016), pp. 1025–1034. https://doi.org/10.1145/2939672.2939750
J. McCaffrey, Data clustering using entropy minimization. Visual Studio Magazine (2013)
S. Meiklejohn, M. Pomarole, G. Jordan, K. Levchenko, D. McCoy, G.M. Voelker, S. Savage, A fistful of Bitcoins: characterizing payments among men with no names, in Proceedings of the Conference on Internet Measurement, IMC’13, (ACM, Barcelona, 2013), pp. 127–140. https://doi.org/10.1145/2504730.2504747
R.C. Merkle, A digital signature based on a conventional encryption function, in Advances in Cryptology—CRYPTO’87, ed. by C. Pomerance (Springer, Berlin, 1988), pp. 369–378
I. Miers, C. Garman, M. Green, A.D. Rubin, Zerocoin: anonymous distributed e-cash from Bitcoin, in Proceedings of the IEEE Symposium on Security and Privacy (2013)
P. Monamo, V. Marivate, B. Twala, Unsupervised learning for robust Bitcoin fraud detection, in Proceedings of the 2016 Information Security for South Africa (ISSA 2016) Conference, Johannesburg, South Africa (2016), pp. 129–134
C.M. Nachiappan, P. Pattanayak, S. Verma, V. Kalyanaraman, Blockchain technology: beyond Bitcoin. Technical Report, Sutardja Center for Entrepreneurship & Technology, University of California, Berkeley (2015)
S. Nakamoto, Bitcoin: a peer-to-peer electronic cash system. Pseudonymous posting (2008). Archived at https://bitcoin.org/en/bitcoin-paper
R. Norvill, B.B.F. Pontiveros, R. State, I. Awan, A. Cullen, Automated labeling of unknown contracts in ethereum, in Proceedings of the 26th International Conference on Computer Communication and Networks (ICCCN), (2017), pp. 1–6. https://doi.org/10.1109/ICCCN.2017.8038513
M. Ober, S. Katzenbeisser, K. Hamacher, Structure and anonymity of the Bitcoin transaction graph. Future Internet 5(2), 237–250 (2013). https://doi.org/10.3390/fi5020237, http://www.mdpi.com/1999-5903/5/2/237
M.S. Ortega, The Bitcoin transaction graph—anonymity. Master’s Thesis, Universitat Oberta de Catalunya, Barcelona (2013)
V.C. Osamor, E.F. Adebiyi, J.O. Oyelade, S. Doumbia, Reducing the time requirement of k-means algorithm. PLoS One 7(12), 1–10 (2012). https://doi.org/10.1371/journal.pone.0049946
S. Patel, Blockchains for publicizing available scientific datasets. Master’s Thesis, The University of Mississippi (2017)
T. Pham, S. Lee, Anomaly detection in Bitcoin network using unsupervised learning methods (2017). arXiv:1611.03941v1 [cs.LG] https://arxiv.org/abs/1611.03941v1
S. Pongnumkul, C. Siripanpornchana, S. Thajchayapong, Performance analysis of private blockchain platforms in varying workloads, in Proceedings of the 26th International Conference on Computer Communication and Networks (ICCCN) (2017), pp. 1–6. https://doi.org/10.1109/ICCCN.2017.8038517
B. Raskutti, C. Leckie, An evaluation of criteria for measuring the quality of clusters. in Proceedings of the 16th International Joint Conference on Artificial Intelligence—Volume 2, IJCAI’99. Stockholm, Sweden (1999), pp. 905–910. http://dl.acm.org/citation.cfm?id=1624312.1624348
S. Raval, Decentralized applications: harnessing Bitcoin’s blockchain technology. O’Reilly Media (2016). ISBN-13: 978-1-4919-2454-9
F. Reid, M. Harrigan, An analysis of anonymity in the Bitcoin system (2012). arXiv:1107.4524v2 [physics.soc-ph]. https://arxiv.org/abs/1107.4524
E. Schubert, A. Koos, T. Emrich, A. Züfle, K.A. Schmid, A. Zimek, A framework for clustering uncertain data. Proc. VLDB Endow. 8(12), 1976–1979 (2015). https://doi.org/10.14778/2824032.2824115
E. Schubert, J. Sander, M. Ester, H.P. Kriegel, X. Xu, DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42(3), 19:1–19:21 (2017). https://doi.org/10.1145/3068335
D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)
What is Bitcoin vanity address? (2017). http://bitcoinvanitygen.com/
H. Xiong, J. Wu, J. Chen, K-means clustering versus validation measures: A data distribution perspective, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’06, Philadelphia, PA, USA (2006), pp. 779–784. https://doi.org/10.1145/1150402.1150503
X. Xu, N. Yuruk, Z. Feng, T.A.J. Schweiger, Scan: a structural clustering algorithm for networks, in Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07 (ACM, New York, 2007), pp. 824–833. https://doi.org/10.1145/1281192.1281280
Y. Yanovich, P. Mischenko, A. Ostrovskiy, Shared send untangling in Bitcoin. The Bitfury Group white paper (2016) (Version 1.0)
J. Yli-Huumo, D. Ko, S. Choi, S. Park, K. Smolander, Where is current research on blockchain technology?—a systematic review. PLoS One 11(10), e0163477 (2016). https://doi.org/10.1371/journal.pone.0163477
D. Zhang, S. Chen, Z.H. Zhou, Entropy-inspired competitive clustering algorithms. Int. J. Softw. Inform. 1(1), 67–84 (2007)
A. Zimek, E. Schubert, H.P. Kriegel, A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. ASA Data Sci. J. 5(5), 363–387 (2012). https://doi.org/10.1002/sam.11161
Acknowledgements
This work was supported in part by the US National Science Foundation grants EAR-1027960 and PLR-1142007. Several improvements resulted from detailed feedback from the reviewers.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chawathe, S.S. (2019). Clustering Blockchain Data. In: Nasraoui, O., Ben N'Cir, CE. (eds) Clustering Methods for Big Data Analytics. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-97864-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-97864-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97863-5
Online ISBN: 978-3-319-97864-2
eBook Packages: EngineeringEngineering (R0)