Abstract
In most information retrieval systems, software processes (whether agent-based or not) reason about passive items of data. An alternative approach instantiates each record as an agent that actively self-organizes with other agents (including queries). Imitating the movement of bodies under physical forces, we describe a distributed algorithm (“force-based clustering,” or FBC) for dynamically clustering and querying large, heterogeneous, dynamic collections of entities. The algorithm moves entities in a virtual space in a way that estimates the transitive closure of the pairwise comparisons. We demonstrate this algorithm on a large, heterogeneous collection of records, each representing a person. We have some information about a person of interest, but no record in the collection directly matches this information. Application of FBC identifies a small subset of records that are good candidates for describing the person of interest, for further manual investigation and verification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abu-Khzam, F.N., Samatovaz, N., Ostrouchov, G., Langston, M.A., Geist, A.: Distributed dimension reduction algorithms for widely dispersed data. In: Fourteenth IASTED International Conference on Parallel and Distributed Computing and Systems (IASTED PDCS 2002), pp. 167–174. ACTA Press (2002)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD Conference, pp. 70–81 (2000)
Faloutsos, C., Lin, K.-I.D.: FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: ACM SIGMOD, San Jose, CA, pp. 163–174 (1995)
Fang, J., Li, H.: Optimal/near-optimal dimensionality reduction for distributed estimation in homogeneous and certain inhomogeneous scenarios. IEEE Trans. Signal Process. 58(8), 4339–4353 (2010)
Hinneburg, A., Aggarwal, C.C, Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 506–515. Morgan Kaufmann, Cairo (2000)
Jang, W., Hendry, M.: Cluster analysis of massive datasets in astronomy. Stat. Comput. 17(3), 253–262 (2007)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29, 1–27 (1964)
Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: K-landmarks: distributed dimensionality reduction for clustering quality maintenance. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 322–334. Springer, Heidelberg (2006)
Magdalinos, P., Doulkeridis, C., Vazirgiannis, M.: a novel effective distributed dimensionality reduction algorithm. In: SIAM Feature Selection for Data Mining Workshop (SIAM-FSDM’06), Bethesda, MD (2006)
Magdalinos, P., Vazirgiannis, M., Valsamou, D.: Distributed knowledge discovery with non linear dimensionality reduction. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 14–26. Springer, Heidelberg (2010)
NARA: The Soundex Indexing System. National Archives and Records Administration. http://www.archives.gov/research/census/soundex.html (2007)
Parunak, H.V.D., Brueckner, S.A., Sauter, J.A., Matthews, R.: Global convergence of local agent behaviors. In Proceedings of Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS05), pp. 305–312. ACM (2005)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Roy, O., Vetterli, M.: Dimensionality reduction for distributed estimation in the infinite dimensional regime. IEEE Trans. Inf. Theory 54(4), 1655–1669 (2008)
Cao, L., Gorodetsky, L., Mitkas, P.: Agent mining: the synergy of agents and data mining. IEEE Intell. Syst. 24(3), 64–72 (2009)
Tenenbaum, J.B., Silva, Vd, Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Van Dyke Parunak, H., Brueckner, S. (2014). Transitive Identity Mapping Using Force-Based Clustering. In: Cao, L., Zeng, Y., Symeonidis, A., Gorodetsky, V., Müller, J., Yu, P. (eds) Agents and Data Mining Interaction. ADMI 2013. Lecture Notes in Computer Science(), vol 8316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55192-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-55192-5_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55191-8
Online ISBN: 978-3-642-55192-5
eBook Packages: Computer ScienceComputer Science (R0)