Abstract
In this abstract we address the problem of learning approximate Markov Random Fields (MRF) from large transactional data. Examples of such data include market basket data, co-authorship networked data, etc. Such data can be represented by a binary data matrix, with an entry (i, j) takes a value of one (zero) if the item j is (not) in the basket i. “Large” means that there can be many rows or columns in the data matrix. To model such data effectively in order to answer queries about the data efficiently, we consider the use of probabilistic models. In this abstract, we consider employing frequent itemsets to learn approximate global MRFs on large transactional data. We conduct an empirical study on real datasets to show the efficiency and effectiveness of our model on solving the query selectivity estimation problem, that is to approximately compute the marginal probability of sets of items (see [1] for the experimental results). Translated into the social network domain, this is the problem of computing the likelihood of seeing a particular combination of grocery items in the market basket domain, or the probability of a group of professors coauthoring a paper in a co-authorship network, etc. This marginal probability computation is also useful for anomalous link detection [2] in social network analysis. A link in a social network corresponds to a pair of items. The links whose associated marginal probabilities are significantly low can be thought of as anomalous.
This work is supported by DOE Award No. DE-FG02-04ER25611 and NSF CAREER Grant IIS-0347662. We refer the reader to a longer version of this paper for experimental results and complete proofs and discussions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, C., Parthasarathy, S.: Learning approximate MRFs from large transactional data. Technical Report: OSU-CISRC-5/06–TR59, The Ohio State University (2006)
Rattigan, M.J., Jensen, D.: The case for anomalous link discovery. ACM SIGKDD Explorations Newsletter 7, 41–47 (2005)
Pavlov, D., Mannila, H., Smyth, P.: Beyond independence: probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering 15, 1409–1421 (2003)
Goldenberg, A., Moore, A.: Tractable learning of large Bayes net structures from sparse data. In: Proceedings of the twenty-first international conference on Machine learning (2004)
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, C., Parthasarathy, S. (2007). Learning Approximate MRFs from Large Transactional Data. In: Airoldi, E., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds) Statistical Network Analysis: Models, Issues, and New Directions. ICML 2006. Lecture Notes in Computer Science, vol 4503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73133-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-73133-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73132-0
Online ISBN: 978-3-540-73133-7
eBook Packages: Computer ScienceComputer Science (R0)