Actionable Mining of Large, Multi-relational Data Using Localized Predictive Models

Ghosh, Joydeep; Sharma, Aayush

doi:10.1007/978-3-642-29764-9_1

Actionable Mining of Large, Multi-relational Data Using Localized Predictive Models

Joydeep Ghosh⁵ &
Aayush Sharma⁵

Conference paper

864 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 272))

Abstract

Many large datasets associated with modern predictive data mining applications are quite complex and heterogeneous, possibly involving multiple relations, or exhibiting a dyadic nature with associated side-information. For example, one may be interested in predicting the preferences of a large set of customers for a variety of products, given various properties of both customers and products, as well as past purchase history, a social network on the customers, and a conceptual hierarchy on the products. This article provides an overview of recent innovative approaches to predictive modeling for such types of data, and also provides some concrete application scenarios to highlight the issues involved. The common philosophy in all the approaches described is to pursue a simultaneous problem decomposition and modeling strategy that can exploit heterogeneity in behavior, use the wide variety of information available and also yield relatively more interpretable solutions as compared to global ”one-shot” approaches. Since both the problem domains and approaches considered are quite new, we also highlight the potential for further investigations on several occasions throughout this article.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abernethy, J., Bach, F., Evgeniou, T., Vert, J.P.: A new approach to collaborative filtering: Operator estimation with spectral regularization. The Journal of Machine Learning Research 10, 803–826 (2009)
MATH Google Scholar
Agarwal, D., Chen, B.: Regression-based latent factor models. In: KDD 2009, pp. 19–28 (2009)
Google Scholar
Agarwal, D., Chen, B., Elango, P.: Spatio-temporal models for estimating click-through rate. In: WWW 2009: Proceedings of the 18th International Conference on World Wide Web, pp. 21–30 (2009)
Google Scholar
Agarwal, D., Chen, B.: flda: matrix factorization through latent dirichlet allocation. In: Proc. ACM International Conference on Web Search and Data Mining 2010, pp. 91–100 (2010)
Google Scholar
Agarwal, D., Merugu, S.: Predictive discrete latent factor models for large scale dyadic data. In: KDD 2007, pp. 26–35 (2007)
Google Scholar
Dempster, A.P., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Statistical Society, Series B(Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman divergences. Jl. Machine Learning Research (JMLR) 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
Banerjee, A., Basu, S., Merugu, S.: Multi-way clustering on relation graphs. In: SDM (2007)
Google Scholar
Basilico, J., Hofmann, T.: Unifying collaborative and content-based filtering. In: ICML (2004)
Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)
Google Scholar
Chamberlain, D.E., Gough, S., Vickery, J.A., Firbank, L.G., Petit, S., Pywell, R., Bradbury, R.B.: Rule-based predictive models are not cost-effective alternatives to bird monitoring on farmland. Agriculture, Ecosystems & Environment 101(1), 1–8 (2004)
Article Google Scholar
Deodhar, M., Ghosh, J.: A framework for simultaneous co-clustering and learning from complex data. In: KDD 2007, pp. 250–259 (2007)
Google Scholar
Deodhar, M., Ghosh, J.: Simultaneous co-clustering and modeling of market data. In: Workshop for Data Mining in Marketing, Industrial Conf. on Data Mining 2007, pp. 73–82 (2007)
Google Scholar
Deodhar, M., Ghosh, J.: Simultaneous co-segmentation and predictive modeling for large, temporal marketing data. In: Data Mining for Marketing Workshop, ICDM 2008 (2008)
Google Scholar
Deodhar, M., Ghosh, J.: Mining for most certain predictions from dyadic data. In: Proc. 15th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, KDD 2009 (2009)
Google Scholar
Deodhar, M., Ghosh, J., Tsar-Tsansky, M.: Active learning for recommender systems with multiple localized models. In: Proc. Fifth Symposium on Statistical Challenges in Electronic Commerce Research, SCECR 2009 (2009)
Google Scholar
Dietterich, T.G., Domingos, P., Getoor, L., Muggleton, S., Tadepalli, P.: Structured machine learning: the next ten years. Machine Learning 73(1), 3–23 (2008)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)
MATH Google Scholar
Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations 5(1), 1–16 (2003)
Article Google Scholar
Airoldi, E., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. JMLR 9, 1981–2014 (2008)
MATH Google Scholar
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press (2007)
Google Scholar
George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 625–628 (2005)
Google Scholar
Getoor, L., Friedman, N., Koller, D., Taskar, B.: Learning probabilistic models of relational structure. In: Proc. 18th International Conf. on Machine Learning, pp. 170–177. Morgan Kaufmann, San Francisco (2001), citeseer.ist.psu.edu/article/getoor01learning.html
Google Scholar
Grover, R., Srinivasan, V.: A simultaneous approach to market segmentation and market structuring. Journal of Marketing Research, 139–153 (1987)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Heidelberg (2009)
Book MATH Google Scholar
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.: An algorithmic framework for performing collaborative filtering. In: SIGIR 1999: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230–237. ACM, Berkeley (1999)
Chapter Google Scholar
Kim, B., Rossi, P.: Purchase frequency, sample selection, and price sensitivity: The heavy-user bias. Marketing Letters, 57–67 (1994)
Google Scholar
Kim, B., Sullivan, M.: The effect of parent brand experience on line extension trial and repeat purchase. Marketing Letters, 181–193 (1998)
Google Scholar
Kolda, T.: Tensor decompositions and data mining. In: Tutorial at ICDM (2007)
Google Scholar
Kolda, T.G., Sun, J.: Scalable tensor decompositions for multi-aspect data mining. In: ICDM, pp. 363–372 (2008)
Google Scholar
Lim, Y., Teh, Y.: Variational bayesian approach to movie rating prediction. In: Proc. KDD Cup and Workshop (2007)
Google Scholar
Lokmic, L., Smith, K.A.: Cash flow forecasting using supervised and unsupervised neural networks. IJCNN 06, 6343 (2000)
Google Scholar
Lu, Z., Agarwal, D., Dhillon, I.: A spatio-temporal approach to collaborative filtering. In: RecSys 2009 (2009)
Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004)
Article Google Scholar
Moe, W., Fader, P.: Modeling hedonic portfolio products: A joint segmentation analysis of music compact disc sales. Journal of Marketing Research, 376–385 (2001)
Google Scholar
Munson, M.A., et al.: The ebird reference dataset. Tech. Report, Cornell Lab of Ornithology and National Audubon Society (June 2009)
Google Scholar
Murray-Smith, R., Johansen, T.A.: Multiple Model Approaches to Modelling and Control. Taylor and Francis, UK (1997)
Google Scholar
Nowicki, K., Snijders, T.A.B.: Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96(455), 1077–1087 (2001), http://www.ingentaconnect.com/content/asa/jasa/2001/00000096/00000455/art00025
Article MathSciNet MATH Google Scholar
Oh, K., Han, I.: An intelligent clustering forecasting system based on change-point detection and artificial neural networks: Application to financial economics. In: HICSS-34, vol. 3, p. 3011 (2001)
Google Scholar
Reutterer, T.: Competitive market structure and segmentation analysis with self-organizing feature maps. In: Proceedings of the 27th EMAC Conference, pp. 85–115 (1998)
Google Scholar
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS 2007 (2007)
Google Scholar
Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using markov chain monte carlo. In: Proc. ICML 2008, pp. 880–887 (2008)
Google Scholar
Sanderson, F.J., Kloch, A., Sachanowicz, K., Donald, P.F.: Predicting the effects of agricultural change on farmland bird populations in poland. Agriculture, Ecosystems & Environment 129(1-3), 37–42 (2009)
Article Google Scholar
Seetharaman, P., Ainslie, A., Chintagunta, P.: Investigating household state dependence effects across categories. Journal of Marketing Research, 488–500 (1999)
Google Scholar
Shan, H., Banerjee, A.: Residual bayesian co-clustering and matrix approximation. In: Proc. SDM 2010, pp. 223–234 (2010)
Google Scholar
Shan, H., Banerjee, A.: Bayesian co-clustering. In: ICDM, pp. 530–539 (2008)
Google Scholar
Sharma, A., Ghosh, J.: Side information aware bayesian affinity estimation. Technical Report TR-11, Department of ECE, UT Austin (2010)
Google Scholar
Takcs, G., Pilszy, I., NÈmeth, B., Tikk, D.: Investigation of various matrix factorization methods for large recommender systems. In: 2nd KDD-Netflix Workshop (2008)
Google Scholar
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear Analysis of Image Ensembles: TensorFaces. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 447–460. Springer, Heidelberg (2002)
Chapter Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1(1-2), 1–305 (2008)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas, 78712, U.S.A.
Joydeep Ghosh & Aayush Sharma

Authors

Joydeep Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Aayush Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, University of Reading, UK
Kecheng Liu
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghosh, J., Sharma, A. (2013). Actionable Mining of Large, Multi-relational Data Using Localized Predictive Models. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2010. Communications in Computer and Information Science, vol 272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29764-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-29764-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29763-2
Online ISBN: 978-3-642-29764-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics