Abstract
Given information about medical drugs and their properties, how can we automatically discover that Aspirin has blood-thinning properties, and thus prevents heart attacks? Expressed in more general terms, if we have a large information network that integrates data from heterogeneous data sources, how can we extract semantic information that provides a better understanding of the integrated data and also helps us to identify missing links? We propose to extract concepts that describe groups of objects and their common properties from the integrated data. The discovered concepts provide semantic information as well as an abstract view on the integrated data and thus improve the understanding of complex systems. Our proposed method has the following desirable properties: (a) it is parameter-free and therefore requires no user-defined parameters (b) it is fault-tolerant, allowing for the detection of missing links and (c) it is scalable, being linear on the input size.We demonstrate the effectiveness and scalability of the proposed method on real, publicly available graphs
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmed, S.P., Siddiq, A., Baig, S.G., Khan, R.A.: Comparative efficacy of haloperidol and risperidone: A review. Pakistan Journal of Pharmacology 24, 55–64 (2007)
Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, 1981–2014 (2008)
Beach, L.R.: Cue probabilism and inference behavior. Psychological Monographs: General and Applied 78, 1–20 (1964)
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: The Konstanz Information Miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)
Chakrabarti, D., Papadimitriou, S., Modha, D.S., Faloutsos, C.: Fully automatic cross-associations. In: KDD, pp. 79–88 (2004)
Günnemann, S., Müller, E., Raubach, S., Seidl, T.: Flexible fault tolerant subspace clustering for data with missing values. In: ICDM, pp. 231–240 (2011)
Herrlich, H., Husek, M.: Galois connections. In: Mathematical Foundations of Programming Semantics, pp. 122–134 (1985)
Knox, C., Law, V., Jewison, T., Liu, P., Ly, S., Frolkis, A., Pon, A., Banco, K., Mak, C., Neveu, V., Djoumbou, Y., Eisner, R., Guo, A.C., Wishart, D.S.: Drugbank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Research 38, 1–7 (2010)
Kötter, T., Berthold, M.R.: From information networks to bisociative information networks. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 33–50. Springer, Heidelberg (2012)
Kötter, T., Berthold, M.R.: (Missing) concept discovery in heterogeneous information networks. In: Berthold, M.R. (ed.) Bisociative Knowledge Discovery. LNCS (LNAI), vol. 7250, pp. 230–245. Springer, Heidelberg (2012)
Li, J., Sim, K., Liu, G., Wong, L.: Maximal quasi-bicliques with balanced noise tolerance: Concepts and co-clustering applications. In: SDM, pp. 72–83 (2008)
Liu, G., Wong, L.: Effective pruning techniques for mining quasi-cliques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 33–49. Springer, Heidelberg (2008)
Liu, X., Li, J., Wang, L.: Quasi-bicliques: Complexity and binding pairs. In: Hu, X., Wang, J. (eds.) COCOON 2008. LNCS, vol. 5092, pp. 255–264. Springer, Heidelberg (2008)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004)
Poernomo, A.K., Gopalkrishnan, V.: Towards efficient mining of proportional fault-tolerant frequent itemsets. In: KDD, pp. 697–706 (2009)
Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)
Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: KDD, pp. 797–806 (2009)
Thompson, S.: Sampling. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., New York (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kötter, T., Günnemann, S., Berthold, M.R., Faloutsos, C. (2014). Fault-Tolerant Concept Detection in Information Networks. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-06608-0_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06607-3
Online ISBN: 978-3-319-06608-0
eBook Packages: Computer ScienceComputer Science (R0)