Abstract
Proteins are known to perform a biological function by interacting with other proteins or compounds. Since protein-protein interaction is intrinsic to most cellular processes, protein interaction prediction is an important issue in post-genomic biology where abundant interaction data has been produced by many research groups. In this paper, we present an associative feature mining method to predict implicit protein-protein interactions of S.cerevisiae from public protein-protein interaction data. To overcome the dimensionality problem of conventional data mining approach, we employ feature dimension reduction filter (FDRF) method based on the information theory to select optimal informative features and to speed up the overall mining procedure. As a mining method to predict interaction, we use association rule discovery algorithm for associative feature and rule mining. Using the discovered associative feature we predict implicit protein interactions which have not been observed in training data. According to the experimental results, the proposed method accomplishes about 94.8% prediction accuracy with reduced computation time which is 32.5% faster than conventional method that has no feature filter.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Deng, M., et al.: Inferring domain–domain interactions from protein–protein interactions. Genome Res. 12, 1540–1548 (2002)
Goffeau, A., et al.: Life with 6000 genes. Science 274, 546–567 (1996)
Agrawal, R., et al.: Mining association rules between sets of items in large databases. In: Proc. of ACM SIGMOD 1993, pp. 207–216 (1993)
Satou, K., et al.: Extraction of substructures of proteins essential to their biological functions by a data mining technique. In: Proc. of ISMB 1997, vol. 5, pp. 254–257 (1997)
Oyama, T., et al.: Extraction of knowledge on protein–protein interaction by association rule discovery. Bioinformatics 18, 705–714 (2002)
Yu, L., Liu, H.: Feature selection for high dimensional data: a fast correlation-based filter solution. In: Proc. of ICML 2003, pp. 856–863 (2003)
Mewes, H.W., et al.: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34 (2002)
Xenarios, I., et al.: DIP: The Database of Interacting Proteins. A research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30, 303–305 (2002)
Christie, K.R., et al.: Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acids Res. 32, D311–D314 (2004)
Quinlan, J.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Press, W.H., et al.: Numerical recipes in C. Cambridge University Press, Cambridge (1988)
Csank, C., et al.: Three yeast proteome databases: YPD, PombePD, and CalPD (Myco- PathPD). Methods Enzymol 350, 347–373 (2002)
Ito, T., et al.: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001)
Uetz, P., et al.: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Eom, JH., Chang, JH., Zhang, BT. (2004). Prediction of Implicit Protein-Protein Interaction by Optimal Associative Feature Mining. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive