Abstract
During the last decade, several clustering and association rule mining techniques have been applied to highlight groups of co-regulated genes in gene expression data. Nowadays, integrating these data and biological knowledge into a single framework has become a major challenge to improve the relevance of mined patterns and simplify their interpretation by biologists. GenMiner was developed for mining association rules from such integrated datasets. It combines a new nomalized discretization method, called NorDi, and the JClose algorithm to extract condensed representations for association rules. Experimental results show that GenMiner requires less memory than Apriori based approaches and that it improves the relevance of extracted rules. Moreover, association rules obtained revealed significant co-annotated and co-expressed gene patterns showing important biological relationships supported by recent biological literature.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the VLDB international conference, pp. 478–499 (1994)
Altman, R., Raychaudhuri, S.: Whole-Genome Expression Analysis: Challenges Beyond Clustering. Current Opinion Structural Biology 11, 340–347 (2001)
Bera, A., Jarque, C.: Efficient Tests for Normality, Homoscedasticity and Serial Independence of Regression Residuals: Monte Carlo Evidence. Economics Letters 7, 313–318 (1981)
Borgelt, C.: Recursion Pruning for the Apriori Algorithm. In: Proceedings of the FIMI international workshop (2004)
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic Itemset Counting and Amplication Rules for Market Basket Data. In: Proceedings of the ACM SIGMOD international conference, pp. 255–264 (1997)
Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J., Pascual-Montano, A.: Integrated Analyis of Gene Expression by Association Rules Discovery. BMC Bioinformatics 7, 54 (2006)
Creighton, C., Hanansh, S.: Mining Gene Expression Databases for Association Rules. Bioinformatics 19, 79–86 (2003)
Cristofor, L., Simovici, D.A.: Generating an Informative Cover for Association Rules. In: Proceedings of the ICDM international conference, pp. 597–600 (2002)
DeRisi, J., Iyer, L., Brown, V.: Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science 278, 680–686 (1997)
Eisen, M., Spellman, P., Brown, P., Botsein, D.: Cluster Analysis and Display of Genome Wide Expression Patterns. Proc. Nat. Aca. Sci. 95, 14863–14868 (1998)
FIMI: Frequent Itemset Mining Implementations Repository, http://fimi.cs.helsinki.fi
GenMiner: Genomic Data Miner, http://bioinfo.unice.fr/publications/genminer_article
Georgi, E., Richter, L., Ruckert, U., Kramer, S.: Analyzing Microarray Data using Quantitative Association Rules. Bioinformatics 21, 123–129 (2005)
Grubbs, F.: Procedures for Detecting Outlying Observations in Samples. Technometrics 11, 1–21 (1969)
KEIA: Knowledge Extraction, Integration and Applications, http://keia.i3s.unice.fr
Lilliefors, H.: On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. Journal of the American Statistical Association 62 (1967)
Lopez, F.J., Blanco, A., Garcia, F., Cano, C., Marin, A.: Fuzzy Association Rules for Biological Data Analysis: A Case Study on Yeast. BMC Bioinformatics 9, 107 (2008)
Martinez, R., Collard, M.: Extracted knowledge: Interpretation in Mining Biological Data, a Survey. Int. J. of Computer Science and Applications 1, 1–21 (2007)
Martinez, R., Pasquier, N., Pasquier, C.: GenMiner: Mining Informative Association Rules from Genomic Data. In: Proceedings of the IEEE BIBM international conference, pp. 15–22 (2007)
NIST: e-Handbook of Statistical Methods. SEMATECH (2007), http://www.itl.nist.gov/div898/handbook/
Pan, K., Lih, C., Cohen, N.: Effects of Threshold Choice on Biological Conclusions Reached During Analysis of Gene Expression by DNA Microarrays. Proc. Nat. Aca. Sci. 102, 8961–8965 (2005)
Pasquier, N., Taouil, R., Bastide, Y., Stumme, G., Lakhal, L.: Generating a Condensed Representation for Association Rules. Journal of Intelligent Information Systems 24(1), 29–60 (2005)
Shatkay, H., Edwards, S., Wilbur, W., Boguski, M.: Genes, Themes, Microarrays: Using Information Retrieval for Large-Scale Gene Analysis. In: Proceedings of the ISMB international conference, pp. 340–347 (2000)
Tuzhilin, A., Adomavicius, G.: Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. In: Proceedings of the SIGKDD international conference, pp. 396–404 (2002)
Yang, I., Chen, E., Hasseman, J., Liang, W., Frank, B., Sharov, V., Quackenbush, J.: Within the Fold: Assesing Differential Expression Measures and Reproducibility in Microarray Assays. Genome Biology 3, 11 (2002)
Zhao, Y., McIntosh, K., Rudra, D., Schawalder, S., Shore, D., Warner, J.: Fine-Structure Analysis of Ribosomal Protein Gene Transcription. Molecular Cellular Biology 26(13), 4853–4862 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martinez, R., Pasquier, N., Pasquier, C. (2009). Mining Association Rule Bases from Integrated Genomic Data and Annotations. In: Masulli, F., Tagliaferri, R., Verkhivker, G.M. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2008. Lecture Notes in Computer Science(), vol 5488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02504-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-02504-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02503-7
Online ISBN: 978-3-642-02504-4
eBook Packages: Computer ScienceComputer Science (R0)