Abstract
Order-Preserving Submatrices (OPSMs) have been widely accepted as a pattern-based biclustering and used in gene expression data analysis. The OPSM problem aims at finding the groups of genes that exhibit similar rises and falls under some certain conditions. However, most methods are heuristic algorithms which are unable to reveal PSOMs entirely. In this paper, we proposed an exact method to discover all OPSMs based on frequent sequential pattern mining. Firstly, an algorithm is adjusted to disclose all common subsequences (ACS) between every two sequences. Then an improved data structure for prefix tree was used to store and traverse all common subsequences, and Apriori Principle was employed to mine the frequent sequential pattern efficiently. Finally, the experiments were implemented on a real data set and GO analysis was applied to identify whether the patterns discovered were biological significant. The results demonstrate the effectiveness and the efficiency of this method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: SPIE (2001)
Cheng, Y., Church, G.: Biclustering of expression data. Ismb. 93–103 (2000)
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. U.S.A. 97, 12079–12084 (2000)
Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC Genomics 9(suppl. 1), S4 (2008)
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. (2002)
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 18(suppl. 1), S136–S144 (2002).
Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Third IEEE Sympesium Bioinforma. Bioengineering, BIBE 2003 (2003)
Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings 18th International Conference on Data Engineering, pp. 517–528. IEEE Comput. Soc. (2002)
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10, 373–384 (2003)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey, http://www.ncbi.nlm.nih.gov/pubmed/17048406
Zhang, M., Wang, W., Liu, J.: Mining approximate order preserving clusters in the presence of noise. In: Data Eng., ICDE 2008 (2008)
Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)
Liu, J., Wang, W.: OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE International Conference on Data Mining, pp. 187–194. IEEE Comput. Soc. (2003)
Wang, H.: All Common Subsequences.pdf. IJCAI (2007)
Wang, H., Lin, Z.: A Novel Algorithm for Counting All Common Subsequences. In: 2007 IEEE International Conference on Granular Computing (GRC 2007), pp. 502–502. IEEE (2007)
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements (2003), http://www.ncbi.nlm.nih.gov/pubmed/16901101
Medvedovic, M., Yeung, K.Y., Bumgarner, R.E.: Bayesian mixture model based clustering of replicated microarray data (2004), http://www.ncbi.nlm.nih.gov/pubmed/14871871
Macqueen, J.: Some Methods for Classifiation and Analysis of Multivariate Observations (1967)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Google (2009)
Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)
Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., Hood, L.: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001)
Hartigan, J.: Direct Clustering of a Data Matrix. J. Am. Statistical Assoc. 67(337), 123–129 (1972)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xue, Y. et al. (2014). Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining. In: Zhang, Y., Yao, G., He, J., Wang, L., Smalheiser, N.R., Yin, X. (eds) Health Information Science. HIS 2014. Lecture Notes in Computer Science, vol 8423. Springer, Cham. https://doi.org/10.1007/978-3-319-06269-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-06269-3_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06268-6
Online ISBN: 978-3-319-06269-3
eBook Packages: Computer ScienceComputer Science (R0)