Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining

Xue, Yun; Li, Yuting; Deng, Weijun; Li, Jiejin; Tang, Jianxiong; Liao, Zhengling; Li, Tiechen

doi:10.1007/978-3-319-06269-3_20

Yun Xue²⁰,
Yuting Li²⁰,
Weijun Deng²⁰,
Jiejin Li²⁰,
Jianxiong Tang²⁰,
Zhengling Liao²⁰ &
…
Tiechen Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8423))

Included in the following conference series:

International Conference on Health Information Science

1153 Accesses
1 Citations

Abstract

Order-Preserving Submatrices (OPSMs) have been widely accepted as a pattern-based biclustering and used in gene expression data analysis. The OPSM problem aims at finding the groups of genes that exhibit similar rises and falls under some certain conditions. However, most methods are heuristic algorithms which are unable to reveal PSOMs entirely. In this paper, we proposed an exact method to discover all OPSMs based on frequent sequential pattern mining. Firstly, an algorithm is adjusted to disclose all common subsequences (ACS) between every two sequences. Then an improved data structure for prefix tree was used to store and traverse all common subsequences, and Apriori Principle was employed to mine the frequent sequential pattern efficiently. Finally, the experiments were implemented on a real data set and GO analysis was applied to identify whether the patterns discovered were biological significant. The results demonstrate the effectiveness and the efficiency of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: SPIE (2001)
Google Scholar
Cheng, Y., Church, G.: Biclustering of expression data. Ismb. 93–103 (2000)
Google Scholar
Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. U.S.A. 97, 12079–12084 (2000)
Article Google Scholar
Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC Genomics 9(suppl. 1), S4 (2008)
Google Scholar
Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. (2002)
Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 18(suppl. 1), S136–S144 (2002).
Google Scholar
Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Third IEEE Sympesium Bioinforma. Bioengineering, BIBE 2003 (2003)
Google Scholar
Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings 18th International Conference on Data Engineering, pp. 517–528. IEEE Comput. Soc. (2002)
Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10, 373–384 (2003)
Article Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey, http://www.ncbi.nlm.nih.gov/pubmed/17048406
Zhang, M., Wang, W., Liu, J.: Mining approximate order preserving clusters in the presence of noise. In: Data Eng., ICDE 2008 (2008)
Google Scholar
Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)
Google Scholar
Liu, J., Wang, W.: OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE International Conference on Data Mining, pp. 187–194. IEEE Comput. Soc. (2003)
Google Scholar
Wang, H.: All Common Subsequences.pdf. IJCAI (2007)
Google Scholar
Wang, H., Lin, Z.: A Novel Algorithm for Counting All Common Subsequences. In: 2007 IEEE International Conference on Granular Computing (GRC 2007), pp. 502–502. IEEE (2007)
Google Scholar
Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements (2003), http://www.ncbi.nlm.nih.gov/pubmed/16901101
Medvedovic, M., Yeung, K.Y., Bumgarner, R.E.: Bayesian mixture model based clustering of replicated microarray data (2004), http://www.ncbi.nlm.nih.gov/pubmed/14871871
Macqueen, J.: Some Methods for Classifiation and Analysis of Multivariate Observations (1967)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Google (2009)
Google Scholar
Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)
Google Scholar
Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., Hood, L.: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001)
Article Google Scholar
Hartigan, J.: Direct Clustering of a Data Matrix. J. Am. Statistical Assoc. 67(337), 123–129 (1972)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou, China, 510006
Yun Xue, Yuting Li, Weijun Deng, Jiejin Li, Jianxiong Tang, Zhengling Liao & Tiechen Li

Authors

Yun Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yuting Li
View author publications
You can also search for this author in PubMed Google Scholar
Weijun Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jiejin Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianxiong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengling Liao
View author publications
You can also search for this author in PubMed Google Scholar
Tiechen Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centre for Applied Informatics, Victoria University, 8001, Melbourne, VIC, Australia
Yanchun Zhang & Xiaoxia Yin &
Faculty of Medicine, University of Southampton, Southampton, SO16 6YD, UK
Guiqing Yao
College of Engineering and Science, Victoria University, 8001, Melbourne, VIC, Australia
Jing He
Institute of Biomedical and Health Engineering, Shenzhen Institute of Advanced Technology, 518055, Shenzhen, China
Lei Wang
Psychiatric Institute, University of Illinois at Chicago, MC912, 1601 W. Taylor Street, 60612, Chicago, IL, USA
Neil R. Smalheiser

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, Y. et al. (2014). Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining. In: Zhang, Y., Yao, G., He, J., Wang, L., Smalheiser, N.R., Yin, X. (eds) Health Information Science. HIS 2014. Lecture Notes in Computer Science, vol 8423. Springer, Cham. https://doi.org/10.1007/978-3-319-06269-3_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-06269-3_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06268-6
Online ISBN: 978-3-319-06269-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics