Skip to main content

Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining

  • Conference paper
Health Information Science (HIS 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8423))

Included in the following conference series:

Abstract

Order-Preserving Submatrices (OPSMs) have been widely accepted as a pattern-based biclustering and used in gene expression data analysis. The OPSM problem aims at finding the groups of genes that exhibit similar rises and falls under some certain conditions. However, most methods are heuristic algorithms which are unable to reveal PSOMs entirely. In this paper, we proposed an exact method to discover all OPSMs based on frequent sequential pattern mining. Firstly, an algorithm is adjusted to disclose all common subsequences (ACS) between every two sequences. Then an improved data structure for prefix tree was used to store and traverse all common subsequences, and Apriori Principle was employed to mine the frequent sequential pattern efficiently. Finally, the experiments were implemented on a real data set and GO analysis was applied to identify whether the patterns discovered were biological significant. The results demonstrate the effectiveness and the efficiency of this method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Treshansky, A., McGraw, R.: An overview of clustering algorithms. In: SPIE (2001)

    Google Scholar 

  2. Cheng, Y., Church, G.: Biclustering of expression data. Ismb. 93–103 (2000)

    Google Scholar 

  3. Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. U.S.A. 97, 12079–12084 (2000)

    Article  Google Scholar 

  4. Gu, J., Liu, J.S.: Bayesian biclustering of gene expression data. BMC Genomics 9(suppl. 1), S4 (2008)

    Google Scholar 

  5. Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Stat. Sin. (2002)

    Google Scholar 

  6. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics. 18(suppl. 1), S136–S144 (2002).

    Google Scholar 

  7. Yang, J., Wang, H., Wang, W., Yu, P.: Enhanced biclustering on expression data. In: Third IEEE Sympesium Bioinforma. Bioengineering, BIBE 2003 (2003)

    Google Scholar 

  8. Yu, P.: δ-clusters: capturing subspace correlation in a large data set. In: Proceedings 18th International Conference on Data Engineering, pp. 517–528. IEEE Comput. Soc. (2002)

    Google Scholar 

  9. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 10, 373–384 (2003)

    Article  Google Scholar 

  10. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey, http://www.ncbi.nlm.nih.gov/pubmed/17048406

  11. Zhang, M., Wang, W., Liu, J.: Mining approximate order preserving clusters in the presence of noise. In: Data Eng., ICDE 2008 (2008)

    Google Scholar 

  12. Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)

    Google Scholar 

  13. Liu, J., Wang, W.: OP-cluster: clustering by tendency in high dimensional space. In: Third IEEE International Conference on Data Mining, pp. 187–194. IEEE Comput. Soc. (2003)

    Google Scholar 

  14. Wang, H.: All Common Subsequences.pdf. IJCAI (2007)

    Google Scholar 

  15. Wang, H., Lin, Z.: A Novel Algorithm for Counting All Common Subsequences. In: 2007 IEEE International Conference on Granular Computing (GRC 2007), pp. 502–502. IEEE (2007)

    Google Scholar 

  16. Yeung, K.Y., Medvedovic, M., Bumgarner, R.E.: Clustering gene-expression data with repeated measurements (2003), http://www.ncbi.nlm.nih.gov/pubmed/16901101

  17. Medvedovic, M., Yeung, K.Y., Bumgarner, R.E.: Bayesian mixture model based clustering of replicated microarray data (2004), http://www.ncbi.nlm.nih.gov/pubmed/14871871

  18. Macqueen, J.: Some Methods for Classifiation and Analysis of Multivariate Observations (1967)

    Google Scholar 

  19. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Google (2009)

    Google Scholar 

  20. Gao, B., Griffith, O., Ester, M.: On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach. Trans. 24, 309–325 (2012)

    Google Scholar 

  21. Ideker, T., Thorsson, V., Ranish, J.A., Christmas, R., Buhler, J., Eng, J.K., Bumgarner, R., Goodlett, D.R., Aebersold, R., Hood, L.: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–934 (2001)

    Article  Google Scholar 

  22. Hartigan, J.: Direct Clustering of a Data Matrix. J. Am. Statistical Assoc. 67(337), 123–129 (1972)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Xue, Y. et al. (2014). Mining Order-Preserving Submatrices Based on Frequent Sequential Pattern Mining. In: Zhang, Y., Yao, G., He, J., Wang, L., Smalheiser, N.R., Yin, X. (eds) Health Information Science. HIS 2014. Lecture Notes in Computer Science, vol 8423. Springer, Cham. https://doi.org/10.1007/978-3-319-06269-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06269-3_20

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06268-6

  • Online ISBN: 978-3-319-06269-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics