Skip to main content

A Linear Time Biclustering Algorithm for Time Series Gene Expression Data

  • Conference paper
Algorithms in Bioinformatics (WABI 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3692))

Included in the following conference series:

Abstract

Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources.

In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this context, we are interested in finding biclusters with consecutive columns. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees. We report results in both synthetic and real data that show the effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order–preserving submatrix problem. In: Proc. of the 6th International Conference on Computacional Biology, pp. 49–57 (2002)

    Google Scholar 

  2. Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 93–103 (2000)

    Google Scholar 

  3. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  4. Ji, L., Tan, K.: Identifying time-lagged gene clusters using gene expression data. Bioinformatics 21(4), 509–516 (2005)

    Article  Google Scholar 

  5. Koyuturk, M., Szpankowski, W., Grama, A.: Biclustering gene-feature matrices for statistically significant dense patterns. In: Proc. of the 8th Annual International Conference on Research in Computational Molecular Biology, pp. 480–484 (2004)

    Google Scholar 

  6. Liu, J., Wang, W., Yang, J.: Biclustering in gene expression data by tendency. In: Proc. of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, pp. 182–193 (2004)

    Google Scholar 

  7. Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Proc. of the 15th Annual Symposium on Combinatorial Pattern Matching, pp. 102–116. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Luan, Y., Li, H.: Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4), 474–482 (2003)

    Article  Google Scholar 

  9. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)

    Article  Google Scholar 

  10. Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D., Jacq, B.: Gotoolbox: functional investigation of gene datasets based on gene ontology. Genome Biology 5(12), R101 (2004)

    Google Scholar 

  11. McLachlan, G., Do, K., Ambroise, C.: Analysing microarray gene expression data. Wiley, Chichester (2004)

    Book  Google Scholar 

  12. Monteiro, P., Teixeira, M.C., Jain, P., Tenreiro, S., Fernandes, A.R., Mira, N., Alenquer, M., Freitas, A.T., Oliveira, A.L., Sá-Correia, I.: Yeast search for transcriptional regulators and consensus tracking (yeastract) (2005), http://www.yeastract.com

  13. Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proc. of the Pacific Symposium on Biocomputing, vol. 8, pp. 77–88 (2003)

    Google Scholar 

  14. Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131(3), 651–654 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  15. Sheng, Q., Moreau, Y., De Moor, B.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19(Suppl. 2), 196–205 (2003)

    Google Scholar 

  16. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(Suppl. 1), S136–S144 (2002)

    Google Scholar 

  17. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)

    Article  Google Scholar 

  18. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Madeira, S.C., Oliveira, A.L. (2005). A Linear Time Biclustering Algorithm for Time Series Gene Expression Data. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_4

Download citation

  • DOI: https://doi.org/10.1007/11557067_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29008-7

  • Online ISBN: 978-3-540-31812-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics