A Linear Time Biclustering Algorithm for Time Series Gene Expression Data

Madeira, Sara C.; Oliveira, Arlindo L.

doi:10.1007/11557067_4

Sara C. Madeira^21,22,23 &
Arlindo L. Oliveira^21,22

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3692))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1202 Accesses
23 Citations

Abstract

Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources.

In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this context, we are interested in finding biclusters with consecutive columns. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees. We report results in both synthetic and real data that show the effectiveness of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The order–preserving submatrix problem. In: Proc. of the 6th International Conference on Computacional Biology, pp. 49–57 (2002)
Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 93–103 (2000)
Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Ji, L., Tan, K.: Identifying time-lagged gene clusters using gene expression data. Bioinformatics 21(4), 509–516 (2005)
Article Google Scholar
Koyuturk, M., Szpankowski, W., Grama, A.: Biclustering gene-feature matrices for statistically significant dense patterns. In: Proc. of the 8th Annual International Conference on Research in Computational Molecular Biology, pp. 480–484 (2004)
Google Scholar
Liu, J., Wang, W., Yang, J.: Biclustering in gene expression data by tendency. In: Proc. of the 3rd International IEEE Computer Society Computational Systems Bioinformatics Conference, pp. 182–193 (2004)
Google Scholar
Lonardi, S., Szpankowski, W., Yang, Q.: Finding biclusters by random projections. In: Proc. of the 15th Annual Symposium on Combinatorial Pattern Matching, pp. 102–116. Springer, Heidelberg (2004)
Chapter Google Scholar
Luan, Y., Li, H.: Clustering of time-course gene expression data using a mixed-effects model with b-splines. Bioinformatics 19(4), 474–482 (2003)
Article Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004)
Article Google Scholar
Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D., Jacq, B.: Gotoolbox: functional investigation of gene datasets based on gene ontology. Genome Biology 5(12), R101 (2004)
Google Scholar
McLachlan, G., Do, K., Ambroise, C.: Analysing microarray gene expression data. Wiley, Chichester (2004)
Book Google Scholar
Monteiro, P., Teixeira, M.C., Jain, P., Tenreiro, S., Fernandes, A.R., Mira, N., Alenquer, M., Freitas, A.T., Oliveira, A.L., Sá-Correia, I.: Yeast search for transcriptional regulators and consensus tracking (yeastract) (2005), http://www.yeastract.com
Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Proc. of the Pacific Symposium on Biocomputing, vol. 8, pp. 77–88 (2003)
Google Scholar
Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131(3), 651–654 (2003)
Article MATH MathSciNet Google Scholar
Sheng, Q., Moreau, Y., De Moor, B.: Biclustering microarray data by Gibbs sampling. Bioinformatics 19(Suppl. 2), 196–205 (2003)
Google Scholar
Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(Suppl. 1), S136–S144 (2002)
Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

INESC-ID, Lisbon, Portugal
Sara C. Madeira & Arlindo L. Oliveira
IST, Technical University of Lisbon, Lisbon, Portugal
Sara C. Madeira & Arlindo L. Oliveira
University of Beira Interior, Covilhã, Portugal
Sara C. Madeira

Authors

Sara C. Madeira
View author publications
You can also search for this author in PubMed Google Scholar
Arlindo L. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biocomputing Group, University of Bologna, Italy
Rita Casadio
Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, USA
Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Madeira, S.C., Oliveira, A.L. (2005). A Linear Time Biclustering Algorithm for Time Series Gene Expression Data. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_4

Download citation

DOI: https://doi.org/10.1007/11557067_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics