Abstract
This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bader, G.D., Cary, M.P., Sander, C.: Pathguide: a pathway resource list. Nucleic Acids Res. 34(Database issue), D504–D506 (2006)
KEGG - Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M.: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 34, D354–D357 (2006)
Bairoch, A.: The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305 (2000)
Schomburg, I., Chang, A., Schomburg, D.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30, 7–9 (2002)
Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A., Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., Karp, P.D.: MetaCyc: A Multiorganism Database of Metabolic Pathways and Enzymes. Nucleic Acids Research 32(1), D438–D442 (2004)
PubChem database, http://pubchem.ncbi.nlm.nih.gov/
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3-4), 255–259 (1998)
Chen, M., Hofestaedt, R.: PathAligner: Metabolic Pathway Retrieval and Alignment. Applied Bioinformatics 3(4), 241–252 (2004)
Pinter, R., et al.: Tree-based Comparison of Metabolic Pathways
Metabolic Pathway Search Engine, http://data.dataspaceweb.net/pathways/Search.php
Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. J. Comput. Biol. 6(3-4), 343–360 (1999)
EC-Published in Enzyme Nomenclature. Academic Press, San Diego, California (1992), ISBN 0-12-227164-5 (hardback), 0-12-227165-3 (paperback) with Supplement 1 (1993), Supplement 2 (1994), Supplement 3 (1995), Supplement 4 (1997), Supplement 5 (in Eur. J.Biochem. 223, 1–5 (1994), Eur. J. Biochem. 232, 1–6 (1995), Eur. J. Biochem. 237, 1–5 (1996), Eur. J. Biochem. 250, 1–6 (1997), Eur. J. Biochem. 264, 610–650 (1999) respectively) (Copyright IUBMB)
Lerdorf, R., Tatroe, K.: Programming PHP (Published: 05/04/2002) ISBN 1565926102
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. Section 22.3 Depth First Search
Grossman, R.L., Kasturi, P., Hamelberg, D., Liu, B.: An Empirical Study of the Universal Chemical Key Algorithm for Assigning Unique Keys to Chemical Compounds. Journal of Bioinformatics and Computational Biology 2(1), 155–171 (2004)
Neglur, G., Grossman, R.L., Liu, B.: Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 145–157. Springer, Heidelberg (2005)
Kelley, B.P., Sharan, R., Karp, R., Sittler, E.T., Root, D.E., Stockwell, B.R., Ideker, T.: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl. Acad. Sci. USA 100, 11394–11399 (2003)
Kelley, B.P., Yuan, B., Lewitter, F., Sharan, R., Stockwell, B.R., Ideker, T.: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res. 32, (Web Server issue), W83–W88 (2004)
Sharan, R., Suthram, S., Kelley, R.M., Kuhn, T., McCuine, S., Uetz, P., Sittler, T., Karp, R.M., Ideker, T.: Conserved patterns of protein interaction in multiple species. Proc. Natl. Acad. Sci. USA 8, 102(6), 1974–1979 (2005)
Goldman, R., Widom, J.: Dataguides:enabling query formulation and optimization in semistructured databases. In: Proceedings of VLDB, pp. 436–445 (1997)
Chung, C.-W., Min, J.-K., Shim, K.: Apex: an adaptive path index for XML data. In: SIGMOD, pp. 121–132 (2002)
Schenkel, R., Theobald, A., Weikum, G.: Efficient Creation and Incremental Maintenance of the HOPI Index for Complex XML Document Collections, icde. In: 21st International Conference on Data Engineering (ICDE 2005), pp. 360–371 (2005)
Shasha, D., Wang, J.T.L., Giugno, R.: Algorithmics and applications of tree and graph searching. In: Symposium on Principles of Database Systems, pp. 39–52 (2002)
Yan, X., Yu, P.S., Han, J.: Graph indexing: A frequent structure based approach. In: Proceedings of SIGMOD 2004 (2004)
James, C.A., Weininger, D., Delany, J.: Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc. (2003)
Nenashev, V., Overbeek, R., Panyushkina, E., Pronevitch, L., Selkov Jr, E., Yunus, I.: The metabolic pathway collection from EMP: the enzymes and metabolic pathways database. Nucleic Acids Res. 24(1), 26–28 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neglur, G., Grossman, R.L., Maltsev, N., Yu, C. (2006). Using Term Lists and Inverted Files to Improve Search Speed for Metabolic Pathway Databases. In: Leser, U., Naumann, F., Eckman, B. (eds) Data Integration in the Life Sciences. DILS 2006. Lecture Notes in Computer Science(), vol 4075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11799511_15
Download citation
DOI: https://doi.org/10.1007/11799511_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36593-8
Online ISBN: 978-3-540-36595-2
eBook Packages: Computer ScienceComputer Science (R0)