A Parallel-Processor Implementation of an Algorithm to Delineate Distantly Related Protein Sequences using Conserved Motifs and Neural Networks

Vogt, Gerhard; Frishman, Dmitrij; Argos, Patrick

doi:10.1007/978-3-642-46808-7_37

Gerhard Vogt⁸,
Dmitrij Frishman^8,9 &
Patrick Argos⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

399 Accesses

Summary

A sensitive technique to recognize distantly related protein sequences has been implemented on a DEC massively parallel computer (MASPAR). The method relies on finding conserved subsequence regions (motifs) in multiple sequence alignments, training neural nets to recognize them, and then searching all segments in large databases of protein sequences to which the nets respond well. The net/motif search has several advantages over other present-day techniques: the sensitivity of neural nets, independence of deletions and insertions as well as gap penalties, and concentration on the most conserved subsegments for searching and scoring. Use of the parallel processors provides ready access to the comparison results as well as quick response for the molecular biological community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ABREMSKI, K.E. & HOESS, R.H. (1992): Evidence for a second conserved arginine residue in the integrase family of recombination proteins: Protein Eng. 5, 87–91.
Article Google Scholar
ARGOS, P. (1990): Computer analysis of protein structure: Methods Enzymol. 182, 751–776.
Article Google Scholar
ARGOS, P., LANDY, A., ABREMSKI, K., HAGGARD-LJUNGQUIST, E., HOESS, R.H., KHAN, M.L., KALIONIS, B., NARAYANA, S.V.L., PIERSON, L.S., III, STERNBERG, N. & LEONG, J.M. (1986): The integrase family of site-specific recombinases: regional similarities and global diversity: EMBO J. 5, 433–440.
Google Scholar
BAIROCH, A. (1991): PROSITE: a dictionary of sites and patterns in proteins: Nucl. Acids Res. 19, 2241–2245.
Google Scholar
BAIROCH, A. & BÖCKMANN, B. (1992): The SWISS-PROT protein sequence databank: Nucl. Acids Res. 20, 2019–2022.
Google Scholar
BARTON, G.J. & STERNBERG, M.J.E. (1990): Flexible protein sequence patterns: a sensitive method to detect weak structural similarities: J. Mol. Biol. 212, 389–402.
Article Google Scholar
COLLINS, J.F. & COULSON, A.F.W. (1984): Applications of parallel processing algorithms for DNA sequence analysis: Nucl. Acid Res. 12, 181–192.
Article Google Scholar
COULSON, A.F.W., COLLINS, J.F., LYALL, A. (1987): Protein and nucleic acid sequence database searching: a suitable case for parallel processing: Comput. J. 30, 420–424.
Google Scholar
FARBER, R., LAPEDES, A. & SIROTKIN, K. (1992): Determination of eukariotic protein coding regions using neural networks and information theory: J. Mol. Biol. 226, 471–479.
Article Google Scholar
FERRAN, E.A. & FERRARA, P. (1992): Clustering proteins into families using artificial neural networks: Comp. Appl. Biosci. 8, 39–44.
Google Scholar
FRISHMAN, D. & ARGOS, P. (1992): Recognition of distantly related protein sequences using conserved motifs and neural networks: J. Mol. Biol. 228, 951–962.
Article Google Scholar
GRIBSKOV, M., MCLACHLAN, A.D. & EISENBERG, D. (1987): Profile analysis: detection of distantly related proteins: Proc. Nat. Acad. Sci., U.S.A. 84, 4355–4358.
Article Google Scholar
HAYWARD, S. & COLLINS, J.F. (1992): Limits on a-helix prediction with neural network models: Proteins 14, 372–381.
Article Google Scholar
HENIKOFF, S. & HENIKOFF, J.G. (1991): Automated assembly of protein blocks for database searching: Nucl. Acids Res. 19, 6565–6572.
Article Google Scholar
HIRST, J.D. & STERNBERG, M.J. (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks: Biochemistry 31, 7211–7218.
Article Google Scholar
HORTON, P.B. & KANEHISHA, M. (1992): An assessment of neural network and statistical approaches for prediction of E.coli promoter sites: Nucl. Acids Res. 20, 4331–4338.
Article Google Scholar
JONES, R. (1992): Sequence pattern matching on a massively parallel computer: Comp. Appl. Biosci. 8, 377–383.
Google Scholar
LANDER, E. & MESIROV, J. (1988): Protein sequence comparison on a parallel computer: Proc. of the 1988 International Conference on Parallel Processing, Penn State Press, Philadelphia, pp. 257–263.
Google Scholar
LAPEDES, A., BARNES, C., BURKS, C., FARBER, R. & SIROTKIN K. (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis: Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII, (Bell, G. & Marr, T., eds.). Addison- Wesley, New York, pp. 157–181.
Google Scholar
OGIWARA, A., UCHIYAMA, I., SETO, Y. & KANEHISHA, M. (1992): Construction of a dictionary of sequence motifs that characterize groups of related proteins: Protein Eng. 5, 479–488.
Article Google Scholar
PANCOSKA, P., BLAZEK, M. & KEIDERLING, T.A. (1992): Relationships between secondary structure fractions for globular proteins. Neural network analyses of crystallographic data sets: Biochemistry 31, 10250–10257.
Article Google Scholar
SHERIDAN, R.P. & VENKATARAGHVAN, R. (1992): A systematic search for protein signiture sequences: Proteins 14, 16–28.
Article Google Scholar
SIBBALD, P.R. & ARGOS, P. (1990): Weighting aligned protein or nucleic acid sequences to correct for unequal representation: J. Mol. Biol. 216, 813–818.
Article Google Scholar
SMITH, T.F. & WATERMAN, M.S. (1981): Identification of common molecular subsequences: J. Mol. Biol. 147, 195–197.
Article Google Scholar
TAYLOR, W.R. & JONES, D.T. (1991): Templates, consensus patterns and motifs: Curr. Opinion Struct. Biol. 1, 327–333.
Article Google Scholar
VOGT, G. & ARGOS, P. (1992): Searching for distantly related protein sequences in large databases by parallel processing on a transputer machine: Comp. Appl. Biosci. 8, 49–55.
Google Scholar
WASSERMAN, P.D. (1989): Neural computing. Theory and practice: Van Nostrand, New York.
Google Scholar
WILLIAMS, A.F. & BARCLAY, A.N. (1988): The immunoglobulin superfamily — domains for cell surface recognition: Ann. Rev. Immunol 6, 381–405.
Article Google Scholar
WU, C., WHITSON, G., MCLARTY, J., ERMONGKONCHAI, A. & CHANG, T.-C. (1992): Protein classification artificial system: Protein Sci. 1, 667–677.
Article Google Scholar
ZHANG, X., MCKENNA, M., MESIROV, J.P. & WALTZ, D.L. (1990): An efficient implementation of the back-propagation algorithm on the connection machine CM-2. In: Advances in Neural Information Processing Systems 2 (Touretzky, D.S., ed.) Morgan Kaufmann Publishers, San Mateo, California, pp. 801–809.
Google Scholar

Download references

Author information

Authors and Affiliations

European Molecular Biology Laboratory, Postfach 10 22 09, Meyerhofstrasse 1, Germany
Gerhard Vogt, Dmitrij Frishman & Patrick Argos
Institute of Evolutionary Physiology and Biochemistry, Thorez pr.44 St Petersburg, 194223, Russia
Dmitrij Frishman

Authors

Gerhard Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Dmitrij Frishman
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Argos
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Statistik, Rheinisch-Westfälische Technische Hochschule Aachen (RWTH), Wüllnerstr. 3, D-52056, Aachen, Germany
Hans-Hermann Bock
Forschungsstelle “Mathematische Logik” der Heidelberger Akademie der Wissenschaften Fachbereich Informatik, Universität Kaiserslautern, Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Wolfgang Lenski
Fachbereich Informatik, Universität Kaiserslautern, Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Michael M. Richter
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI), Erwin-Schrödinger-Str. 57, D-67653, Kaiserslautern, Germany
Michael M. Richter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vogt, G., Frishman, D., Argos, P. (1994). A Parallel-Processor Implementation of an Algorithm to Delineate Distantly Related Protein Sequences using Conserved Motifs and Neural Networks. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-46808-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58057-7
Online ISBN: 978-3-642-46808-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics