Summary
A sensitive technique to recognize distantly related protein sequences has been implemented on a DEC massively parallel computer (MASPAR). The method relies on finding conserved subsequence regions (motifs) in multiple sequence alignments, training neural nets to recognize them, and then searching all segments in large databases of protein sequences to which the nets respond well. The net/motif search has several advantages over other present-day techniques: the sensitivity of neural nets, independence of deletions and insertions as well as gap penalties, and concentration on the most conserved subsegments for searching and scoring. Use of the parallel processors provides ready access to the comparison results as well as quick response for the molecular biological community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ABREMSKI, K.E. & HOESS, R.H. (1992): Evidence for a second conserved arginine residue in the integrase family of recombination proteins: Protein Eng. 5, 87–91.
ARGOS, P. (1990): Computer analysis of protein structure: Methods Enzymol. 182, 751–776.
ARGOS, P., LANDY, A., ABREMSKI, K., HAGGARD-LJUNGQUIST, E., HOESS, R.H., KHAN, M.L., KALIONIS, B., NARAYANA, S.V.L., PIERSON, L.S., III, STERNBERG, N. & LEONG, J.M. (1986): The integrase family of site-specific recombinases: regional similarities and global diversity: EMBO J. 5, 433–440.
BAIROCH, A. (1991): PROSITE: a dictionary of sites and patterns in proteins: Nucl. Acids Res. 19, 2241–2245.
BAIROCH, A. & BÖCKMANN, B. (1992): The SWISS-PROT protein sequence databank: Nucl. Acids Res. 20, 2019–2022.
BARTON, G.J. & STERNBERG, M.J.E. (1990): Flexible protein sequence patterns: a sensitive method to detect weak structural similarities: J. Mol. Biol. 212, 389–402.
COLLINS, J.F. & COULSON, A.F.W. (1984): Applications of parallel processing algorithms for DNA sequence analysis: Nucl. Acid Res. 12, 181–192.
COULSON, A.F.W., COLLINS, J.F., LYALL, A. (1987): Protein and nucleic acid sequence database searching: a suitable case for parallel processing: Comput. J. 30, 420–424.
FARBER, R., LAPEDES, A. & SIROTKIN, K. (1992): Determination of eukariotic protein coding regions using neural networks and information theory: J. Mol. Biol. 226, 471–479.
FERRAN, E.A. & FERRARA, P. (1992): Clustering proteins into families using artificial neural networks: Comp. Appl. Biosci. 8, 39–44.
FRISHMAN, D. & ARGOS, P. (1992): Recognition of distantly related protein sequences using conserved motifs and neural networks: J. Mol. Biol. 228, 951–962.
GRIBSKOV, M., MCLACHLAN, A.D. & EISENBERG, D. (1987): Profile analysis: detection of distantly related proteins: Proc. Nat. Acad. Sci., U.S.A. 84, 4355–4358.
HAYWARD, S. & COLLINS, J.F. (1992): Limits on a-helix prediction with neural network models: Proteins 14, 372–381.
HENIKOFF, S. & HENIKOFF, J.G. (1991): Automated assembly of protein blocks for database searching: Nucl. Acids Res. 19, 6565–6572.
HIRST, J.D. & STERNBERG, M.J. (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks: Biochemistry 31, 7211–7218.
HORTON, P.B. & KANEHISHA, M. (1992): An assessment of neural network and statistical approaches for prediction of E.coli promoter sites: Nucl. Acids Res. 20, 4331–4338.
JONES, R. (1992): Sequence pattern matching on a massively parallel computer: Comp. Appl. Biosci. 8, 377–383.
LANDER, E. & MESIROV, J. (1988): Protein sequence comparison on a parallel computer: Proc. of the 1988 International Conference on Parallel Processing, Penn State Press, Philadelphia, pp. 257–263.
LAPEDES, A., BARNES, C., BURKS, C., FARBER, R. & SIROTKIN K. (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis: Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII, (Bell, G. & Marr, T., eds.). Addison- Wesley, New York, pp. 157–181.
OGIWARA, A., UCHIYAMA, I., SETO, Y. & KANEHISHA, M. (1992): Construction of a dictionary of sequence motifs that characterize groups of related proteins: Protein Eng. 5, 479–488.
PANCOSKA, P., BLAZEK, M. & KEIDERLING, T.A. (1992): Relationships between secondary structure fractions for globular proteins. Neural network analyses of crystallographic data sets: Biochemistry 31, 10250–10257.
SHERIDAN, R.P. & VENKATARAGHVAN, R. (1992): A systematic search for protein signiture sequences: Proteins 14, 16–28.
SIBBALD, P.R. & ARGOS, P. (1990): Weighting aligned protein or nucleic acid sequences to correct for unequal representation: J. Mol. Biol. 216, 813–818.
SMITH, T.F. & WATERMAN, M.S. (1981): Identification of common molecular subsequences: J. Mol. Biol. 147, 195–197.
TAYLOR, W.R. & JONES, D.T. (1991): Templates, consensus patterns and motifs: Curr. Opinion Struct. Biol. 1, 327–333.
VOGT, G. & ARGOS, P. (1992): Searching for distantly related protein sequences in large databases by parallel processing on a transputer machine: Comp. Appl. Biosci. 8, 49–55.
WASSERMAN, P.D. (1989): Neural computing. Theory and practice: Van Nostrand, New York.
WILLIAMS, A.F. & BARCLAY, A.N. (1988): The immunoglobulin superfamily — domains for cell surface recognition: Ann. Rev. Immunol 6, 381–405.
WU, C., WHITSON, G., MCLARTY, J., ERMONGKONCHAI, A. & CHANG, T.-C. (1992): Protein classification artificial system: Protein Sci. 1, 667–677.
ZHANG, X., MCKENNA, M., MESIROV, J.P. & WALTZ, D.L. (1990): An efficient implementation of the back-propagation algorithm on the connection machine CM-2. In: Advances in Neural Information Processing Systems 2 (Touretzky, D.S., ed.) Morgan Kaufmann Publishers, San Mateo, California, pp. 801–809.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Vogt, G., Frishman, D., Argos, P. (1994). A Parallel-Processor Implementation of an Algorithm to Delineate Distantly Related Protein Sequences using Conserved Motifs and Neural Networks. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-46808-7_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58057-7
Online ISBN: 978-3-642-46808-7
eBook Packages: Springer Book Archive