Skip to main content

A Parallel-Processor Implementation of an Algorithm to Delineate Distantly Related Protein Sequences using Conserved Motifs and Neural Networks

  • Conference paper
Information Systems and Data Analysis

Summary

A sensitive technique to recognize distantly related protein sequences has been implemented on a DEC massively parallel computer (MASPAR). The method relies on finding conserved subsequence regions (motifs) in multiple sequence alignments, training neural nets to recognize them, and then searching all segments in large databases of protein sequences to which the nets respond well. The net/motif search has several advantages over other present-day techniques: the sensitivity of neural nets, independence of deletions and insertions as well as gap penalties, and concentration on the most conserved subsegments for searching and scoring. Use of the parallel processors provides ready access to the comparison results as well as quick response for the molecular biological community.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ABREMSKI, K.E. & HOESS, R.H. (1992): Evidence for a second conserved arginine residue in the integrase family of recombination proteins: Protein Eng. 5, 87–91.

    Article  Google Scholar 

  • ARGOS, P. (1990): Computer analysis of protein structure: Methods Enzymol. 182, 751–776.

    Article  Google Scholar 

  • ARGOS, P., LANDY, A., ABREMSKI, K., HAGGARD-LJUNGQUIST, E., HOESS, R.H., KHAN, M.L., KALIONIS, B., NARAYANA, S.V.L., PIERSON, L.S., III, STERNBERG, N. & LEONG, J.M. (1986): The integrase family of site-specific recombinases: regional similarities and global diversity: EMBO J. 5, 433–440.

    Google Scholar 

  • BAIROCH, A. (1991): PROSITE: a dictionary of sites and patterns in proteins: Nucl. Acids Res. 19, 2241–2245.

    Google Scholar 

  • BAIROCH, A. & BÖCKMANN, B. (1992): The SWISS-PROT protein sequence databank: Nucl. Acids Res. 20, 2019–2022.

    Google Scholar 

  • BARTON, G.J. & STERNBERG, M.J.E. (1990): Flexible protein sequence patterns: a sensitive method to detect weak structural similarities: J. Mol. Biol. 212, 389–402.

    Article  Google Scholar 

  • COLLINS, J.F. & COULSON, A.F.W. (1984): Applications of parallel processing algorithms for DNA sequence analysis: Nucl. Acid Res. 12, 181–192.

    Article  Google Scholar 

  • COULSON, A.F.W., COLLINS, J.F., LYALL, A. (1987): Protein and nucleic acid sequence database searching: a suitable case for parallel processing: Comput. J. 30, 420–424.

    Google Scholar 

  • FARBER, R., LAPEDES, A. & SIROTKIN, K. (1992): Determination of eukariotic protein coding regions using neural networks and information theory: J. Mol. Biol. 226, 471–479.

    Article  Google Scholar 

  • FERRAN, E.A. & FERRARA, P. (1992): Clustering proteins into families using artificial neural networks: Comp. Appl. Biosci. 8, 39–44.

    Google Scholar 

  • FRISHMAN, D. & ARGOS, P. (1992): Recognition of distantly related protein sequences using conserved motifs and neural networks: J. Mol. Biol. 228, 951–962.

    Article  Google Scholar 

  • GRIBSKOV, M., MCLACHLAN, A.D. & EISENBERG, D. (1987): Profile analysis: detection of distantly related proteins: Proc. Nat. Acad. Sci., U.S.A. 84, 4355–4358.

    Article  Google Scholar 

  • HAYWARD, S. & COLLINS, J.F. (1992): Limits on a-helix prediction with neural network models: Proteins 14, 372–381.

    Article  Google Scholar 

  • HENIKOFF, S. & HENIKOFF, J.G. (1991): Automated assembly of protein blocks for database searching: Nucl. Acids Res. 19, 6565–6572.

    Article  Google Scholar 

  • HIRST, J.D. & STERNBERG, M.J. (1992): Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks: Biochemistry 31, 7211–7218.

    Article  Google Scholar 

  • HORTON, P.B. & KANEHISHA, M. (1992): An assessment of neural network and statistical approaches for prediction of E.coli promoter sites: Nucl. Acids Res. 20, 4331–4338.

    Article  Google Scholar 

  • JONES, R. (1992): Sequence pattern matching on a massively parallel computer: Comp. Appl. Biosci. 8, 377–383.

    Google Scholar 

  • LANDER, E. & MESIROV, J. (1988): Protein sequence comparison on a parallel computer: Proc. of the 1988 International Conference on Parallel Processing, Penn State Press, Philadelphia, pp. 257–263.

    Google Scholar 

  • LAPEDES, A., BARNES, C., BURKS, C., FARBER, R. & SIROTKIN K. (1990): Application of neural networks and other machine learning algorithms to DNA sequence analysis: Computers and DNA, SFI Studies in the Sciences of Complexity, vol. VII, (Bell, G. & Marr, T., eds.). Addison- Wesley, New York, pp. 157–181.

    Google Scholar 

  • OGIWARA, A., UCHIYAMA, I., SETO, Y. & KANEHISHA, M. (1992): Construction of a dictionary of sequence motifs that characterize groups of related proteins: Protein Eng. 5, 479–488.

    Article  Google Scholar 

  • PANCOSKA, P., BLAZEK, M. & KEIDERLING, T.A. (1992): Relationships between secondary structure fractions for globular proteins. Neural network analyses of crystallographic data sets: Biochemistry 31, 10250–10257.

    Article  Google Scholar 

  • SHERIDAN, R.P. & VENKATARAGHVAN, R. (1992): A systematic search for protein signiture sequences: Proteins 14, 16–28.

    Article  Google Scholar 

  • SIBBALD, P.R. & ARGOS, P. (1990): Weighting aligned protein or nucleic acid sequences to correct for unequal representation: J. Mol. Biol. 216, 813–818.

    Article  Google Scholar 

  • SMITH, T.F. & WATERMAN, M.S. (1981): Identification of common molecular subsequences: J. Mol. Biol. 147, 195–197.

    Article  Google Scholar 

  • TAYLOR, W.R. & JONES, D.T. (1991): Templates, consensus patterns and motifs: Curr. Opinion Struct. Biol. 1, 327–333.

    Article  Google Scholar 

  • VOGT, G. & ARGOS, P. (1992): Searching for distantly related protein sequences in large databases by parallel processing on a transputer machine: Comp. Appl. Biosci. 8, 49–55.

    Google Scholar 

  • WASSERMAN, P.D. (1989): Neural computing. Theory and practice: Van Nostrand, New York.

    Google Scholar 

  • WILLIAMS, A.F. & BARCLAY, A.N. (1988): The immunoglobulin superfamily — domains for cell surface recognition: Ann. Rev. Immunol 6, 381–405.

    Article  Google Scholar 

  • WU, C., WHITSON, G., MCLARTY, J., ERMONGKONCHAI, A. & CHANG, T.-C. (1992): Protein classification artificial system: Protein Sci. 1, 667–677.

    Article  Google Scholar 

  • ZHANG, X., MCKENNA, M., MESIROV, J.P. & WALTZ, D.L. (1990): An efficient implementation of the back-propagation algorithm on the connection machine CM-2. In: Advances in Neural Information Processing Systems 2 (Touretzky, D.S., ed.) Morgan Kaufmann Publishers, San Mateo, California, pp. 801–809.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Vogt, G., Frishman, D., Argos, P. (1994). A Parallel-Processor Implementation of an Algorithm to Delineate Distantly Related Protein Sequences using Conserved Motifs and Neural Networks. In: Bock, HH., Lenski, W., Richter, M.M. (eds) Information Systems and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-46808-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-46808-7_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58057-7

  • Online ISBN: 978-3-642-46808-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics