Encoding of primary structures of biological macromolecules within a data mining perspective

Maddouri, Mondher; Elloumi, Mourad

doi:10.1007/BF02944786

Encoding of primary structures of biological macromolecules within a data mining perspective

Published: January 2004

Volume 19, pages 78–88, (2004)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Mondher Maddouri¹ &
Mourad Elloumi²

60 Accesses
13 Citations
Explore all metrics

Abstract

An encoding method has a direct effect on the quality and the representation of the discovered knowledge in data mining systems. Biological macromolecules are encoded by strings of characters, calledprimary structures. Knowing that data mining systems usually use relational tables to encode data, we have then to reencode these strings and transform them into relational tables. In this paper, we do a comparative study of the existingstatic encoding methods, that are based on the Biologist know-how, and our newdynamic encoding one, that is based on the, construction ofDiscriminant and Minimal Substrings (DMS). Different classification methods are used to do this study. The experimental results show that ourdynamic encoding method is more efficient than thestatic ones, to encode biological macromolecules within a data mining perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Article Open access 16 May 2015

Introduction to Biological Databases

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs

Article Open access 24 April 2015

References

Dickerson R E, Geis I. The Structure and Actions of Proteins. Harper & Row Publishers, New York, NY, 1969, pp.16–17.
Google Scholar
Hirsh J D, Sternberg M J E. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks.Biochemistry, 1992, 31(32): 7211–7218.
Article Google Scholar
Hirsh H, Noordewier M. Using background knowledge to improve inductive learning of DNA sequences. InProc. the Tenth Conference on Artificial Intelligence for Applications, 1994, pp.351–357.
Wang J T L, Marr T G, Shasha Det al. Discovering active motifs in sets of related protein sequences and using them for classification.Nucleic Acids Res., 1994, 22: 2769–2775.
Article Google Scholar
Qicheng M, Wang J T L, Gattiker J R. Mining biomolecular data using background knowledge and artificial neural networks.technical report.
Quinlan J R. Learning efficient classification procedures and their application to chess end games. InMachine Learning: An AI Approach, Vol.1, Michalski R S, Carbonell J G, Mitchell T M (Eds.), 1983, pp.463–482.
Towell G G. Symbolic knowledge and neural networks: Insertion, refinement and extraction [Dissertation]. Department of Computer Sciences, University of Wisconsin-Madison, 1991.
Zurada J M. Introduction to Artificial Neural Systems. West Publishing Co., St. Paul, MN, 1992, pp.186–196.
Google Scholar
Lu S Y, Fu K S. A sentence-to-sentence clustering procedure for pattern analysis.IEEE Trans. Systems, Man and Cybernetics, 1978, (8): 381–389.
Article MATH MathSciNet Google Scholar
O'Neill M C. Consensus methods for finding and ranking DNA binding sites.Journal of Molecular Biology, 1989, 207: 301–310.
Article Google Scholar
O'Neill M C, Chiafari F. Escherichia coli promoters. II. A spacing class-dependent promoter search protocol.J. Biol. Chem., 1989, 264: 5531–5534.
Google Scholar
Fu H A study of amino acids binary codes.Master in Computer Sciences, University of Lille, France, 2001.
Google Scholar
Maddouri M, Elloumi M. A data mining approach based on machine learning techniques to classify biological sequences.Knowledge Based Systems Journal, March 2002.
Elloumi M, Maddouri M. Discrimination between two families of strings: Application to classification of primary structures of biological macromolecules. InProc. Second International Workshop on Biomolecular Informatics, Atlantic City, New Jersey, USA, February 2000.
Karp R, Miller R E, Rosenberg A L. Rapid identification of repeated patterns in strings, trees and arrays. In4th Symposium of Theory of Computing, 1972, pp.125–136.
Elloumi M. Analysis of strings coding biological macromolecules [Dissertation]. The University of Aix-Marseilles III. France, June 1994.
Google Scholar
Weiss S M, Kulikowski C A. Computer Systems that Learn. Morgan-Kaufmann Publish., California, U.S.A., 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, National Institute of Applied Sciences and Technologies, Tunis-Carthage, 2035, Tunis, Tunisia
Mondher Maddouri
Computer Science Department, Faculty of Economic Sciences and Management of Tunis, El Manar, 2092, Tunis, Tunisia
Mourad Elloumi

Authors

Mondher Maddouri
View author publications
You can also search for this author in PubMed Google Scholar
Mourad Elloumi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mondher Maddouri.

Additional information

Mondher Maddouri received an B.S. degree in mathematics and physics in 1990, an M.S. degree in computer engineering in 1994 and a Ph.D. degree in computer science in 2000, from the Faculty of Sciences of Tunis, Tunisia. He is currently an associate professor in the Computer Science Department in the National Institute of Applied Sciences and Technologies, Tunis, Tunisia. His research interests are machine learning, knowledge discovery and data mining, and computational molecular biology.

Mourad Elloumi received an B.S. degree in mathematics and physics in 1984, and an M.S. degree in computer engineering in 1988, from the Faculty of Sciences of Tunis, Tunisia. He also received an M.S. degree in computer science in 1989, and a Ph.D. degree in computer science in 1994, from the University of Aix-Marseilles III, France. He is currently an associate professor in the Computer Science Department in the Faculty of Economic Sciences and Management of Tunis, Tunisia. His research interests are computational molecular biology, algorithmics, and knowledge discovery and data mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maddouri, M., Elloumi, M. Encoding of primary structures of biological macromolecules within a data mining perspective. J. Comput. Sci. & Technol. 19, 78–88 (2004). https://doi.org/10.1007/BF02944786

Download citation

Received: 26 May 2003
Revised: 21 August 2003
Issue Date: January 2004
DOI: https://doi.org/10.1007/BF02944786

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoding of primary structures of biological macromolecules within a data mining perspective

Abstract

Access this article

Similar content being viewed by others

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Introduction to Biological Databases

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Encoding of primary structures of biological macromolecules within a data mining perspective

Abstract

Access this article

Similar content being viewed by others

ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins

Introduction to Biological Databases

Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation