Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset

Yearwood, J. L.; Kang, B. H.; Kelarev, A. V.

doi:10.1007/978-3-642-10509-8_34

J. L. Yearwood²⁰,
B. H. Kang²¹ &
A. V. Kelarev²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 5899))

Included in the following conference series:

International Conference on Future Generation Information Technology

998 Accesses
1 Citations

Abstract

The present article is devoted to experimental investigation of the performance of three machine learning algorithms for ITS dataset in their ability to achieve agreement with classes published in the biologi cal literature before. The ITS dataset consists of nuclear ribosomal DNA sequences, where rather sophisticated alignment scores have to be used as a measure of distance. These scores do not form a Minkowski metric and the sequences cannot be regarded as points in a finite dimensional space. This is why it is necessary to develop novel machine learning ap proaches to the analysis of datasets of this sort. This paper introduces a k-committees classifier and compares it with the discrete k-means and Nearest Neighbour classifiers. It turns out that all three machine learning algorithms are efficient and can be used to automate future biologically significant classifications for datasets of this kind. A simplified version of a synthetic dataset, where the k-committees classifier outperforms k-means and Nearest Neighbour classifiers, is also presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bagirov, A.M., Rubinov, A.M., Yearwood, J.: A global optimization approach to classification. Optim. Eng. 3, 129–155 (2002)
Article MATH MathSciNet Google Scholar
Baldi, P., Brunak, S.: Bioinformatics: The Machine Learning Approach. MIT Press, Cambridge (2001)
MATH Google Scholar
Huda, S., Ghosh, R., Yearwood, J.: A variable initialization approach to the EM algorithm for better estimation of the parameters of Hidden Markov Model based acoustic modeling of speech signals. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 416–430. Springer, Heidelberg (2006)
Google Scholar
Huda, S., Yearwood, J., Ghosh, R.: A hybrid algorithm for estimation of the parameters of Hidden Markov Model based acoustic modeling of speech signals using constraint-based genetic algorithm and expectation maximization. In: Proceedings of ICIS 2007, the 6th Annual IEEE/ACIS International Conference on Computer and Information Science, Melbourne, Australia, July 11-13, pp. 438–443 (2007)
Google Scholar
Huda, S., Yearwood, J., Togneri, R.: A constraint based evolutionary learning approach to the expectation maximization for optiomal estimation of the Hidden Markov Model for speech signal modeling. IEEE Transactions on Systems, Man, Cybernetics, Part B 39(1), 182–197 (2009)
Article Google Scholar
Kang, B.H., Kelarev, A.V., Sale, A.H.J., Williams, R.N.: A new model for classifying DNA code inspired by neural networks and FSA. In: Hoffmann, A., Kang, B.-h., Richards, D., Tsumoto, S. (eds.) PKAW 2006. LNCS (LNAI), vol. 4303, pp. 187–198. Springer, Heidelberg (2006)
Chapter Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Google Scholar
Kelarev, A.V., Kang, B.H., Sale, A.H.J., Williams, R.N.: Labeled directed graphs and FSA as classifiers of strings. In: 17th Australasian Workshop on Combinatorial Algorithms, AWOCA 2006, Uluru (Ayres Rock), Northern Territory, Australia, July 12–16, pp. 93–109 (2006)
Google Scholar
Kelarev, A., Kang, B., Steane, D.: Clustering algorithms for ITS sequence data with alignment metrics. In: Sattar, A., Kang, B.-h. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1027–1031. Springer, Heidelberg (2006)
Chapter Google Scholar
Lee, K., Kay, J., Kang, B.H.: KAN and RinSCut: lazy linear classifier and rank-in-score threshold in similarity-based text categorization. In: Proc. ICML 2002 Workshop on Text Learning, University of New South Wales, Sydney, Australia, pp. 36–43 (2002)
Google Scholar
Park, G.S., Park, S., Kim, Y., Kang, B.H.: Intelligent web document classification using incrementally changing training data set. J. Security Engineering 2, 186–191 (2005)
Google Scholar
Sattar, A., Kang, B.H.: Advances in Artificial Intelligence. In: Proceedings of AI 2006, Hobart, Tasmania (2006)
Google Scholar
Steane, D.A., Nicolle, D., Mckinnon, G.E., Vaillancourt, R.E., Potts, B.M.: High-level relationships among the eucalypts are resolved by ITS-sequence data. Australian Systematic Botany 15, 49–62 (2002)
Article Google Scholar
WEKA, Waikato Environment for Knowledge Analysis, http://www.cs.waikato.ac.nz/ml/weka
Washio, T., Motoda, H.: State of the art of graph-based data mining, SIGKDD Explorations. In: Dzeroski, S., De Raedt, L. (eds.) Editorial: Multi-Relational Data Mining: The Current Frontiers; SIGKDD Exploration 5(1), 59–68 (2003)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Yearwood, J.L., Mammadov, M.: Classification Technologies: Optimization Approaches to Short Text Categorization. Idea Group Inc., USA (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Mathematical Sciences, University of Ballarat, P.O. Box 663, Ballarat, Victoria, 3353, Australia
J. L. Yearwood & A. V. Kelarev
School of Computing and Information Systems, University of Tasmania, Private Bag 100, Hobart, Tasmania, 7001, Australia
B. H. Kang

Authors

J. L. Yearwood
View author publications
You can also search for this author in PubMed Google Scholar
B. H. Kang
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Kelarev
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sogang University, South Korea
Young-hoon Lee
Hannam University, Daejeon, South Korea
Tai-hoon Kim
National Chiao Tung University, Hsinchu, Taiwan
Wai-chi Fang
University of Warsaw & Infobright Inc., Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yearwood, J.L., Kang, B.H., Kelarev, A.V. (2009). Experimental Investigation of Three Machine Learning Algorithms for ITS Dataset. In: Lee, Yh., Kim, Th., Fang, Wc., Ślęzak, D. (eds) Future Generation Information Technology. FGIT 2009. Lecture Notes in Computer Science, vol 5899. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10509-8_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-10509-8_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10508-1
Online ISBN: 978-3-642-10509-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics