Abstract
The primary objective of clustering is to discover a structure in the data by forming some number of clusters or groups. In order to achieve optimal clustering results in current soft computing approaches, two fundamental questions should be considered; (i) how many clusters should be actually presented in the given data, and (ii) how real or good the clustering itself is. Based on these two fundamental questions, almost clustering method needs to determine the number of clusters . Yet, it is difficult to determine an optimal number of a cluster group should be obtained for each data set. Hence, DNA-based clustering algorithms were proposed to solve clustering problem without considering any preliminary parameters such as a number of clusters, iteration and, etc..
Because of the nature of processes between DNA-based solutions with a silicon- based solution, the evaluation of obtained results from DNA-based clustering is critical to be conducted. It is to ensure that the obtained results from this proposal can be accepted as well as other soft computing techniques. Thus, this study proposes two different techniques to evaluate the DNA-based clustering algorithms either it can be accepted as other soft computing techniques or the results that obtained from DNA-based clustering are not reliable for employed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adleman, L.M.: Molecular Computation of Solutions to Combinatorial Problems. Science 266(11), 1021–1024 (1994)
Bakar, R.B.A., Watada, J., Pedryzc, W.: DNA approach to solve clustering problem based on a mutual distance order. Biosystems 91(1), 1–12 (2008)
Bakar, R.B.A., Watada, J.: A proximity approach to DNA based clustering analysis. International Journal of Innovative Computing, Information and Control (IJICIC) 4(5), 1203–1212 (2008)
Oehler, K.L., Gray, R.M.: Combining image compression and classification using vector quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 61–473 (1995)
Shopbell, P.L., Britton, M.C., Ebert, R.: Making the most of missing values: object clustering with partial data in astronomy, astronomical data analysis software and system XIV. ASP Conference Series, vol. 30 (2005)
Jiang, T., Tuzhillin, A.: Segmenting customers from population to individuals: Does 1-to-1 keep your customer forever. IEEE Transaction on Knowledge and Data Engineering 18(10), 1297–1311 (2006)
Jimmy, L., Karakos, D., Fushman, D.D., Khudanpur, S.: Generative content models for structural analysis of medical abstracts. In: Proceedings of the 2006 Workshop on Biomedical Natural Language Processing (BioNPL 2006), New York City (June 2006)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computer Surveys 31(3) (September 1999)
Pedrycz, W.: Knowledge-based clustering:From data to information granules. Wiley Interscience, Hoboken (2005)
Franti, P., Xu, M., Karkkainen, I.: Classification of binary vectors by using ∆ SC distance to minimize stochastic complexity. Journal of Pattern Recognation 24, 65–73 (2003)
Lu, X.-g., Lin, et al.: Gene cluster algorithm based on most similarity tree. In: Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA 2005), Beijing, November 30-December 3 (2005)
Cleju, I., Franti, P., Wu, X.: Clustering Based on Principal Curve. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 872–881. Springer, Heidelberg (2005)
Jain, A.K., Law, M.H.C.: Data clustering: A user’s dilemma. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds.) PReMI 2005. LNCS, vol. 3776, pp. 1–10. Springer, Heidelberg (2005)
Volfovsky, N., et al.: A clustering method for repeat analysis in DNA sequences, Genome Biology Publication, Citing Internet sources (2001), http://genomebiology.com/2001/2/8/research/0027
FitzGerald, P.C., Shlyakhtenko, A., Mir, A.A., Vinson, C.: Clustering of DNA sequences in human promoters. Cold Spring Harbor Laboratory Press (2004); ISBN 1088-9051/04, http://www.genome.org
Sang, L., et al.: CLAGen: A tool for clustering and annotating gene se-quences using a suffix tree algorithm. BioSystems 84, 175–182 (2006)
Joseph, Z.B., Gifford, D.K., Jaakkola, T.S.: Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 17(suppl.1), S22–S29 (2001)
Kim, S.Y., Lee, W.L., Bae, J.S.: Effect of data normalization on fuzzy clustering of DNA microarray data. BMC Bioinformatics 7,134 (2006), http://www.biomedcentrel.com/1471-2105/7/135
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bakar, R.A., Yu-Yi, C., Watada, J. (2011). Robustness of DNA-Based Clustering. In: Ruano, A.E., Várkonyi-Kóczy, A.R. (eds) New Advances in Intelligent Signal Processing. Studies in Computational Intelligence, vol 372. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11739-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-11739-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11738-1
Online ISBN: 978-3-642-11739-8
eBook Packages: EngineeringEngineering (R0)