Abstract
Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. One of the challenges is to identify genomic markers in Hepatitis B Virus (HBV) that are associated with HCC (liver cancer) development by comparing the complete genomic sequences of HBV among patients with HCC and those without HCC. In this study, a data mining framework, which includes molecular evolution analysis and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. A new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The non additivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. A thorough comparison study of these two methods with existing methods is detailed. For genotype B, genotype C subgroups C1, C2, and C3, important mutation markers (sites) have been found, respectively. These two classification methods have been applied to classify never-seen-before examples for validation. The results show that the classification methods have more than 70 percent accuracy and 80 percent sensitivity for most data sets, which are considered high as an initial scanning method for liver cancer diagnosis.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chan, H.L.Y., Tse, C.H., Ng, E.Y.T., Leung, K.S., Lee, K.H., Tsui, K.W., Sung, J.J.Y.: Phylogenetic, Virological and Clinical Characteristics of Genotype C Hepatitis B Virus with Tcc at Codon 15 of the Precore Region. J. Clinical Microbiology 44(3), 681–687 (2006)
Ciancio, A., Smedile, A., Rizzetto, M.: Identification of HBV DNA Sequences that Are Predictive of Response to Lamivudine Therapy. Hepatology 39, 64–73 (2004)
Kimura, M.: A Simple Method for Estimating Evolutionary Rates of Base Substitutions through Comparative Studies of Nucleotide Sequences. J. Molecular Evolution 16, 111–120 (1980)
Orito, E., et al.: Geographic Distribution of Hepatitis B Virus (HBV) Genotype in Patients with Chronic HBV Infection in Japan. Hepatology 34, 590–594 (2001)
Eugene, C.: Bayesian Network without Tears. AI Magazine 12(4), 50–63 (1991)
Freitas, A.A.: A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery. In: Ghosh, A., Tsutsui, S. (eds.) Advances in Evolutionary Computation. Springer (2002)
Wong, M.L., Leung, K.S.: Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers (January 2000)
Xu, K.B., Wang, Z.Y., Heng, P.A., Leung, K.S.: Classification by Nonlinear Integral Projections. IEEE Trans. Fuzzy Systems 11(2), 187–201 (2003)
Wong, M.L., Leung, K.S.: Genetic Logic Programming and Applications. IEEE Expert 10(5), 68–76 (1995)
Data Mining Tools See5 and C5.0, Software (May 2006), http://www.rulequest.com/see5info.html
SAS1EnterpriseMiner (EM), http://www.sas.com/technologies/analytics/datamining/miner/
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines, Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm
Borgelt, C.: Bayes Classifier Induction, Software (2009), http://fuzzy.cs.uni-magdeburg.de/~borgelt/bayes.html
Zhang, H.: The Optimality of Naive Bayes. In: Proc. 17th Int’l Florida Alliance of Information and Referral Services (FLAIRS) Conf. (2004)
Van Der Walt, C.M., Barnard, E.: Data Characteristics That Determine Classifier Performance. In: Proc. 16th Ann. Symp. Pattern Recognition Assoc. of South Africa, pp. 160–165 (2006), http://www.patternrecognition.co.za
Leung, K.S., Ng, Y.T., Lee, K.H., Chan, L.Y., Tsui, K.W., Mok, T., Tse, C.H., Sung, J.: Data Mining on DNA Sequences of Hepatitis B Virus by Nonlinear Integrals. Proc.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Rekha, H.S., Vijaya Lakshmi, P. (2014). Classification on DNA Sequences of Hepatitis B Virus. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-319-03095-1_46
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03094-4
Online ISBN: 978-3-319-03095-1
eBook Packages: EngineeringEngineering (R0)