Skip to main content

Classification on DNA Sequences of Hepatitis B Virus

  • Conference paper
  • 2578 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 249))

Abstract

Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development. One of the challenges is to identify genomic markers in Hepatitis B Virus (HBV) that are associated with HCC (liver cancer) development by comparing the complete genomic sequences of HBV among patients with HCC and those without HCC. In this study, a data mining framework, which includes molecular evolution analysis and classification, is introduced. Our research group has collected HBV DNA sequences, either genotype B or C, from over 200 patients specifically for this project. In the molecular evolution analysis and clustering, three subgroups have been identified in genotype C and a clustering method has been developed to separate the subgroups. A new classification method by Nonlinear Integral has been developed. Good performance of this method comes from the use of the fuzzy measure and the relevant nonlinear integral. The non additivity of the fuzzy measure reflects the importance of the feature attributes as well as their interactions. A thorough comparison study of these two methods with existing methods is detailed. For genotype B, genotype C subgroups C1, C2, and C3, important mutation markers (sites) have been found, respectively. These two classification methods have been applied to classify never-seen-before examples for validation. The results show that the classification methods have more than 70 percent accuracy and 80 percent sensitivity for most data sets, which are considered high as an initial scanning method for liver cancer diagnosis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, H.L.Y., Tse, C.H., Ng, E.Y.T., Leung, K.S., Lee, K.H., Tsui, K.W., Sung, J.J.Y.: Phylogenetic, Virological and Clinical Characteristics of Genotype C Hepatitis B Virus with Tcc at Codon 15 of the Precore Region. J. Clinical Microbiology 44(3), 681–687 (2006)

    Article  Google Scholar 

  2. Ciancio, A., Smedile, A., Rizzetto, M.: Identification of HBV DNA Sequences that Are Predictive of Response to Lamivudine Therapy. Hepatology 39, 64–73 (2004)

    Article  Google Scholar 

  3. Kimura, M.: A Simple Method for Estimating Evolutionary Rates of Base Substitutions through Comparative Studies of Nucleotide Sequences. J. Molecular Evolution 16, 111–120 (1980)

    Article  Google Scholar 

  4. Orito, E., et al.: Geographic Distribution of Hepatitis B Virus (HBV) Genotype in Patients with Chronic HBV Infection in Japan. Hepatology 34, 590–594 (2001)

    Article  Google Scholar 

  5. Eugene, C.: Bayesian Network without Tears. AI Magazine 12(4), 50–63 (1991)

    Google Scholar 

  6. Freitas, A.A.: A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery. In: Ghosh, A., Tsutsui, S. (eds.) Advances in Evolutionary Computation. Springer (2002)

    Google Scholar 

  7. Wong, M.L., Leung, K.S.: Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers (January 2000)

    Google Scholar 

  8. Xu, K.B., Wang, Z.Y., Heng, P.A., Leung, K.S.: Classification by Nonlinear Integral Projections. IEEE Trans. Fuzzy Systems 11(2), 187–201 (2003)

    Article  Google Scholar 

  9. Wong, M.L., Leung, K.S.: Genetic Logic Programming and Applications. IEEE Expert 10(5), 68–76 (1995)

    Article  Google Scholar 

  10. Data Mining Tools See5 and C5.0, Software (May 2006), http://www.rulequest.com/see5info.html

  11. SAS1EnterpriseMiner (EM), http://www.sas.com/technologies/analytics/datamining/miner/

  12. Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines, Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm

  13. Borgelt, C.: Bayes Classifier Induction, Software (2009), http://fuzzy.cs.uni-magdeburg.de/~borgelt/bayes.html

  14. Zhang, H.: The Optimality of Naive Bayes. In: Proc. 17th Int’l Florida Alliance of Information and Referral Services (FLAIRS) Conf. (2004)

    Google Scholar 

  15. Van Der Walt, C.M., Barnard, E.: Data Characteristics That Determine Classifier Performance. In: Proc. 16th Ann. Symp. Pattern Recognition Assoc. of South Africa, pp. 160–165 (2006), http://www.patternrecognition.co.za

  16. Leung, K.S., Ng, Y.T., Lee, K.H., Chan, L.Y., Tsui, K.W., Mok, T., Tse, C.H., Sung, J.: Data Mining on DNA Sequences of Hepatitis B Virus by Nonlinear Integrals. Proc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Swapna Rekha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Rekha, H.S., Vijaya Lakshmi, P. (2014). Classification on DNA Sequences of Hepatitis B Virus. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03095-1_46

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03094-4

  • Online ISBN: 978-3-319-03095-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics