Analysis and Comparison of Genomes of HIV-1 and HIV-2 Using Apriori Algorithm, Decision Tree, and Support Vector Machine
AIDS is caused by HIV, which can be divided into two strains: HIV-1 and HIV-2. Whereas HIV-1 is distributed around the world and is the major cause of global infections, HIV-2 is less infectious and transmissible and is therefore generally confined to West Africa. Thus this research aims to account for their difference by analyzing genome sequences of HIV-1 and HIV-2 using some methods: Apriori algorithm, Decision tree, and Support Vector Machine. Apriori demonstrates that HIV-1 has lysine, arginine, and serine as its typical amino acids, while HIV-2 has glycine, lysine, leucine, and arginine. Decision tree determines the significant positions of amino acids that can distinguish the two viruses: pos5 in 9 window, pos13 in 13 window, and pos16 in 19 window. SVM indicates that two viruses are seemingly similar but indeed different. The collective results provide a biologically verifiable background for making effective vaccines for HIV, especially for HIV-2.
KeywordsHIV-1 HIV-2 Amino acids Bioinformatics Data mining Apriori algorithm Decision tree Support vector machine (SVM)
- 12.Kropp, S., Caulfield, V.I.C.: Data Mining and Bioinformatics. Faculty of Information Technology, Monash University, Caulfield (2004)Google Scholar
- 13.Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification. 1–16 (2003)Google Scholar
- 14.Byvatov, E., Schneider, G.: Support vector machine applications in bioinformatics. Appl. Bioinform. 2(2), 67–77 (2002)Google Scholar
- 16.Chen, X., Wang, M., Zhang, H.: The use of classification trees for bioinformatics. Wiley Interdiscip. Rev.: Data Min. Knowl. Disc. 1(1), 55–63 (2011)Google Scholar