Abstract
The Human Immunodeficiency Virus (HIV) is a retrovirus that attacks the human immune system reducing its effectiveness. Combinations of antiretroviral drugs are used to treat the infection by HIV. However, the high mutation rate in the HIV virus makes it resistant to some antiretroviral drugs and leads to treatment failure. Nowadays, there are computational methods based on machine learning that try to predict the patients’ response to therapies. In this bioinformatics study we deal with data preprocessing techniques to find significant features in HIV sequences that can be interesting for the prediction of patients’ short-term progression. Experiments were conducted trough four classification methods using datasets composed by different sets of attributes. Classifiers trained with a dataset including solely viral load, CD4+ cell counts and information about mutations in the viral genome achieved accuracies ranging from 50.29% to 63.87%. Nevertheless, the addition of attributes (antiretroviral drug resistance levels, HIV subtype, epitope occurrence and others) in the dataset has improved the accuracy of the classifiers in almost all tests executed in this work, indicating its relevance to the prediction task discussed here.
This work was supported in part by a UFOP MSc scholarship and in part by CNPq research grant 307711/2010-2.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Andrew, R., David, P., Crandall, K.A., Holmes, E.C.: The causes and consequences of HIV evolution. Nature Reviews Genetics 5(1), 52–61 (2004)
Bhaskar, H., Hoyle, D., Singh, S.: Machine learning in bioinformatics: A brief survey and recommendations for practitioners. Computers in Biology and Medicine 36(10), 1104–1125 (2006)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Deeks, S.: Treatment of antiretroviral-drug-resistant HIV-1 infection. The Lancet 362(9400), 2002–2011 (2003)
Frankel, A.D., Young, J.A.T.: HIV-1: Fifteen proteins and an rna. Annual Review of Biochemistry 67(1), 1–25 (1998)
Freund, Y.: The alternating decision tree learning algorithm. In: Machine Learning: Proceedings of the Sixteenth International Conference, pp. 124–133. Morgan Kaufmann, San Francisco (1999)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18
Hall, M.A.: Correlation-based feature selection for machine learning. Tech. rep. (1999)
Havlir, D., Richman, D.: Viral dynamics of HIV: implications for drug development and therapeutic strategies. Annals of Internal Medicine 124(11), 984 (1996)
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Larder, B., Wang, D., Revell, A., Montaner, J., Harrigan, R., De Wolf, F., Lange, J., Wegner, S., Ruiz, L., Pérez-Elías, M., et al.: The development of artificial neural networks to predict virological response to combination HIV therapy. Antiviral Therapy 12(1), 15 (2007)
Lin, R.S., Rhee, S.Y., Shafer, R.W., Das, A.K.: Prediction of HIV mutation changes based on treatment history. American Medical Informatics Association (2006)
Liu, H., Setiono, R.: A probabilistic approach to feature selection - a filter solution. In: Proc. of Int. Conf. on Machine Learning, pp. 319–327. Morgan Kaufmann, San Francisco (1996)
Nanni, L., Lumini, A.: MppS: An ensemble of support vector machine based on multiple physicochemical properties of amino acids. Neurocomputing 69(13-15), 1688–1690 (2006)
Pinney, J.W., Dickerson, J.E., Fu, W., Sanders-Beer, B.E., Ptak, R.G., Robertson, D.L.: HIV-host interactions: a map of viral perturbation of the host system. AIDS 23(5) (March 2009)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)
Ptak, R.G., Fu, W., Sanders-Beer, B.E., Dickerson, J.E., Pinney, J.W., Robertson, D.L., Rozanov, M.N., Katz, K.S., Maglott, D.R., Pruitt, K.D., Dieffenbach, C.W.: Cataloguing the hiv type 1 human protein interaction network. AIDS Res. Hum. Retroviruses 24(12), 1497–1502 (2008)
Rosen-Zvi, M., Altmann, A., Prosperi, M., Aharoni, E., Neuvirth, H., Sönnerborg, A., Schülter, E., Struck, D., Peres, Y., Incardona, F., Kaiser, R., Zazzi, M., Lengauer, T.: Selecting anti-HIV therapies based on a variety of genomic and clinical factors. Bioinformatics 24, i399–i406 (2008)
Dampier, W., Perry Evans, L.U., Tozeren, A.: Host sequence motifs shared by HIV predict response to antiretroviral therapy. BMC Med. Genomics 47 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de Lima Oliveira, S.E., de Campos Merschmann, L.H., Bouillet, L.E.M. (2011). Identifying Significant Features in HIV Sequence to Predict Patients’ Response to Therapies. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2011. Lecture Notes in Computer Science(), vol 6832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22825-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-22825-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22824-7
Online ISBN: 978-3-642-22825-4
eBook Packages: Computer ScienceComputer Science (R0)