Abstract
Acquired immunodeficiency syndrome (AIDS) is a syndrome caused by the human immunodeficiency virus (HIV). During the progression of AIDS, a patient’s the immune system is weakened, which increases the patient’s susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multi-label classification problem. Given this multi-class relationship, traditional single-label classification methods usually fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this paper, we propose a novel multi-label Robust Sample Specific Distance (RSSD) method to identify multi-class HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase sequence against a given drug nucleoside analogue and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of \(\ell _1\)-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, non-greedy, iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV-1 drug resistance data set with over 600 RT sequences and five nucleoside analogues. We compared our method against other state-of-the-art multi-label classification methods and the experimental results have demonstrated the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, G., Song, Y., Wang, F., Zhang, C.: Semi-supervised multi-label learning by solving a sylvester equation. In: SDM, pp. 410–419. SIAM (2008)
Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, pp. 281–288 (2006)
Feng, J., Zhou, Z.H.: Deep MIML network. In: AAAI (2017)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Elsevier, Amsterdam (2013)
Gönen, M., Margolin, A.A.: Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning. Bioinformatics 30(17), i556–i563 (2014)
Han, F., Wang, H., Zhang, H.: Learning of integrated holism-landmark representations for long-term loop closure detection. In: AAAI Conference on Artificial Intelligence (2018)
Heider, D., Senge, R., Cheng, W., HĂ¼llermeier, E.: Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics 29(16), 1946–1952 (2013)
Heider, D., Verheyen, J., Hoffmann, D.: Predicting bevirimat resistance of HIV-1 from genotype. BMC Bioinform. 11(1), 37 (2010)
Hepler, N.L., et al.: IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform. PLOS Comput. Biol. 10(9), e1003842 (2014)
Jenatton, R., Obozinski, G., Bach, F.: Structured sparse principal component analysis. In: International Conference on Artificial Intelligence and Statistics (2010)
Ke, Q., Kanade, T.: Robust L/sub 1/norm factorization in the presence of outliers and missing data by alternative convex programming. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 739–746. IEEE (2005)
Kwak, N.: Principal component analysis based on L1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1672–1680 (2008)
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157(1), 105–132 (1982)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Liu, K., Wang, H., Nie, F., Zhang, H.: Learning multi-instance enriched image representations via non-greedy ratio maximization of the L1-norm distances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7727–7735 (2018)
Liu, Y., Gao, Q., Miao, S., Gao, X., Nie, F., Li, Y.: A non-greedy algorithm for L1-norm LDA. IEEE Trans. Image Process. 26(2), 684–695 (2017)
Nie, F., et al.: New L1-norm relaxations and optimizations for graph clustering. In: AAAI, pp. 1962–1968 (2016)
Nie, F., Wang, H., Huang, H., Ding, C.: Unsupervised and semi-supervised learning via \(\ell _1\)-norm graph. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2268–2273. IEEE (2011)
Pennings, P.S.: Standing genetic variation and the evolution of drug resistance in HIV. PLoS Comput. Biol. 8(6), e1002527 (2012)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Rhee, S.Y., Gonzales, M.J., Kantor, R., Betts, B.J., Ravela, J., Shafer, R.W.: Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31(1), 298–303 (2003)
Rhee, S.Y., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, D.L., Shafer, R.W.: Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. 103(46), 17355–17360 (2006)
Riemenschneider, M., Senge, R., Neumann, U., HĂ¼llermeier, E., Heider, D.: Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min. 9(1), 10 (2016)
Smyth, R.P., Davenport, M.P., Mak, J.: The origin of genetic diversity in HIV-1. Virus Res. 169(2), 415–429 (2012)
Sun, W., Yuan, Y.X.: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer, Heidelberg (2006). https://doi.org/10.1007/b106451
Wang, H., Deng, C., Zhang, H., Gao, X., Huang, H.: Drosophila gene expression pattern annotations via multi-instance biological relevance learning. In: AAAI, pp. 1324–1330 (2016)
Wang, H., Ding, C., Huang, H.: Multi-label linear discriminant analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 126–139. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_10
Wang, H., Ding, C.H., Huang, H.: Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAI (2010)
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated green’s function. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2029–2034. IEEE (2009)
Wang, H., Huang, H., Ding, C.: Multi-label feature transform for image classifications. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 793–806. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_57
Wang, H., Huang, H., Ding, C.: Function-function correlated multi-label protein function prediction over interaction networks. J. Comput. Biol. 20(4), 322–343 (2013)
Wang, H., Huang, H., Ding, C.: Correlated protein function prediction via maximization of data-knowledge consistency. J. Comput. Biol. 22(6), 546–562 (2015)
Wang, H., Huang, H., Kamangar, F., Nie, F., Ding, C.H.: Maximum margin multi-instance learning. In: Advances in Neural Information Processing Systems, pp. 1–9 (2011)
Wang, H., Nie, F., Huang, H.: Learning instance specific distance for multi-instance classification. In: AAAI, vol. 2, p. 6 (2011)
Wang, H., Nie, F., Huang, H.: Robust and discriminative distance for multi-instance learning. In: CVPR. IEEE (2012)
Wang, H., Nie, F., Huang, H.: Robust and discriminative self-taught learning. In: International Conference on Machine Learning, pp. 298–306 (2013)
Wang, H., Nie, F., Huang, H.: Robust distance metric learning via simultaneous \(\ell _1\)-norm minimization and maximization. In: ICML, pp. 1836–1844 (2014)
Wang, H., Nie, F., Huang, H., Yang, Y.: Learning frame relevance for video classification. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1345–1348. ACM (2011)
Wang, H., Yan, L., Huang, H., Ding, C.: From protein sequence to protein function via multi-label linear discriminant analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 14(3), 503–513 (2017)
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted. In: NIPS, p. 116 (2009)
Wright, S.J., Nocedal, J.: Numerical optimization. Springer Sci. 35(67–68), 7 (1999)
Yuan, H., Paskov, I., Paskov, H., GonzĂ¡lez, A.J., Leslie, C.S.: Multitask learning improves prediction of cancer drug sensitivity. Sci. Rep. 6, 31619 (2016)
Acknowledgments
This work was partially supported by National Science Foundation under Grant NSF-IIS 1652943. This research was also partially supported by Army Research Office (ARO) under Grant W911NF-17-1-0447, U.S. Air Force Academy (USAFA) under Grant FA7000-18-2-0016, and the Distributed and Collaborative Intelligent Systems and Technology (DCIST) CRA under Grant W911NF-17-2-0181.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Brand, L., Yang, X., Liu, K., Elbeleidy, S., Wang, H., Zhang, H. (2019). Learning Robust Multi-label Sample Specific Distances for Identifying HIV-1 Drug Resistance. In: Cowen, L. (eds) Research in Computational Molecular Biology. RECOMB 2019. Lecture Notes in Computer Science(), vol 11467. Springer, Cham. https://doi.org/10.1007/978-3-030-17083-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-17083-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17082-0
Online ISBN: 978-3-030-17083-7
eBook Packages: Computer ScienceComputer Science (R0)