Learning Robust Multi-label Sample Specific Distances for Identifying HIV-1 Drug Resistance

Brand, Lodewijk; Yang, Xue; Liu, Kai; Elbeleidy, Saad; Wang, Hua; Zhang, Hao

doi:10.1007/978-3-030-17083-7_4

Lodewijk Brand ORCID: orcid.org/0000-0001-6296-2895¹⁵,
Xue Yang¹⁵,
Kai Liu ORCID: orcid.org/0000-0002-1272-0262¹⁵,
Saad Elbeleidy¹⁵,
Hua Wang ORCID: orcid.org/0000-0002-5986-7413¹⁵ &
…
Hao Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11467))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

1915 Accesses

Abstract

Acquired immunodeficiency syndrome (AIDS) is a syndrome caused by the human immunodeficiency virus (HIV). During the progression of AIDS, a patient’s the immune system is weakened, which increases the patient’s susceptibility to infections and diseases. Although antiretroviral drugs can effectively suppress HIV, the virus mutates very quickly and can become resistant to treatment. In addition, the virus can also become resistant to other treatments not currently being used through mutations, which is known in the clinical research community as cross-resistance. Since a single HIV strain can be resistant to multiple drugs, this problem is naturally represented as a multi-label classification problem. Given this multi-class relationship, traditional single-label classification methods usually fail to effectively identify the drug resistances that may develop after a particular virus mutation. In this paper, we propose a novel multi-label Robust Sample Specific Distance (RSSD) method to identify multi-class HIV drug resistance. Our method is novel in that it can illustrate the relative strength of the drug resistance of a reverse transcriptase sequence against a given drug nucleoside analogue and learn the distance metrics for all the drug resistances. To learn the proposed RSSDs, we formulate a learning objective that maximizes the ratio of the summations of a number of \(\ell _1\)-norm distances, which is difficult to solve in general. To solve this optimization problem, we derive an efficient, non-greedy, iterative algorithm with rigorously proved convergence. Our new method has been verified on a public HIV-1 drug resistance data set with over 600 RT sequences and five nucleoside analogues. We compared our method against other state-of-the-art multi-label classification methods and the experimental results have demonstrated the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, G., Song, Y., Wang, F., Zhang, C.: Semi-supervised multi-label learning by solving a sylvester equation. In: SDM, pp. 410–419. SIAM (2008)
Google Scholar
Ding, C., Zhou, D., He, X., Zha, H.: R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization. In: ICML, pp. 281–288 (2006)
Google Scholar
Feng, J., Zhou, Z.H.: Deep MIML network. In: AAAI (2017)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Elsevier, Amsterdam (2013)
Google Scholar
Gönen, M., Margolin, A.A.: Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning. Bioinformatics 30(17), i556–i563 (2014)
Article Google Scholar
Han, F., Wang, H., Zhang, H.: Learning of integrated holism-landmark representations for long-term loop closure detection. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Heider, D., Senge, R., Cheng, W., Hüllermeier, E.: Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics 29(16), 1946–1952 (2013)
Article Google Scholar
Heider, D., Verheyen, J., Hoffmann, D.: Predicting bevirimat resistance of HIV-1 from genotype. BMC Bioinform. 11(1), 37 (2010)
Article Google Scholar
Hepler, N.L., et al.: IDEPI: rapid prediction of HIV-1 antibody epitopes and other phenotypic features from sequence data using a flexible machine learning platform. PLOS Comput. Biol. 10(9), e1003842 (2014)
Article Google Scholar
Jenatton, R., Obozinski, G., Bach, F.: Structured sparse principal component analysis. In: International Conference on Artificial Intelligence and Statistics (2010)
Google Scholar
Ke, Q., Kanade, T.: Robust L/sub 1/norm factorization in the presence of outliers and missing data by alternative convex programming. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 739–746. IEEE (2005)
Google Scholar
Kwak, N.: Principal component analysis based on L1-norm maximization. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1672–1680 (2008)
Article Google Scholar
Kyte, J., Doolittle, R.F.: A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157(1), 105–132 (1982)
Article Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Google Scholar
Liu, K., Wang, H., Nie, F., Zhang, H.: Learning multi-instance enriched image representations via non-greedy ratio maximization of the L1-norm distances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7727–7735 (2018)
Google Scholar
Liu, Y., Gao, Q., Miao, S., Gao, X., Nie, F., Li, Y.: A non-greedy algorithm for L1-norm LDA. IEEE Trans. Image Process. 26(2), 684–695 (2017)
Article MathSciNet Google Scholar
Nie, F., et al.: New L1-norm relaxations and optimizations for graph clustering. In: AAAI, pp. 1962–1968 (2016)
Google Scholar
Nie, F., Wang, H., Huang, H., Ding, C.: Unsupervised and semi-supervised learning via \(\ell _1\)-norm graph. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2268–2273. IEEE (2011)
Google Scholar
Pennings, P.S.: Standing genetic variation and the evolution of drug resistance in HIV. PLoS Comput. Biol. 8(6), e1002527 (2012)
Article MathSciNet Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Article MathSciNet Google Scholar
Rhee, S.Y., Gonzales, M.J., Kantor, R., Betts, B.J., Ravela, J., Shafer, R.W.: Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res. 31(1), 298–303 (2003)
Article Google Scholar
Rhee, S.Y., Taylor, J., Wadhera, G., Ben-Hur, A., Brutlag, D.L., Shafer, R.W.: Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc. Natl. Acad. Sci. 103(46), 17355–17360 (2006)
Article Google Scholar
Riemenschneider, M., Senge, R., Neumann, U., Hüllermeier, E., Heider, D.: Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Min. 9(1), 10 (2016)
Article Google Scholar
Smyth, R.P., Davenport, M.P., Mak, J.: The origin of genetic diversity in HIV-1. Virus Res. 169(2), 415–429 (2012)
Article Google Scholar
Sun, W., Yuan, Y.X.: Optimization Theory and Methods: Nonlinear Programming, vol. 1. Springer, Heidelberg (2006). https://doi.org/10.1007/b106451
Book MATH Google Scholar
Wang, H., Deng, C., Zhang, H., Gao, X., Huang, H.: Drosophila gene expression pattern annotations via multi-instance biological relevance learning. In: AAAI, pp. 1324–1330 (2016)
Google Scholar
Wang, H., Ding, C., Huang, H.: Multi-label linear discriminant analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 126–139. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_10
Chapter Google Scholar
Wang, H., Ding, C.H., Huang, H.: Multi-label classification: inconsistency and class balanced k-nearest neighbor. In: AAAI (2010)
Google Scholar
Wang, H., Huang, H., Ding, C.: Image annotation using multi-label correlated green’s function. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2029–2034. IEEE (2009)
Google Scholar
Wang, H., Huang, H., Ding, C.: Multi-label feature transform for image classifications. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 793–806. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_57
Chapter Google Scholar
Wang, H., Huang, H., Ding, C.: Function-function correlated multi-label protein function prediction over interaction networks. J. Comput. Biol. 20(4), 322–343 (2013)
Article MathSciNet Google Scholar
Wang, H., Huang, H., Ding, C.: Correlated protein function prediction via maximization of data-knowledge consistency. J. Comput. Biol. 22(6), 546–562 (2015)
Article Google Scholar
Wang, H., Huang, H., Kamangar, F., Nie, F., Ding, C.H.: Maximum margin multi-instance learning. In: Advances in Neural Information Processing Systems, pp. 1–9 (2011)
Google Scholar
Wang, H., Nie, F., Huang, H.: Learning instance specific distance for multi-instance classification. In: AAAI, vol. 2, p. 6 (2011)
Google Scholar
Wang, H., Nie, F., Huang, H.: Robust and discriminative distance for multi-instance learning. In: CVPR. IEEE (2012)
Google Scholar
Wang, H., Nie, F., Huang, H.: Robust and discriminative self-taught learning. In: International Conference on Machine Learning, pp. 298–306 (2013)
Google Scholar
Wang, H., Nie, F., Huang, H.: Robust distance metric learning via simultaneous \(\ell _1\)-norm minimization and maximization. In: ICML, pp. 1836–1844 (2014)
Google Scholar
Wang, H., Nie, F., Huang, H., Yang, Y.: Learning frame relevance for video classification. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1345–1348. ACM (2011)
Google Scholar
Wang, H., Yan, L., Huang, H., Ding, C.: From protein sequence to protein function via multi-label linear discriminant analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 14(3), 503–513 (2017)
Article Google Scholar
Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted. In: NIPS, p. 116 (2009)
Google Scholar
Wright, S.J., Nocedal, J.: Numerical optimization. Springer Sci. 35(67–68), 7 (1999)
MATH Google Scholar
Yuan, H., Paskov, I., Paskov, H., González, A.J., Leslie, C.S.: Multitask learning improves prediction of cancer drug sensitivity. Sci. Rep. 6, 31619 (2016)
Google Scholar

Download references

Acknowledgments

This work was partially supported by National Science Foundation under Grant NSF-IIS 1652943. This research was also partially supported by Army Research Office (ARO) under Grant W911NF-17-1-0447, U.S. Air Force Academy (USAFA) under Grant FA7000-18-2-0016, and the Distributed and Collaborative Intelligent Systems and Technology (DCIST) CRA under Grant W911NF-17-2-0181.

Author information

Authors and Affiliations

Department of Computer Science, Colorado School of Mines, Golden, CO, 80401, USA
Lodewijk Brand, Xue Yang, Kai Liu, Saad Elbeleidy, Hua Wang & Hao Zhang

Authors

Lodewijk Brand
View author publications
You can also search for this author in PubMed Google Scholar
Xue Yang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Saad Elbeleidy
View author publications
You can also search for this author in PubMed Google Scholar
Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Wang .

Editor information

Editors and Affiliations

Tufts University, Cambridge, MA, USA
Lenore J. Cowen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brand, L., Yang, X., Liu, K., Elbeleidy, S., Wang, H., Zhang, H. (2019). Learning Robust Multi-label Sample Specific Distances for Identifying HIV-1 Drug Resistance. In: Cowen, L. (eds) Research in Computational Molecular Biology. RECOMB 2019. Lecture Notes in Computer Science(), vol 11467. Springer, Cham. https://doi.org/10.1007/978-3-030-17083-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-17083-7_4
Published: 02 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17082-0
Online ISBN: 978-3-030-17083-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics