Improving Positive Unlabeled Learning Algorithms for Protein Interaction Prediction

Pancaroglu, Doruk; Tan, Mehmet

doi:10.1007/978-3-319-07581-5_10

Doruk Pancaroglu⁶ &
Mehmet Tan⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 294))

1182 Accesses
2 Citations

Abstract

In binary classification, it is sometimes difficult to label two training samples as negative. The aforementioned difficulty in obtaining true negative samples created a need for learning algorithms which does not use negative samples. This study aims to improve upon two PU learning algorithms, AGPS[2] and Roc-SVM[3] for protein interaction prediction. Two extensions to these algorithms is proposed; the first one is to use Random Forests as the classifier instead of support vector machines and the second is to combine the results of AGPS and Roc-SVM using a voting system. After these two approaches are implemented, their results was compared to the original algorithms as well as two well-known learning algorithms, ARACNE [9] and CLR [10]. In the comparisons, both the Random Forest (called AGPS-RF and Roc-RF) and the Hybrid algorithm performed well against the original SVM-classified ones. The improved algorithms also performed well against ARACNE and CLR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kilic, C., Tan, M.: Positive unlabelled learning for deriving protein interaction networks. Netw. Modeling Anal. in Health Inform. and Bioinform. 1(3), 87–102 (2012)
Article Google Scholar
Zhao, X.-M., Wang, Y., Chen, L., Aihara, K.: Gene function prediction using labeled and unlabeled data. BMC Bioinformatics 9, 57 (2008)
Article Google Scholar
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI 2003: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–592 (2003)
Google Scholar
Wang, C., Ding, C., Meraz, R.F., Holbrook, S.R.: PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22(21), 2590–2596 (2006)
Article Google Scholar
Carter, R.J., Dubchak, I., Holbrook, S.R.: A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res. 29(19), 3928–3938 (2001)
Google Scholar
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM, New York (2008)
Google Scholar
Mordelet, F., Vert, J.-P.: A bagging SVM to learn from positive and unlabeled examples (2010)
Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML (2002)
Google Scholar
Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1), S7 (2006)
Google Scholar
Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., et al.: Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biol. 5(1), e8 (2007), doi:10.1371/journal.pbio.0050008
Google Scholar
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)
Google Scholar
Statistics, L.B., Breiman, L.: Random Forests. Machine Learning, 5–32 (2001)
Google Scholar
Näppi, J.J., Regge, D., Yoshida, H.: Comparative Performance of Random Forest and Support Vector Machine Classifiers for Detection of Colorectal Lesions in CT Colonography. In: Yoshida, H., Sakas, G., Linguraru, M.G. (eds.) Abdominal Imaging. LNCS, vol. 7029, pp. 27–34. Springer, Heidelberg (2012)
Chapter Google Scholar
Tang, Y., Krasser, S., He, Y., Yang, W., Alperovitch, D.: Support Vector Machines and Random Forests Modeling for Spam Senders Behavior Analysis. In: Proceedings of IEEE Global Communications Conference (IEEE GLOBECOM 2008), Computer and Communications Network Security Symposium, New Orleans, LA (2008)
Google Scholar
Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: Proceedings of the First Conference on Email and Anti-Spam, Mountain View, CA, USA (2004)
Google Scholar
Faith, et al.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 36(Database issue), D866–D870 (2008), doi:10.1093/nar/gkr1088
Google Scholar
Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., Jandrasits, C., Jimenez, R.C., Khadake, J., Mahadevan, U., Masson, P., Pedruzzi, I., Pfeiffenberger, E., Porras, P., Raghunath, A., Roechert, B., Orchard1, S., Hermjakob, H.: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40(1), D841–D846 (2011), doi:10.1093/nar/gkr1088
Google Scholar
Witten, H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (October 1999), http://www.cs.waikato.ac.nz/ml/weka/

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, TOBB University of Economics and Technology, Ankara, Turkey
Doruk Pancaroglu & Mehmet Tan

Authors

Doruk Pancaroglu
View author publications
You can also search for this author in PubMed Google Scholar
Mehmet Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Doruk Pancaroglu .

Editor information

Editors and Affiliations

EMBL Outstation - Hinxton, European Bioinformatics Institute, Hinxton, United Kingdom
Julio Saez-Rodriguez
Department of Informatics, University of Minho, Braga, Portugal
Miguel P. Rocha
Department of Informatics Campus Universitario As Lagoas s/n, University of Vigo, Ourense, Spain
Florentino Fdez-Riverola
Department of Computing Science, University of Salamanca, Salamanca, Spain
Juan F. De Paz Santana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pancaroglu, D., Tan, M. (2014). Improving Positive Unlabeled Learning Algorithms for Protein Interaction Prediction. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-07581-5_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics