Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 294))

Abstract

In binary classification, it is sometimes difficult to label two training samples as negative. The aforementioned difficulty in obtaining true negative samples created a need for learning algorithms which does not use negative samples. This study aims to improve upon two PU learning algorithms, AGPS[2] and Roc-SVM[3] for protein interaction prediction. Two extensions to these algorithms is proposed; the first one is to use Random Forests as the classifier instead of support vector machines and the second is to combine the results of AGPS and Roc-SVM using a voting system. After these two approaches are implemented, their results was compared to the original algorithms as well as two well-known learning algorithms, ARACNE [9] and CLR [10]. In the comparisons, both the Random Forest (called AGPS-RF and Roc-RF) and the Hybrid algorithm performed well against the original SVM-classified ones. The improved algorithms also performed well against ARACNE and CLR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kilic, C., Tan, M.: Positive unlabelled learning for deriving protein interaction networks. Netw. Modeling Anal. in Health Inform. and Bioinform. 1(3), 87–102 (2012)

    Article  Google Scholar 

  2. Zhao, X.-M., Wang, Y., Chen, L., Aihara, K.: Gene function prediction using labeled and unlabeled data. BMC Bioinformatics 9, 57 (2008)

    Article  Google Scholar 

  3. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI 2003: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 587–592 (2003)

    Google Scholar 

  4. Wang, C., Ding, C., Meraz, R.F., Holbrook, S.R.: PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics 22(21), 2590–2596 (2006)

    Article  Google Scholar 

  5. Carter, R.J., Dubchak, I., Holbrook, S.R.: A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res. 29(19), 3928–3938 (2001)

    Google Scholar 

  6. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM, New York (2008)

    Google Scholar 

  7. Mordelet, F., Vert, J.-P.: A bagging SVM to learn from positive and unlabeled examples (2010)

    Google Scholar 

  8. Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML (2002)

    Google Scholar 

  9. Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1), S7 (2006)

    Google Scholar 

  10. Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., et al.: Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles. PLoS Biol. 5(1), e8 (2007), doi:10.1371/journal.pbio.0050008

    Google Scholar 

  11. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)

    Google Scholar 

  12. Statistics, L.B., Breiman, L.: Random Forests. Machine Learning, 5–32 (2001)

    Google Scholar 

  13. Näppi, J.J., Regge, D., Yoshida, H.: Comparative Performance of Random Forest and Support Vector Machine Classifiers for Detection of Colorectal Lesions in CT Colonography. In: Yoshida, H., Sakas, G., Linguraru, M.G. (eds.) Abdominal Imaging. LNCS, vol. 7029, pp. 27–34. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Tang, Y., Krasser, S., He, Y., Yang, W., Alperovitch, D.: Support Vector Machines and Random Forests Modeling for Spam Senders Behavior Analysis. In: Proceedings of IEEE Global Communications Conference (IEEE GLOBECOM 2008), Computer and Communications Network Security Symposium, New Orleans, LA (2008)

    Google Scholar 

  15. Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: Proceedings of the First Conference on Email and Anti-Spam, Mountain View, CA, USA (2004)

    Google Scholar 

  16. Faith, et al.: Many microbe microarrays database: uniformly normalized affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 36(Database issue), D866–D870 (2008), doi:10.1093/nar/gkr1088

    Google Scholar 

  17. Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., Jandrasits, C., Jimenez, R.C., Khadake, J., Mahadevan, U., Masson, P., Pedruzzi, I., Pfeiffenberger, E., Porras, P., Raghunath, A., Roechert, B., Orchard1, S., Hermjakob, H.: The IntAct molecular interaction database in 2012. Nucleic Acids Res. 40(1), D841–D846 (2011), doi:10.1093/nar/gkr1088

    Google Scholar 

  18. Witten, H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (October 1999), http://www.cs.waikato.ac.nz/ml/weka/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Doruk Pancaroglu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Pancaroglu, D., Tan, M. (2014). Improving Positive Unlabeled Learning Algorithms for Protein Interaction Prediction. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07581-5_10

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07580-8

  • Online ISBN: 978-3-319-07581-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics