Skip to main content

An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data

  • Conference paper
  • First Online:
Book cover Algorithms and Architectures for Parallel Processing (ICA3PP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

Abstract

Active learning can select most informative unlabeled samples to manually annotate to enlarge the training set. Many active learning methods have been proposed so far, most of them work for these data that have all classes of tagged data. A few methods work for positive and unlabeled data and the computational complexity of existing methods is particularly high and they can’t work well for big data. In this paper, we proposed an active learning approach that works well when only small number positive data are available in big data. We utilize data preprocessing to remove most of the outliers, so the density calculation is simplified relative to KNN algorithm, and our proposed sample selection strategy Min-Uncertainty Density (MDD) can help select more uncertain and higher density unlabeled samples with less computation. A combined semi-supervised learning active learning technique (MDD-SSAL) automatically annotating some confident unlabeled samples in the each iteration is proposed to reduce the number of manually annotated samples. Experimental results indicate that our proposed method is competitive with other similar methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2(1), 999–1006 (2001)

    MATH  Google Scholar 

  2. Wang, M., Hua, X.S.: Active learning in multimedia annotation and retrieval a survey. ACM Trans. Intell. Syst. Technol. 2(2), 1–21 (2011)

    Article  Google Scholar 

  3. Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught Learning (2007)

    Google Scholar 

  4. Xiaojin, Z.: Semi-supervised learning literature survey 37(1), 63–77 (2005)

    Google Scholar 

  5. Liu, B., Lee, W. S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: Nineteenth International Conference on Machine Learning, pp. 387–394. Morgan Kaufmann Publishers Inc. (2002)

    Google Scholar 

  6. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: International Joint Conference on Artificial Intelligence, pp. 587–592. Morgan Kaufmann Publishers Inc. (2003)

    Google Scholar 

  7. Ren, Y.F., Ji, D.H., Zhang, H.B.: Positive unlabeled learning for deceptive reviews detection. In: EMNLP, pp. 488–498 (2014)

    Google Scholar 

  8. Plessis, M.C.D., Niu, G., Sugiyama, M.: Convex formulation for learning from positive and unlabeled data, pp. 1386–1394 (2015)

    Google Scholar 

  9. Zhang, J., Wang, Z., Yuan, J., Tan, Y.P.: Positive and unlabeled learning for anomaly detection with multi-features, pp. 854–862. ACM (2017)

    Google Scholar 

  10. Gu, Y., Jin, Z., Chiu, S.C.: Active learning combining uncertainty and diversity for multi-class image classification. IET Comput. Vis. 9(3), 400–407 (2015)

    Article  Google Scholar 

  11. He, G., Li, Y., Zhao, W.: An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl.-Based Syst. 124, 8092 (2017)

    Google Scholar 

  12. Li, Y., He, G., Xia, X., Li, Y.: A reverse nearest neighbor based active semi-supervised learning method for multivariate time series classification. In: Hartmann, S., Ma, H. (eds.) DEXA 2016. LNCS, vol. 9827, pp. 272–286. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44403-1_17

    Chapter  Google Scholar 

  13. Zhu, J., Wang, H., Ma, M., Ma, M.: Active learning with sampling by uncertainty and density for data annotations. IEEE Trans. Audio Speech Lang. Process. 18(6), 1323–1331 (2010)

    Article  Google Scholar 

  14. Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 1936–1949 (2014)

    Article  Google Scholar 

  15. Guo, H., Wang, W.: An active learning-based SVM multi-class classification model. Pattern Recognit. 48(5), 1577–1597 (2015)

    Article  Google Scholar 

  16. Ghasemi, A., Rabiee, H.R., Fadaee, M., Manzuri, M.T., Rohban, M.H.: Active learning from positive and unlabeled data. In: IEEE, International Conference on Data Mining Workshops, pp. 244–250. IEEE (2012)

    Google Scholar 

  17. Seung, H.S., Opper, M., Sompolinsky.: Query by committee. In: Proceedings of the Fifth Workshop on Computational Learning Theory, vol. 284, pp. 287–294 (1992)

    Google Scholar 

  18. Hady, M.F.A., Schwenker, F.: Combining committee-based semi-supervised learning and active learning. J. Comput. Sci. Technol. 25(4), 681–698 (2010)

    Article  MathSciNet  Google Scholar 

  19. Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Fifteenth International Conference on Machine Learning, pp. 1–9. DBLP (1998)

    Google Scholar 

  20. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)

    Google Scholar 

  21. Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of Icml, pp. 208–215 (2015)

    Google Scholar 

  22. Wang, M., Min, F., Zhang, Z.H., Wu, Y.X.: Active learning through density clustering. Expert Syst. Appl. 85, 305–317 (2017)

    Article  Google Scholar 

  23. He, G., Duan, Y., Li, Y., Qian, T., He, J., Jia, X.: Active learning for multivariate time series classification with positive unlabeled data. In: IEEE International Conference on TOOLS with Artificial Intelligence, pp. 178–185. IEEE (2016)

    Google Scholar 

  24. http://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation

  25. http://archive.ics.uci.edu/ml/datasets/Diabetic+Retinopathy+Debrecen+Data+Set

Download references

Acknowledgments

Supported by the National Science and Technology Major Project (2018ZX03001019-003), the National Natural Science Foundation of China (Grant No.61372088).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Luo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, J., Zhou, W., Du, Y. (2018). An Active Learning Based on Uncertainty and Density Method for Positive and Unlabeled Data. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05051-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05050-4

  • Online ISBN: 978-3-030-05051-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics