Estimating the $$F_1$$ Score for Learning from Positive and Unlabeled Examples

Tabatabaei, Seyed Amin; Klein, Jan; Hoogendoorn, Mark

doi:10.1007/978-3-030-64583-0_15

Seyed Amin Tabatabaei^16,17,
Jan Klein¹⁸ &
Mark Hoogendoorn¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12565))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1581 Accesses
1 Citations

Abstract

Semi-supervised learning can be applied to datasets that contain both labeled and unlabeled instances and can result in more accurate predictions compared to fully supervised or unsupervised learning in case limited labeled data is available. A subclass of problems, called Positive-Unlabeled (PU) learning, focuses on cases in which the labeled instances contain only positive examples. Given the lack of negatively labeled data, estimating the general performance is difficult. In this paper, we propose a new approach to approximate the $F_1$ score for PU learning. It requires an estimate of what fraction of the total number of positive instances is available in the labeled set. We derive theoretical properties of the approach and apply it to several datasets to study its empirical behavior and to compare it to the most well-known score in the field, LL score. Results show that even when the estimate is quite off compared to the real fraction of positive labels the approximation of the $F_1$ score is significantly better compared with the LL score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All code is available on Github: https://github.com/SEYED7037/PU-Learning-Estimating-F1-LOD2020-.

References

Bekker, J., Davis, J.: Learning from positive and unlabeled data: A survey. arXiv preprint arXiv:1811.04820 (2018)
Denis, F., Gilleron, R., Tommasi, M.: Text classification from positive and unlabeled examples (2002)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017) http://archive.ics.uci.edu/ml
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 213–220. ACM (2008)
Google Scholar
Lee, W.S., Liu, B.: Learning with positive and unlabeled examples using weighted logistic regression. ICML. 3, 448–455 (2003)
Google Scholar
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. IJCAI. 3, 587–592 (2003)
Google Scholar
Liu, B.: Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, Berlin (2007)
MATH Google Scholar
Liu, B., Lee, W.S., Yu, P.S., Li, X.: Partially supervised classification of text documents. In: ICML. vol. 2, pp. 387–394. Citeseer (2002)
Google Scholar
Skala, M.: Hypergeometric tail inequalities: ending the insanity. arXiv preprint arXiv:1311.5939 (2013)
Tabatabaei, S.A., Lu, X., Hoogendoorn, M., Reijers, H.A.: Identifying patient groups based on frequent patterns of patient samples. arXiv preprint arXiv:1904.01863 (2019)
Wilcoxon, F.: Some rapid approximate statistical procedures. Ann. New York Acad. Sci. 52(1), 808–814 (1950)
Article Google Scholar
Zhao, Y., Kong, X., Philip, S.Y.: Positive and unlabeled learning for graph classification. In: 2011 IEEE 11th International Conference on Data Mining. pp. 962–971. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
Seyed Amin Tabatabaei & Mark Hoogendoorn
Elsevier B.V., Amsterdam, Netherlands
Seyed Amin Tabatabaei
Centrum Wiskunde & Informatica, Amsterdam, Netherlands
Jan Klein

Authors

Seyed Amin Tabatabaei
View author publications
You can also search for this author in PubMed Google Scholar
Jan Klein
View author publications
You can also search for this author in PubMed Google Scholar
Mark Hoogendoorn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyed Amin Tabatabaei .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
University of Reading, Reading, UK
Varun Ojha
University of Oxford, Oxford, UK
Emanuele La Malfa
University of Cambridge, Cambridge, UK
Giorgio Jansen
Almawave, Rome, Italy
Vincenzo Sciacca
University of Florida, Gainesville, FL, USA
Panos Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Harvard University, Cambridge, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabatabaei, S.A., Klein, J., Hoogendoorn, M. (2020). Estimating the $F_1$ Score for Learning from Positive and Unlabeled Examples. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science(), vol 12565. Springer, Cham. https://doi.org/10.1007/978-3-030-64583-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-64583-0_15
Published: 08 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64582-3
Online ISBN: 978-3-030-64583-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Estimating the \(F_1\) Score for Learning from Positive and Unlabeled Examples

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Estimating the \(F_1\) Score for Learning from Positive and Unlabeled Examples

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation