Abstract
This study provides a positive-unlabeled learning model for extending a Vietnamese petroleum dictionary based on Vietnamese Wikipedia data. Machine learning algorithms with positive and unlabeled data together with separated and combined between Google similarity distance and Cosine similarity distance, used in this study. The data sources used to integrate are English - Vietnamese oil and gas dictionary and the Vietnamese Wikipedia. In the results, a novelty way for data integration with higher accuracy by using a combination of algorithms. The first Vietnamese oil and gas ontology was built in Vietnam. This ontology is a useful tool for staff in the oil and gas industry in training, research, search daily.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, T.C., Bich, P.M., et al.: English – Vietnamese Dictionary of Petroleum. The Science and Technics Publishing House, Ha Noi (1996)
Vietnamese Wikipedia page. https://vi.wikipedia.org/wiki/Wikipedia:Giới_thiệu. Accessed 15 Oct 2017
Khan, S.S., Madden, M.G.: A survey of recent trends in one class classification. In: Coyle, L., Freyne, J. (eds.) AICS 2009. LNCS (LNAI), vol. 6206, pp. 188–197. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17080-5_21
Khan, S.S., Madden, M.G.: One-class classification: taxonomy of study and review of techniques. Knowl. Eng. Rev. 29(03), 345–374 (2014)
Li, X.-L, Liu, B., Ng, S.-K.: Learning to identify unexpected instances in the test set. In: IJCAI, vol. 7 (2007)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Sig. Process. 9999, 215–249 (2014)
Yu, H., Han, J., Chang, K.C.-C.: PEBL web page classification without negative examples. IEEE Trans. Knowl. Data Eng. 16(1), 70–81 (2004)
Fung, G.P.C., Yu, J.X., Lu, H., Yu, P.S.: Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2006)
Noto, K., Saier, M.H., Elkan, C.: Learning to find relevant biological articles without negative training examples. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 202–213. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89378-3_20
Li, M., Pan, S., Zhang, Y., Cai, X.: Classifying networked text data with positive and unlabeled examples. Pattern Recogn. Lett. 77, 1–7 (2016)
Li, X.-L., Liu, B., Ng, S.-K.: Learning to classify documents with only a small positive training set. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 201–213. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_21
Li, X.-L, Yu, P.S., Liu, B., Ng, S.-K.: Positive unlabeled learning for data stream classification. In: SDM 2009, pp. 259–270 (2009)
Davoudi, H., Li, X.-L., Nhut, N.M., Krishnaswamy, S.P.: Activity recognition using a few label samples. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, Arbee L.P., Kao, Hung-Yu. (eds.) PAKDD 2014. LNCS (LNAI), vol. 8443, pp. 521–532. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06608-0_43
Xiao, Y., Liu, B., Yin, J., Cao, L., Zhang, C., Hao, Z.: Similarity-based approach for positive and unlabeled learning. In: IJCAI 2011, pp. 1577–1582 (2011)
Sansone, E.: Efficient training for positive unlabeled learning (2016). CoRR abs/1608.06807
Kiryo, R., Niu, G., du Plessis, M.C., Sugiyama, M.: Positive-unlabeled learning with non-negative risk estimator (2017). CoRR abs/1703.00593
Niu, G., du Plessis, M.C., Sakai, T., Ma, Y., Sugiyama, M.: Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: NIPS 2016, pp. 1199–1207 (2016)
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD 2008, pp. 213–220 (2008)
Li, H., Liu, B., Mukherjee, A., Shao, J.: Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas 18(3), 467–475 (2014)
Acknowledgements
This project has been done by the staffs of Vietnamese Petroleum Institute (VPI), Vietnam National Oil and Gas Group (PetroVietnam).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Vu, NT., Nguyen, QD., Nguyen, TD., Nguyen, MC., Vu, VV., Ha, QT. (2018). A Positive-Unlabeled Learning Model for Extending a Vietnamese Petroleum Dictionary Based on Vietnamese Wikipedia Data. In: Nguyen, N., Hoang, D., Hong, TP., Pham, H., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2018. Lecture Notes in Computer Science(), vol 10751. Springer, Cham. https://doi.org/10.1007/978-3-319-75417-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-75417-8_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75416-1
Online ISBN: 978-3-319-75417-8
eBook Packages: Computer ScienceComputer Science (R0)