Abstract
Topic detection is a process used to analyze words in a collection of textual data to determine the topics in the collection, how they relate to each other, and how they change from time to time. Fuzzy C-Means (FCM) and Kernel-based Fuzzy C-Means (KFCM) method are clustering method that is often used in topic detection problems. Both FCM and KFCM can group dataset into multiple clusters on a low-dimensional dataset, but fail on high-dimensional dataset. To overcome this problem, dimension reduction is carried out on the dataset before topic detection is carried out using the FCM or KFCM method. In this study, the national news account’s tweets dataset on Twitter were used for topic detection using the Randomspace-based Fuzzy C-Means (RFCM) method and Kernelized Randomspace-based Fuzzy C-Means (KRFCM) method. The RFCM and KRFCM learning methods are divided into two steps, which are reducing the dimension of the dataset into a lower-dimensional dataset using random projection and conducting the FCM learning method on the RFCM and the KFCM learning method on KRFCM. After obtaining the topics, then an evaluation is carried out by calculating the coherence value on the topics. The coherence value used in this study uses the Pointwise Mutual Information (PMI) unit. The study was conducted by comparing the average PMI values of RFCM and KRFCM with Eigenspace-based Fuzzy C-Means (EFCM) and Kernelized Eigenspace-based Fuzzy C-Means (KRFCM). The results obtained using national news account’s tweets showed that the RFCM and KRFCM methods offered faster running time for a dimensional reduction but had smaller average PMI values compared to the average PMI values generated by the EFCM and KEFCM learning methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topic sketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
Craig, T., Ludloff, E.M.: Privacy and Big Data. O’Reilly Media Inc., Sebastopol (2011)
Aiello, L.M., et al.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). https://doi.org/10.1109/TMM.2013.2265080
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Petkos, G., Papadopoulos, S., Kompatsiaris, Y.: Two-level message clustering for topic detection in Twitter. In: Proceedings of the SNOW 2014 Data Challenge, Seoul, Korea, 8 April 2014 (2014)
Nur’aini, K., Najahaty, I., Hidayati, L., Murfi, H., Nurrohmah, S.: Combination of singular value decomposition and k-means clustering method for topic detection on Twitter. In: Proceedings of International Conference on Advanced Computer Science and Information System, Depok, Indonesia, 10–11 October 2015 (2015)
Fitriyani, S.R., Murfi, H.: The k-means with mini batch algorithm for topics detection on online news. In: Proceedings of the 4th International Conference on Information and Communication Technology, Bandung, Indonesia, 25–27 May 2016 (2016)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Platinum Press, New York (1981)
Daniel, G., Witold, P.: Kernel-based fuzzy clustering and fuzzy clustering: a comparative experimental study. Fuzzy Sets Syst. 161(3), 522–543 (2010). https://doi.org/10.1016/j.fss.2009.10.021
Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c means in high dimensional spaces. Int. J. Fuzzy Syst. Appl. 1, 1–16 (2011)
Muliawati, T., Murfi, H.: Eigenspace-based fuzzy c-means for sensing trending topics in Twitter. In: AIP Conference Proceedings, vol. 1862, no. 1, July 2017. http://doi.org/10.1063/1.4991244
Murfi, H.: The accuracy of fuzzy c-means in lower-dimensional space for topic detection. In: Qiu, M. (ed.) SmartCom 2018. LNCS, vol. 11344, pp. 321–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05755-8_32
Prakoso, Y., Murfi, H., Wibowo, A.: Kernelized eigenspace based fuzzy C means for sensing trending topics on Twitter. In: Proceedings of the International Conference on Data Science and Information Technology, Singapore (2018)
Vu, K.K.: Random projection for high-dimensional optimization. Optimization and Control. Université Paris-Saclay. English (2016)
Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. In: Conference in Modern Analysis and Probability. Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society (1984)
Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL Interact. Present. Sess, pp. 69–72 (2006)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction. In: Proceeding of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2001)
Acknowledgment
This work was supported by Universitas Indonesia under PIT 9 2019 grant. Any opinions, findings, and conclusions or recommendations are the authors’ and do not necessarily reflect those of the sponsor.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yusdiansyah, M.R., Murfi, H., Wibowo, A. (2019). Randomspace-Based Fuzzy C-Means for Topic Detection on Indonesia Online News. In: Chamchong, R., Wong, K. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2019. Lecture Notes in Computer Science(), vol 11909. Springer, Cham. https://doi.org/10.1007/978-3-030-33709-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-33709-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33708-7
Online ISBN: 978-3-030-33709-4
eBook Packages: Computer ScienceComputer Science (R0)