Skip to main content

Randomspace-Based Fuzzy C-Means for Topic Detection on Indonesia Online News

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2019)

Abstract

Topic detection is a process used to analyze words in a collection of textual data to determine the topics in the collection, how they relate to each other, and how they change from time to time. Fuzzy C-Means (FCM) and Kernel-based Fuzzy C-Means (KFCM) method are clustering method that is often used in topic detection problems. Both FCM and KFCM can group dataset into multiple clusters on a low-dimensional dataset, but fail on high-dimensional dataset. To overcome this problem, dimension reduction is carried out on the dataset before topic detection is carried out using the FCM or KFCM method. In this study, the national news account’s tweets dataset on Twitter were used for topic detection using the Randomspace-based Fuzzy C-Means (RFCM) method and Kernelized Randomspace-based Fuzzy C-Means (KRFCM) method. The RFCM and KRFCM learning methods are divided into two steps, which are reducing the dimension of the dataset into a lower-dimensional dataset using random projection and conducting the FCM learning method on the RFCM and the KFCM learning method on KRFCM. After obtaining the topics, then an evaluation is carried out by calculating the coherence value on the topics. The coherence value used in this study uses the Pointwise Mutual Information (PMI) unit. The study was conducted by comparing the average PMI values of RFCM and KRFCM with Eigenspace-based Fuzzy C-Means (EFCM) and Kernelized Eigenspace-based Fuzzy C-Means (KRFCM). The results obtained using national news account’s tweets showed that the RFCM and KRFCM methods offered faster running time for a dimensional reduction but had smaller average PMI values compared to the average PMI values generated by the EFCM and KEFCM learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org/stable/modules/decomposition.html.

References

  1. Xie, W., Zhu, F., Jiang, J., Lim, E.-P., Wang, K.: Topic sketch: real-time bursty topic detection from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)

    Article  Google Scholar 

  2. Craig, T., Ludloff, E.M.: Privacy and Big Data. O’Reilly Media Inc., Sebastopol (2011)

    Google Scholar 

  3. Aiello, L.M., et al.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). https://doi.org/10.1109/TMM.2013.2265080

    Article  Google Scholar 

  4. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  5. Petkos, G., Papadopoulos, S., Kompatsiaris, Y.: Two-level message clustering for topic detection in Twitter. In: Proceedings of the SNOW 2014 Data Challenge, Seoul, Korea, 8 April 2014 (2014)

    Google Scholar 

  6. Nur’aini, K., Najahaty, I., Hidayati, L., Murfi, H., Nurrohmah, S.: Combination of singular value decomposition and k-means clustering method for topic detection on Twitter. In: Proceedings of International Conference on Advanced Computer Science and Information System, Depok, Indonesia, 10–11 October 2015 (2015)

    Google Scholar 

  7. Fitriyani, S.R., Murfi, H.: The k-means with mini batch algorithm for topics detection on online news. In: Proceedings of the 4th International Conference on Information and Communication Technology, Bandung, Indonesia, 25–27 May 2016 (2016)

    Google Scholar 

  8. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Platinum Press, New York (1981)

    Book  Google Scholar 

  9. Daniel, G., Witold, P.: Kernel-based fuzzy clustering and fuzzy clustering: a comparative experimental study. Fuzzy Sets Syst. 161(3), 522–543 (2010). https://doi.org/10.1016/j.fss.2009.10.021

    Article  MathSciNet  Google Scholar 

  10. Winkler, R., Klawonn, F., Kruse, R.: Fuzzy c means in high dimensional spaces. Int. J. Fuzzy Syst. Appl. 1, 1–16 (2011)

    Article  Google Scholar 

  11. Muliawati, T., Murfi, H.: Eigenspace-based fuzzy c-means for sensing trending topics in Twitter. In: AIP Conference Proceedings, vol. 1862, no. 1, July 2017. http://doi.org/10.1063/1.4991244

  12. Murfi, H.: The accuracy of fuzzy c-means in lower-dimensional space for topic detection. In: Qiu, M. (ed.) SmartCom 2018. LNCS, vol. 11344, pp. 321–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05755-8_32

    Chapter  Google Scholar 

  13. Prakoso, Y., Murfi, H., Wibowo, A.: Kernelized eigenspace based fuzzy C means for sensing trending topics on Twitter. In: Proceedings of the International Conference on Data Science and Information Technology, Singapore (2018)

    Google Scholar 

  14. Vu, K.K.: Random projection for high-dimensional optimization. Optimization and Control. Université Paris-Saclay. English (2016)

    Google Scholar 

  15. Johnson, W.B., Lindenstrauss, J.: Extensions of Lipshitz mapping into Hilbert space. In: Conference in Modern Analysis and Probability. Contemporary Mathematics, vol. 26, pp. 189–206. American Mathematical Society (1984)

    Google Scholar 

  16. Manning, C.D., Schuetze, H., Raghavan, P.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  17. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL Interact. Present. Sess, pp. 69–72 (2006)

    Google Scholar 

  18. Bingham, E., Mannila, H.: Random projection in dimensionality reduction. In: Proceeding of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2001)

    Google Scholar 

Download references

Acknowledgment

This work was supported by Universitas Indonesia under PIT 9 2019 grant. Any opinions, findings, and conclusions or recommendations are the authors’ and do not necessarily reflect those of the sponsor.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Rifky Yusdiansyah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yusdiansyah, M.R., Murfi, H., Wibowo, A. (2019). Randomspace-Based Fuzzy C-Means for Topic Detection on Indonesia Online News. In: Chamchong, R., Wong, K. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2019. Lecture Notes in Computer Science(), vol 11909. Springer, Cham. https://doi.org/10.1007/978-3-030-33709-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33709-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33708-7

  • Online ISBN: 978-3-030-33709-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics