Abstract
A large volume of user reviews on tourist attractions can prohibit travel businesses from acquiring overall consumers’ expectations and consumers themselves from seeing the big picture and making thoughtful decisions on trip planning. Summarization of the reviews allows both parties to catch the main themes and underlying tones of the attractions. In this paper, we address the task of topic clustering, by applying a graph-based approach to group the reviews into clusters. To interpret the resulting review clusters, WordNet and Inverse Document Frequency (IDF) are utilized to extract keywords from each cluster which represents the topic. We evaluate the graph-based clustering approach against gold standard data annotated by human and the results are compared against Latent Dirichlet Allocation (LDA), a widely used algorithm for topic discovery. The approach is shown to be competitive to LDA in terms of clustering user reviews on tourist attractions. The graph-based approach, unlike LDA which requires the number of clusters as an input, can dynamically clusters the reviews into groups, revealing the number of clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Lexical database which measures the relatedness of terms.
- 2.
- 3.
- 4.
Path similarity computes the shortest path between two word senses. Word senses are more similar when their path distance is closer to 1.
- 5.
BCubed is an evaluation metric which compares the resulting clusters generated by an algorithm with the gold standard clusters.
- 6.
There is a statistically significant difference between the two results if p-value is less than 0.05 (p < 0.05).
- 7.
References
Aker, A., et al.: A graph-based approach to topic clustering for online comments to news. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 15–29. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_2
Alghamdi, R., Alfalqi, K.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6 (2015). https://doi.org/10.14569/IJACSA.2015.060121
Alkhodair, S.A., Fung, B.C.M., Rahman, O., Hung, P.C.K.: Improving interpretations of topic modeling in microblogs. J. Assoc. Inf. Sci. Technol. 69(4), 528–540 (2018). https://doi.org/10.1002/asi.23980
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009). https://doi.org/10.1007/s10791-008-9066-8
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
DeMaere, M.Z., Darling, A.E.: Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C). PeerJ 4, e2676 (2016). https://doi.org/10.7717/peerj.2676
van Dongen, S.M.: Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, The Netherlands (2000). https://dspace.library.uu.nl/handle/1874/848
Dorow, B., Widdows, D.: Discovering corpus-specific word senses. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (2003). http://aclweb.org/anthology/E03-1020
Grant, C.E., George, C.P., Kanjilal, V., Nirkhiwale, S., Wilson, J.N., Wang, D.Z.: A topic-based search, visualization, and exploration system. In: FLAIRS Conference (2015)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences, vol. 101, pp. 5228–5235. National Academy of Sciences (2004). https://doi.org/10.1073/pnas.0307752101
Holten, D., van Wijk, J.J.: Force-directed edge bundling for graph visualization. In: Proceedings of the 11th Eurographics/IEEE - VGTC Conference on Visualization, EuroVis 2009, pp. 983–998. The Eurographics Association and Wiley, Chichester (2009). https://doi.org/10.1111/j.1467-8659.2009.01450.x
Ji, Z., Pi, H., Wei, W., Xiong, B., Woźniak, M., Damasevicius, R.: Recommendation based on review texts and social communities. A hybrid model. IEEE Access 7, 40416–40427 (2019). https://doi.org/10.1109/ACCESS.2019.2897586
Jindal, V.: A personalized Markov clustering and deep learning approach for Arabic text categorization. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 145–151. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/P16-3022
Jurgens, D., Klapaftis, I.: SemEval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 290–299. Association for Computational Linguistics (2013). http://aclweb.org/anthology/S13-2049
Litvin, S., Hoffman, L.M.: Responses to consumer-generated media in the hospitality marketplace: an empirical study. J. Vacation Mark. 18, 135–145 (2012). https://doi.org/10.1177/1356766712443467
Llewellyn, C., Grover, C., Oberlander, J.: Improving topic model clustering of newspaper comments for summarisation. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 43–50. Association for Computational Linguistics, Berlin, August 2016. http://anthology.aclweb.org/P/P16/P16-3007
Phuong, D.V., Phuong, T.M.: A keyword-topic model for contextual advertising. In: Proceedings of the Third Symposium on Information and Communication Technology, SoICT 2012, pp. 63–70 (2012). https://doi.org/10.1145/2350716.2350728
Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 737–746. ACM, New York (2009). https://doi.org/10.1145/1557019.1557101
Satuluri, V., Parthasarathy, S., Ucar, D.: Markov clustering of protein interaction networks with improved balance and scalability. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 247–256. ACM, New York (2010). https://doi.org/10.1145/1854776.1854812
Shih, Y.K., Parthasarathy, S.: Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 28(18), i473–i479 (2012). https://doi.org/10.1093/bioinformatics/bts370
Acknowledgments
This research project was supported by Faculty of Information and Communication Technology, Mahidol University. The study was carried out under the research framework of Mahidol University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sirilertworakul, N., Yimwadsana, B. (2019). A Graph-Based Approach to Topic Clustering of Tourist Attraction Reviews. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2019. Communications in Computer and Information Science, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-030-30275-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-30275-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30274-0
Online ISBN: 978-3-030-30275-7
eBook Packages: Computer ScienceComputer Science (R0)