A Graph-Based Approach to Topic Clustering of Tourist Attraction Reviews

Sirilertworakul, Nuttha; Yimwadsana, Boonsit

doi:10.1007/978-3-030-30275-7_26

Nuttha Sirilertworakul⁹ &
Boonsit Yimwadsana^9,10

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1078))

Included in the following conference series:

International Conference on Information and Software Technologies

940 Accesses

Abstract

A large volume of user reviews on tourist attractions can prohibit travel businesses from acquiring overall consumers’ expectations and consumers themselves from seeing the big picture and making thoughtful decisions on trip planning. Summarization of the reviews allows both parties to catch the main themes and underlying tones of the attractions. In this paper, we address the task of topic clustering, by applying a graph-based approach to group the reviews into clusters. To interpret the resulting review clusters, WordNet and Inverse Document Frequency (IDF) are utilized to extract keywords from each cluster which represents the topic. We evaluate the graph-based clustering approach against gold standard data annotated by human and the results are compared against Latent Dirichlet Allocation (LDA), a widely used algorithm for topic discovery. The approach is shown to be competitive to LDA in terms of clustering user reviews on tourist attractions. The graph-based approach, unlike LDA which requires the number of clusters as an input, can dynamically clusters the reviews into groups, revealing the number of clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Lexical database which measures the relatedness of terms.
2.
https://www.tripadvisor.com/.
3.
https://github.com/koadman/proxigenomics.
4.
Path similarity computes the shortest path between two word senses. Word senses are more similar when their path distance is closer to 1.
5.
BCubed is an evaluation metric which compares the resulting clusters generated by an algorithm with the gold standard clusters.
6.
There is a statistically significant difference between the two results if p-value is less than 0.05 (p < 0.05).
7.
https://cytoscape.org/.

References

Aker, A., et al.: A graph-based approach to topic clustering for online comments to news. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 15–29. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_2
Chapter Google Scholar
Alghamdi, R., Alfalqi, K.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6 (2015). https://doi.org/10.14569/IJACSA.2015.060121
Alkhodair, S.A., Fung, B.C.M., Rahman, O., Hung, P.C.K.: Improving interpretations of topic modeling in microblogs. J. Assoc. Inf. Sci. Technol. 69(4), 528–540 (2018). https://doi.org/10.1002/asi.23980
Article Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retrieval 12(4), 461–486 (2009). https://doi.org/10.1007/s10791-008-9066-8
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
MATH Google Scholar
DeMaere, M.Z., Darling, A.E.: Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C). PeerJ 4, e2676 (2016). https://doi.org/10.7717/peerj.2676
Article Google Scholar
van Dongen, S.M.: Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht, The Netherlands (2000). https://dspace.library.uu.nl/handle/1874/848
Dorow, B., Widdows, D.: Discovering corpus-specific word senses. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (2003). http://aclweb.org/anthology/E03-1020
Grant, C.E., George, C.P., Kanjilal, V., Nirkhiwale, S., Wilson, J.N., Wang, D.Z.: A topic-based search, visualization, and exploration system. In: FLAIRS Conference (2015)
Google Scholar
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: Proceedings of the National Academy of Sciences, vol. 101, pp. 5228–5235. National Academy of Sciences (2004). https://doi.org/10.1073/pnas.0307752101
Article Google Scholar
Holten, D., van Wijk, J.J.: Force-directed edge bundling for graph visualization. In: Proceedings of the 11th Eurographics/IEEE - VGTC Conference on Visualization, EuroVis 2009, pp. 983–998. The Eurographics Association and Wiley, Chichester (2009). https://doi.org/10.1111/j.1467-8659.2009.01450.x
Article Google Scholar
Ji, Z., Pi, H., Wei, W., Xiong, B., Woźniak, M., Damasevicius, R.: Recommendation based on review texts and social communities. A hybrid model. IEEE Access 7, 40416–40427 (2019). https://doi.org/10.1109/ACCESS.2019.2897586
Article Google Scholar
Jindal, V.: A personalized Markov clustering and deep learning approach for Arabic text categorization. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 145–151. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/P16-3022
Jurgens, D., Klapaftis, I.: SemEval-2013 task 13: word sense induction for graded and non-graded senses. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 290–299. Association for Computational Linguistics (2013). http://aclweb.org/anthology/S13-2049
Litvin, S., Hoffman, L.M.: Responses to consumer-generated media in the hospitality marketplace: an empirical study. J. Vacation Mark. 18, 135–145 (2012). https://doi.org/10.1177/1356766712443467
Article Google Scholar
Llewellyn, C., Grover, C., Oberlander, J.: Improving topic model clustering of newspaper comments for summarisation. In: Proceedings of the ACL 2016 Student Research Workshop, pp. 43–50. Association for Computational Linguistics, Berlin, August 2016. http://anthology.aclweb.org/P/P16/P16-3007
Phuong, D.V., Phuong, T.M.: A keyword-topic model for contextual advertising. In: Proceedings of the Third Symposium on Information and Communication Technology, SoICT 2012, pp. 63–70 (2012). https://doi.org/10.1145/2350716.2350728
Satuluri, V., Parthasarathy, S.: Scalable graph clustering using stochastic flows: applications to community discovery. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 737–746. ACM, New York (2009). https://doi.org/10.1145/1557019.1557101
Satuluri, V., Parthasarathy, S., Ucar, D.: Markov clustering of protein interaction networks with improved balance and scalability. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 2010, pp. 247–256. ACM, New York (2010). https://doi.org/10.1145/1854776.1854812
Shih, Y.K., Parthasarathy, S.: Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 28(18), i473–i479 (2012). https://doi.org/10.1093/bioinformatics/bts370
Article Google Scholar

Download references

Acknowledgments

This research project was supported by Faculty of Information and Communication Technology, Mahidol University. The study was carried out under the research framework of Mahidol University.

Author information

Authors and Affiliations

Faculty of Information and Communication Technology, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand
Nuttha Sirilertworakul & Boonsit Yimwadsana
Integrated Computational BioScience Center, Office of the President, Mahidol University, 999 Phuttamonthon 4 Road, Salaya, 73170, Nakhon Pathom, Thailand
Boonsit Yimwadsana

Authors

Nuttha Sirilertworakul
View author publications
You can also search for this author in PubMed Google Scholar
Boonsit Yimwadsana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nuttha Sirilertworakul .

Editor information

Editors and Affiliations

Kaunas University of Technology, Kaunas, Lithuania
Robertas Damaševičius
Kaunas University of Technology, Kaunas, Lithuania
Giedrė Vasiljevienė

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sirilertworakul, N., Yimwadsana, B. (2019). A Graph-Based Approach to Topic Clustering of Tourist Attraction Reviews. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2019. Communications in Computer and Information Science, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-030-30275-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-30275-7_26
Published: 03 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30274-0
Online ISBN: 978-3-030-30275-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics