Bundles: A Framework to Optimise Topic Analysis in Real-Time Chat Discourse

Dunne, Jonathan; Malone, David; Penrose, Andrew

doi:10.1007/978-3-319-99504-5_6

Jonathan Dunne ORCID: orcid.org/0000-0002-2303-7792¹⁶,
David Malone¹⁶ &
Andrew Penrose¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11001))

Included in the following conference series:

International Conference on Collaboration and Technology

621 Accesses

Abstract

Collaborative chat tools and large text corpora are ubiquitous in today’s world of real-time communication. As micro teams and start-ups adopt such tools, there is a need to understand the meaning (even at a high level) of chat conversations within collaborative teams. In this study, we propose a technique to segment chat conversations to increase the number of words available (19% on average) for text mining purposes. Using an open source dataset, we answer the question of whether having more words available for text mining can produce more useful information to the end user. Our technique can help micro-teams and start-ups with limited resources to efficiently model their conversations to afford a higher degree of readability and comprehension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fitting Linear Models. http://bit.ly/2dvqYet
Texting Statistics (2015). http://bit.ly/2kjHeF8
Improving the Consumer E-commerce Experience Through Text Mining (2015). http://bit.ly/2z8eYyv
We Just Don’t Speak Anymore (2015). http://bit.ly/2yDXzJ6
Expect More Chatbots (2016). http://bit.ly/2z771cJ
How to Deal with Social Media Overwhelm (2016). http://bit.ly/2yN5e8r
Gain Business Insight with Big Data (2017). http://bit.ly/2zPxmcC
Qualitative Sample Size (2017). http://bit.ly/2hWeh3R
Social Messaging: Catalysing the Next Wave of Digital Revolution in Communication (2017). http://bit.ly/2FekIpz
Stopword Lists (2017). http://bit.ly/2jwKvDa
Ubuntu IRC Logs (2017). https://irclogs.ubuntu.com/
The Value and Benefits of Text Mining (2017). http://bit.ly/2zJcDcl
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
MATH Google Scholar
Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)
Article Google Scholar
Dale, E., Chall, J.S.: A formula for predicting readability: instructions. Educ. Res. Bull. 27, 37–54 (1948)
Google Scholar
Diao, Q., Jiang, J., Zhu, F., Lim, E.P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 536–544. Association for Computational Linguistics (2012)
Google Scholar
Galton, F.: Regression towards mediocrity in hereditary stature. J. Anthropol. Inst. Great Br. Irel. 15, 246–263 (1886)
Article Google Scholar
Gunning, R.: The Technique of Clear Writing. McGraw-Hill, New York (1952)
Google Scholar
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)
Google Scholar
Jivani, A.G., et al.: A comparative study of stemming algorithms. Int. J. Comput. Technol. Appl. 2(6), 1930–1938 (2011)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, vol. 3. Pearson, London (2014)
Google Scholar
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Google Scholar
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Dartmouth Publishing Group, London (1967)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2014)
Book Google Scholar
Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1–2), 22–31 (1968)
Google Scholar
Luhn, H.P.: Key word-in-context index for technical literature (KWIC index). J. Assoc. Inf. Sci. Technol. 11(4), 288–295 (1960)
Google Scholar
Manning, D.A.C.: Introduction. In: Manning, D.A.C. (ed.) Introduction to Industrial Minerals, pp. 1–16. Springer, Dordrecht (1995). https://doi.org/10.1007/978-94-011-1242-0_1
Chapter Google Scholar
Naveed, N., Gottron, T., Kunegis, J., Alhadi, A.C.: Searching microblogs: coping with sparsity and document quality. In: Proceedings of the 20th Acm International Conference on Information and Knowledge Management, pp. 183–188. ACM (2011)
Google Scholar
Schofield, A., Mimno, D.: Comparing apples to apple: the effects of stemmers on topic models. Trans. Assoc. Comput. Linguist. 4, 287–300 (2016)
Google Scholar
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: VS@ HLT-NAACL, pp. 192–200 (2015)
Google Scholar
Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 4, pp. 1106–1110. Association for Computational Linguistics (1992)
Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456. ACM (2013)
Google Scholar
Yan, X., Guo, J., Lan, Y., Xu, J., Cheng, X.: A probabilistic model for bursty topic discovery in microblogs. In: AAAI, pp. 353–359 (2015)
Google Scholar
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233–242. ACM (2014)
Google Scholar
Zuo, Y., et al.: Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114. ACM (2016)
Google Scholar

Download references

Acknlowdgements

The authors would like to personally thank the 24 individuals who took part in our topic modelling comprehension experiment.

Author information

Authors and Affiliations

Hamilton Institute, Maynooth University, Maynooth, Ireland
Jonathan Dunne & David Malone
IBM Technology Campus, Dublin, Ireland
Andrew Penrose

Authors

Jonathan Dunne
View author publications
You can also search for this author in PubMed Google Scholar
David Malone
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Penrose
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Dunne .

Editor information

Editors and Affiliations

Department of Computer Science, Faculty of Science and Technology, Universidade NOVA de Lisboa, Caparica, Portugal
Armanda Rodrigues
Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal
Benjamim Fonseca
Department of Computer Science, Faculty of Science and Technology, Universidade NOVA de Lisboa, Caparica, Portugal
Nuno Preguiça

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dunne, J., Malone, D., Penrose, A. (2018). Bundles: A Framework to Optimise Topic Analysis in Real-Time Chat Discourse. In: Rodrigues, A., Fonseca, B., Preguiça, N. (eds) Collaboration and Technology. CRIWG 2018. Lecture Notes in Computer Science(), vol 11001. Springer, Cham. https://doi.org/10.1007/978-3-319-99504-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-99504-5_6
Published: 08 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99503-8
Online ISBN: 978-3-319-99504-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics