Abstract
Data mining technology has attracted much attention in recent years and has shown extremely important and extensive prospects for application in the rapidly developing era of big data. According to the broad explanation given in Han et al. (Data Mining-Concepts and Techniques (3rd Edition), Morgan Kaufmann, 2012), data mining refers to the process of mining interesting patterns and knowledge from a large amount of data. The data sources include databases, data warehouses, the web, and other information repositories or data flowing into the system dynamically. Since this technology was originally proposed for the discovery and extraction of useful knowledge from databases, the term is usually written as knowledge discovery in databases (KDD).
This book introduces the methods and techniques of interest to users for mining patterns and knowledge from natural language texts. This technique is called text data mining, sometimes abbreviated as text mining. The text referred to here includes common TXT files, doc/docx files, PDF files, HTML files, and any other format of data file with language text as the main content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C. C. (2018). Machine learning for text. Berlin: Springer.
Cheng, X., & Zhu, Q. (2010). Text mining principles. Beijing: Science Press (in Chinese).
Han, J., Kamber, M., & Pei, J. (2012). Data mining-concepts and techniques (3rd ed.). Burlington: Morgan Kaufmann.
Inderjeet, M. (2001). Automatic summarization. Amsterdam: John Benjamins Publishing Co.
Li, H. (2019). Statistical machine learning (2nd ed.). Beijing: Tsinghua University Press (in Chinese).
Li, X., Dong, Y., & Li, J. (2010b). Data mining and knowledge discovering. Beijing: High Education Press (in Chinese).
Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. Berlin: Springer.
Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.
Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge: Cambridge University Press.
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159–165.
Mao, G., Duan, L., & Wang, S. (2007). Principles and algorithms on data mining. Beijing: Tsinghua University Press (in Chinese).
Marcu, D. (2000). The theory and practice of discorse parsing and summarization. Cambridge: MIT Press.
Petrov, S., & McDonald, R. (2012). Overview of the 2012 shared task on parsing the web. In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL).
Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.
Wu, X., Kumar, V., Ross, J., Joydeep, Q., Yang, G. Q., Motoda, H., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1–37.
Yu, J. (2017). Machine learning: From axiom to algorithm. Beijing: Tsinghua University Press (in Chinese).
Zhang, X. (2016). Pattern recognition (3rd ed.). Beijing: Tsinghua University Press (in Chinese).
Zhang, Z. (2014). Research and implementation of sentiment analysis methods on Chinese weibo. Master Thesis (in Chinese).
Zhou, Z. (2016). Machine learning. Beijing: Tsinghua University Press (in Chinese).
Zong, C. (2013). Statistical natural language processing (2nd ed.). Beijing: Tsinghua University Press (in Chinese).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Tsinghua University Press
About this chapter
Cite this chapter
Zong, C., Xia, R., Zhang, J. (2021). Introduction. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_1
Download citation
DOI: https://doi.org/10.1007/978-981-16-0100-2_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0099-9
Online ISBN: 978-981-16-0100-2
eBook Packages: Computer ScienceComputer Science (R0)