Skip to main content

Introduction

  • Chapter
  • First Online:
Text Data Mining

Abstract

Data mining technology has attracted much attention in recent years and has shown extremely important and extensive prospects for application in the rapidly developing era of big data. According to the broad explanation given in Han et al. (Data Mining-Concepts and Techniques (3rd Edition), Morgan Kaufmann, 2012), data mining refers to the process of mining interesting patterns and knowledge from a large amount of data. The data sources include databases, data warehouses, the web, and other information repositories or data flowing into the system dynamically. Since this technology was originally proposed for the discovery and extraction of useful knowledge from databases, the term is usually written as knowledge discovery in databases (KDD).

This book introduces the methods and techniques of interest to users for mining patterns and knowledge from natural language texts. This technique is called text data mining, sometimes abbreviated as text mining. The text referred to here includes common TXT files, doc/docx files, PDF files, HTML files, and any other format of data file with language text as the main content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www-nlpir.nist.gov/related_projects/muc/.

  2. 2.

    https://baike.baidu.com/item/ 中国图书馆图书分类法 /1919634?fr=aladdin.

  3. 3.

    https://www.sina.com.cn/.

References

  • Aggarwal, C. C. (2018). Machine learning for text. Berlin: Springer.

    Book  Google Scholar 

  • Cheng, X., & Zhu, Q. (2010). Text mining principles. Beijing: Science Press (in Chinese).

    Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2012). Data mining-concepts and techniques (3rd ed.). Burlington: Morgan Kaufmann.

    MATH  Google Scholar 

  • Inderjeet, M. (2001). Automatic summarization. Amsterdam: John Benjamins Publishing Co.

    MATH  Google Scholar 

  • Li, H. (2019). Statistical machine learning (2nd ed.). Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

  • Li, X., Dong, Y., & Li, J. (2010b). Data mining and knowledge discovering. Beijing: High Education Press (in Chinese).

    Google Scholar 

  • Liu, B. (2011). Web data mining: Exploring hyperlinks, contents, and usage data. Berlin: Springer.

    Book  Google Scholar 

  • Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1–167.

    Article  Google Scholar 

  • Liu, B. (2015). Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159–165.

    Article  MathSciNet  Google Scholar 

  • Mao, G., Duan, L., & Wang, S. (2007). Principles and algorithms on data mining. Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

  • Marcu, D. (2000). The theory and practice of discorse parsing and summarization. Cambridge: MIT Press.

    Book  Google Scholar 

  • Petrov, S., & McDonald, R. (2012). Overview of the 2012 shared task on parsing the web. In Notes of the First Workshop on Syntactic Analysis of Non-Canonical Language (SANCL).

    Google Scholar 

  • Sarawagi, S. (2008). Information extraction. Foundations and Trends in Databases, 1(3), 261–377.

    Article  Google Scholar 

  • Wu, X., Kumar, V., Ross, J., Joydeep, Q., Yang, G. Q., Motoda, H., et al. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14, 1–37.

    Article  Google Scholar 

  • Yu, J. (2017). Machine learning: From axiom to algorithm. Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

  • Zhang, X. (2016). Pattern recognition (3rd ed.). Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

  • Zhang, Z. (2014). Research and implementation of sentiment analysis methods on Chinese weibo. Master Thesis (in Chinese).

    Google Scholar 

  • Zhou, Z. (2016). Machine learning. Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

  • Zong, C. (2013). Statistical natural language processing (2nd ed.). Beijing: Tsinghua University Press (in Chinese).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Tsinghua University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zong, C., Xia, R., Zhang, J. (2021). Introduction. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0100-2_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0099-9

  • Online ISBN: 978-981-16-0100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics