Skip to main content

Recent Advances in Data Mining for Categorizing Text Records

  • Chapter

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

Abstract

In a world with highly competitive markets, there is a great need in almost all business organizations to develop a highly effective coordination and decision support tool that can be used to become a daily life predictive enterprise to direct, optimize and automate specific decision-making processes. The improved decision-making support can help people to examine data on the past circumstances and present events, as well as project future actions, which will continually improve the quality of products or services. Such improvement has been driven by recent advances in digital data collection and storage technology. The new technology in data collection has resulted in the growth of massive databases, also known as data avalanches. These rapidly growing databases occur in various applications including service industry, global supply chain organizations, air traffic control, nuclear reactors, aircraft fly-by-wire, real time sensor networks, industrial process control, hospital healthcare, and security systems. The massive data, especially text records, on one hand, may contain a great wealth of knowledge and information, but on the other hand, contain other information that may not be reliable due to many uncertainty reasons in our changing environments. However, manually classifying thousands of text records according to their contents can be demanding and overwhelming. Data mining has gained a lot of attention from researchers and practitioners over the past decade as an emerging research area in finding meaningful patterns to make sense out of massive data sets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cerrito P, Cerrito JC (2006) Data and text mining the electronic medical record to improve care and to lower costs. SAS SUGI Proceedings paper 077–31

    Google Scholar 

  2. Duda RO, Hart PE, Stork DG (2001) Pattern Classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  3. Myllymaki P, Silander T, Tirri H, Uronen P (2001) Bayesian data mining on the web with B-Course. Proceedings of the 1st IEEE International Conference on Data Mining (ICDM-2001), pp. 626–629

    Google Scholar 

  4. Frand J (1996) Data mining: what is data mining? www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

    Google Scholar 

  5. Liu B, Grossman R, Zhai Y (2003) Mining data records in web pages. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2003), pp. 601–606

    Google Scholar 

  6. Myatt GJ (2006) Making Sense of Data: a Practical Guide to Exploratory Data Analysis and Data Mining. Wiley, New York

    Google Scholar 

  7. Dagli CH, Lee H-C (1997) Impacts of data mining technology on product design and planning. In: Plonka F, Olling G (eds) Computer applications in production and engineering. Chapman and Hall, Detroit, Michigan, pp. 58–7

    Google Scholar 

  8. Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 130–136

    Google Scholar 

  9. Han J, Kamber M (2006) Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann/Elsevier, USA

    Google Scholar 

  10. Berson A, Smith S, Thearling K (1999) Building Data Mining Applications for CRM. McGraw-Hill, New York

    MATH  Google Scholar 

  11. Yuan Y, Shaw MJ (1995) Induction of fuzzy decision trees. Fuzzy Sets and Systems 69:125–139

    Article  MathSciNet  Google Scholar 

  12. Hand DJ, Mannila H, Smyth P (2000) Principles of Data Mining. MIT Press, Mass., USA

    Google Scholar 

  13. Hartigan J (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  14. Fan H, Ramamohanarao K (2003) A Bayesian approach to use emerging patterns for classification. Proceedings of the 14th Australasian Database Conference, Adelaide, Australia, pp. 39–48

    Google Scholar 

  15. Schmidt M (1996) Identifying Speaker with Support Vector Networks. Proceedings of Interface, Sydney

    Google Scholar 

  16. Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data: identification and analysis of coexpressed genes. Genome Research 9:1106–1115

    Article  Google Scholar 

  17. von Ahsen N, Oellerich M, Armstrong VW, Schütz E (1999) Application of a thermodynamic nearest-neighbor model to estimate nucleic acid stability and optimize probe design: prediction of melting points of multiple mutations of apolipoprotein B-3500 and factor V with a hybridization probe genotyping assay on the LightCycler. Clinical Chemistry 45:2094–2101

    Google Scholar 

  18. Bishop CM (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford

    Google Scholar 

  19. Zeitouni K, Chelghoum N (2001) Spatial decision tree-application to traffic risk analysis. Computer Systems and Applications, ACS/IEEE International Conference, pp. 203–207

    Google Scholar 

  20. Ismail S, Manan bin Ahmad A (2004) Recurrent neural network with backpropagation through time algorithm for arabic recognition. IEEE International Symposium on Communications and Information Technology (ISCIT-2004), pp. 98–102

    Google Scholar 

  21. Kehtarnavaz N, Griswold N, Miller K, Lescoe P (1998) A transportable neural-network approach to autonomous vehicle following. IEEE Transactions on Vehicular Technology 47:694–702

    Article  Google Scholar 

  22. Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1:23–34

    Article  Google Scholar 

  23. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2:121–167

    Article  Google Scholar 

  24. Scholkopf B, Burges C, Vapnik V (1995) Extracting support data for a given task. Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining 1995, AAAI Press, Mass., USA, pp. 252–257

    Google Scholar 

  25. Blanz V, Scholkopf B, Bulthoff H et al. (1996) Comparison of view-based object recognition algorithms using realistic 3d models. Springer Lecture Notes in Computer Science 1112:251–256

    Google Scholar 

  26. Joachims T (1997) Text categorization with support vector machines. Technical report, LS VIII Number 23, University of Dortmund, ftp://ftp-ai.informatik.uni-dortmund.de/pub/Reports/report23.ps.Z

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer London

About this chapter

Cite this chapter

Chaovalitwongse, W., Pham, H., Hwang, S., Liang, Z., Pham, C. (2008). Recent Advances in Data Mining for Categorizing Text Records. In: Pham, H. (eds) Recent Advances in Reliability and Quality in Design. Springer Series in Reliability Engineering. Springer, London. https://doi.org/10.1007/978-1-84800-113-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-113-8_21

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-112-1

  • Online ISBN: 978-1-84800-113-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics