Abstract
Text categorization is a task of text mining/analytics which involves extracting useful information from unstructured resources followed by categorizing these documents. In this paper, we classify the TechTC dataset collected from various Web directories. We employed feature selection methods such as Gini index, chi-square, t-statistic, correlation which drastically reduced the model building time. Various neural network models such as probabilistic neural network, group method of data handling, multi layer perceptron yielded higher accuracies compared to other techniques applied in literature.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chinta, P.M., Murty, M.N.: Discriminative feature analysis and selection for document classification. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012, Part I. LNCS, vol. 7663, pp. 366–374. Springer, Heidelberg (2012)
Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5. In: The 21st International Conference on Machine Learning (ICML), pp. 321–328, Banff, Alberta, Canada (2004)
Pandey, M., Ravi, V.: Detecting phishing emails using text and data mining. In: The Proceedings of International Conference on Computational Intelligence and Computing Research (ICCIC, 2012), pp. 249–254, Coimbatore, India (2012)
Sundarkumar, G.G., Ravi, V.: Malware detection by text and data mining. In: The Proceedings of International Conference on Computational Intelligence and Computing Research (ICCIC) (2013)
Pandey, M., Ravi, V.: Text and data mining to detect phishing websites and spam emails. SEMCCO 2, 559–573 (2013)
Quinlan, J.R.: Simplifying decision trees. Int. J. Man Mach. Stud. 27(3), 221–234 (1987)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. LS8-Report 23, Universität Dortmund (LS VIII-Report) (1997)
Rosenblatt, F.: Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books, Washington, D.C. (1961)
Ivakhnenko, A.G.: Heuristic self-organization in problems of engineering cybernetics. Automatica 6, 207–219 (1970)
Ivakhnenko, A.G.: Polynomial theory of complex system. IEEE Trans. Syst. Man Cybern. SMC-1(4):364–378 (1971)
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3, 109–118 (1990)
Gini, C.: Variability and Mutability, 156 p. C. Cuppini, Bologna (1912)
Helmert, F.R.: Mathematical and Physical Theories of Higher Geodesy, vol. 1. B. G. Teubner, Leipzig (1964)
Pearson, E.S.: Student - A Statistical Biography of William Sealy Gosset. Oxford University Press, Oxford (1990)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–4 (2002)
Feldman, R., Dagan, I.: Knowledge discovery in textual databases. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD-95, pp. 112–117. Montreal, Canada, 20–21 Aug 1995
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.): From data mining to knowledge discovery: an overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–36. MIT Press, Cambridge (1996)
Tan, A.H.: Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD-99 Workshop on Knowledge Discovery from Advanced Databases (1999)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Breiman, L.: Classification and Regression Trees. Chapman & Hall/CRC, London (1984)
http://nlp.stanford.edu/IR-book/html/htmledition/feature-selection-1.html
He, W., Zha, S., Li, L.: Social media competitive analysis and text mining: a case study in the pizza industry. Int. J. Inf. Manage. 33(3), 464–472 (2013)
Holt, J.D., Chung, S.M.: Efficient of mining rules in text databases. In: Eighth International Conference on Information and Knowledge Management, CIKM-99, pp. 234–242. ACM, New York, NY, USA (1999)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Seventh International Conference on Information and Knowledge Management, CIKM-98, pp. 148–155. ACM, New York, NY, USA (1998)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill Inc., New York (1986)
Joachims, T.: Text Categorization with support vector machines: learning with many relevant features. In: Proceedings of 10th European Conference on Machine Learning (ECML-98), pp. 137–142 (1998)
Brin, S., Davis, J., Garcia-Molina, H.: Copy Detection mechanisms for digital documents. In: SIGMOD’95, Proceedings of the International conference on Management of data, pp. 398–409 (1995)
Mena, J.: Investigative Data Mining for Security and Criminal Detection. Elsevier Science, Burlington (2003)
Zanasi, A.: Text Mining and Its Applications to Intelligence. CRM and Knowledge Management, WIT Press, Southampton, Boston (2007)
Aggarwal, C.C., Wang, H.: Text Mining in Social Networks, Social Network Data Analytics, pp. 353–378. Springer, New York (2011)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge Press, Cambridge (2007)
Klimt, B., Yang, Y.: The Enron corpus: a new dataset for email classification research. ECML 2004, 217–226 (2004)
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Salton, G., Buckley, C.: Term-weighting approaches in text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison-Wesley, Boston (2006)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–130 (1997)
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, vol. 12 (1995)
Kenney, D.A.: Correlation and Causality. Wiley (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Shravankumar, B., Ravi, V. (2015). Text Classification Using Ensemble Features Selection and Data Mining Techniques. In: Panigrahi, B., Suganthan, P., Das, S. (eds) Swarm, Evolutionary, and Memetic Computing. SEMCCO 2014. Lecture Notes in Computer Science(), vol 8947. Springer, Cham. https://doi.org/10.1007/978-3-319-20294-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-20294-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20293-8
Online ISBN: 978-3-319-20294-5
eBook Packages: Computer ScienceComputer Science (R0)