Skip to main content

Weighting of Features by Sequential Selection

  • Chapter
  • First Online:
Feature Selection for Data and Pattern Recognition

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

Abstract

Constructing a set with characteristic features for supervised classification is a task which can be considered as preliminary for the intended purpose, just a step to take on the way, yet with its significance and bearing on the outcome, the level of difficulty and computational costs involved, the problem has evolved in time to constitute by itself a field of intense study. We can use statistics, available expert domain knowledge, specialised procedures, analyse the set of all accessible features and reduce them backward, we can examine them one by one and select them forward. The process of sequential selection can be conditioned by the performance of a classification system, while exploiting a wrapper model, and the observations with respect to selected variables can result in assignment of weights and ranking. The chapter illustrates weighting of features with the procedures of sequential backward and forward selection for rule and connectionist classifiers employed in the stylometric task of authorship attribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical Report C-1997-23, Department of Computer Science, University of Helsinki, Finland (1997)

    Google Scholar 

  2. Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)

    Google Scholar 

  3. Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)

    Google Scholar 

  4. Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  5. Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)

    Google Scholar 

  6. Berber Sardinha, T.: Using key words in text analysis: practical aspects (1999). Available on-line from ftp://ftp.liv.ac.uk/pub/linguistics

  7. Burrows, J.: Textual analysis. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)

    Google Scholar 

  8. Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)

    Google Scholar 

  9. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  10. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151, 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)

    Book  Google Scholar 

  12. Greco, S., Matarazzo, B., SƂowiñski, R.: Advances in multiple criteria decision making. In: Gal, T., Hanne, T., Stewart, T. (eds.) The Use of Rough Sets and Fuzzy Sets in Multi Criteria Decision Making Chap. 14, pp. 14.1–14.59. Kluwer Academic Publishers, Boston (1999)

    Google Scholar 

  13. Greco, S., Matarazzo, B., SƂowiƄski, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)

    Article  MATH  Google Scholar 

  14. Greco, S., Matarazzo, B., SƂowiƄski, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets 7, 36–52 (2007)

    Google Scholar 

  15. Greco, S., SƂowiƄski, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. Lect. Notes Artif. Intell. 4482, 314–321 (2007)

    Google Scholar 

  16. Greco, S., SƂowiƄski, R., Stefanowski, J., Ć»urawski, M.: Incremental versus non-incremental rule induction for multicriteria classification. Trans. Rough Sets 2, 33–53 (2004)

    Google Scholar 

  17. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  18. Jelonek, J., Krawiec, K., Stefanowski, J.: Comparative study of feature subset selection techniques for machine learning tasks. In: Proceedings of the 7th Workshop on Intelligent Information Systems (1998)

    Google Scholar 

  19. Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)

    Book  Google Scholar 

  20. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)

    Google Scholar 

  21. Kavzoglu, T., Mather, P.: Assessing artificial neural network pruning algorithms. In: Proceedings of the 24th Annual Conference and Exhibition of the Remote Sensing Society, pp. 603–609. Greenwich (2011)

    Google Scholar 

  22. Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)

    Article  Google Scholar 

  23. Kingston, G., Maier, H., Lambert, M.: A statistical input pruning method for artificial neural networks used in environmental modelling. In: Transactions of the 2nd Biennial Meeting of the International Environmental Modelling and Software Society, pp. 87–92. Osnabrueck (2004)

    Google Scholar 

  24. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2008)

    Google Scholar 

  25. Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)

    Google Scholar 

  26. Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets 6, 211–246 (2006)

    Google Scholar 

  27. Moshkow, M., Skowron, A., Suraj, Z.: On covering attribute sets by reducts. In: Kryszkiewicz, M., Peters, J., Rybinski, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Paradigms. LNCS (LNAI), vol. 4585, pp. 175–180. Springer, Berlin (2007)

    Chapter  Google Scholar 

  28. Munro, R.: A Queing-theory model of word frequency distributions. In: Proceedings of the 1st Australasian Language Technology Workshop, pp. 1–8. Melbourne (2003)

    Google Scholar 

  29. Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  30. Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  31. Peng, R.: Statistical aspects of literary style. Bachelor’s Thesis, Yale University (1999)

    Google Scholar 

  32. Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)

    Article  MathSciNet  Google Scholar 

  33. Shen, Q.: Rough feature selection for intelligent classifiers. Trans. Rough Sets 7, 244–255 (2006)

    Google Scholar 

  34. Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., SƂowiƄski, R. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer (2006)

    Google Scholar 

  35. SƂowiƄski, R., Greco, S., Matarazzo, B.: Dominance-Based Rough Set Approach to Reasoning About Ordinal Data. LNCS (LNAI), vol. 4585, pp. 5–11 (2007)

    Google Scholar 

  36. StaƄczyk, U.: Relative reduct-based selection of features for ANN classifier. In: Cyran, K., Kozielski, S., Peters, J., StaƄczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions. AISC, vol. 59, pp. 335–344. Springer, Berlin (2009)

    Chapter  Google Scholar 

  37. StaƄczyk, U.: DRSA decision algorithm analysis in stylometric processing of literary texts. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds.) Rough Sets and Current Trends in Computing. LNCS (LNAI), vol. 6086, pp. 600–609. Springer, Berlin (2010)

    Chapter  Google Scholar 

  38. StaƄczyk, U.: Reduct-based analysis of decision algorithms: application in computational stylistics. In: Corchado, M., KurzyƄski, E., WoĆșniak, M.(eds.) Hybrid Artificial Intelligence Systems. Part 2. LNCS (LNAI), vol. 6679, pp. 295–302. Springer (2011)

    Google Scholar 

  39. StaƄczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., KƂopotek, M., Marciniak, M., Mykowiecka, A., RybiƄski, H. (eds.) Security and Intelligent Information Systems. LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)

    Chapter  Google Scholar 

  40. StaƄczyk, U.: On preference order of DRSA conditional attributes for computational stylistics. In: Decker, H., Lhotska, L., Link, S., Tjoa, B.J,A. (eds.) Database and Expert Systems Applications. LNCS, pp. 26–33. Springer, Berlin (2013)

    Chapter  Google Scholar 

  41. StaƄczyk, U.: Relative reduct-based estimation of relevance for stylometric features. In: Catania, B., Guerrini, G., Pokorny, J. (eds.) Advances in Databases and Information Systems. LNCS, vol. 8133, pp. 135–147. Springer, Berlin (2013)

    Chapter  Google Scholar 

  42. Waugh, S., Adams, A., Tweedie, F.: Computational stylistics using artificial neural networks. Lit. Linguist. Comput. 15(2), 187–198 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urszula StaƄczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

StaƄczyk, U. (2015). Weighting of Features by Sequential Selection. In: StaƄczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45620-0_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45619-4

  • Online ISBN: 978-3-662-45620-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics