Skip to main content

Ranking-Based Rule Classifier Optimisation

  • Chapter
  • First Online:
Advances in Feature Selection for Data and Pattern Recognition

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 138))

  • 1240 Accesses

Abstract

Ranking is a strategy widely used for estimating relevance or importance of available characteristic features. Depending on the applied methodology, variables are assessed individually or as subsets, by some statistics referring to information theory, machine learning algorithms, or specialised procedures that execute systematic search through the feature space. The information about importance of attributes can be used in the pre-processing step of initial data preparation, to remove irrelevant or superfluous elements. It can also be employed in post-processing, for optimisation of already constructed classifiers. The chapter describes research on the latter approach, involving filtering inferred decision rules while exploiting ranking positions and scores of features. The optimised rule classifiers were applied in the domain of stylometric analysis of texts for the task of binary authorship attribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The works are available for on-line reading and download in various e-book formats thanks to Project Gutenberg (see http://www.gutenberg.org).

References

  1. Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical report C-1997-23, Department of Computer Science, University of Helsinki, Finland (1997)

    Google Scholar 

  2. Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)

    Google Scholar 

  3. Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)

    Google Scholar 

  4. Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)

    Article  Google Scholar 

  5. Baron, G.: Comparison of cross-validation and test sets approaches to evaluation of classifiers in authorship attribution domain. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Science, vol. 659, pp. 81–89. Springer, Cracow (2016)

    Google Scholar 

  6. Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)

    Google Scholar 

  7. Biesiada, J., Duch, W., Kachel, A., PaƂucha, S.: Feature ranking methods based on information entropy with Parzen windows. In: Proceedings of International Conference on Research in Electrotechnology and Applied Informatics, pp. 109–119, Katowice (2005)

    Google Scholar 

  8. Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)

    Google Scholar 

  9. Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1, 131–156 (1997)

    Article  Google Scholar 

  10. Deuntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Noninvasive Knowledge Discovery. Matho\(\delta \)os Publishers, Bangor (2000)

    Google Scholar 

  11. Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)

    Book  MATH  Google Scholar 

  12. Greco, S., Matarazzo, B., SƂowiƄski, R.: The use of rough sets and fuzzy sets in multi criteria decision making. In: Gal, T., Hanne, T., Stewart, T. (eds.) Advances in Multiple Criteria Decision Making, Chap. 14, pp. 14.1–14.59. Kluwer Academic Publishers, Dordrecht (1999)

    Google Scholar 

  13. Greco, S., Matarazzo, B., SƂowiƄski, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)

    Article  MATH  Google Scholar 

  14. Greco, S., Matarazzo, B., SƂowiƄski, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets VII 4400, 36–52 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. Greco, S., SƂowiƄski, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. Lect. Notes Artif. Intell. 4482, 314–321 (2007)

    Google Scholar 

  16. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  17. Holte, R.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)

    Article  MATH  Google Scholar 

  18. Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)

    Book  Google Scholar 

  19. Jockers, M., Witten, D.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)

    Article  Google Scholar 

  20. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)

    Google Scholar 

  21. Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)

    Article  Google Scholar 

  22. Koppel, M., Argamon, S., Shimoni, A.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)

    Article  Google Scholar 

  23. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2008)

    MATH  Google Scholar 

  24. Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)

    Google Scholar 

  25. Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets VI 4374, 211–246 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  26. Pawlak, Z.: Computing, artificial intelligence and information technology: rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 136, 181–189 (2002)

    Article  MATH  Google Scholar 

  27. Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  28. Peng, R.: Statistical aspects of literary style. Bachelor’s thesis, Yale University (1999)

    Google Scholar 

  29. Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)

    Article  MathSciNet  Google Scholar 

  30. Shen, Q.: Rough feature selection for intelligent classifiers. Trans. Rough Sets VII 4400, 244–255 (2006)

    Article  Google Scholar 

  31. Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., SƂowiƄski, R. (eds.) Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer, Berlin (2006)

    Google Scholar 

  32. SƂowiƄski, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. Lect. Notes Comput. Sci. (Lect. Notes Artif. Intell.) 4585, 5–11 (2007)

    Article  Google Scholar 

  33. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)

    Article  Google Scholar 

  34. StaƄczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3, Advances in Intelligent and Soft Computing, vol. 242, pp. 475–483. Springer, Berlin (2013)

    Google Scholar 

  35. StaƄczyk, U.: Attribute ranking driven filtering of decision rules. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raƛ, Z. (eds.) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science, vol. 8537, pp. 217–224. Springer, Berlin (2014)

    Google Scholar 

  36. StaƄczyk, U.: Feature evaluation by filter, wrapper and embedded approaches. In: StaƄczyk, U., Jain, L. (eds.) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol. 584, pp. 29–44. Springer, Berlin (2015)

    Google Scholar 

  37. StaƄczyk, U.: Selection of decision rules based on attribute ranking. J. Intell. Fuzzy Syst. 29(2), 899–915 (2015)

    Article  MathSciNet  Google Scholar 

  38. StaƄczyk, U.: The class imbalance problem in construction of training datasets for authorship attribution. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Mach. Interact. 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 535–547. Springer, Berlin (2016)

    Google Scholar 

  39. StaƄczyk, U.: Weighting and pruning of decision rules by attributes and attribute rankings. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Science, vol. 659, pp. 106–114. Springer, Cracow (2016)

    Google Scholar 

  40. Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)

    Google Scholar 

  41. Wróbel, L., Sikora, M., Michalak, M.: Rule quality measures settings in classification, regression and survival rule induction – an empirical approach. Fundamenta Informaticae 149, 419–449 (2016)

    Article  MathSciNet  Google Scholar 

  42. Zielosko, B.: Application of dynamic programming approach to optimization of association rules relative to coverage and length. Fundamenta Informaticae 148(1–2), 87–105 (2016)

    Article  MATH  MathSciNet  Google Scholar 

  43. Zielosko, B.: Optimization of decision rules relative to coverage–comparison of greedy and modified dynamic programming approaches. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Machine Interactions 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 639–650. Springer, Berlin (2016)

    Chapter  Google Scholar 

Download references

Acknowledgements

In the research there was used WEKA workbench [40]. 4eMka Software exploited for DRSA processing [32] was developed at the Laboratory of Intelligent Decision Support Systems, PoznaƄ, Poland. The research was performed at the Silesian University of Technology, Gliwice, within the project BK/RAu2/2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Urszula StaƄczyk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

StaƄczyk, U. (2018). Ranking-Based Rule Classifier Optimisation. In: StaƄczyk, U., Zielosko, B., Jain, L. (eds) Advances in Feature Selection for Data and Pattern Recognition. Intelligent Systems Reference Library, vol 138. Springer, Cham. https://doi.org/10.1007/978-3-319-67588-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67588-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67587-9

  • Online ISBN: 978-3-319-67588-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics