Ranking-Based Rule Classifier Optimisation

Stańczyk, Urszula

doi:10.1007/978-3-319-67588-6_7

Urszula Stańczyk⁶

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 138))

1240 Accesses

Abstract

Ranking is a strategy widely used for estimating relevance or importance of available characteristic features. Depending on the applied methodology, variables are assessed individually or as subsets, by some statistics referring to information theory, machine learning algorithms, or specialised procedures that execute systematic search through the feature space. The information about importance of attributes can be used in the pre-processing step of initial data preparation, to remove irrelevant or superfluous elements. It can also be employed in post-processing, for optimisation of already constructed classifiers. The chapter describes research on the latter approach, involving filtering inferred decision rules while exploiting ranking positions and scores of features. The optimised rule classifiers were applied in the domain of stylometric analysis of texts for the task of binary authorship attribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The works are available for on-line reading and download in various e-book formats thanks to Project Gutenberg (see http://www.gutenberg.org).

References

Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.: Applying data mining techniques in text analysis. Technical report C-1997-23, Department of Computer Science, University of Helsinki, Finland (1997)
Google Scholar
Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)
Google Scholar
Argamon, S., Karlgren, J., Shanahan, J.: Stylistic analysis of text for information access. In: Proceedings of the 28th International ACM Conference on Research and Development in Information Retrieval, Brazil (2005)
Google Scholar
Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)
Article Google Scholar
Baron, G.: Comparison of cross-validation and test sets approaches to evaluation of classifiers in authorship attribution domain. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Science, vol. 659, pp. 81–89. Springer, Cracow (2016)
Google Scholar
Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)
Google Scholar
Biesiada, J., Duch, W., Kachel, A., Pałucha, S.: Feature ranking methods based on information entropy with Parzen windows. In: Proceedings of International Conference on Research in Electrotechnology and Applied Informatics, pp. 109–119, Katowice (2005)
Google Scholar
Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)
Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1, 131–156 (1997)
Article Google Scholar
Deuntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Noninvasive Knowledge Discovery. Matho\(\delta \)os Publishers, Bangor (2000)
Google Scholar
Fiesler, E., Beale, R.: Handbook of Neural Computation. Oxford University Press, Oxford (1997)
Book MATH Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: The use of rough sets and fuzzy sets in multi criteria decision making. In: Gal, T., Hanne, T., Stewart, T. (eds.) Advances in Multiple Criteria Decision Making, Chap. 14, pp. 14.1–14.59. Kluwer Academic Publishers, Dordrecht (1999)
Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Rough set theory for multicriteria decision analysis. Eur. J. Oper. Res. 129(1), 1–47 (2001)
Article MATH Google Scholar
Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets VII 4400, 36–52 (2007)
Article MATH MathSciNet Google Scholar
Greco, S., Słowiński, R., Stefanowski, J.: Evaluating importance of conditions in the set of discovered rules. Lect. Notes Artif. Intell. 4482, 314–321 (2007)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Holte, R.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)
Article MATH Google Scholar
Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken (2008)
Book Google Scholar
Jockers, M., Witten, D.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)
Article Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning: Proceedings of the 11th International Conference, pp. 121–129. Morgan Kaufmann Publishers (1994)
Google Scholar
Khmelev, D., Tweedie, F.: Using Markov chains for identification of writers. Lit. Linguist. Comput. 16(4), 299–307 (2001)
Article Google Scholar
Koppel, M., Argamon, S., Shimoni, A.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)
Article Google Scholar
Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2008)
MATH Google Scholar
Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, pp. 1–4. San Diego (2001)
Google Scholar
Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets VI 4374, 211–246 (2006)
Article MATH MathSciNet Google Scholar
Pawlak, Z.: Computing, artificial intelligence and information technology: rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 136, 181–189 (2002)
Article MATH Google Scholar
Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)
Article MATH MathSciNet Google Scholar
Peng, R.: Statistical aspects of literary style. Bachelor’s thesis, Yale University (1999)
Google Scholar
Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)
Article MathSciNet Google Scholar
Shen, Q.: Rough feature selection for intelligent classifiers. Trans. Rough Sets VII 4400, 244–255 (2006)
Article Google Scholar
Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Słowiński, R. (eds.) Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer, Berlin (2006)
Google Scholar
Słowiński, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. Lect. Notes Comput. Sci. (Lect. Notes Artif. Intell.) 4585, 5–11 (2007)
Article Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)
Article Google Scholar
Stańczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3, Advances in Intelligent and Soft Computing, vol. 242, pp. 475–483. Springer, Berlin (2013)
Google Scholar
Stańczyk, U.: Attribute ranking driven filtering of decision rules. In: Kryszkiewicz, M., Cornelis, C., Ciucci, D., Medina-Moreno, J., Motoda, H., Raś, Z. (eds.) Rough Sets and Intelligent Systems Paradigms. Lecture Notes in Computer Science, vol. 8537, pp. 217–224. Springer, Berlin (2014)
Google Scholar
Stańczyk, U.: Feature evaluation by filter, wrapper and embedded approaches. In: Stańczyk, U., Jain, L. (eds.) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol. 584, pp. 29–44. Springer, Berlin (2015)
Google Scholar
Stańczyk, U.: Selection of decision rules based on attribute ranking. J. Intell. Fuzzy Syst. 29(2), 899–915 (2015)
Article MathSciNet Google Scholar
Stańczyk, U.: The class imbalance problem in construction of training datasets for authorship attribution. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Mach. Interact. 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 535–547. Springer, Berlin (2016)
Google Scholar
Stańczyk, U.: Weighting and pruning of decision rules by attributes and attribute rankings. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Science, vol. 659, pp. 106–114. Springer, Cracow (2016)
Google Scholar
Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2011)
Google Scholar
Wróbel, L., Sikora, M., Michalak, M.: Rule quality measures settings in classification, regression and survival rule induction – an empirical approach. Fundamenta Informaticae 149, 419–449 (2016)
Article MathSciNet Google Scholar
Zielosko, B.: Application of dynamic programming approach to optimization of association rules relative to coverage and length. Fundamenta Informaticae 148(1–2), 87–105 (2016)
Article MATH MathSciNet Google Scholar
Zielosko, B.: Optimization of decision rules relative to coverage–comparison of greedy and modified dynamic programming approaches. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Machine Interactions 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 639–650. Springer, Berlin (2016)
Chapter Google Scholar

Download references

Acknowledgements

In the research there was used WEKA workbench [40]. 4eMka Software exploited for DRSA processing [32] was developed at the Laboratory of Intelligent Decision Support Systems, Poznań, Poland. The research was performed at the Silesian University of Technology, Gliwice, within the project BK/RAu2/2017.

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland
Urszula Stańczyk

Authors

Urszula Stańczyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Urszula Stańczyk .

Editor information

Editors and Affiliations

Silesian University of Technology , Gliwice, Poland
Urszula Stańczyk
University of Silesia in Katowice , Katowice, Poland
Beata Zielosko
University of Bournemouth , Poole, United Kingdom
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stańczyk, U. (2018). Ranking-Based Rule Classifier Optimisation. In: Stańczyk, U., Zielosko, B., Jain, L. (eds) Advances in Feature Selection for Data and Pattern Recognition. Intelligent Systems Reference Library, vol 138. Springer, Cham. https://doi.org/10.1007/978-3-319-67588-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-67588-6_7
Published: 17 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67587-9
Online ISBN: 978-3-319-67588-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics