Skip to main content

Weighted Voting and Meta-Learning for Combining Authorship Attribution Methods

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2018 (IDEAL 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11314))

Abstract

Our research concentrates on ways to combine machine learning techniques for authorship attribution. Traditionally, research in authorship attribution is focused on the development of new base-classifiers (combinations of stylometric features and learning methods). A large number of base-classifiers developed for authorship attribution vary in accuracy, often proposing different authors for a disputed document. In this research, we use predictions of multiple base-classifiers as a knowledge base for learning the true author.

We introduce and compare two novel methods that utilize multiple base-classifiers. In the Weighted Voting approach, each base-classifier supports an author in proportion to its accuracy in leave-one-out classification. In our Meta-Learning approach, each base-classifier is treated as a feature and methods’ predictions in leave-one-out cross-validation are used as training data from which machine learning methods produce an aggregated decision.

We illustrate our results through a collection of 18th century political writings. Anonymously written essays were common during this period, leading to frequent disagreements between scholars over their attribution.

This research was partially supported by the generous grant from the Robert David Lion Gardiner Foundation to Iona College’s Institute for Thomas Paine Studies (ITPS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Love, H.: Attributing Authorship: An Introduction. Cambridge University Press, Cambridge (2002)

    Book  Google Scholar 

  2. Stamatatos, E.: Authorship attribution based on feature set subspacing ensembles. Int. J. Artif. Intell. Tools 15, 823–838 (2006). https://doi.org/10.1142/S0218213006002965

    Article  Google Scholar 

  3. Ryan, M., Noecker, J.: Mixture of Experts Authorship Attribution Notebook for PAN at CLEF 2012 (2012)

    Google Scholar 

  4. Berton, G., Petrovic, S., Ivanov, L., Schiaffino, R.: Examining the Thomas Paine Corpus: automated computer authorship attribution methodology applied to Thomas Paine’s writings. In: Cleary, S., Stabell, I.L. (eds.) New Directions in Thomas Paine Studies, pp. 31–47. Palgrave Macmillan US, New York (2016). https://doi.org/10.1057/9781137589996_3

    Chapter  Google Scholar 

  5. Petrovic, S., Berton, G., Campbell, S., Ivanov, L.: Attribution of 18th century political writings using machine learning. J. Technol. Soc. 11, 1–13 (2015). https://doi.org/10.18848/2381-9251/CGP/v11i03/56506

    Article  Google Scholar 

  6. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009). https://doi.org/10.1002/asi.21001

    Article  Google Scholar 

  7. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60, 9–26 (2008). https://doi.org/10.1002/asi.20961

    Article  Google Scholar 

  8. Mosteller, F., Wallace, D.L.: Inference and disputed authorship: the Federalist. Center for the Study of Language and Information (1964)

    Google Scholar 

  9. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: NAACL 2003, pp. 173–180. Association for Computational Linguistics, Morristown (2003)

    Google Scholar 

  10. Balota, D.A., Yap, M.J., Cortese, M.J., et al.: The English lexicon project. Behav. Res. Methods 39, 445–459 (2007)

    Article  Google Scholar 

  11. Porter, M.F.: An algorithm for suffix stripping. Program 14, 130–137 (1980). https://doi.org/10.1108/eb046814

    Article  Google Scholar 

  12. Hall, M., Frank, E., Holmes, G., et al.: The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11, 10 (2009). https://doi.org/10.1145/1656274.1656278

    Article  Google Scholar 

  13. Juola, P.: Authorship attribution. Found. Trends® Inf. Retr. 1, 233–334 (2008). https://doi.org/10.1561/1500000005

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Smiljana Petrovic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Petrovic, S., Petrovic, I., Palesi, I., Calise, A. (2018). Weighted Voting and Meta-Learning for Combining Authorship Attribution Methods. In: Yin, H., Camacho, D., Novais, P., Tallón-Ballesteros, A. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2018. IDEAL 2018. Lecture Notes in Computer Science(), vol 11314. Springer, Cham. https://doi.org/10.1007/978-3-030-03493-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03493-1_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03492-4

  • Online ISBN: 978-3-030-03493-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics