Advertisement

Authorship Verification in the Absence of Explicit Features and Thresholds

  • Oren HalvaniEmail author
  • Lukas Graner
  • Inna Vogel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)

Abstract

Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.

Keywords

One-class Compression Intrinsic authorship verification 

Notes

Acknowledgments

This work was supported by the German Federal Ministry of Education and Research (BMBF) under the project “DORIAN” (Scrutinise and thwart disinformation). We would like to thank Christian Winter and Felix Mayer for their valuable reviews that helped to improve the quality of this paper.

References

  1. 1.
    Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015Google Scholar
  2. 2.
    Castillo, E., Cervantes, O., No, D.V., Báez, D.: Author verification using a graph-based representation. Int. J. Comput. Appl. 123(14), 1–8 (2015)Google Scholar
  3. 3.
    Forner, P., Navigli, R., Tufis, D., Ferro, N. (eds.): Working notes for CLEF 2013 Conference, Valencia, Spain, 23–26 September 2013, CEUR Workshop Proceedings, vol. 1179 (2014). CEUR-WS.org
  4. 4.
    Halvani, O.: Enron Authorship Verification Corpus, Mendeley Data, v1 (2017)Google Scholar
  5. 5.
    Halvani, O., Winter, C., Graner, L.: On the usefulness of compression models for authorship verification. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES 2017, pp. 54:1–54:10 (2017)Google Scholar
  6. 6.
    Hernández, J.G.G., Casillas, J., Ledesma, P., Pineda, G.F., Ruíz, I.V.M.: Homotopy based classification for author verification task: notebook for PAN at CLEF 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015Google Scholar
  7. 7.
    Hürlimann, M., Weck, B., von den Berg, E., Šuster, S., Nissim, M.: GLAD: groningen lightweight authorship detection. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015Google Scholar
  8. 8.
    Jankowska, M., Keselj, V., Milios, E.E.: Proximity based one-class classification with common N-gram dissimilarity for authorship verification task notebook for PAN at CLEF 2013. In: Forner et al. [3]Google Scholar
  9. 9.
    Noecker Jr., J., Ryan, M.: Distractorless authorship verification. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 2012Google Scholar
  10. 10.
    Juola, P., Stamatatos, E.: Overview of the author identification task at PAN 2013. In: Forner et al. [3]Google Scholar
  11. 11.
    Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016)CrossRefGoogle Scholar
  12. 12.
    Khonji, M., Iraqi, Y.: A slightly-modified GI-based author-verifier with lots of features (ASGALF). In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 977–983 (2014)Google Scholar
  13. 13.
    Kocher, M., Savoy, J.: A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2017).  https://doi.org/10.1002/asi.23648 CrossRefGoogle Scholar
  14. 14.
    Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-First International Conference (ICML 2004), vol. 69, Banff, Alberta, Canada, 4–8 July 2004. ACM (2004)Google Scholar
  15. 15.
    Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)CrossRefGoogle Scholar
  16. 16.
    Koppel, M., Winter, Y.: Determining if two documents are written by the same author. JASIST 65(1), 178–187 (2014)Google Scholar
  17. 17.
    Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for PAN at CLEF 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September 2015, Toulouse, France (2015). CEUR-WS.org
  18. 18.
    Nagaprasad, S., Reddy, V., Babu, A.: Authorship attribution based on data compression for telugu text. Int. J. Comput. Appl. 110(1), 1–5 (2015)Google Scholar
  19. 19.
    Potha, N., Stamatatos, E.: A profile-based method for authorship verification. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS (LNAI), vol. 8445, pp. 313–326. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07064-3_25 CrossRefGoogle Scholar
  20. 20.
    Potha, N., Stamatatos, E.: An improved Impostors method for authorship verification. In: Jones, G.J.F., Lawless, S., Gonzalo, J., Kelly, L., Goeuriot, L., Mandl, T., Cappellato, L., Ferro, N. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 138–144. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-65813-1_14 CrossRefGoogle Scholar
  21. 21.
    Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. ArXiv e-prints, February 2017Google Scholar
  22. 22.
    Rexha, A., Kröll, M., Ziak, H., Kern, R.: Extending scientific literature search by including the author’s writing style. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) Proceedings of the Fifth Workshop on Bibliometric-Enhanced Information Retrieval (BIR) Co-located with the 39th European Conference on Information Retrieval (ECIR 2017), Aberdeen, UK, 9th April 2017, CEUR Workshop Proceedings, vol. 1823, pp. 93–100 (2017). CEUR-WS.org
  23. 23.
    Sculley, D., Brodley, C.E.: Compression and machine learning: a new perspective on feature space vectors. In: DCC, pp. 332–332. IEEE Computer SocietyGoogle Scholar
  24. 24.
    Seidman, S.: Authorship verification using the impostors method notebook for PAN at CLEF 2013. In: Forner et al. [3]Google Scholar
  25. 25.
    Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)CrossRefGoogle Scholar
  26. 26.
    Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, 8–11 September 2015Google Scholar
  27. 27.
    Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 877–897 (2014)Google Scholar
  28. 28.
    Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24027-5_49 CrossRefGoogle Scholar
  29. 29.
    Stein, B., Lipka, N., Zu Eissen, S.M.: Meta analysis within authorship verification. In: 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1–5 September 2008, Turin, Italy, pp. 34–39. IEEE Computer Society (2008)Google Scholar
  30. 30.
    Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis (2001)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Fraunhofer Institute for Secure Information TechnologyDarmstadtGermany

Personalised recommendations