Abstract
The analysis of textual writing styles is a well-studied problem with ongoing and active research in fields like authorship attribution, author profiling, text segmentation or plagiarism detection. While many features have been proposed and shown to be effective to characterize authors or document types in terms of high-dimensional feature vectors, an intuitive, human-friendly view on the computed data is often lacking. For example, machine learning algorithms are able to attribute previously unseen documents to a set of known authors by utilizing those features, but a visualization of the most discriminating features is usually not provided. To this end, we present StyleExplorer, a freely available web tool that is able to extract textual features from documents and to visualize them in multiple variants. Besides analyzing single documents intrinsically, it is also possible to visually compare multiple documents in single views with respect to selected metrics, making it a valuable analysis tool for various tasks in natural language processing as well as for areas in the humanities that work and analyze textual data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Available at https://dbis-styleexplorer.uibk.ac.at, [Review login: ecir2019/ecir2019].
- 2.
https://www.meteor.com, visited October 2018.
- 3.
https://reactjs.org, visited October 2018.
- 4.
And also be downloaded in JSON format for individual further postprocessing.
- 5.
Utilizing Highcharts, https://www.highcharts.com, visited October 2018.
- 6.
E.g., recent and popular techniques like word2vec [12].
References
Gamon, M.: Linguistic correlates of style: authorship classification with deep linguistic analysis features. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING), p. 611. ACL (2004)
Gibbons, J.: Forensic Linguistics: An Introduction to Language in the Justice System. Wiley-Blackwell, Hoboken (2003)
Misra, H., et al.: Text segmentation: a topic modeling perspective. Inf. Process. Manage. 47(4), 528–544 (2011)
Huber, B.: Evaluation of Style Features of Text Documents, Bachelor thesis. Department of Computer Science, Universität Innsbruck (2016)
Koppel, M., Schler, J.: Exploiting stylistic idiosyncrasies for authorship attribution. In: Proceedings of the 18th International Joint Conference on AI, vol. 69, pp. 72–80 (2003)
Potthast, M., et al.: Overview of the 5th international competition on plagiarism detection. In: Notebook Papers of the 9th PAN Evaluation Lab (2013)
Mosteller, F., Wallace, D.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Boston (1964)
Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, vol. 1609 (2016)
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009). https://doi.org/10.1002/asi.v60:3
Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: Notebook Papers of the 5th PAN Evaluation Lab (2011)
Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)
Mikolov, T., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Tschuggnall, M., Specht, G.: Using grammar-profiles to intrinsically expose plagiarism in text documents. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 297–302. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38824-8_28
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
Eissen, S.M., Stein, B.: Intrinsic plagiarism detection. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 565–569. Springer, Heidelberg (2006). https://doi.org/10.1007/11735106_66
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tschuggnall, M., Gerrier, T., Specht, G. (2019). StyleExplorer: A Toolkit for Textual Writing Style Visualization. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-15719-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15718-0
Online ISBN: 978-3-030-15719-7
eBook Packages: Computer ScienceComputer Science (R0)