Skip to main content

Assortative Mixture of English Parts of Speech

  • Conference paper
  • First Online:
Complex Networks & Their Applications VI (COMPLEX NETWORKS 2017)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 689))

Included in the following conference series:

  • 4708 Accesses

Abstract

Network data analysis is an emerging area of study that applies quantitative analysis to complex data from a variety of application fields. Methods used in network data analysis enable visualization of relational data in the form of graphs and also yield descriptive characteristics and predictive graph models. This paper presents an application of network data analysis to the authorship attribution problem. Specifically, we show how a representation of text as a word graph produces the well documented feature sets used in authorship attribution tasks such as the word frequency model and the part-of-speech (POS) bigram model. Analysis of these models along with word graph characteristics provides insights into the English language. Particularly, analysis of the nominal assortative mixture of parts of speech, a statistic that measures the tendency of words of the same POS in the word network to be connected by an edge, reveals regular structural properties of English grammar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is necessary for calculating vertex degree to connect words that end a sentence to a dummy end vertex.

  2. 2.

    Excluding symbols and list items markers.

  3. 3.

    Consider the French use of articles le and la.

References

  1. Azar, P.: Using algorithmic attribution techniques to determine authorship in unsigned judicial opinions. Stanf. Technol. Law Rev. 16(3) (2013). https://journals.law.stanford.edu/sites/default/files/stanford-technology-law-review-stlr/online/algorithmicattribution.pdf

  2. Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship attribution with support vector machines. Appl. Intell. 19(1–2), 109–123 (2003). https://doi.org/10.1023/A:1023824908771

  3. Hamel, L.H.: Knowledge Discovery with Support Vector Machines (Wiley Series on Methods and Applications in Data Mining). Wiley-Interscience (2011)

    Google Scholar 

  4. Hirst, G., Feiguina, O.: Bigrams of syntactic labels for authorship discrimination of short texts. In: Literary and Linguistic Computing (2007)

    Google Scholar 

  5. Kolaczyk, E.D., Csardi, G.: Statistical Analysis of Network Data with R (Use R!). Springer Science and Business Media (2014)

    Google Scholar 

  6. Lahiri, S.: Complexity of Word Collocation Networks: A Preliminary Structural Analysis. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 96–105. Association for Computational Linguistics, Gothenburg, Sweden. http://www.aclweb.org/anthology/E14-3011 (2014)

  7. Lahiri, S., Mihalcea, R.: Authorship attribution using word network features. CoRR abs/1311.2978 (2013). arXiv:1311.2978

  8. Litvak, N., van der Hofstad, R.: Degree-degree correlations in random graphs with heavy-tailed degrees (2012). ArXiv e-prints

    Google Scholar 

  9. Litvak, N., van der Hofstad, R.: Uncovering disassortativity in large scale-free networks. 87(2), 022801 (2013). arXiv e-prints. https://doi.org/10.1103/PhysRevE.87.022801

  10. Mihalcea, R., Radev, D.: Graph-based natural language processing and information retrieval. Cambridge University Press, United Kingdom (2011). https://doi.org/10.1017/CBO9780511976247

  11. Piantadosi, S.T.: Zipf’s word frequency law in natural language: A critical review and future directions. Psychon. Bull. Rev. 21(5), 1112–1130 (2014). https://doi.org/10.3758/s13423-014-0585-6

  12. Seroussi, Y., Zukerman, I., Bohnert, F.: Authorship attribution with topic models. Comput. Linguist. 40(2), 269–310 (2014)

    Article  Google Scholar 

  13. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009). https://doi.org/10.1002/asi.21001

  14. Toutanove, K., Manning, C.D.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 63–70. https://nlp.stanford.edu/software/tagger.shtml (2000)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natallia V. Katenka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leonard, T., Hamel, L., Daniels, N.M., Katenka, N.V. (2018). Assortative Mixture of English Parts of Speech. In: Cherifi, C., Cherifi, H., Karsai, M., Musolesi, M. (eds) Complex Networks & Their Applications VI. COMPLEX NETWORKS 2017. Studies in Computational Intelligence, vol 689. Springer, Cham. https://doi.org/10.1007/978-3-319-72150-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72150-7_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72149-1

  • Online ISBN: 978-3-319-72150-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics