Skip to main content

Topic Modelling vs Distant Supervision: A Comparative Evaluation Based on the Classification of Parliamentary Enquiries

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11799))

Abstract

We investigate two different approaches to text classification, by categorising enquiries submitted to the House of Commons Library from elected Members of the UK Parliament. One is an unsupervised approach, i.e. topic modelling, and the other is a supervised approach based on weakly labelled data, i.e. distant supervision. Models were trained on two types of feature sets: one based only on bag of words, and the other combining bag of words with structured metadata attached to enquiries. Our results show that topic modelling obtains superior performance on this task, and that the incorporation of structured metadata as learning features contributes insignificantly to improved model performance.

Supported by an EPSRC Impact Acceleration Account awarded to the University of Manchester.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    There are eight research sections in the Library: Business and Transport Section (BTS), Economic Policy and Statistics (EPAS), Home Affairs Section (HAS), International Affairs and Defence Section (IADS), Parliament and Constitution Centre (PCC), Science and Environment Section (SES), Social and General Statistics (SGS), Social Policy Section (SPS).

  2. 2.

    https://www.parliament.uk/topics/topical-issues.htm.

  3. 3.

    http://www.structuraltopicmodel.com/.

  4. 4.

    http://topepo.github.io/caret/index.html.

  5. 5.

    Each enquiry was assigned at most two labels or topics from the taxonomy.

  6. 6.

    The political affiliation of the MP’s office at the time they submitted an enquiry, which was obtained from the UK Parliament’s data platform using the pdpr R package: https://github.com/olihawkins/pdpr.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Joachims, T.: Text categorization with Support Vector Machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Machine Learning: ECML 1998. LNCS, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683

    Chapter  Google Scholar 

  3. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, ACL 2009, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  4. Roberts, M.E., Stewart, B.M., Tingley, D., et al.: STM: R package for structural topic models. J. Stat. Softw. 10, 1–40 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riza Batista-Navarro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Batista-Navarro, R., Hawkins, O. (2019). Topic Modelling vs Distant Supervision: A Comparative Evaluation Based on the Classification of Parliamentary Enquiries. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science(), vol 11799. Springer, Cham. https://doi.org/10.1007/978-3-030-30760-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30760-8_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30759-2

  • Online ISBN: 978-3-030-30760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics