Skip to main content

An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2011)

Abstract

An investigation into the potential effectiveness of generating text classifiers from secondary data for the purpose of text summarisation is described. The application scenario assumes a questionnaire corpus where we wish to provide a summary regarding the nature of the free text element of such questionnaires, but no suitable training data is available. The advocated approach is to build the desired text summarisation classifiers using secondary data and then apply these classifiers, for the purpose of text summarisation, to the primary data. We refer to this approach using the acronym CGUSD (Classifier Generation Using Secondary Data). The approach is evaluated using real questionnaire data obtained as part of the SAVSNET (Small Animal Veterinary Surveillance Network) project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artificial Intelligence in Medicine 33, 157–177 (2005)

    Article  Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994)

    Google Scholar 

  3. Alonso, L., Castellón, I., Climent, S., Fuentes, M., Padró, L., Rodríguez, H.: Approaches to text summarization: Questions and answers. Inteligencia Artificial 8, 22 (2004)

    Article  Google Scholar 

  4. Amini, M.-R., Gallinari, P.: Automatic Text Summarisation Using Unsupervised and Semi-Supervised Learning. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 16–28. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Amini, M.-R., Gallinari, P.: The use of unlabeled data to improve supervised learning for text summarization. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval (2002)

    Google Scholar 

  6. Chen, Y.L., Weng, C.H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems 22, 46–56 (2009)

    Article  Google Scholar 

  7. Chuang, W.T., Yang, J.: Extracting sentence segments for text summarization: a machine learning approach. In: SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152–159 (2000)

    Google Scholar 

  8. Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. Department of Computer Science. The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFP/aprioriTFP.html

  9. Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.cSc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html

  10. Fuentes, M., Rodríguez, H. (2002). Using cohesive properties of text for automatic summarization. In: JOTRI (2002)

    Google Scholar 

  11. Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classication Problems. Machine Learning 45, 171–186 (2001)

    Article  MATH  Google Scholar 

  12. Jebara, T., Pentland, A.: Maximum conditional likelihood via bound maximization and the CEM algorithm. Advances in Neural Information Processing Systems, 494–500 (1999)

    Google Scholar 

  13. Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings of the First International Conference on Machine Learning and Cybernetics (2002)

    Google Scholar 

  14. Jones, K.S., et al.: Automatic summarizing: factors and directions. Advances in Automatic Text Summarization, 1–12 (1999)

    Google Scholar 

  15. Luhn, P.H.: Automatic creation of Literature Abstracts. IBM Journal, 159–165 (1958)

    Google Scholar 

  16. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Lingusitics, ACL 2004 (2004) (companion volume)

    Google Scholar 

  17. Silber, H.G., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)

    Article  Google Scholar 

  18. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21 (1972)

    Article  Google Scholar 

  19. Strzalkowski, T., Wang, J., Wise, B.: A robust practical text summarization. In: Proceedings of the AAAI Symposium on Intelligent Text Summarization (1999)

    Google Scholar 

  20. Willett, P.: The Porter stemming algorithm: then and now. Program: electronic library and information systems 40, 219–223 (2006)

    Article  Google Scholar 

  21. Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, 58–63 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garcia-Constantino, M., Coenen, F., Noble, P.J., Radford, A., Setzkorn, C., Tierney, A. (2011). An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics