An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data

Garcia-Constantino, Matias; Coenen, Frans; Noble, P. -J.; Radford, Alan; Setzkorn, Christian; Tierney, Aine

doi:10.1007/978-3-642-23199-5_29

Matias Garcia-Constantino²⁰,
Frans Coenen²⁰,
P. -J. Noble²¹,
Alan Radford²¹,
Christian Setzkorn²¹ &
…
Aine Tierney²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

1991 Accesses
1 Citations

Abstract

An investigation into the potential effectiveness of generating text classifiers from secondary data for the purpose of text summarisation is described. The application scenario assumes a questionnaire corpus where we wish to provide a summary regarding the nature of the free text element of such questionnaires, but no suitable training data is available. The advocated approach is to build the desired text summarisation classifiers using secondary data and then apply these classifiers, for the purpose of text summarisation, to the primary data. We refer to this approach using the acronym CGUSD (Classifier Generation Using Secondary Data). The approach is evaluated using real questionnaire data obtained as part of the SAVSNET (Small Animal Veterinary Surveillance Network) project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artificial Intelligence in Medicine 33, 157–177 (2005)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994)
Google Scholar
Alonso, L., Castellón, I., Climent, S., Fuentes, M., Padró, L., Rodríguez, H.: Approaches to text summarization: Questions and answers. Inteligencia Artificial 8, 22 (2004)
Article Google Scholar
Amini, M.-R., Gallinari, P.: Automatic Text Summarisation Using Unsupervised and Semi-Supervised Learning. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 16–28. Springer, Heidelberg (2001)
Chapter Google Scholar
Amini, M.-R., Gallinari, P.: The use of unlabeled data to improve supervised learning for text summarization. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval (2002)
Google Scholar
Chen, Y.L., Weng, C.H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems 22, 46–56 (2009)
Article Google Scholar
Chuang, W.T., Yang, J.: Extracting sentence segments for text summarization: a machine learning approach. In: SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152–159 (2000)
Google Scholar
Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. Department of Computer Science. The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFP/aprioriTFP.html
Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.cSc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html
Fuentes, M., Rodríguez, H. (2002). Using cohesive properties of text for automatic summarization. In: JOTRI (2002)
Google Scholar
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classication Problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Jebara, T., Pentland, A.: Maximum conditional likelihood via bound maximization and the CEM algorithm. Advances in Neural Information Processing Systems, 494–500 (1999)
Google Scholar
Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings of the First International Conference on Machine Learning and Cybernetics (2002)
Google Scholar
Jones, K.S., et al.: Automatic summarizing: factors and directions. Advances in Automatic Text Summarization, 1–12 (1999)
Google Scholar
Luhn, P.H.: Automatic creation of Literature Abstracts. IBM Journal, 159–165 (1958)
Google Scholar
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Lingusitics, ACL 2004 (2004) (companion volume)
Google Scholar
Silber, H.G., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)
Article Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21 (1972)
Article Google Scholar
Strzalkowski, T., Wang, J., Wise, B.: A robust practical text summarization. In: Proceedings of the AAAI Symposium on Intelligent Text Summarization (1999)
Google Scholar
Willett, P.: The Porter stemming algorithm: then and now. Program: electronic library and information systems 40, 219–223 (2006)
Article Google Scholar
Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, 58–63 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Liverpool, Liverpool, L69 3BX, UK
Matias Garcia-Constantino & Frans Coenen
School of Veterinary Science, University of Liverpool, Leahurst, Neston, CH64 7TE, UK
P. -J. Noble, Alan Radford, Christian Setzkorn & Aine Tierney

Authors

Matias Garcia-Constantino
View author publications
You can also search for this author in PubMed Google Scholar
Frans Coenen
View author publications
You can also search for this author in PubMed Google Scholar
P. -J. Noble
View author publications
You can also search for this author in PubMed Google Scholar
Alan Radford
View author publications
You can also search for this author in PubMed Google Scholar
Christian Setzkorn
View author publications
You can also search for this author in PubMed Google Scholar
Aine Tierney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intitute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Constantino, M., Coenen, F., Noble, P.J., Radford, A., Setzkorn, C., Tierney, A. (2011). An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-642-23199-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics