Abstract
An investigation into the potential effectiveness of generating text classifiers from secondary data for the purpose of text summarisation is described. The application scenario assumes a questionnaire corpus where we wish to provide a summary regarding the nature of the free text element of such questionnaires, but no suitable training data is available. The advocated approach is to build the desired text summarisation classifiers using secondary data and then apply these classifiers, for the purpose of text summarisation, to the primary data. We refer to this approach using the acronym CGUSD (Classifier Generation Using Secondary Data). The approach is evaluated using real questionnaire data obtained as part of the SAVSNET (Small Animal Veterinary Surveillance Network) project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artificial Intelligence in Medicine 33, 157–177 (2005)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (1994)
Alonso, L., Castellón, I., Climent, S., Fuentes, M., Padró, L., RodrÃguez, H.: Approaches to text summarization: Questions and answers. Inteligencia Artificial 8, 22 (2004)
Amini, M.-R., Gallinari, P.: Automatic Text Summarisation Using Unsupervised and Semi-Supervised Learning. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 16–28. Springer, Heidelberg (2001)
Amini, M.-R., Gallinari, P.: The use of unlabeled data to improve supervised learning for text summarization. In: SIGIR 2002: Proceedings of the 25th annual international ACM SIGIR Conference on Research and Development in Information Retrieval (2002)
Chen, Y.L., Weng, C.H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems 22, 46–56 (2009)
Chuang, W.T., Yang, J.: Extracting sentence segments for text summarization: a machine learning approach. In: SIGIR 2000: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152–159 (2000)
Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. Department of Computer Science. The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFP/aprioriTFP.html
Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.cSc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html
Fuentes, M., RodrÃguez, H. (2002). Using cohesive properties of text for automatic summarization. In: JOTRI (2002)
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classication Problems. Machine Learning 45, 171–186 (2001)
Jebara, T., Pentland, A.: Maximum conditional likelihood via bound maximization and the CEM algorithm. Advances in Neural Information Processing Systems, 494–500 (1999)
Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings of the First International Conference on Machine Learning and Cybernetics (2002)
Jones, K.S., et al.: Automatic summarizing: factors and directions. Advances in Automatic Text Summarization, 1–12 (1999)
Luhn, P.H.: Automatic creation of Literature Abstracts. IBM Journal, 159–165 (1958)
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Lingusitics, ACL 2004 (2004) (companion volume)
Silber, H.G., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21 (1972)
Strzalkowski, T., Wang, J., Wise, B.: A robust practical text summarization. In: Proceedings of the AAAI Symposium on Intelligent Text Summarization (1999)
Willett, P.: The Porter stemming algorithm: then and now. Program: electronic library and information systems 40, 219–223 (2006)
Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, 58–63 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcia-Constantino, M., Coenen, F., Noble, P.J., Radford, A., Setzkorn, C., Tierney, A. (2011). An Investigation Concerning the Generation of Text Summarisation Classifiers Using Secondary Data. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-23199-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)