Abstract
The aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more information-rich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Caylor, J.S., Stitch, T.G., Fox, L.C., Ford, J.P.: Methodologies for determining reading requirements of military occupational specialities. Technical Report 73-5, Human Resources Research Organization, Alexandria, VA (1973)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Duffy, T.M., Kabance, P.: Testing a readable writing approach to text revision. Journal of Educational Psychology 74, 733–748 (1982)
Faiz, S.I.: Discovering higher order relations from biomedical text. Master’s thesis, Department of Computer Science, The University of Western Ontario, Canada (2012)
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)
Fundel, K., Küffner, R., Zimmer, R.: Relex - relation extraction using dependency parse trees. BMC Bioinformatics 23(3), 365–371 (2007)
Gunning, R.: Fog index after twenty years. Journal of Business Communication 6(3), 3–13 (1969)
Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count, and flesch reading ease formula) for navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Writing: Naval Air Station Memphis (1975)
McLaughlin, G.H.: Smog grading – a new redability formula. Journal of Reading 12(8), 639–46 (1969)
Medelyan, O.: Human-competitive automatic topic indexing. PhD thesis, University of Waikato, New Zealand (2009)
Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59(7), 1026–1040 (2008)
Perez-Iratxeta, C., Bork, P., Andrade, M.: Literature and genome data mining for prioritizing disease-associated genes. In: Eisenhaber, F. (ed.) Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit, pp. 74–81. Springer (2006)
Shams, R., Mercer, R.E.: Extracting connected concepts from biomedical texts using fog index. Elsevier Procedia - Social and Behavioral Sciences 27, 70–76 (2011)
Shams, R., Mercer, R.E.: Evaluating core measures of text denoising for biomedical relation mining. In: 3rd International Workshop on Global Collaboration of Information Schools (WIS 2012), Taipei, Taiwan (2012)
Shams, R., Mercer, R.E.: Improving supervised keyphrase indexer classification of keyphrases with text denoising. In: Chen, H.-H., Chowdhury, G. (eds.) ICADL 2012. LNCS, vol. 7634, pp. 77–86. Springer, Heidelberg (2012)
Shams, R., Mercer, R.E.: Investigating keyphrase indexing with text denoising. In: 12th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012), pp. 263–266. ACM (2012)
Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 254–255 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shams, R. (2013). Extracting Information-Rich Part of Texts Using Text Denoising. In: Zaïane, O.R., Zilles, S. (eds) Advances in Artificial Intelligence. Canadian AI 2013. Lecture Notes in Computer Science(), vol 7884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38457-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-38457-8_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38456-1
Online ISBN: 978-3-642-38457-8
eBook Packages: Computer ScienceComputer Science (R0)