Extracting Information-Rich Part of Texts Using Text Denoising

Shams, Rushdi

doi:10.1007/978-3-642-38457-8_39

Rushdi Shams²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7884))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1671 Accesses

Abstract

The aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more information-rich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Caylor, J.S., Stitch, T.G., Fox, L.C., Ford, J.P.: Methodologies for determining reading requirements of military occupational specialities. Technical Report 73-5, Human Resources Research Organization, Alexandria, VA (1973)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Duffy, T.M., Kabance, P.: Testing a readable writing approach to text revision. Journal of Educational Psychology 74, 733–748 (1982)
Article Google Scholar
Faiz, S.I.: Discovering higher order relations from biomedical text. Master’s thesis, Department of Computer Science, The University of Western Ontario, Canada (2012)
Google Scholar
Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)
Article Google Scholar
Fundel, K., Küffner, R., Zimmer, R.: Relex - relation extraction using dependency parse trees. BMC Bioinformatics 23(3), 365–371 (2007)
Article Google Scholar
Gunning, R.: Fog index after twenty years. Journal of Business Communication 6(3), 3–13 (1969)
Article Google Scholar
Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count, and flesch reading ease formula) for navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Writing: Naval Air Station Memphis (1975)
Google Scholar
McLaughlin, G.H.: Smog grading – a new redability formula. Journal of Reading 12(8), 639–46 (1969)
Google Scholar
Medelyan, O.: Human-competitive automatic topic indexing. PhD thesis, University of Waikato, New Zealand (2009)
Google Scholar
Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59(7), 1026–1040 (2008)
Article Google Scholar
Perez-Iratxeta, C., Bork, P., Andrade, M.: Literature and genome data mining for prioritizing disease-associated genes. In: Eisenhaber, F. (ed.) Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit, pp. 74–81. Springer (2006)
Google Scholar
Shams, R., Mercer, R.E.: Extracting connected concepts from biomedical texts using fog index. Elsevier Procedia - Social and Behavioral Sciences 27, 70–76 (2011)
Article Google Scholar
Shams, R., Mercer, R.E.: Evaluating core measures of text denoising for biomedical relation mining. In: 3rd International Workshop on Global Collaboration of Information Schools (WIS 2012), Taipei, Taiwan (2012)
Google Scholar
Shams, R., Mercer, R.E.: Improving supervised keyphrase indexer classification of keyphrases with text denoising. In: Chen, H.-H., Chowdhury, G. (eds.) ICADL 2012. LNCS, vol. 7634, pp. 77–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Shams, R., Mercer, R.E.: Investigating keyphrase indexing with text denoising. In: 12th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012), pp. 263–266. ACM (2012)
Google Scholar
Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 254–255 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Western Ontario, London, ON, N6A 5B7, Canada
Rushdi Shams

Authors

Rushdi Shams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane
Department of Computer Science, University of Regina, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shams, R. (2013). Extracting Information-Rich Part of Texts Using Text Denoising. In: Zaïane, O.R., Zilles, S. (eds) Advances in Artificial Intelligence. Canadian AI 2013. Lecture Notes in Computer Science(), vol 7884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38457-8_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-38457-8_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38456-1
Online ISBN: 978-3-642-38457-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics