Skip to main content

Extracting Information-Rich Part of Texts Using Text Denoising

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7884))

Included in the following conference series:

  • 1671 Accesses

Abstract

The aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more information-rich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Caylor, J.S., Stitch, T.G., Fox, L.C., Ford, J.P.: Methodologies for determining reading requirements of military occupational specialities. Technical Report 73-5, Human Resources Research Organization, Alexandria, VA (1973)

    Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  3. Duffy, T.M., Kabance, P.: Testing a readable writing approach to text revision. Journal of Educational Psychology 74, 733–748 (1982)

    Article  Google Scholar 

  4. Faiz, S.I.: Discovering higher order relations from biomedical text. Master’s thesis, Department of Computer Science, The University of Western Ontario, Canada (2012)

    Google Scholar 

  5. Flesch, R.: A new readability yardstick. Journal of Applied Psychology 32, 221–233 (1948)

    Article  Google Scholar 

  6. Fundel, K., Küffner, R., Zimmer, R.: Relex - relation extraction using dependency parse trees. BMC Bioinformatics 23(3), 365–371 (2007)

    Article  Google Scholar 

  7. Gunning, R.: Fog index after twenty years. Journal of Business Communication 6(3), 3–13 (1969)

    Article  Google Scholar 

  8. Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count, and flesch reading ease formula) for navy enlisted personnel. Research Branch Report 8-75, Chief of Naval Technical Writing: Naval Air Station Memphis (1975)

    Google Scholar 

  9. McLaughlin, G.H.: Smog grading – a new redability formula. Journal of Reading 12(8), 639–46 (1969)

    Google Scholar 

  10. Medelyan, O.: Human-competitive automatic topic indexing. PhD thesis, University of Waikato, New Zealand (2009)

    Google Scholar 

  11. Medelyan, O., Witten, I.: Domain-independent automatic keyphrase indexing with small training sets. Journal of the American Society for Information Science and Technology (JASIST) 59(7), 1026–1040 (2008)

    Article  Google Scholar 

  12. Perez-Iratxeta, C., Bork, P., Andrade, M.: Literature and genome data mining for prioritizing disease-associated genes. In: Eisenhaber, F. (ed.) Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit, pp. 74–81. Springer (2006)

    Google Scholar 

  13. Shams, R., Mercer, R.E.: Extracting connected concepts from biomedical texts using fog index. Elsevier Procedia - Social and Behavioral Sciences 27, 70–76 (2011)

    Article  Google Scholar 

  14. Shams, R., Mercer, R.E.: Evaluating core measures of text denoising for biomedical relation mining. In: 3rd International Workshop on Global Collaboration of Information Schools (WIS 2012), Taipei, Taiwan (2012)

    Google Scholar 

  15. Shams, R., Mercer, R.E.: Improving supervised keyphrase indexer classification of keyphrases with text denoising. In: Chen, H.-H., Chowdhury, G. (eds.) ICADL 2012. LNCS, vol. 7634, pp. 77–86. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Shams, R., Mercer, R.E.: Investigating keyphrase indexing with text denoising. In: 12th ACM/IEEE Joint Conference on Digital Libraries (JCDL 2012), pp. 263–266. ACM (2012)

    Google Scholar 

  17. Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the 4th ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 254–255 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shams, R. (2013). Extracting Information-Rich Part of Texts Using Text Denoising. In: Zaïane, O.R., Zilles, S. (eds) Advances in Artificial Intelligence. Canadian AI 2013. Lecture Notes in Computer Science(), vol 7884. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38457-8_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38457-8_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38456-1

  • Online ISBN: 978-3-642-38457-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics