An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems

Matos, Pablo F.; Lombardi, Leonardo O.; Pardo, Thiago A. S.; Ciferri, Cristina D. A.; Vieira, Marina T. P.; Ciferri, Ricardo R.

doi:10.1007/978-3-642-13022-9_31

Pablo F. Matos²⁴,
Leonardo O. Lombardi²⁵,
Thiago A. S. Pardo²⁶,
Cristina D. A. Ciferri²⁶,
Marina T. P. Vieira²⁵ &
…
Ricardo R. Ciferri²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6096))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

2136 Accesses

Abstract

This paper addresses the problem of extracting and processing relevant information from unstructured electronic documents of the biomedical domain. The documents are full scientific papers. This problem imposes several challenges, such as identifying text passages that contain relevant information, collecting the relevant information pieces, populating a database and a data warehouse, and mining these data. For this purpose, this paper proposes the IEDSS-Bio, an environment for Information Extraction and Decision Support System in Biomedical domain. In a case study, experiments with machine learning for identifying relevant text passages (disease and treatment effects, and patients number information on Sickle Cell Anemia papers) showed that the best results (95.9% accuracy) were obtained with a statistical method and the use of preprocessing techniques to resample the examples and to eliminate noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2007)
Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery in Databases. AI Magazine 17(3), 37–54 (1996)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)
Google Scholar
Džeroski, S.: Multi-Relational Data Mining: An Introduction. ACM SIGKDD Explorations Newsletter 5(1), 1–16 (2003)
Article Google Scholar
Cohen, K.B., Hunter, L.: Getting Started in Text Mining. PLoS Computational Biology 4(1), 1–3 (2008)
Article Google Scholar
Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical Literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)
Article Google Scholar
Ananiadou, S., McNaught, J. (eds.): Text Mining for Biology and Biomedicine. Artech House, Norwood (2006)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Improving the Performance of Dictionary-Based Approaches in Protein Name Recognition. Journal of Biomedical Informatics 37(6), 461–470 (2004)
Article Google Scholar
Chun, H.-W., Tsuruoka, Y., Kim, J.-D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, J.: Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning. In: 11th PSB, Hawaii, pp. 4–15 (2006)
Google Scholar
Mika, S., Rost, B.: NLProt: Extracting Protein Names and Sequences from Papers. Nucleic Acids Research 32(suppl. 2), 634–637 (2004)
Article Google Scholar
Seki, K., Mostafa, J.: A Hybrid Approach to Protein Name Identification in Biomedical Texts. Information Processing & Management 41(4), 723–743 (2005)
Article Google Scholar
Hanisch, D., Fundel, K., Mevissen, H., Zimmer, R., Fluck, J.: Prominer: Rule-Based Protein and Gene Entity Recognition. BMC Bioinf. 6(suppl. 1), S14 (2005)
Article Google Scholar
Tanabe, L., Wilbur, W.J.: Tagging Gene and Protein Names in Biomedical Text. Bioinformatics 18(8), 1124–1132 (2002)
Article Google Scholar
Bremer, E.G., Natarajan, J., Zhang, Y., DeSesa, C., Hack, C.J., Dubitzky, W.: Text Mining of Full Text Articles and Creation of a Knowledge Base for Analysis of Microarray Data. In: López, J.A., Benfenati, E., Dubitzky, W. (eds.) KELSI 2004. LNCS (LNAI), vol. 3303, pp. 84–95. Springer, Heidelberg (2004)
Chapter Google Scholar
Garten, Y., Altman, R.: Pharmspresso: A Text Mining Tool for Extraction of Pharmacogenomic Concepts and Relationships from Full Text. BMC Bioinf. 10(suppl. 2), S6 (2009)
Article Google Scholar
Tanabe, L., Wilbur, W.J.: Tagging Gene and Protein Names in Full Text Articles. In: Workshop on NLP in the Biomedical Domain, pp. 9–13. ACL, Phildadelphia (2002)
Google Scholar
Cohen, A.M., Hersh, W.R.: A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics 6(1), 57–71 (2005)
Article Google Scholar
Pinto, A.C.S., Matos, P.F., Perlin, C.B., Andrade, C.G., Carosia, A.E.O., Lombardi, L.O., Ciferri, R.R., Pardo, T.A.S., Ciferri, C.D.A., Vieira, M.T.P.: Technical Report Sickle Cell Anemia. Technical Report, Federal University of São Carlos (2009), http://sca.dc.ufscar.br/download/files/report.sca.pdf
Fleiss, J.L.: Measuring Nominal Scale Agreement among Many Raters. Psychological Bulletin 76(5), 378–382 (1971)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Anthony, L., Lashkia, G.V.: Mover: A Machine Learning Tool to Assist in the Reading and Writing of Technical Papers. IEEE Trans. Prof. Comm. 46(3), 185–193 (2003)
Article Google Scholar
Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1), 159–174 (1977)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of São Carlos, São Carlos/SP, Brazil
Pablo F. Matos & Ricardo R. Ciferri
Faculty of Mathematical and Nature Sciences, Methodist University of Piracicaba, Piracicaba/SP, Brazil
Leonardo O. Lombardi & Marina T. P. Vieira
Department of Computer Science, University of São Paulo, São Carlos/SP, Brazil
Thiago A. S. Pardo & Cristina D. A. Ciferri

Authors

Pablo F. Matos
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo O. Lombardi
View author publications
You can also search for this author in PubMed Google Scholar
Thiago A. S. Pardo
View author publications
You can also search for this author in PubMed Google Scholar
Cristina D. A. Ciferri
View author publications
You can also search for this author in PubMed Google Scholar
Marina T. P. Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo R. Ciferri
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computing and Numerical Analysis, University of Cordoba, Campus Universitario de Rabanales, Einstein Building, 3rd floor, 14071, Cordoba, Spain
Nicolás García-Pedrajas
Dept. of Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
Francisco Herrera
School of Computing, University of the West of Scotland, PA1 2BE, Paisley, UK
Colin Fyfe
Dept. Computer Science and Artificial Intelligence, ETS de Ingenierias Informática y de Telecomunicación, University of Granada, 18071, Granada, Spain
José Manuel Benítez
Department of Computer Science, Texas State University-San Marcos, 601 University Drive, TX 78666-4616, San Marcos, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Matos, P.F., Lombardi, L.O., Pardo, T.A.S., Ciferri, C.D.A., Vieira, M.T.P., Ciferri, R.R. (2010). An Environment for Data Analysis in Biomedical Domain: Information Extraction for Decision Support Systems. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6096. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13022-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-13022-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13021-2
Online ISBN: 978-3-642-13022-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics