Skip to main content

Case Study: Chemistry

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

We describe how we developed and applied two annotation schemes for journal articles in the field of chemistry. The first involves the criteria for identifying a chemical named entity and assigning it a “type”, roughly speaking deciding whether it was a small molecular species, a process that a small molecular species might be involved in, an enzyme, or an adjective or a prefix. The second involves assigning these chemical named entities a “subtype” which describes the reference, for example whether “imidazole” refers to the imidazole molecule itself, the imidazole motif within a larger molecule, or any of a family of molecules bearing the imidazole motif. We also describe how these guidelines and the resulting corpora and software have subsequently been used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://oscar3.sourceforge.net/ and https://bitbucket.org/wwmm/oscar4/.

References

  1. Alias-i. Lingpipe 4.10 (2008). Accessed 11 Feb 2015

    Google Scholar 

  2. Batchelor, C.R., Corbett, P.T.: Semantic enrichment of journal articles using chemical named entity recognition. In: Proceedings of the ACL 2007 Demo and Poster Sessions, pp. 45–48, Prague, Czech Republic (2007)

    Google Scholar 

  3. Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9, S4 (2008). doi:10.1186/1471-2105-9-S11-S4

    Article  Google Scholar 

  4. Corbett, P., Murray-Rust, P.: High-throughput identification of chemistry in life science texts. Lect. Notes Comput. Sci. 4216, 107–118 (2006)

    Article  Google Scholar 

  5. Corbett, P., Batchelor, C., Teufel, S.: Annotation of chemical named entities. In: BioNLP 2007: Biological, Translational and Clinical Language Processing, pp. 57–64. Czech Republic, Prague (2007)

    Google Scholar 

  6. Corbett, P., Batchelor, C., Copestake, A.: Pyridines, pyridine and pyridine rings. In: Proceedings of Building and Evaluating Resources for Biomedical Text Mining at LREC2008, Marrakech, Morocco (2008)

    Google Scholar 

  7. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9, e1002854 (2013)

    Article  Google Scholar 

  8. Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Guedj, M., Ashburner, M.: ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36, D344–D350 (2008)

    Article  Google Scholar 

  9. Gaizauskas, R., Demetriou, G., Artymiuk, P.J., Willett, P.: Protein structures and information extraction from biological texts: the PASTA system. Bioinformatics 19, 135–143 (2003)

    Article  Google Scholar 

  10. Jessop, D.M., Adams, S.F., Willighagen, E.I., Hawizy, L., Murray-Rust, P.: OSCAR4: a flexible architecture for chemical text-mining. J. Cheminformatics 3, 41 (2011)

    Article  Google Scholar 

  11. Kidd, R.: Changing the face of scientific publishing. Integr. Biol. 1, 293 (2009)

    Article  Google Scholar 

  12. Krallinger, M., Leitner, F., Rabal, O., Vazquez, M., Oyarzabal, J., Valencia, A.: CHEMDNER: the drugs and chemical names extraction challenge. J. Cheminformatics 7(Suppl 1), S1 (2015)

    Article  Google Scholar 

  13. Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Zhiyong, L., Leaman, R., Yanan, L., Ji, D., Lowe, D., Sayle, R., Batista-Navarro, R., Rak, R., Huber, T., Rocktaschel, T., Matos, S., Campos, D., Tang, B., Hua, X., Munkhdalai, T., Ryu, K., Ramanan, S.V., Nathan, S., Zitnik, S., Bajec, M., Weber, L., Irmer, M., Akhondi, S., Kors, J., Xu, S., An, X., Sikdar, U., Ekbal, A., Yoshioka, M., Dieb, T., Choi, M., Verspoor, K., Khabsa, M., Giles, C., Liu, H., Ravikumar, K., Lamurias, A., Couto, F., Dai, H.-J., Tsai, R., Ata, C., Can, T., Usie, A., Alves, R., Segura-Bedmar, I., Martinez, P., Oyarzabal, J., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminformatics 7(Suppl 1), S2 (2015)

    Article  Google Scholar 

  14. Kulick, S., Bies, A., Liberman, M., Mandel, M., McDonald, R., Palmer, M., Schein, A., Ungar, L., Winters, S., White, P.: Integrated annotation for biomedical information extraction. In: HLT-NAACL 2004 Workshop: Biolink 2004, Linking Biological Literature, Ontologies and Databases, pp. 61–68 (2004)

    Google Scholar 

  15. Lowe, D.M., Corbett, P.T., Murray-Rust, P., Glen, R.C.: Chemical name to structure: opsin, an open source solution. J. Chem. Inf. Modell. 53, 739–753 (2011)

    Article  Google Scholar 

  16. Ohta, T., Tateisi, Y., Kim, J.-D., Lee, S.-Z., Tsujii, J.: Genia corpus: an annotated research abstract corpus in molecular biology domain. In: Proceedings of the Human Language Technology Conference (HLT 2002), San Diego, CA, USA (2002)

    Google Scholar 

  17. Rupp, C.J., Copestake, A., Corbett, P., Murray-Rust P., Siddharthan, A., Teufel, S., Waldron, B.: Language resources and chemical informatics. In: Proceedings of 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, Morocco (2008)

    Google Scholar 

  18. Savage, A.: Changes in mesh data structure. NLM Tech Bull. p. e2 (2000)

    Google Scholar 

  19. Vander Stouw, G.G., Naznitsky, I., Rush, J.E.: Procedures for converting systematic names of organic compounds into atom-bond connection tables. J. Chem. Doc. 7, 165–169 (1967)

    Article  Google Scholar 

  20. Vander Stouw, G.G., Elliott, P.M., Isenberg, A.C.: Automated conversion of chemical substance names to atom-bond connection tables. J. Chem. Doc. 14, 185–193 (1974)

    Article  Google Scholar 

  21. Teufel, S., Elhadad, N.: Collection and linguistic processing of a large-scale corpus of medical articles. In: Proceedings of the Third LREC (LREC2002), pp. 1214–1219 (2002)

    Google Scholar 

  22. Zhmurov, P.A., Sukhorukov, AYu., Chupakhin, V.I., Khomutova, Y.V., Ioffe, S.L., Tartakovsky, V.A.: Synthesis of PDE IV inhibitors: first asymmetric synthesis of two of GlaxoSmithKline’s highly potent Rolipram analogues. Org. Biomol. Chem. 11, 8082–8091 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

We thank the UK eScience Programme and EPSRC (EP/C010035/1) for funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Colin Batchelor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Batchelor, C., Corbett, P., Teufel, S. (2017). Case Study: Chemistry. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_33

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics