Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks

Yoon, Hong-Jun; Qiu, John X.; Christian, J. Blair; Hinkle, Jacob; Alamudun, Folami; Tourassi, Georgia

doi:10.1007/978-3-030-16841-4_9

Hong-Jun Yoon⁷,
John X. Qiu⁷,
J. Blair Christian⁷,
Jacob Hinkle⁷,
Folami Alamudun⁷ &
…
Georgia Tourassi⁷

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 1))

Included in the following conference series:

INNS Big Data and Deep Learning conference

1044 Accesses

Abstract

To trust model predictions, it is important to ensure new data scored by the model comes from the same population used for model training. If the model is used to score new data different than the model’s training data, then predictions and model performance metrics cannot be trusted. Identifying and excluding these anomalous data points is an important task when using models in the real world. Traditional machine learning algorithms and classifiers don’t have the capability to abstain in this case. Here we propose a data-novelty detection algorithm for the Convolutional Neural Network classifier, yielding a rejection score for each new data point scored. It is a post-modeling procedure which examines the distribution of convolution filters to determine if the prediction should be trusted. We apply this algorithm to an information extraction model for a natural language text corpus. We evaluated the algorithm performance using a primary cancer site classification model applied to cancer pathology reports. Results demonstrate that the algorithm is an effective way to exclude cancer pathology reports from model scoring when they do not contain the expected information necessary to accurately classify the primary cancer type.

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
Google Scholar
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)
Chollet, F., et al.: Keras (2015). https://github.com/fchollet/keras
Deng, Y., Bao, F., Deng, X., Wang, R., Kong, Y., Dai, Q.: Deep and structured robust information theoretic learning for image analysis. IEEE Trans. Image Process. 25(9), 4209–4221 (2016)
MathSciNet MATH Google Scholar
Gelman, A., Stern, H.S., Carlin, J.B., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC, Berkeley (2013)
MATH Google Scholar
Goodman, L.A.: On the exact variance of products. J. Am. Stat. Assoc. 55(292), 708–713 (1960)
Article MathSciNet Google Scholar
Goroshin, R., Mathieu, M.F., LeCun, Y.: Learning to linearize under uncertainty. In: Advances in Neural Information Processing Systems, pp. 1234–1242 (2015)
Google Scholar
Kavuluru, R., Hands, I., Durbin, E.B., Witt, L.: Automatic extraction of ICD-O-3 primary sites from cancer pathology reports. In: AMIA Summits on Translational Science Proceedings 2013, p. 112 (2013)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Louis, D.N., Ohgaki, H., Wiestler, O.D., Cavenee, W.K., Burger, P.C., Jouvet, A., Scheithauer, B.W., Kleihues, P.: The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathol. 114(2), 97–109 (2007)
Article Google Scholar
Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearb. Med. Inform. 17(01), 128–144 (2008)
Article Google Scholar
Nguyen, A., Moore, J., Lawley, M., Hansen, D., Colquist, S.: Automatic extraction of cancer characteristics from free-text pathology reports for cancer notifications. Stud. Health Technol. Inform. 168, 117–124 (2011)
Google Scholar
Papadopoulos, H.: Inductive conformal prediction: theory and application to neural networks. In: Tools in Artificial Intelligence. InTech (2008)
Google Scholar
Qiu, J.X., Yoon, H.J., Fearn, P.A., Tourassi, G.D.: Deep learning for automated extraction of primary sites from cancer pathology reports. IEEE J. Biomed. Health Inform. 22(1), 244–251 (2018)
Article Google Scholar
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn. Res. 9, 371–421 (2008)
MathSciNet MATH Google Scholar
Smith, R.C.: Uncertainty Quantification: Theory, Implementation, and Applications, vol. 12. SIAM, Philadelphia (2013)
Google Scholar
American Cancer Society: Cancer facts & figures. The Society (2018)
Google Scholar
Yoon, H.J., Robinson, S., Christian, J.B., Qiu, J.X., Tourassi, G.D.: Filter pruning of convolutional neural networks for text classification: a case study of cancer pathology report comprehension. In: 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 345–348. IEEE (2018)
Google Scholar

Download references

Acknowledgment

This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.

The authors wish to thank Valentina Petkov of the Surveillance Research Program from the National Cancer Institute and the SEER registry at Connecticut, Hawaii, Kentucky, New Mexico and Seattle for the pathology reports used in this investigation.

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S., Department of Energy under Contract No. DE-AC05-00OR22725.

Author information

Authors and Affiliations

Biomedical Sciences, Engineering and Computing Group, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Hong-Jun Yoon, John X. Qiu, J. Blair Christian, Jacob Hinkle, Folami Alamudun & Georgia Tourassi

Authors

Hong-Jun Yoon
View author publications
You can also search for this author in PubMed Google Scholar
John X. Qiu
View author publications
You can also search for this author in PubMed Google Scholar
J. Blair Christian
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Hinkle
View author publications
You can also search for this author in PubMed Google Scholar
Folami Alamudun
View author publications
You can also search for this author in PubMed Google Scholar
Georgia Tourassi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Jun Yoon .

Editor information

Editors and Affiliations

Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Luca Oneto
Department of Mathematics, University of Padova, Padua, Italy
Nicolò Navarin
Department of Mathematics, University of Padova, Padua, Italy
Alessandro Sperduti
Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Davide Anguita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoon, HJ., Qiu, J.X., Christian, J.B., Hinkle, J., Alamudun, F., Tourassi, G. (2020). Selective Information Extraction Strategies for Cancer Pathology Reports with Convolutional Neural Networks. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Advances in Big Data and Deep Learning. INNSBDDL 2019. Proceedings of the International Neural Networks Society, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-16841-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-16841-4_9
Published: 03 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16840-7
Online ISBN: 978-3-030-16841-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics