Abstract
Pathology is a branch of medical science focused on the cause, origin, and nature of disease. A typical pathology laboratory workflow involves preparation of a tissue specimen on a glass slide using a stain designed to enhance imaging and analysis by a board-certified pathologist using a conventional light microscope. Digital pathology is the process of digitizing an analog image so that it can be manipulated by computer. Digitizing pathology slides into whole slide images provides many benefits including real-time, remote analysis of the specimen. Digital pathology is creating an enormous opportunity for the application of machine learning techniques to automate and accelerate the diagnostic process. Over ten million pathology slides are produced and interpreted by experts annually in the United States alone. This suggests that there is an ample supply of data to support machine learning research if it can be acquired and curated in a cost-effective manner.
In this chapter, we discuss the development of the world’s largest open source corpus of digitized pathology images and review the process being used to collect the digital images along with associated standards for annotation and archival. These images are currently being collected at Temple University Hospital and are facilitating the development of automated interpretation technology. This corpus, known as the Temple University Hospital Digital Pathology Corpus (TUHDP), is expected to reach one million images, or one petabyte of data, over the next decade. Though this corpus is currently being collected using a single digital scanner at one institution, we hope over time we can include data from other hospitals and scanning equipment. The initial phase of the project, which is described here, focuses on generating 100,000 images that will be released by December 2020. The first installment of this release, over 20,000 images, is now publicly available.
Performance of deep learning systems is heavily dependent on the breadth and quality of the data used. In this chapter, we also introduce some pilot experiments on classifying various types of images using a deep learning system that is based on a combination of convolutional neural networks and long short-term memory networks. We show that performance on relatively simple tasks, such as artifact classification, exceeds 95% sensitivity. We discuss several approaches to memory management and computational complexity issues for these ultra-high-resolution images. We demonstrate that the field of pathology is sufficiently rich to support the development of high-performance classification systems. These systems enable a new generation of decision support technology for pathologists. This directly addresses a future industry need for efficient workflows in response to the projected decline in the number of board-certified pathologists.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsChange history
30 October 2022
This book was inadvertently published with sensitive patient information in Figure 3.9 of this chapter for which we did not have permission to display. We have now revised the figure wherein the date of birth of this patient has been whited out. https://doi.org/10.1007/978-3-030-36844-9_3
References
Sattar, H. (2017). Fundamentals of pathology: Medical course and step 1 review (8th ed.). Chicago, IL: Pathoma, LLC.. Retrieved from https://www.pathoma.com/fundamentals-of-pathology.
Rolls, G. (2018). An introduction to specimen preparation. Retrieved from https://www.leicabiosystems.com/pathologyleaders/an-introduction-to-specimen-preparation/.
Anderson, J. (2019). An introduction to routine and special staining. Retrieved from https://www.leicabiosystems.com/pathologyleaders/an-introduction-to-routine-and-special-staining/.
American Cancer Society. (2019). What happens to biopsy and cytology specimens? Retrieved August 19, 2019, from https://www.cancer.org/treatment/understanding-your-diagnosis/tests/testing-biopsy-and-cytology-specimens-for-cancer/what-happens-to-specimens.html.
Eiseman, E., & Haga, S. (2000). In E. Eiseman (Ed.) A handbook of human tissue sources: A national resource of human tissue samples (1st ed.). Washington, DC: Rand Publishing. Retrieved from https://www.rand.org/pubs/monograph_reports/MR954.html.
Kapila, S. N., Boaz, K., & Natarajan, S. (2016). The post-analytical phase of histopathology practice: Storage, retention and use of human tissue specimens. International Journal of Applied & Basic Medical Research, 6(1), 3–7. https://doi.org/10.4103/2229-516X.173982.
Hallworth, M. J. (2011). The ‘70% claim’: what is the evidence base? Annals of Clinical Biochemistry: International Journal of Laboratory Medicine, 48(6), 487–488. https://doi.org/10.1258/acb.2011.011177.
Jhala, N. (2017). Digital pathology: Advancing frontiers. In IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA. Retrieved from https://ieeexplore.ieee.org/document/8257013/.
Barry, M. J., Kaufman, D. S., & Wu, C.-L. (2008). Case 15-2008 :2008: A 55-year-old man with an elevated prostate-specific antigen level and early-stage prostate cancer. The New England Journal of Medicine, 358(20), 2161–2168. https://doi.org/10.1056/NEJMcpc0707057.
Bongaerts, O., Clevers, C., Debets, M., Paffen, D., Senden, L., Rijks, K., et al. (2018). Conventional microscopical versus digital whole-slide imaging-based diagnosis of thin-layer cervical specimens: A validation study. Journal of Pathology Informatics, 9(1), 29–37. https://doi.org/10.4103/jpi.jpi_28_18.
The Medical Futurist. (2018). The digital future of pathology. Retrieved August 19, 2019, from https://medicalfuturist.com/digital-future-pathology.
Stathonikos, N., Veta, M., Huisman, A., & van Diest, P. J. (2013). Going fully digital: Perspective of a Dutch academic pathology lab. Journal of Pathology Informatics, 4, 15. https://doi.org/10.4103/2153-3539.114206.
Leica Biosystems. (2019). Aperio AT2 – High volume, digital whole slide scanning. Retrieved from https://www.leicabiosystems.com/digital-pathology/scan/aperio-at2/.
Philips. (2019). Clinical digital pathology system. Retrieved August 19, 2019, from https://www.usa.philips.com/healthcare/resources/landing/philips-intellisite-pathology-solution.
Hanna, M. G., Monaco, S. E., Cuda, J., Xing, J., Ahmed, I., & Pantanowitz, L. (2017). Comparison of glass slides and various digital-slide modalities for cytopathology screening and interpretation. Cancer Cytopathology, 125(9), 701–709. https://doi.org/10.1002/cncy.21880.
Joint Photographic Experts Group. (2019). Overview of JPEG. Retrieved from https://jpeg.org/jpeg/.
Campbell, C., Mecca, N., Duong, T., Obeid, I., & Picone, J. (2018). Expanding an HPC cluster to support the computational demands of digital pathology. In I. Obeid & J. Picone (Eds.), IEEE Signal Processing in Medicine and Biology Symposium (pp. 1–2). Philadelphia, PA: IEEE. Retrieved from https://ieeexplore.ieee.org/document/8615614.
Mahar, J. H., Rosencrance, J. G., & Rasmussen, P. A. (2018). Telemedicine: Past, present, and future. Cleveland Clinic Journal of Medicine, 85(12), 938–942. Retrieved from https://www.mdedge.com/ccjm/article/189759/practice-management/telemedicine-past-present-and-future.
Beam, A., & Kohane, I. S. (2016). Translating artificial intelligence into clinical care. JAMA, 316(22), 2368–2369. https://doi.org/10.1001/jama.2016.17217.
Hamilton, P. W., Bankhead, P., Wang, Y., Hutchinson, R., Kieran, D., McArt, D. G., et al. (2014). Digital pathology and image analysis in tissue biomarker research. Methods, 70(1), 59–73. https://doi.org/10.1016/j.ymeth.2014.06.015.
Janowczyk, A., & Madabhushi, A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics, 7. Retrieved from http://www.jpathinformatics.org/text.asp?2016/7/1/29/186902.
Bauer, D. R., Otter, M., & Chafin, D. R. (2018). A new paradigm for tissue diagnostics: Tools and techniques to standardize tissue collection, transport, and fixation. Current Pathobiology Reports, 6(2), 135–143. https://doi.org/10.1007/s40139-018-0170-1.
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2014). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005.
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., et al. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42(December 2012), 60–88. https://doi.org/10.1016/j.media.2017.07.005.
Barker, J., Hoogi, A., Depeursinge, A., & Rubin, D. (2016). Automated classification of brain tumor type in whole-slide digital pathology images using local representative tiles. Medical Image Analysis, 30(1), 60–71. https://doi.org/10.1016/j.media.2015.12.002.
Gleason, D. F. (1992). Histologic grading of prostate cancer: a perspective. Human Pathology, 23(3), 273–279. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/1555838.
Gordetsky, J., & Epstein, J. (2016). Grading of prostatic adenocarcinoma: current state and prognostic implications. Diagnostic Pathology, 11, 25. https://doi.org/10.1186/s13000-016-0478-2.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239.
Picone, J., Farkas, T., Obeid, I., & Persidsky, Y. (2017). MRI: High performance digital pathology using big data and machine learning. Major Research Instrumentation (MRI), Division of Computer and Network Systems, January 11, 2017. Retrieved from https://www.isip.piconepress.com/proposals/2017/nsf/mri/.
Harabagiu, S., Picone, J., & Moldovan, D. (2002). Voice activated question answering. In Proceedings of the International Conference on Computational Linguistics, Taipei, Taiwan (pp. 1–7). Retrieved from http://www.isip.piconepress.com/publications/conference_proceedings/2002/coling/vaqa/.
Capp, N., Campbell, C., Elseify, T., Obeid, I., & Picone, J. (2018). Optimizing EEG visualization through remote data retrieval. In IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PA (pp. 1–2). Retrieved from https://ieeexplore.ieee.org/document/8615613.
Picone, J., Obeid, I., & Harabagiu, S. (2018). Automated cohort retrieval from EEG medical records. In 26th Conference on Intelligent Systems for Molecular Biology, Chicago, IL (pp. 1–7). Retrieved from https://www.isip.piconepress.com/publications/conference_presentations/2018/ismb/cohort_retrieval/.
Ross, M. H., & Pawlina, W. (2019). Histology: A text and atlas: with correlated cell and molecular biology (8th ed.). Philadelphia, PA: Wolters Kluwer Health. Retrieved from https://www.lww.co.uk/9781975115364/histology-a-text-and-atlas/.
Gutman, D., Cobb, J., Somanna, D., Park, Y., Wang, F., Kurc, T., et al. (2013). Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data. Journal of the American Medical Informatics Association, 20(6), 1091–1098. https://doi.org/10.1136/amiajnl-2012-001469.
Drissen, H. (2017). Philips and LabPON plan to create world’s largest pathology database of annotated tissue images for deep learning. Retrieved from https://www.philips.com/a-w/about/news/archive/standard/news/press/2017/20170306-philips-and-labpon-plan-to-create-worlds-largest-pathology-database-of-annotated-tissue-images-for-deep-learning.html.
Ferrell, S., von Weltin, E., Obeid, I., & Picone, J. (2018). Open source resources to advance EEG research. In IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PA (pp. 1–3). Retrieved from https://ieeexplore.ieee.org/document/8615622.
Obeid, I., & Picone, J. (2018). The Temple University Hospital EEG Data Corpus. In Augmentation of brain function: Facts, fiction and controversy. Volume I: Brain-machine interfaces (1st ed., pp. 394–398). Lausanne, Switzerland: Frontiers Media S.A.. https://doi.org/10.3389/fnins.2016.001967.
de Freitas, N., Reed, S., & Vinyals, O. (2017). Deep learning: Practice and trends. In Neural Information Processing Systems, Long Beach, CA. Retrieved from https://nips.cc/Conferences/2017/Schedule?showEvent=8730.
Golmohammadi, M., Shah, V., Obeid, I., & Picone, J. (2019). Deep learning approaches for automatic analysis of EEGs. In S.-M. Chan & W. Pedrycz (Eds.), Deep learning: Algorithms and applications (1st ed.). New York, NY: Springer. Retrieved from https://isip.piconepress.com/publications/book_sections/2019/springer/deep_learning/.
LeCun, Y., & Bengio, Y. (1998). Convolutional networks for images, speech, and time series. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (pp. 255–258). Cambridge, MA: MIT Press. Retrieved from http://dl.acm.org/citation.cfm?id=303568.303704.
Golmohammadi, M., Ziyabari, S., Shah, V., Obeid, I., & Picone, J. (2018). Deep architectures for spatio-temporal modeling: Automated seizure detection in scalp EEGs. In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL (pp. 1–6). https://doi.org/10.1109/ICMLA.2018.00118.
Saon, G., Sercu, T., Rennie, S., & Kuo, H.-K. J. (2016). The IBM 2016 English Conversational Telephone Speech Recognition System. In Proceedings of the Annual Conference of the International Speech Communication Association (Vol. 08–12–Sept, pp. 7–11). https://doi.org/10.21437/Interspeech.2016-1460.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA (pp. 1–14). https://doi.org/10.1016/j.infsof.2008.09.005.
Ghiasi, G., Lin, T.-Y., & Le, Q. V. (2018). DropBlock: A regularization method for convolutional networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 31 (pp. 10727–10737). Red Hook, NY: Curran Associates, Inc.. Retrieved from http://papers.nips.cc/paper/8271-dropblock-a-regularization-method-for-convolutional-networks.pdf.
Chen, Y., Kalantidis, Y., Li, J., Yan, S., & Feng, J. (2018). A^2-nets: Double attention networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31, pp. 352–361). Red Hook, NY: Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf.
Cireşan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention. Haspolat, Turkey: Signal Processing and Communications Applications Conference. https://doi.org/10.1109/SIU.2013.6531502.
Cruz-Roa, A., Basavanhally, A., Gonzalez, F., Gilmore, H., Feldman, M., Ganesan, S., et al. (2014). Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In Medical Imaging 2014: Digital Pathology (pp. 1–15). https://doi.org/10.1117/12.2043872.
Hua, K. L., Hsu, C. H., Hidayati, S. C., Cheng, W. H., & Chen, Y. J. (2015). Computer-aided classification of lung nodules on computed tomography images via deep learning technique. OncoTargets and Therapy, 8, 2015–2022. https://doi.org/10.2147/OTT.S80733.
Sirinukunwattana, K., Raza, S. E. A., Tsang, Y. W., Snead, D. R. J., Cree, I. A., & Rajpoot, N. M. (2016). Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging, 35(5), 1196–1206. https://doi.org/10.1109/TMI.2016.2525803.
Bejnordi, B. E., Zuidhof, G., Balkenhol, M., Hermsen, M., Bult, P., van Ginneken, B., et al. (2017). Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging, 4(4), 44504. https://doi.org/10.1117/1.JMI.4.4.044504.
Wang, D., Khosla, A., Gargeya, R., Irshad, H., & Beck, A. H. (2016). Deep learning for identifying metastatic breast cancer. ArXiv Preprint ArXiv, 1606, 05718.
Obeid, I., & Picone, J. (2016). The Neural Engineering Data Consortium: Building community resources to advance research. Philadelphia, PA: Temple University. https://doi.org/https://www.isip.piconepress.com/publications/reports/2016/nsf/cri/.
Campbell, C., Mecca, N., Obeid, I., & Picone, J. (2017). The Neuronix HPC Cluster: Improving cluster management using free and open source software tools. In I. Obeid & J. Picone (Eds.), IEEE Signal Processing in Medicine and Biology Symposium (p. 1). Philadelphia, PA: IEEE. https://doi.org/10.1109/SPMB.2017.8257042.
Yoo, A. B., Jette, M. A., & Grondona, M. (2003). SLURM: Simple Linux utility for resource management. In D. Feitelson, L. Rudolph, & U. Schwiegelshohn (Eds.), Job scheduling strategies for parallel processing (pp. 44–60). Berlin: Springer.
Red Hat Inc. (2019). What is Gluster? Retrieved from https://docs.gluster.org/en/v3/AdministratorGuide/GlusterFS Introduction/.
Bonwick, J., Ahrens, M., Henson, V., Maybee, M., & Shellenbaum, M. (2003). The Zettabyte File System. In Proceedings of the 2nd Usenix Conference on File and Storage Technologies, San Francisco, CA (pp. 1–13). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184.3704&rep=rep1&type=pdf.
Satyanarayanan, M., Goode, A., Gilbert, B., Harkes, J., & Jukic, D. (2013). OpenSlide: A vendor-neutral software foundation for digital pathology. Journal of Pathology Informatics, 4(1), 27. https://doi.org/10.4103/2153-3539.119005.
Clunie, D. (2019). DICOM whole slide imaging: Acquire, archive, view, annotate, download and transmit, Bangor, PA. Retrieved from https://www.dclunie.com/papers/HIMA_2017_DICOMWSI_Clunie.pdf.
Leica Biosystems. (2008). Digital slides and third-party data interchange (MAN-0069, Revision B), Wetzlar, Germany. Retrieved August 22, 2019, from http://web.archive.org/web/20120420105738/http://www.aperio.com/documents/api/Aperio_Digital_Slides_and_Third-party_data_interchange.pdf.
Leica Biosystems. (2018). Aperio ImageScope - Pathology slide viewing software. Retrieved from https://www.leicabiosystems.com/digital-pathology/manage/aperio-imagescope/.
Rojo, M. G., Garcia, G. B., Mateos, C. P., Garcia, J. G., & Vicente, M. C. (2006). Critical comparison of 31 commercially available digital slide systems in pathology. International Journal of Surgical Pathology, 14(4), 285–305. https://doi.org/10.1177/1066896906292274.
Brzezinski, R. (2016). HIPAA privacy and security compliance - Simplified: Practical Guide for healthcare providers and managers 2016 Edition (3rd ed.). Seattle, WA: CreateSpace Independent Publishing Platform.
Epic Systems Corporation. (2019). EPIC outcomes. Retrieved from https://www.epic.com/.
The Cornell Law School. (2019). 42 CFR 493.1274 - Standard: cytology. Retrieved from https://www.law.cornell.edu/cfr/text/42/493.1274.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
Armato, S. G., III, McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer, C. R., Reeves, A. P., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915–931. https://doi.org/10.1118/1.3528204.
Roux, L., & Capron, F. (2014). MITOS atypia 2014 grand challenge. Retrieved April 22, 2019, from https://mitos-atypia-14.grand-challenge.org/.
Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), Lille, France (pp. 448–456). Retrieved from https://arxiv.org/abs/1502.03167.
Ba, J., & Kingma, D. (2014). Adam: A method for stochastic optimization. In International Conference on Learning Representations, Banff, Canada (pp. 1–15). https://doi.org/arXiv:1412.6980.
Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (pp. 1–8). https://doi.org/10.1109/CVPR.2017.195.
Fukunaga, K. (1990). Introduction to statistical pattern recognition. Computer science and scientific computing (2nd ed.). San Diego, CA: Academic Press. Retrieved from https://www.elsevier.com/books/introduction-to-statistical-pattern-recognition/fukunaga/978-0-08-047865-4.
Shah, V., von Weltin, E., Ahsan, T., Obeid, I., & Picone, J. (2019). On the use of non-experts for generation of high-quality annotations of seizure events. Journal of Clinical Neurophysiology (in review). Retrieved from https://www.isip.piconepress.com/publications/unpublished/journals/2019/elsevier_cn/ira.
CDC. (1988). Clinical laboratory improvement amendments. Retrieved from https://wwwn.cdc.gov/clia/Regulatory/default.aspx.
Acknowledgments
This material is supported by the National Science Foundation under grant nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Opensource libraries that were used to develop the deep learning model presented in this chapter are: Shapely v1.6.4, OpenSlide v1.1.1, Abstract Syntax Library, OpenCV-Python v3.4.1, NumPy v1.14.2, PIL v4.2.1, TensorFlow v1.9.0, and Keras v2.2.4.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Shawki, N. et al. (2020). The Temple University Hospital Digital Pathology Corpus. In: Obeid, I., Selesnick, I., Picone, J. (eds) Signal Processing in Medicine and Biology. Springer, Cham. https://doi.org/10.1007/978-3-030-36844-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-36844-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36843-2
Online ISBN: 978-3-030-36844-9
eBook Packages: EngineeringEngineering (R0)