The Temple University Hospital Digital Pathology Corpus
- 48 Downloads
Pathology is a branch of medical science focused on the cause, origin, and nature of disease. A typical pathology laboratory workflow involves preparation of a tissue specimen on a glass slide using a stain designed to enhance imaging and analysis by a board-certified pathologist using a conventional light microscope. Digital pathology is the process of digitizing an analog image so that it can be manipulated by computer. Digitizing pathology slides into whole slide images provides many benefits including real-time, remote analysis of the specimen. Digital pathology is creating an enormous opportunity for the application of machine learning techniques to automate and accelerate the diagnostic process. Over ten million pathology slides are produced and interpreted by experts annually in the United States alone. This suggests that there is an ample supply of data to support machine learning research if it can be acquired and curated in a cost-effective manner.
In this chapter, we discuss the development of the world’s largest open source corpus of digitized pathology images and review the process being used to collect the digital images along with associated standards for annotation and archival. These images are currently being collected at Temple University Hospital and are facilitating the development of automated interpretation technology. This corpus, known as the Temple University Hospital Digital Pathology Corpus (TUHDP), is expected to reach one million images, or one petabyte of data, over the next decade. Though this corpus is currently being collected using a single digital scanner at one institution, we hope over time we can include data from other hospitals and scanning equipment. The initial phase of the project, which is described here, focuses on generating 100,000 images that will be released by December 2020. The first installment of this release, over 20,000 images, is now publicly available.
Performance of deep learning systems is heavily dependent on the breadth and quality of the data used. In this chapter, we also introduce some pilot experiments on classifying various types of images using a deep learning system that is based on a combination of convolutional neural networks and long short-term memory networks. We show that performance on relatively simple tasks, such as artifact classification, exceeds 95% sensitivity. We discuss several approaches to memory management and computational complexity issues for these ultra-high-resolution images. We demonstrate that the field of pathology is sufficiently rich to support the development of high-performance classification systems. These systems enable a new generation of decision support technology for pathologists. This directly addresses a future industry need for efficient workflows in response to the projected decline in the number of board-certified pathologists.
KeywordsDigital pathology Deep learning Big data Convolutional neural networks CNN Long short-term networks LSTM
This material is supported by the National Science Foundation under grant nos. CNS-1726188 and 1925494. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Opensource libraries that were used to develop the deep learning model presented in this chapter are: Shapely v1.6.4, OpenSlide v1.1.1, Abstract Syntax Library, OpenCV-Python v3.4.1, NumPy v1.14.2, PIL v4.2.1, TensorFlow v1.9.0, and Keras v2.2.4.
- 2.Rolls, G. (2018). An introduction to specimen preparation. Retrieved from https://www.leicabiosystems.com/pathologyleaders/an-introduction-to-specimen-preparation/.
- 3.Anderson, J. (2019). An introduction to routine and special staining. Retrieved from https://www.leicabiosystems.com/pathologyleaders/an-introduction-to-routine-and-special-staining/.
- 4.American Cancer Society. (2019). What happens to biopsy and cytology specimens? Retrieved August 19, 2019, from https://www.cancer.org/treatment/understanding-your-diagnosis/tests/testing-biopsy-and-cytology-specimens-for-cancer/what-happens-to-specimens.html.
- 5.Eiseman, E., & Haga, S. (2000). In E. Eiseman (Ed.) A handbook of human tissue sources: A national resource of human tissue samples (1st ed.). Washington, DC: Rand Publishing. Retrieved from https://www.rand.org/pubs/monograph_reports/MR954.html.
- 8.Jhala, N. (2017). Digital pathology: Advancing frontiers. In IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA. Retrieved from https://ieeexplore.ieee.org/document/8257013/.
- 10.Bongaerts, O., Clevers, C., Debets, M., Paffen, D., Senden, L., Rijks, K., et al. (2018). Conventional microscopical versus digital whole-slide imaging-based diagnosis of thin-layer cervical specimens: A validation study. Journal of Pathology Informatics, 9(1), 29–37. https://doi.org/10.4103/jpi.jpi_28_18.CrossRefGoogle Scholar
- 11.The Medical Futurist. (2018). The digital future of pathology. Retrieved August 19, 2019, from https://medicalfuturist.com/digital-future-pathology.
- 13.Leica Biosystems. (2019). Aperio AT2 – High volume, digital whole slide scanning. Retrieved from https://www.leicabiosystems.com/digital-pathology/scan/aperio-at2/.
- 14.Philips. (2019). Clinical digital pathology system. Retrieved August 19, 2019, from https://www.usa.philips.com/healthcare/resources/landing/philips-intellisite-pathology-solution.
- 16.Joint Photographic Experts Group. (2019). Overview of JPEG. Retrieved from https://jpeg.org/jpeg/.
- 17.Campbell, C., Mecca, N., Duong, T., Obeid, I., & Picone, J. (2018). Expanding an HPC cluster to support the computational demands of digital pathology. In I. Obeid & J. Picone (Eds.), IEEE Signal Processing in Medicine and Biology Symposium (pp. 1–2). Philadelphia, PA: IEEE. Retrieved from https://ieeexplore.ieee.org/document/8615614.Google Scholar
- 18.Mahar, J. H., Rosencrance, J. G., & Rasmussen, P. A. (2018). Telemedicine: Past, present, and future. Cleveland Clinic Journal of Medicine, 85(12), 938–942. Retrieved from https://www.mdedge.com/ccjm/article/189759/practice-management/telemedicine-past-present-and-future.CrossRefGoogle Scholar
- 21.Janowczyk, A., & Madabhushi, A. (2016). Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of Pathology Informatics, 7. Retrieved from http://www.jpathinformatics.org/text.asp?2016/7/1/29/186902.
- 29.Picone, J., Farkas, T., Obeid, I., & Persidsky, Y. (2017). MRI: High performance digital pathology using big data and machine learning. Major Research Instrumentation (MRI), Division of Computer and Network Systems, January 11, 2017. Retrieved from https://www.isip.piconepress.com/proposals/2017/nsf/mri/.
- 30.Harabagiu, S., Picone, J., & Moldovan, D. (2002). Voice activated question answering. In Proceedings of the International Conference on Computational Linguistics, Taipei, Taiwan (pp. 1–7). Retrieved from http://www.isip.piconepress.com/publications/conference_proceedings/2002/coling/vaqa/.
- 31.Capp, N., Campbell, C., Elseify, T., Obeid, I., & Picone, J. (2018). Optimizing EEG visualization through remote data retrieval. In IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PA (pp. 1–2). Retrieved from https://ieeexplore.ieee.org/document/8615613.
- 32.Picone, J., Obeid, I., & Harabagiu, S. (2018). Automated cohort retrieval from EEG medical records. In 26th Conference on Intelligent Systems for Molecular Biology, Chicago, IL (pp. 1–7). Retrieved from https://www.isip.piconepress.com/publications/conference_presentations/2018/ismb/cohort_retrieval/.
- 33.Ross, M. H., & Pawlina, W. (2019). Histology: A text and atlas: with correlated cell and molecular biology (8th ed.). Philadelphia, PA: Wolters Kluwer Health. Retrieved from https://www.lww.co.uk/9781975115364/histology-a-text-and-atlas/.Google Scholar
- 34.Gutman, D., Cobb, J., Somanna, D., Park, Y., Wang, F., Kurc, T., et al. (2013). Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data. Journal of the American Medical Informatics Association, 20(6), 1091–1098. https://doi.org/10.1136/amiajnl-2012-001469.CrossRefGoogle Scholar
- 35.Drissen, H. (2017). Philips and LabPON plan to create world’s largest pathology database of annotated tissue images for deep learning. Retrieved from https://www.philips.com/a-w/about/news/archive/standard/news/press/2017/20170306-philips-and-labpon-plan-to-create-worlds-largest-pathology-database-of-annotated-tissue-images-for-deep-learning.html.
- 36.Ferrell, S., von Weltin, E., Obeid, I., & Picone, J. (2018). Open source resources to advance EEG research. In IEEE Signal Processing in Medicine and Biology Symposium, Philadelphia, PA (pp. 1–3). Retrieved from https://ieeexplore.ieee.org/document/8615622.
- 37.Obeid, I., & Picone, J. (2018). The Temple University Hospital EEG Data Corpus. In Augmentation of brain function: Facts, fiction and controversy. Volume I: Brain-machine interfaces (1st ed., pp. 394–398). Lausanne, Switzerland: Frontiers Media S.A.. https://doi.org/10.3389/fnins.2016.001967.CrossRefGoogle Scholar
- 38.de Freitas, N., Reed, S., & Vinyals, O. (2017). Deep learning: Practice and trends. In Neural Information Processing Systems, Long Beach, CA. Retrieved from https://nips.cc/Conferences/2017/Schedule?showEvent=8730.
- 39.Golmohammadi, M., Shah, V., Obeid, I., & Picone, J. (2019). Deep learning approaches for automatic analysis of EEGs. In S.-M. Chan & W. Pedrycz (Eds.), Deep learning: Algorithms and applications (1st ed.). New York, NY: Springer. Retrieved from https://isip.piconepress.com/publications/book_sections/2019/springer/deep_learning/.Google Scholar
- 41.Golmohammadi, M., Ziyabari, S., Shah, V., Obeid, I., & Picone, J. (2018). Deep architectures for spatio-temporal modeling: Automated seizure detection in scalp EEGs. In Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL (pp. 1–6). https://doi.org/10.1109/ICMLA.2018.00118.
- 42.Saon, G., Sercu, T., Rennie, S., & Kuo, H.-K. J. (2016). The IBM 2016 English Conversational Telephone Speech Recognition System. In Proceedings of the Annual Conference of the International Speech Communication Association (Vol. 08–12–Sept, pp. 7–11). https://doi.org/10.21437/Interspeech.2016-1460.
- 44.Ghiasi, G., Lin, T.-Y., & Le, Q. V. (2018). DropBlock: A regularization method for convolutional networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 31 (pp. 10727–10737). Red Hook, NY: Curran Associates, Inc.. Retrieved from http://papers.nips.cc/paper/8271-dropblock-a-regularization-method-for-convolutional-networks.pdf.Google Scholar
- 45.Chen, Y., Kalantidis, Y., Li, J., Yan, S., & Feng, J. (2018). A^2-nets: Double attention networks. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems (Vol. 31, pp. 352–361). Red Hook, NY: Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf.Google Scholar
- 46.Cireşan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In International Conference on Medical Image Computing and Computer-assisted Intervention. Haspolat, Turkey: Signal Processing and Communications Applications Conference. https://doi.org/10.1109/SIU.2013.6531502.
- 47.Cruz-Roa, A., Basavanhally, A., Gonzalez, F., Gilmore, H., Feldman, M., Ganesan, S., et al. (2014). Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. In Medical Imaging 2014: Digital Pathology (pp. 1–15). https://doi.org/10.1117/12.2043872.
- 49.Sirinukunwattana, K., Raza, S. E. A., Tsang, Y. W., Snead, D. R. J., Cree, I. A., & Rajpoot, N. M. (2016). Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging, 35(5), 1196–1206. https://doi.org/10.1109/TMI.2016.2525803.CrossRefGoogle Scholar
- 50.Bejnordi, B. E., Zuidhof, G., Balkenhol, M., Hermsen, M., Bult, P., van Ginneken, B., et al. (2017). Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging, 4(4), 44504. https://doi.org/10.1117/1.JMI.4.4.044504.CrossRefGoogle Scholar
- 51.Wang, D., Khosla, A., Gargeya, R., Irshad, H., & Beck, A. H. (2016). Deep learning for identifying metastatic breast cancer. ArXiv Preprint ArXiv, 1606, 05718.Google Scholar
- 52.Obeid, I., & Picone, J. (2016). The Neural Engineering Data Consortium: Building community resources to advance research. Philadelphia, PA: Temple University. https://doi.org/https://www.isip.piconepress.com/publications/reports/2016/nsf/cri/.Google Scholar
- 53.Campbell, C., Mecca, N., Obeid, I., & Picone, J. (2017). The Neuronix HPC Cluster: Improving cluster management using free and open source software tools. In I. Obeid & J. Picone (Eds.), IEEE Signal Processing in Medicine and Biology Symposium (p. 1). Philadelphia, PA: IEEE. https://doi.org/10.1109/SPMB.2017.8257042.Google Scholar
- 55.Red Hat Inc. (2019). What is Gluster? Retrieved from https://docs.gluster.org/en/v3/AdministratorGuide/GlusterFS Introduction/.
- 56.Bonwick, J., Ahrens, M., Henson, V., Maybee, M., & Shellenbaum, M. (2003). The Zettabyte File System. In Proceedings of the 2nd Usenix Conference on File and Storage Technologies, San Francisco, CA (pp. 1–13). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184.3704&rep=rep1&type=pdf.
- 58.Clunie, D. (2019). DICOM whole slide imaging: Acquire, archive, view, annotate, download and transmit, Bangor, PA. Retrieved from https://www.dclunie.com/papers/HIMA_2017_DICOMWSI_Clunie.pdf.
- 59.Leica Biosystems. (2008). Digital slides and third-party data interchange (MAN-0069, Revision B), Wetzlar, Germany. Retrieved August 22, 2019, from http://web.archive.org/web/20120420105738/http://www.aperio.com/documents/api/Aperio_Digital_Slides_and_Third-party_data_interchange.pdf.
- 60.Leica Biosystems. (2018). Aperio ImageScope - Pathology slide viewing software. Retrieved from https://www.leicabiosystems.com/digital-pathology/manage/aperio-imagescope/.
- 62.Brzezinski, R. (2016). HIPAA privacy and security compliance - Simplified: Practical Guide for healthcare providers and managers 2016 Edition (3rd ed.). Seattle, WA: CreateSpace Independent Publishing Platform.Google Scholar
- 63.Epic Systems Corporation. (2019). EPIC outcomes. Retrieved from https://www.epic.com/.
- 64.The Cornell Law School. (2019). 42 CFR 493.1274 - Standard: cytology. Retrieved from https://www.law.cornell.edu/cfr/text/42/493.1274.
- 66.Armato, S. G., III, McLennan, G., Bidaut, L., McNitt-Gray, M. F., Meyer, C. R., Reeves, A. P., et al. (2011). The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Medical Physics, 38(2), 915–931. https://doi.org/10.1118/1.3528204.CrossRefGoogle Scholar
- 67.Roux, L., & Capron, F. (2014). MITOS atypia 2014 grand challenge. Retrieved April 22, 2019, from https://mitos-atypia-14.grand-challenge.org/.
- 68.Ioffe, S., & Szegedy, C. (2015). Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML), Lille, France (pp. 448–456). Retrieved from https://arxiv.org/abs/1502.03167.
- 69.Ba, J., & Kingma, D. (2014). Adam: A method for stochastic optimization. In International Conference on Learning Representations, Banff, Canada (pp. 1–15). https://doi.org/arXiv:1412.6980.
- 70.Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii (pp. 1–8). https://doi.org/10.1109/CVPR.2017.195.
- 71.Fukunaga, K. (1990). Introduction to statistical pattern recognition. Computer science and scientific computing (2nd ed.). San Diego, CA: Academic Press. Retrieved from https://www.elsevier.com/books/introduction-to-statistical-pattern-recognition/fukunaga/978-0-08-047865-4.zbMATHGoogle Scholar
- 72.Shah, V., von Weltin, E., Ahsan, T., Obeid, I., & Picone, J. (2019). On the use of non-experts for generation of high-quality annotations of seizure events. Journal of Clinical Neurophysiology (in review). Retrieved from https://www.isip.piconepress.com/publications/unpublished/journals/2019/elsevier_cn/ira.
- 73.CDC. (1988). Clinical laboratory improvement amendments. Retrieved from https://wwwn.cdc.gov/clia/Regulatory/default.aspx.