Skip to main content

Cloud-Based Whole Slide Image Analysis Using MapReduce

  • Conference paper
  • First Online:
Data Management and Analytics for Medicine and Healthcare (DMAH 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10186))

Abstract

Systematic analysis of high resolution whole slide images enables more effective diagnosis, prognosis and prediction of cancer and other important diseases. Due to the enormous sizes and dimensions of whole slide images, the analysis requires extensive computing resources which are not commonly available. Images have to be divided into smaller regions for processing due to computer memory limitations, which will lead to inaccurate results due to the ignorance of boundary crossing objects. In this paper, we propose a highly scalable and cost effective MapReduce based image analysis framework for whole slide image processing, and provide a cloud based implementation. The framework takes a grid-based overlapping partitioning scheme, and provides parallelization of image segmentation based on MapReduce. It provides graceful handling of boundary objects with a highly efficient spatial indexing based matching method, thus avoiding loss of accuracy due to partitioning. We demonstrate that the system achieves high scalability and is cost-effective – our experiments demonstrate that it costs less than fifteen cents to analyze one image on average using Amazon Elastic MapReduce.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://hadoop.apache.org.

  2. 2.

    http://aws.amazon.com/s3/.

  3. 3.

    http://opencv.org.

  4. 4.

    http://www.opengeospatial.org/standards/sfs.

  5. 5.

    http://www.boost.org.

  6. 6.

    http://www.angusj.com/delphi/clipper.php.

  7. 7.

    https://tcga-data.nci.nih.gov/.

References

  1. Kong, J., Cooper, L.A.D., Wang, F., Teodoro, G., Scarpace, L., Mikkelsen, T., Schniederjan, M.J., Moreno, S., Saltz, J.H., Brat, D.J.: Machine-based morphologic analysis of glioblastoma using whole-slide pathology images uncovers clinically relevant molecular correlates. PLoS One 8(11), e81049 (2013)

    Article  Google Scholar 

  2. Cooper, L.A.D., Kong, J., Gutman, D.A., Wang, F., Gao, J., Appin, C., Cholleti, S., Pan, T., Sharma, A., Scarpace, L., Mikkelsen, T., Kurc, T.M., Moreno, C., Brat, D.J., Saltz, J.H.: Integrated morphologic analysis for the identification and characterization of disease subtypes. J. Am. Med. Inform. Assoc. 19(2), 317–323 (2012)

    Article  Google Scholar 

  3. Foran, D.J., Yang, L., Hu, J., Goodell, L.A., Reise, M., Wang, F., Kurc, T., Pan, T., Sharma, A., Saltz, H.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)

    Google Scholar 

  4. Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A.D., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on cpu-gpu cluster platforms. In: IPDPS, pp. 103–114, May 2013

    Google Scholar 

  5. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  6. Aji, A., Wang, F., Saltz, J.H.: Towards building a high performance spatial query system for large scale medical imaging data. In: SIGSPATIAL GIS, pp. 309–318. ACM (2012)

    Google Scholar 

  7. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J.H.: Hadoop-GIS: a high performance spatial data warehousing system over MapReduce. Proc. VLDB Endow. 6(11), 1009–1020 (2013)

    Article  Google Scholar 

  8. Cooper, L.A.D., Kong, J., Wang, F., Saltz, K.T., J.H., Brat D.: In silico analysis of nuclei in glioblastoma using large-scale microscopy images improves prediction of treatment response. In: EMBC (2011)

    Google Scholar 

  9. Wang, F., Oh, T.W., Vergara-Nidermayr, C., Kurc, T.M., Saltz, J.H.: Managing and querying whole slide images. In: SPIE Medical Imaging (2012)

    Google Scholar 

  10. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The r*-tree: an efficient and robust access method for points and rectangles. In: SIGMOD (1990)

    Google Scholar 

  11. Zhang, X., Wang, F., Lee, R., Saltz, J.H.: Towards building high performance medical image management system for clinical trials. In: SPIE Medical Imaging, pp. 762805–11 (2011)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by NSF IIS 1350885, by NSF ACI 1350885, by Grant Number K25CA181503 from the National Institute of Health, by Grant Number R01LM009239 from the National Library of Medicine, by Grant Number 1U24CA180924-01A1 from the National Cancer Institute, and by CNPq.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fusheng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Vo, H. et al. (2017). Cloud-Based Whole Slide Image Analysis Using MapReduce. In: Wang, F., Yao, L., Luo, G. (eds) Data Management and Analytics for Medicine and Healthcare. DMAH 2016. Lecture Notes in Computer Science(), vol 10186. Springer, Cham. https://doi.org/10.1007/978-3-319-57741-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-57741-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-57740-1

  • Online ISBN: 978-3-319-57741-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics