Scalable Histopathological Image Analysis via Active Learning

  • Yan Zhu
  • Shaoting Zhang
  • Wei Liu
  • Dimitris N. Metaxas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8675)


Training an effective and scalable system for medical image analysis usually requires a large amount of labeled data, which incurs a tremendous annotation burden for pathologists. Recent progress in active learning can alleviate this issue, leading to a great reduction on the labeling cost without sacrificing the predicting accuracy too much. However, most existing active learning methods disregard the “structured information” that may exist in medical images (e.g., data from individual patients), and make a simplifying assumption that unlabeled data is independently and identically distributed. Both may not be suitable for real-world medical images. In this paper, we propose a novel batch-mode active learning method which explores and leverages such structured information in annotations of medical images to enforce diversity among the selected data, therefore maximizing the information gain. We formulate the active learning problem as an adaptive submodular function maximization problem subject to a partition matroid constraint, and further present an efficient greedy algorithm to achieve a good solution with a theoretically proven bound. We demonstrate the efficacy of our algorithm on thousands of histopathological images of breast microscopic tissues.


Active Learning Submodular Function Histopathological Image Active Learning Method Label Cost 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Balcan, M.F., Hanneke, S., Vaughan, J.W.: The true sample complexity of active learning. Machine Learning 80(2-3), 111–139 (2010)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Chen, Y., Krause, A.: Near-optimal batch mode active learning and adaptive submodular optimization. In: Proc. ICML (2013)Google Scholar
  3. 3.
    Doyle, S., Agner, S., Madabhushi, A., Feldman, M., Tomaszewski, J.: Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features. In: Proc. ISBI (2008)Google Scholar
  4. 4.
    Dundar, M.M., Badve, S., Bilgin, G., Raykar, V., Jain, R., Sertel, O., Gurcan, M.N.: Computerized classification of intraductal breast lesions using histopathological images. IEEE Transactions on Biomedical Engineering 58(7), 1977–1984 (2011)CrossRefGoogle Scholar
  5. 5.
    Fisher, M.L., Nemhauser, G.L., Wolsey, L.A.: An analysis of approximations for maximizing submodular set functions–ii. In: Polyhedral Combinatorics pp. 73–87 (1978)Google Scholar
  6. 6.
    Foran, D.J., Yang, L., et al.: Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. JAMIA 18(4), 403–415 (2011)Google Scholar
  7. 7.
    Golovin, D., Krause, A.: Adaptive submodular optimization under matroid constraints. arXiv preprint arXiv:1101.4450 (2011)Google Scholar
  8. 8.
    Golovin, D., Krause, A.: Adaptive submodularity: Theory and applications in active learning and stochastic optimization. JAIR 42(1), 427–486 (2011)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Gurcan, M.N., Boucheron, L.E., Can, A., Madabhushi, A., Rajpoot, N.M., Yener, B.: Histopathological image analysis: A review. IEEE Reviews in Biomedical Engineering 2, 147–171 (2009)CrossRefGoogle Scholar
  10. 10.
    Hoi, S.C., Jin, R., Zhu, J., Lyu, M.R.: Batch mode active learning and its application to medical image classification. In: Proc. ICML (2006)Google Scholar
  11. 11.
    Lovász, L.: Hit-and-run mixes fast. Mathematical Programming 86(3), 443–461 (1999)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Petushi, S., Garcia, F.U., Haber, M.M., Katsinis, C., Tozeren, A.: Large-scale computations on histology images reveal grade-differentiating parameters for breast cancer. BMC Medical Imaging 6(1), 14 (2006)CrossRefGoogle Scholar
  13. 13.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)Google Scholar
  14. 14.
    Settles, B.: Active learning literature survey. Technical Report, University of Wisconsin, Madison (2010)Google Scholar
  15. 15.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2002)zbMATHGoogle Scholar
  16. 16.
    Wang, Z., Ye, J.: Querying discriminative and representative samples for batch mode active learning. In: Proc. KDD (2013)Google Scholar
  17. 17.
    Zhang, X., Liu, W., Zhang, S.: Mining histopathological images via hashing-based scalable image retrieval. In: ISBI (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yan Zhu
    • 1
  • Shaoting Zhang
    • 2
  • Wei Liu
    • 3
  • Dimitris N. Metaxas
    • 1
  1. 1.Department of Computer ScienceRutgers UniversityPiscatawayUSA
  2. 2.Department of Computer ScienceUniversity of North Carolina at CharlotteUSA
  3. 3.IBM T.J. Watson Research CenterUSA

Personalised recommendations