Improving Representation of the Positive Class in Imbalanced Multiple-Instance Learning

  • Carlos MeraEmail author
  • Mauricio Orozco-Alzate
  • John Branch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8814)


In standard supervised learning, the problem of learning from imbalanced data has been addressed to improve the performance of learning algorithms in the presence of underrepresented data. However, in Multiple-Instance Learning (MIL), where the imbalance problem is more complex, there is little discussion about it. Motivated by the need of further studies, we discuss the multiple-instance imbalance problem and propose a method to improve the representation of the positive class. Our approach looks for the target concept in positive bags and tries to strength it using an oversampling technique while removes the borderline (ambiguous) instances in positive and negative bags. We evaluate our method on several standard MIL benchmarking data sets in order to show its ability to get an enhanced representation of the positive class.


Multiple-instance learning Class imbalance learning Oversampling Undersampling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)CrossRefzbMATHGoogle Scholar
  2. 2.
    Amores, J.: Multiple instance classification: Review, taxonomy and comparative study. Artif. Intell. 201, 81–105 (2013)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Adv. Neural Inf. Process. Syst., pp. 570–576. MIT Press (1998)Google Scholar
  4. 4.
    Zhang, Q., Goldman, S.: EM-DD: An improved multiple-instance learning technique. In: Adv. Neural Inf. Process. Syst., pp. 1073–1080. MIT Press (2001)Google Scholar
  5. 5.
    Wang, J., Zucker, J.D.: Solving the multiple-instance problem: A lazy learning approach. In: Proc. of the Int. Conf. on Machine Learning, pp. 1119–1126 (2000)Google Scholar
  6. 6.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Adv. Neural Inf. Process. Syst., pp. 561–568 (2003)Google Scholar
  7. 7.
    Gärtner, T., Flach, P., Kowalczyk, A., Smola, A.: Multi-Instance kernels. In: Proc. of the Int. Conf. on Machine Learning, pp. 179–186 (2002)Google Scholar
  8. 8.
    Chen, Y., Bi, J., Wang, J.: MILES: Multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Machine Intell. 28(12), 1931–1947 (2006)CrossRefGoogle Scholar
  9. 9.
    Fu, Z., Robles-Kelly, A., Zhou, J.: MILIS: Multiple instance learning with instance selection. IEEE Trans. Pattern Anal. Machine Intell. 33(5), 958–977 (2011)CrossRefGoogle Scholar
  10. 10.
    Viola, P., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: Adv. Neural Inf. Process. Syst., pp. 1417–1426 (2005)Google Scholar
  11. 11.
    He, H., Garcia, E.: Learning from imbalanced data. IEEE Trans. on Knowl. and Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  12. 12.
    Wang, X., Liu, X., Japkowicz, N., Matwin, S.: Resampling and cost-sensitive methods for imbalanced multi-instance learning. In: IEEE ICDMW, pp. 808–816 (2013)Google Scholar
  13. 13.
    Wang, X., Matwin, S., Japkowicz, N., Liu, X.: Cost-Sensitive Boosting Algorithms for Imbalanced Multi-instance Datasets. In: Zaïane, O.R., Zilles, S. (eds.) Canadian AI 2013. LNCS, vol. 7884, pp. 174–186. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  14. 14.
    Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)zbMATHGoogle Scholar
  15. 15.
    Tax, D., Duin, R.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)CrossRefzbMATHGoogle Scholar
  16. 16.
    Cao, P., Zhao, D., Zaiane, O.: An optimized cost-sensitive SVM for imbalanced data learning. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part II. LNCS, vol. 7819, pp. 280–292. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  17. 17.
    Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003) Google Scholar
  18. 18.
    Parzen, E.: On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3), 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Carlos Mera
    • 1
    Email author
  • Mauricio Orozco-Alzate
    • 2
  • John Branch
    • 1
  1. 1.Universidad Nacional de Colombia, Sede MedellínMedellínColombia
  2. 2.Universidad Nacional de Colombia, Sede ManizalesManizalesColombia

Personalised recommendations