Advertisement

Detecting outliers with one-class selective transfer machine

  • Hirofumi Fujita
  • Tetsu Matsukawa
  • Einoshin SuzukiEmail author
Regular Paper
  • 14 Downloads

Abstract

In this paper, we propose an outlier detection method from an unlabeled target dataset by exploiting an unlabeled source dataset. Detecting outliers has attracted attention of data miners for over two decades, since such outliers can be crucial in decision making, knowledge discovery, and fraud detection, to name but a few. The fact that outliers are scarce and often tedious to label motivated researchers to propose detection methods from an unlabeled dataset, some of which borrow strengths from relevant labeled datasets in the framework of transfer learning. He et al. tackled a more challenging situation in which the input datasets coming from multiple tasks are all unlabeled. Their method, ML-OCSVM, conducts multi-task learning with one-class support vector machines (SVMs) and yields a mean model plus task-specific increments to detect outliers in the test datasets of the multiple tasks. We inherit a part of their problem setting, taking only unlabeled datasets in the input, but increase the difficulty by assuming only one source dataset in addition to the target dataset. Consequently, the source dataset consists of examples relevant to the target task as well as examples that are less relevant. To cope with this situation, we extend Selective Transfer Machine, which weights individual examples in the framework of covariate shift and learns an SVM classifier, to our one-class setting by replacing the binary SVMs with one-class SVMs. Experiments on two public datasets and an artificial dataset show that our method mostly outperforms baseline methods, including ML-OCSVM and a state-of-the-art ensemble anomaly detection method, in F1 score and AUC.

Keywords

One-class outlier detection One-class support vector machines Kernel mean matching Transfer learning 

Notes

Acknowledgements

A part of this research was supported by Grants-in-Aid for Scientific Research JP15K12100 and JP18H03290 from the Japan Society for the Promotion of Science (JSPS).

References

  1. 1.
    Hawkins DM (1980) Identification of Outliers. Chapman and Hall, LondonCrossRefGoogle Scholar
  2. 2.
    Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J. 8(3–4):237–253CrossRefGoogle Scholar
  3. 3.
    Deguchi Y, Suzuki E (2015) Hidden fatigue detection for a desk worker using clustering of successive tasks. In: Ambient Intelligence, vol 9425 of LNCS. Springer, pp 268–283Google Scholar
  4. 4.
    Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefGoogle Scholar
  5. 5.
    Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66CrossRefGoogle Scholar
  6. 6.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  7. 7.
    Ganin Y, Lempitsky V (2015) Unsupervised Domain Adaptation by Backpropagation. In: Proceedings of ICML, pp 1180–1189Google Scholar
  8. 8.
    Sener O, Song HO, Saxena A, Savarese S (2016) Learning transferrable representations for unsupervised domain adaptation. In: Proceedings of NIPS, pp 2110–2118Google Scholar
  9. 9.
    Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised Domain Adaptation with Residual Transfer Networks. In: Proceedings of NIPS, pp 136–144Google Scholar
  10. 10.
    Yang H, King I, Lyu MR (2010) Multi-task learning for one-class classification. In: Proceedings of IJCNN, pp 1–8Google Scholar
  11. 11.
    He X, Mourot G, Maquin D, Ragot J, Beauseroy P, Smolarz A, Grall-Maës E (2014) Multi-task learning with one-class SVM. Neurocomputing 133:416–426CrossRefGoogle Scholar
  12. 12.
    Chu W-S, Torre FDL, Cohn JF (2017) Selective transfer machine for personalized facial expression analysis. IEEE Trans Pattern Anal Mach Intell 39(3):529–545CrossRefGoogle Scholar
  13. 13.
    Gretton A, Smola A, Huang J, Schmittfull M, Borgwardt K, Schölkopf B (2009) Covariate shift by Kernel mean matching. In: Dataset shift in machine learning, chapter 8, pp 131–160. The MIT Press, CambridgeCrossRefGoogle Scholar
  14. 14.
    Sugiyama M, Krauledat M, Müller K-R (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8:985–1005zbMATHGoogle Scholar
  15. 15.
    Fujita H, Matsukawa T, Suzuki E (2018) One-class selective transfer machine for personalized anomalous facial expression detection. In: Proceedings of VISIGRAPP, vol 5: VISAPP, pp 274–283Google Scholar
  16. 16.
    Han J, Kamber M, Pei J (2012) Data mining, 3rd edn. Morgan Kaufmann, WalthamzbMATHGoogle Scholar
  17. 17.
    Schapire RE (1999) A brief introduction to boosting. In: Proceedings of IJCAI, pp 1401–1406Google Scholar
  18. 18.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp 226–231Google Scholar
  19. 19.
    Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of SIGMOD, pp 49–60CrossRefGoogle Scholar
  20. 20.
    Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. Proc KDD 98:58–65Google Scholar
  21. 21.
    Wang W, Yang J, Muntz RR (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of VLDB, pp 186–195Google Scholar
  22. 22.
    Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of SIGMOD, pp 94–105CrossRefGoogle Scholar
  23. 23.
    Breunig MM, Kriegel H-P, Ng RT, Sander Jörg J (2000) LOF: identifying density-based local outliers. Proc SIGMOD Rec 29(2):93–104CrossRefGoogle Scholar
  24. 24.
    Sugiyama M, Borgwardt K (2013) Rapid distance-based outlier detection via sampling. In: Proceedings of NIPS, pp 467–475Google Scholar
  25. 25.
    Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, PrincetonCrossRefGoogle Scholar
  26. 26.
    Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRefGoogle Scholar
  27. 27.
    Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRefGoogle Scholar
  28. 28.
    Liu H, Liu T, Wu J, Tao D, Fu Y (2015) Spectral ensemble clustering. In: Proceedings of KDD, pp 715–724Google Scholar
  29. 29.
    Zhao Y, Nasrullah Z, Hryniewicki MK, Li Z (2019) LSCP: locally selective combination in parallel outlier ensembles. In Proceedings of SDMGoogle Scholar
  30. 30.
    Bakker B, Heskes T (2003) Task clustering and gating for Bayesian multitask learning. J Mach Learn Res 4:83–99zbMATHGoogle Scholar
  31. 31.
    Yao Y, Doretto G (2010) Boosting for transfer learning with multiple sources. In: Proceedings of CVPR, pp 1855–1862Google Scholar
  32. 32.
    Ge L, Gao J, Ngo H, Li K, Zhang A (2014) On handling negative transfer and imbalanced distributions in multiple source transfer learning. Stat Anal Data Min ASA Data Sci J 7(4):254–271MathSciNetCrossRefGoogle Scholar
  33. 33.
    Cao B, Pan SJ, Zhang Y, Yeung D-Y, Yang Q (2010) Adaptive transfer learning. In: Proceedings of AAAI, pp 407–712Google Scholar
  34. 34.
    Tzeng E, Homan J, Darrell T, Saenko K (2015) Simultaneous deep transfer across domains and tasks. In: Proceedings of ICCV, pp 4068–4076Google Scholar
  35. 35.
    Chen J, Liu X, Tu P, Aragones A (2013) Learning person-specific models for facial expression and action unit recognition. Pattern Recognit Lett 34(15):1964–1970CrossRefGoogle Scholar
  36. 36.
    Kodirov E, Xiang T, Fu Z-Y, Gong S (2015) Unsupervised domain adaptation for zero-shot learning. In: Proceedings of ICCV, pp 2452–2460Google Scholar
  37. 37.
    Chen J, Liu X (2014) Transfer learning with one-class data. Pattern Recognit Lett 37:32–40CrossRefGoogle Scholar
  38. 38.
    Sangineto E, Zen G, Ricci E, Sebe N (2014) We are not all equal: personalizing models for facial expression analysis with transductive parameter transfer. In: Proceedings of ACM international conference on multimedia, pp 357–366Google Scholar
  39. 39.
    Zen G, Porzi L, Sangineto E, Ricci E, Sebe N (2016) Learning personalized models for facial expression analysis and gesture recognition. IEEE Trans Multimed 18(4):775–788CrossRefGoogle Scholar
  40. 40.
    Sugiyama M, Nakajima S, Kashima H, Buenau PV, Kawanabe M (2008) Direct importance estimation with model selection and its application to covariate shift adaptation. In: Proceedings of NIPS, pp 1433–1440Google Scholar
  41. 41.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  42. 42.
    Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. MIT Press, CambridgeGoogle Scholar
  43. 43.
    Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178MathSciNetCrossRefGoogle Scholar
  44. 44.
    Amari S, Wu S (1999) Improving support vector machine classifiers by modifying kernel functions. Neural Netw 12(6):783–789CrossRefGoogle Scholar
  45. 45.
    Gorski J, Pfeuffer F, Klamroth K (2007) Biconvex sets and optimization with biconvex functions: a survey and extensions. Math Methods Oper Res 66(3):373–407MathSciNetCrossRefGoogle Scholar
  46. 46.
    Monteiro RDC, Adler I (1989) Interior path following primal–dual algorithms. Part II: convex quadratic programming. Math Program 44(1–3):43–66CrossRefGoogle Scholar
  47. 47.
    Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I (2011) Painful data: the UNBC-McMaster shoulder pain expression archive database. In: Proceedings of IEEE international conference on automatic face and gesture recognition and workshops, pp 57–64Google Scholar
  48. 48.
    Prkachin KM, Solomon PE (2008) The structure, reliability and validity of pain expression: evidence from patients with shoulder pain. Pain 139(2):267–274CrossRefGoogle Scholar
  49. 49.
    Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50CrossRefGoogle Scholar
  50. 50.
    Lowe DG (1999) Object recognition from local scale-invariant features. In: Proceedings of ICCV, pp 1150–1157Google Scholar
  51. 51.
    Ahonen T, Hadid A, Pietikäinen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041CrossRefGoogle Scholar
  52. 52.
    Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685CrossRefGoogle Scholar
  53. 53.
    Ekman P, Friesen WV (1975) Unmasking the face: a guide to recognizing emotions from facial cues. Prentice Hall, Englewood CliffsGoogle Scholar
  54. 54.
    Mohammadian A, Aghaeinia H, Towhidkhah F et al (2016) Subject adaptation using selective style transfer mapping for detection of facial action units. Expert Syst Appl 56:282–290CrossRefGoogle Scholar
  55. 55.
    Pan J, Tompkins WJ (1985) A real-time QRS detection algorithm. IEEE Trans Biomed Eng BME–32(3):230–236CrossRefGoogle Scholar
  56. 56.
    Yu S-N, Chen Y-H (2007) Electrocardiogram beat classification based on wavelet transformation and probabilistic neural network. Pattern Recognit Lett 28(10):1142–1150CrossRefGoogle Scholar
  57. 57.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  58. 58.
    Andrei N (2019) PyClustering: data mining library. J Open Source Softw 4(36):1230CrossRefGoogle Scholar
  59. 59.
    Zhao Y, Nasrullah Z, Li Z (2019) PyOD: a python toolbox for scalable outlier detection. arXiv preprint arXiv:1901.01588
  60. 60.
    Andersen M, Dahl J, Liu Z, Vandenberghe L (2011) Interior-point methods for largescale cone programming. In: Optimization for machine learning, pp 55–83. MIT Press, CambridgeGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Graduate School, Faculty of Information Science and Electrical EngineeringKyushu UniversityFukuokaJapan

Personalised recommendations