Keywords

1 Anomaly Detection: Motivation and Challenges

Group anomaly detection (GAD) is an important part of data analysis for many interesting group applications. Pointwise anomaly detection focuses on the study of individual data instances that do not conform with the expected pattern in a dataset. With the increasing availability of multifaceted information, GAD research has recently explored datasets involving groups or collections of observations. Many pointwise anomaly detection methods cannot detect a variety of different deviations that are evident in group datasets. For example, Muandet et al. [20] possibly discover Higgs bosons as a group of collision events in high energy particle physics whereas pointwise methods are unable to distinguish this anomalous behavior. Detecting group anomalies require more specialized techniques for robustly differentiating group behaviors.

GAD aims to identify groups that deviate from the regular group pattern. Generally, a group consists of a collection of two or more points and group behaviors are more adequately described by a greater number of observations. A point-based anomalous group is a collection of individual pointwise anomalies that deviate from the expected pattern. It is more difficult to detect distribution-based group anomalies where points are seemingly regular however their collective behavior is anomalous. It is also possible to characterize group anomalies by certain properties and subsequently apply pointwise anomaly detection methods. In image applications, a distribution-based anomalous group has an irregular mixture of visual features compared to the expected group pattern.

GAD is a difficult problem for many real-world applications especially involving more complicated group behaviors such as in image datasets. Xiong et al. [29] propose a novel method for detecting group anomalies however an improvement in their detection results is possible for image applications. Images are modeled as group of pixels or visual features and it may be difficult to accurately characterize anomalous images by deviating properties. For example, it is difficult to distinguish regular groups (cat images) from anomalous groups (tiger images) that possess cat whiskers but also irregular features of tiger stripes. The problem of GAD in image datasets is useful and applicable to similar challenging real-world applications where group distributions are more complex and difficult to characterize.

Figure 1 illustrates examples of point-based and distribution-based group anomalies where the innermost circle contains images exhibiting regular behaviors whereas the outer circle conveys group anomalies. Plot (A) displays tiger images as point-based group anomalies as well as rotated cat images as distribution-based group anomalies (\(180^\circ \) rotation). In plot (B), distribution-based group anomalies are irregular mixtures of cats and dogs in a single image while plot (C) depicts anomalous images stitched from different scene categories of cities, mountains or coastlines. Our image data experiments will mainly focus on detecting group anomalies in these scenarios.

Fig. 1.
figure 1

Examples of point-based and distribution-based group anomalies in various image applications. The expected group behavior represents images in the inner concentric circle while the outer circle contains images that are group anomalies.

Even though the GAD problem may seem like a straightforward comparison of group observations, many complications and challenges arise. As there is a dependency between the location of pixels in a high-dimensional space, appropriate features in an image may be difficult to extract. For effective detection of anomalous images, an adequate description of images is required for model training. Complications in images potentially arise such as low resolution, poor illumination intensity, different viewing angles, scaling and rotations of images. Like other anomaly detection applications, ground truth labels are also usually unavailable for training or evaluation purposes. A number of pre-processing and extraction techniques can be applied as solutions to different aspects of these challenges.

In order to detect distribution-based group anomalies in various image applications, we propose using deep generative models (DGMs). The main contributions of this paper are:

  • We formulate DGMs for the problem of detecting group anomalies using a group reference function.

  • Although deep generative models have been applied in various image applications, they have not been applied to the GAD problem.

  • A variety of experiments are performed on both synthetic and real-world datasets to demonstrate the effectiveness of deep generative models for detecting group anomalies as compared to other GAD techniques.

The rest of the paper is organized as follows. An overview of related work is provided (Sect. 2) and preliminaries for understanding approaches for detecting group anomalies are also described (Sect. 3). We formulate our problem and then proceed to elaborate on our proposed solution that involves deep generative models (Sect. 4). Our experimental setup and key results are presented in Sect. 5 and Sect. 6 respectively. Finally, Sect. 7 provides a summary of our findings as well as recommends future directions for GAD research.

2 Background and Related Work on Group Anomaly Detection

GAD applications are emerging areas of research where most state-of-the-art techniques have been more recently developed. While group anomalies are briefly discussed in anomaly detection surveys such as Chandola et al. [4] and Austin [10], Xiong [28] elaborates on more recent state-of-the-art GAD methods. Yu et al. [33] further reviews GAD techniques where group structures are not previously known and clusters are inferred based on pairwise relationships between data instances. Recently Toth and Chawla [27] provided a comprehensive overview of GAD methods as well as a detailed description of detecting temporal changes in groups over time. This paper explores group anomalies where group memberships are known a priori such as in image applications.

Previous studies on image anomaly detection can be understood in terms of group anomalies. Quellec et al. [22] examine mammographic images where point-based group anomalies represent potentially cancerous regions. Perera and Patel [21] learn features from a collection of images containing regular chair objects and detect point-based group anomalies where chairs have abnormal shapes, colors and other irregular characteristics. On the other hand, regular categories in Xiong et al. [29] represent scene images such as inside city, mountain or coast and distribution-based group anomalies are stitched images with a mixture of different scene categories. At a pixel level, Xiong et al. [30] apply GAD methods to detect anomalous galaxy clusters with irregular proportions of RGB pixels. We emphasize detecting distribution-based group anomalies rather than point-based anomalies in our subsequent image applications.

The discovery of group anomalies is of interest to a number of diverse domains. Muandet et al. [20] investigate GAD for physical phenomena in high energy particle physics where Higgs bosons are observed as slight excesses in a collection of collision events rather than individual events. Xiong et al. [29] analyze a fluid dynamics application where a group anomaly represents unusual vorticity and turbulence in fluid motion. In topic modeling, Soleimani and Miller [25] characterize documents by topics and anomalous clusters of documents are discovered by their irregular topic mixtures. By incorporating additional information from pairwise connection data, Yu et al. [34] find potentially irregular communities of co-authors in various research communities. Thus there are many GAD application other than image anomaly detection.

A related discipline to image anomaly detection is video anomaly detection where many deep learning architectures have been applied. Sultani et al. [26] detect real-world anomalies such as burglary, fighting, vandalism and so on from CCTV footage using deep learning methods. In a review, Kiran et al. [15] compare DGMs with different convolution architectures for video anomaly detection applications. Recent work [3, 23, 32] illustrate the effectiveness of generative models for high-dimensional anomaly detection. Although, there are existing works that have applied deep generative models in image related applications, they have not been formulated as a GAD problem. We leverage autoencoders for DGMs to detect group anomalies in a variety of data experiments.

3 Preliminaries

In this section, a summary of state-of-the-art techniques for detecting group anomalies is provided. We also assess strengths and weaknesses of existing models, compared with the proposed deep generative models.

3.1 Mixture of Gaussian Mixture (MGM) Models

A hierarchical generative approach MGM is proposed by Xiong et al. [30] for detecting group anomalies. The data generating process in MGM assumes that each group follow a Gaussian mixture where more than one regular mixture proportion is possible. For example, an image is a distribution over visual features such as paws and whiskers from a cat image and each image is categorized into possible regular behaviors or genres (e.g. dogs or cats). An anomalous group is then characterized by an irregular mixture of visual features such as a cat and dog in a single image. MGM is useful for distinguishing multiple types of group behaviors however poor results are obtained when group observations do not appropriately follow the assumed generative process.

3.2 One-Class Support Measure Machines (OCSMM)

Muandet et al. [20] propose OCSMM to maximize the margin that separates regular class of group behaviors from anomalous groups. Each group is firstly characterized by a mean embedding function then group representations are separated by a parameterized hyperplane. OCSMM is able to classify groups as regular or anomalous behaviors however careful parameter selection is required in order to effectively detect group anomalies.

3.3 One-Class Support Vector Machines (OCSVM)

If group distributions are reduced and characterized by a single value then OCSVM from Schölkopf et al. [24] can be applied to the GAD problem. OCSVM separates data points using a parametrized hyperplane similar to OCSMM. OCSVM requires additional pre-processing to convert groups of visual features into pointwise observations. We follow a bag of features approach in Azhar et al. [1], where k-means is applied to visual image features and centroids are clustered into histogram intervals before implementing OCSVM. OCSVM is a popular pointwise anomaly detection method however it may not accurately capture group anomalies if the initial group characterizations are inadequate.

3.4 Deep Generative Models for Anomaly Detection

This section describes the mathematical background of deep generative models that will be applied for detecting group anomalies. The following notation considers data involving M groups where the mth group is denoted by \(G_m\).

Autoencoders: An autoencoder is trained to learn reconstructions that are close to its original input. The autoencoder consists of encoder \(f_\phi \) to embed the input to latent or hidden representation and decoder \(g_\psi \) which reconstructs the input from hidden representation. The reconstruction loss of an autoencoder is defined as the squared error between the input \(G_{m}\) and output \(\hat{G}_{m}\) given by

$$\begin{aligned} { L_r(G_{m},\hat{G}_{m} )} = ||{ G_m - \hat{G}_m }||^2 \end{aligned}$$
(1)

Autoencoders leverage reconstruction error as an anomaly score where data points with significantly high errors are considered to be anomalies.

Variational Autoencoders (VAE): Variational autoencoder (VAE) [14] are generative analogues to the standard deterministic autoencoder. VAE impose constraint while inferring latent variable z. The hidden latent codes produced by encoder \(f_\phi \) is constrained to follow prior data distribution \(P(G_m)\). The core idea of VAE is to infer P(z) from \(P(z|G_m)\) using Variational Inference (VI) technique given by

$$\begin{aligned} { L(G_m,\hat{G}_m)} = { L_r(G_m,\hat{G}_m)} + KL(f_\phi (z|x)\, || \, g_\psi (z)) \end{aligned}$$
(2)

In order to optimize the Kullback–Leibler (KL) divergence, a simple reparameterization trick is applied; instead of the encoder embedding a real-valued vector, it creates a vector of means \(\varvec{\mu }\) and a vector of standard deviations \(\varvec{\sigma }\). Now a new sample that replicates the data distribution \(P(G_m)\) can be generated from learned parameters (\(\varvec{\mu }\), \(\varvec{\sigma }\)) and input this latent representation z through the decoder \(g_\psi \) to reconstruct the original group observations. VAE utilizes reconstruction probabilities [3] or reconstruction error to compute anomaly scores.

Adversarial Autoencoders (AAE): One of the main limitations of VAE is lack of closed form analytical solution for integral of the KL divergence term except for few distributions. Adversarial autoencoders (AAE) [19] avoid using the KL divergence by adopting adversarial learning, to learn broader set of distributions as priors for the latent code. The training procedure for this architecture is performed using an adversarial autoencoder consisting of encoder \(f_\phi \) and decoder \(g_\psi \). Firstly a latent representation z is created according to generator network \(f_\phi (z|G_m)\) and the decoder reconstructs the input \(\hat{G}_m\) from z. The weights of encoder \(f_\phi \) and decoder \(g_\psi \) are updated by backpropogating the reconstruction loss between \(\hat{G}_m\) and \(G_m\). Secondly the discriminator receives z distributed as \(f_\phi (z|G_m)\) and \(z'\) sampled from the true prior P(z) to compute the score assigned to each (D(z) and \(D(z')\)). The loss incurred is minimized by backpropagating through the discriminator to update its weights. The loss function for autoencoder (or generator) \(L_G\) is composed of the reconstruction error along with the loss for discriminator \(L_D\) where

$$\begin{aligned} {L_G} = \frac{1}{M'} \sum _{m=1}^{M'} \log D(z_m) \quad \hbox {and}\quad L_D = -\frac{1}{M'} \sum _{m=1}^{M'} \big [\log D(z'_m)+ \log (1- D(z_m)) \big ] \end{aligned}$$
(3)

where \(M'\) is the minibatch size while z represents the latent code generated by encoder and \(z'\) is a sample from the true prior P(z).

4 Problem and Model Formulation

Problem Definition: The following formulation follows the problem definition introduced in Toth and Chawla [27]. Suppose groups \(\mathcal {G} = \big \{ \mathbf{G}_m \big \} _{ m=1 }^M \) are observed where M is the number of groups and the mth group has group size \(N_m\) with V-dimensional observations, that is \(\mathbf{G}_m \in \mathbb {R}^{N_m \times V} \). In GAD, the behavior or properties of the mth group is captured by a characterization function denoted by \(f: \mathbb {R}^{N_m \times V} \rightarrow \mathbb {R}^{D}\) where D is the dimensionality on a transformed feature space. After a characterization function is applied to a training dataset, group information is combined using an aggregation function \(g: \mathbb {R}^{M \times D} \rightarrow \mathbb {R}^{D}\). A group reference is composed of characterization and aggregation functions on input groups with

$$\begin{aligned} \mathcal {G}^{(ref)} = g \Big [ \big \{ f(\mathbf{G}_{m} ) \big \}_{m=1}^M \Big ] \end{aligned}$$
(4)

Then a distance metric \(d(\cdot ,\cdot ) \ge 0 \) is applied to measure the deviation of a particular group from the group reference function. The distance score \( d\Big (\mathcal {G}^{(ref)} , \mathbf{G}_{m} \Big )\) quantifies the deviance of the mth group from the expected group pattern where larger values are associated with more anomalous groups. Group anomalies are effectively detected when characterization function f and aggregation function g respectively capture properties of group distributions and appropriately combine information into a group reference. For example in an variational autoencoder setting, an encoder function f characterizes mean and standard deviation of group distributions whereas decoder function g reconstructs the original sample. Further descriptions of functions f and g for VAE and AAE are provided in Algorithm 1.

figure a

4.1 Training the Model

The variational and adversarial autoencoder are trained according to the objective function given in Eqs. (2) and (3) respectively. The objective functions of DGMs are optimized using the standard backpropogation algorithm. Given known group memberships, AAE is fully trained on input groups to obtain a representative group reference \(\mathcal {G}^{(ref)}\) described in Eq. 4. While in case of VAE, \(\mathcal {G}^{(ref)}\) is obtained by drawing samples using mean and standard deviation parameters that are inferred using VAE as illustrated in Algorithm 1.

4.2 Predicting with the Model

In order to identify group anomalies, the distance of a group from the group reference \(\mathcal {G}^{(ref)}\) is computed. The output scores are sorted according to descending order where groups that are furthest from \(\mathcal {G}^{(ref)}\) are considered most anomalous. One convenient property of DGMs is that the anomaly detector will be inductive, i.e. it can generalize to unseen data points. One can interpret the model as learning a robust representation of group distributions. An appropriate characterization of groups results in more accurate detection where any unseen observations either lie within the reference group manifold or deviate from the expected group pattern.

5 Experimental Setup

In this section we show the empirical effectiveness of deep generative models over the state-of-the-art methods on real-world data. Our primary focus is on non-trivial image datasets, although our method is applicable in any context where autoencoders are useful e.g. speech, text.

5.1 Methods Compared

We compare our proposed technique using deep generative models (DGMs) with the following state-of-the art methods for detecting group anomalies:

  • Mixture of Gaussian Mixture (MGM) Model, as per [30].

  • One-Class Support Measure Machines (OCSMM), as per [20].

  • One-Class Support Vector Machines (OCSVM), as per [24].

  • Variational Autoencoder (VAE) [9], as per Eq. (2).

  • Adversarial Autoencoder (AAE) [19], as per Eq. (3).

We used Keras [5], TensorFlow [2] for the implementation of AAE and VAEFootnote 1. MGMFootnote 2, OCSMMFootnote 3 and OCSVMFootnote 4 are applied using publicly available code.

5.2 Datasets

We compare all methods on the following datasets:

  • synthetic data follows Muandet et al. [20] where regular groups are generated by bivariate Gaussian distributions while anomalous groups have rotated covariance matrices.

  • cifar-10 [16] consists of \(32\times 32\) color images over 10 classes with 6000 images per class.

  • scene image data following Xiong et al. [31] where anomalous images are stitched from different scene categories.

  • Pixabay [11] is used to obtain tiger images as well as images of cats and dogs together. These images are rescaled to match dimensions of cat images in cifar-10 dataset.

The real-world data experiments are previously illustrated in Fig. 1.

5.3 Parameter Selection

We now briefly discuss the model and parameter selection for applying techniques in GAD applications. A pre-processing stage is required for state-of-the-art GAD methods when dealing with images where feature extraction methods such as SIFT [18] or HOG [7] represent images as a collection of visual features. In MGM, the number of regular group behaviors T and number of Gaussian mixtures L are selected using information criteria. The kernel bandwidth smoothing parameter in OCSMM [20] is chosen as \( \hbox {median}\big \{ || \mathbf{G}_{m,i} -\mathbf{G}_{l,j} ||^2 \big \} \) for all \(i,j \in \{1,2,\dots ,N_m \}\) and \(m,l \in {1,2,\dots ,M}\) where \( \mathbf{G}_{m,i} \) represents the ith random vector in the mth group. In addition, the parameter for expected proportions of anomalies in OCSMM and OCSVM is set to the true value in the respective datasets.

When applying VAE and AAE, there are four existing network parameters that require careful selection; (a) number of convolutional filters, (b) filter size, (c) strides of convolution operation and (d) activation function. We tuned via grid search of additional hyper-parameters including the number of hidden-layer nodes \(H \in \{3, 64, 128\}\) and regularization \(\lambda \) within range [0, 100]. The learning drop-out rates and regularization parameter \(\mu \) were sampled from a uniform distribution in the range [0.05, 0.1]. The embedding and initial weight matrices are all sampled from uniform distribution within range \([-1, 1]\).

6 Experimental Results

In this section, we explore a variety of GAD experiments. As anomaly detection is an unsupervised learning problem, model evaluation is highly challenging. We employ anomaly injection where known group anomalies are injected into real-world image datasets. The performances of DGMs are evaluated against state-of-the-art GAD methods using area under precision-recall curve (AUPRC) and area under receiver operating characteristic curve (AUROC). AUPRC is more appropriate than AUROC for binary classification under class imbalanced datasets such as in GAD applications [8]. However in our experiments, a high AUPRC score indicates the effectiveness of accurately identifying regular groups while AUROC accounts for the false positive rate of detection methods.

6.1 Synthetic Data: Rotated Gaussian Distribution

Firstly we generate synthetic data where regular behavior consists of bivariate Gaussian samples while anomalous groups have rotated covariance structures. More specifically, \(M=500\) regular group distributions have correlation \(\rho =0.7\) while 50 anomalous groups are generated with correlation \(\rho =-0.7\). The mean vectors are randomly sampled from uniform distributions while covariances of group distributions are given by

$$\begin{aligned} \varvec{\varSigma }_m=\left\{ \begin{array}{@{}ll@{}} \; \begin{pmatrix} 0.2 &{} 0.14 \\ 0.14 &{} 0.2 \end{pmatrix}, &{} m=1,2,\dots ,500 \\ \; \small \begin{pmatrix} 0.2 &{} -0.14 \\ -0.14 &{} 0.2 \end{pmatrix},&m=501,502,\dots ,550 \end{array}\right. \end{aligned}$$
(5)

with each group having \(N_m = 1536\) observations. Since we configured the proposed DGMs with an architecture suitable for \(32\times 32\) pixels for 3 dimensions (red, green, blue), our dataset is constructed such that each group has bivariate observations with a total of 3072 values.

Parameter Settings: GAD methods are applied on the raw data with various parameter settings. MGM is trained with \(T=1\) regular scene types and \(L=3\) as the number of Gaussian mixtures. The expected proportion of group anomalies as true proportion in OCSMM and OCSVM is set to \(\nu \) = 50/M where \(M= 550\) or \(M= 5050\). In addition, OCSVM is applied by treating each Gaussian distribution as a single high-dimensional observation.

Results: Table 1 illustrates the results of detecting distribution-based group anomalies for different number of groups. For smaller number of groups \(M= 550\), state-of-the-art GAD methods achieve a higher performance than DGMs however for a larger training set with \(M= 5050\), deep generative models achieve the highest performance. AAE and VAE attain similar results for both synthetic datasets. This conveys that DGMs require larger number of group observations in order to train an appropriate model.

Table 1. Task results for detecting rotated Gaussian distributions in synthetic datasets where AAE and VAE attain poor detection results for smaller datasets while they achieve the highest performances (as highlighted in gray) given a larger number of groups.

6.2 Detecting Tigers Within Cat Images

Firstly we explore the detection of point-based group anomalies (or image anomalies) by injecting 50 anomalous images of tigers among 5000 cat images. From Fig. 1, the visual features of cats are considered as regular behavior while characteristics of tigers are anomalous. The goal is to correctly detect images of tigers (point-based group anomalies) in an unsupervised manner.

Parameter Settings: In this experiment, HOG extracts visual features as inputs for GAD methods. MGM is trained with \(T=1\) regular cat type and \(L=3\) as the number of mixtures. Parameters in OCSMM and OCSVM are set to \(\nu \) = 50/5050 and OCSVM is applied with k-means (\(k=40\)). Following the success of the Batch Normalization architecture [12] and Exponential Linear Units (elu) [6], we have found that convolutional+batch-normalization+elu layers for DGMs provide a better representation of convolutional filters. Hence, in this experiment the autoencoder of both AAE and VAE adopts four layers of (conv-batch-normalization-elu) in the encoder part and as well as in the decoder portion of the network. AAE network parameters such as (number of filter, filter size, strides) are chosen to be (16, 3, 1) for first and second layers while (32, 3, 1) for third and fourth layers of both encoder and decoder layers. The middle hidden layer size is set to be same as rank \(K = 64\) and the model is trained using Adam [13]. The decoding layer uses sigmoid function in order to capture the nonlinearity characteristics from latent representations produced by the hidden layer. Similar parameter settings are selected for DGMs in subsequent experiments.

Results: From Table 2, AAE attains the highest AUROC value of 0.9906 while OCSMM achieves a AUPRC of 0.9941. MGM, OCSMM, OCSVM are associated with high AUPRC as regular groups are correctly identified but their low AUROC scores indicate poor detection of group anomalies. Figure 2(a) further investigates the top 10 anomalous images detected by these methods and finds that AAE correctly detects all images of tigers while OCSMM erroneously captures regular cat images.

6.3 Detecting Cats and Dogs

We further investigate GAD detection where images of a single cat and dog are considered as regular groups while images with both cats and dogs are distributed-based group anomalies. The constructed dataset consists of 5050 images; 2500 single cats, 2500 single dogs and 50 images of cats and dogs together. As previously illustrated in Fig. 1(B), our goal is to detect all images with irregular mixtures of cats and dogs in an unsupervised manner.

Parameter Settings: In this experiment, HOG extracts visual features as inputs for GAD methods. MGM is trained with \(T=2\) regular cat type and \(L=3\) as the number of mixtures while OCSVM is applied with k-means (\(k=30\)).

Results: Table 2 highlights (in gray) that AEE achieves the highest AUPRC and AUROC values. Other state-of-the-art GAD methods attain high AUPRC however AUROC values are relatively low. From Fig. 2(a), the top 10 anomalous images with both cats and dogs are correctly detected by AAE while OCSMM erroneously captures regular cat images. In fact, OCSMM incorrectly but consistently detects regular cats with similar results to Subsect. 6.2.

6.4 Discovering Rotated Entities

We now explore the detection of distribution-based group anomalies with 5000 regular cat images and 50 images of rotated cats. As illustrated in Fig. 1(A), images of rotated cats are anomalous compared to regular images of cats. Our goal is to detect all rotated cats in an unsupervised manner.

Parameter Settings: In this experiment involving rotated entities, HOG extracts visual features because SIFT is rotation invariant. MGM is trained with \(T=1\) regular cat type and \(L=3\) mixtures while OCSVM is applied with k-means (\(k=40\)).

Results: In Table 2, AAE and VAE achieve the highest AUROC with AAE having slightly better detection results. MGM, OCSMM and OCSVM achieve a high AUPRC but low AUROC. Figure 3 illustrates the top 10 most anomalous groups where AAE correctly detects images containing rotated cats while MGM incorrectly identifies regular cats as anomalous.

Fig. 2.
figure 2

Top 10 anomalous groups are presented for AAE and the best GAD method respectively where red boxes outlining images represent true group anomalies. AAE has an accurate detection of anomalous tigers injected into the cifar-10 dataset as well as for anomalous images of both cats and dogs. On the other hand, OCSMM consistently but erroneously identifies similar cat images as the most anomalous images. (Color figure online)

6.5 Detecting Stitched Scene Images

A scene image dataset is also explored where 100 images originated from each category “inside city”, “mountain” and “coast”. 66 group anomalies are injected where images are stitched from two scene categories. Illustrations are provided in Fig. 1(C) where a stitched image may contain half coast and half city street view. These anomalies are challenging to detect since they have the same local features as regular images however as a collection, they are anomalous. Our objective is detect stitched scene images in an unsupervised manner.

Parameter Settings: State-of-the-art GAD methods utilize SIFT feature extraction in this experiment. MGM is trained with \(T=3\) regular scene types and \(L=4\) Gaussian mixtures while OCSVM is applied with k-means (\(k=10\)). The scene image dimensions are rescaled to enable the application of an identical architecture for DGMs as implemented in previous experiments. The parameter settings for both AAE and VAE follows setup as described in Sect. 6.2.

Results: In Table 2, OCSMM achieves the highest AUROC score while DGMs are less effective in detecting distribution-based group anomalies in this experiment. We suppose that this is because only \(M=366\) groups are available for training in the scene dataset as compared to \(M=5050\) groups in previous experiments. Figure 3(b) displays the top 10 most anomalous images where OCSMM achieves a better detection results than AAE.

Fig. 3.
figure 3

Top 10 anomalous groups are presented where red boxes outlining images represent true group anomalies in the given datasets. AAE performs well in (a) with number of groups \(M=5050\) however does not effectively detect group anomalies in (b) where number of groups is \(M=366\). MGM is unable to correctly detect any rotated cats while OSCMM is able to group anomalies in the scene dataset. (Color figure online)

6.6 Results Summary and Discussion

Table 2 summarizes the performance of detection methods in our experiments. AAE usually achieves better results than VAE as AAE has the advantage of the embedding coverage in the latent space [19]. AAE enforces a better mapping of input variables to embedding space and hence captures more robust input features. Thus AAE achieves the highest detection performance in most experiments however poor results are obtained for scene image data due to the limited number of groups. As demonstrated in our synthetic data and scene images, DGMs have a significantly worse performance on a dataset with a smaller number of groups. Thus given sufficient number of group observations for training, DGMs are effective in detecting group anomalies however poor detection occurs for a small number of groups.

Table 2. Summary of results for various data experiments where first two rows contains deep generative models and the later techniques are state-of-the-art GAD methods. The highest values of performance metrics are shaded in gray.

Comparison of Training Times: We add a final remark about applying the proposed DGMs on GAD problems in terms of computational time and training efficiency. For example, including the time taken to calculate SIFT features on the small-scale scene dataset, MGM takes 42.8 s for training, 3.74 min to train OCSMM and 27.9 s for OCSVM. In comparison, the computational times for our AAE and VAE are 6.5 min and 8.5 min respectively. All the experiments involving DGMs were conducted on a MacBook Pro equipped with an Intel Core i7 at 2.2 GHz, 16 GB of RAM (DDR3 1600 MHz). The ability to leverage recent advances in deep learning as part of our optimization (e.g. training models on a GPU) is a salient feature of our approach. We also note that while MGM and OCSMM are faster to train on small-scale datasets, they suffer from at least \(O(N^2)\) complexity for the total number of observations N. It is plausible that one could leverage recent advances in fast approximations of kernel methods [17] for OCSMM and studying these would be of interest in future work.

7 Conclusion

Group anomaly detection is a challenging area of research especially when dealing with complex group distributions such as image data. In order to detect group anomalies in various image applications, we clearly formulate deep generative models (DGMs) for detecting distribution-based group anomalies. DGMs outperform state-of-the-art GAD techniques in many experiments involving both synthetic and real-world image datasets however DGMs require a large number of group observations for model training. To the best of our knowledge, this is the first paper to formulate and apply DGMs to the problem of detecting group anomalies. A future direction for research involves using recurrent neural networks to detect temporal changes in a group of time series.