1 Introduction

With the booming of e-commerce websites, like Amazon and eBay, recommender systems increasing become a useful tool to handle massive information. In traditional recommender systems [1, 18, 29, 31, 32], collaborative filtering (CF) obtains a better achievement than other methods, which is widely applied in product recommendation [32] and movie recommendation [18, 31]. However, due to the rapidly increasing number of registered users and various products, the problem of cold start for users (new users of RS with few historical records) and the sparsity of datasets have been increasingly intractable [1].

Fortunately, with the proliferation of social networks, such as Facebook, Twitter and Pinterest, users always prefer to share their images, ratings and reviews on the Internet. In addition, more and more users build different circles of interest and show their relationships explicitly. As shown in Fig. 1, users post their favorite images of clothing items on Pinterest and get reviews from other users. At the same time, users can build the relationship with others, then they could keep interaction. This kind of relationship among users is called social circles. The rich social media provide valuable clues to solve the problems of traditional recommendation systems and new types of recommendation systems, such as video [37], travel [7, 15] and local service [28] recommendation have emerged.

Fig. 1
figure 1

The illustration of social circles: user Ui follows other users V 1, V 2, V 3 and V 4. It conforms the real social situation: some of her social friends share same clothing items with her, like V 4. \(S_{u_{i},v_{1}}, S_{u_{i},v_{2}} , S_{u_{i},v_{3}}\) and \(S_{u_{i},v_{4}}\) are interpersonal influence values of ui to other friends, respectively

Generally, social-based recommendation systems [4, 5, 7, 12, 14, 15, 24, 27, 28, 37, 41] devote to mine the interpersonal trust relationship between users from complex social networks. A method to infer trust circles among users’ friends is proposed in [41]. It is demonstrated that personal interest is a significant factor for recommendation systems [14]. Three factors: personal interest, interpersonal interest similarity, interpersonal influence are considered for local service recommendation in [28], which indicates that leveraging user social circles can effectively improve the performance of recommendation systems.

In this paper, we focus on personalized clothing recommendation. Though user social circles supply a new perspective to boost the performance of recommendation systems, it is still not enough for clothing recommendation. Compared with movie or local service recommendation, clothing recommendation has its own characteristics. First, visual content is more important for clothing items than other products. Users often buy the clothing which meets their visual preference. Second, there exists the fashion style consistency between clothing items, which has a huge impact on user decision making. Fashion style consistency means that matched pairs of clothing items visually belong to a similar style. As shown in Fig. 2, each image consists of two clothing items with the same style. For instance, tops and bottoms in the left column of Fig. 2 are sports style. If the style of a T-shirt is consistent with user’s trousers bought, the user is likely to buy it. Therefore, fashion style consistency between clothing items is a critical factor for clothing recommendation.

Fig. 2
figure 2

The illustration of fashion style consistency: clothing items in the first column are sports style, the medium column belongs to a street style and the last column is casual style

Our work is a practical application in real world, which is illustrated in Fig. 3. On the fashion website, such as Pinterest or Mogujie,Footnote 1 based on user’s favorite items, the recommendation system recommend a list of clothing items according to her social information and matched clothing pairs shared by fashion icons, previously. This task rises some challenging problems which are not well studied yet. First, there are large visual differences between various kinds of clothes with in a similar fashion style. As shown in Fig. 2, although tops and bottoms of the middle column are visually different, users think they belong to the same style and will buy them. How to measure fashion style consistency between clothing items of different categories is a novel topic, which has not been explored in recommendation systems. Second, the fusion of user social circle and fashion style consistency is an interesting issue for clothing recommendation.

Fig. 3
figure 3

The illustration of our task: make a recommendation of clothing based on fashion style consistency between clothing items and user’s social circles

To investigate fashion style consistency of different clothing categories, matched pairs of clothing items are collected from a social fashion website. A novel learning model based on Siamese Convolutional Neural Network (SCNN) is introduced to learn a feature transformation from clothing images to a latent feature space that expresses fashion style consistency. In the training phrase, a new sampling strategy based on deep features is explored to ensure the reasonability of training dataset. This model is emploited to analyze the fashion style consistency from a visual perspective. To implement personalized clothing recommendation, interpersonal influence, personal interest, interpersonal interest similarity and fashion style consistency are fully integrated into a unified framework based on probabilistic matrix factorization (PMF). Each factor, acting as a constrain, is used to refine the representation of users and clothing items in latent feature space.

The main contributions of this paper are summarized as follows:

  • To the best of our knowledge, it is the first time to explore clothing recommendation by simultaneously considering user social circle and fashion style consistency between clothing items, which is in compliance with the urgent needs of social fashion websites.

  • A novel learning model based on SCNN with a new sampling strategy is proposed to measure the fashion style consistency between clothing items, which can transform clothing items of different categories to a uniform feature space.

  • A unified framework is constructed for clothing recommendation, which considers four factors: interpersonal influence, personal interest, interpersonal interest similarity and fashion style consistency.

  • To evaluate our method, two datasets are collected from a real social fashion. One is called Fashion Match Dataset, which contains 37343 pairs of clothing items shared by fashion icons. The other dataset is used for personalized clothing recommendation, which consists of 1031 users, 30050 rating records of clothing items and their social relationships. Extensive experiments demonstrate that the proposed model achieves better performance than state-of-the-art methods.

The rest of this paper is organized as follows: related work is reviewed in Section 2. The model of personalized clothing recommendation is elaborated in Section 3. Experiments and results are presented in Section 4. Finally, this paper is concluded with a summary in Section 5.

2 Related work

The most related branches of our work are recommender system based on collaborative filtering (CF) and clothing research. The summarization is separately presented in the following subsections.

2.1 Recommender system based on collaborative filtering

Collaborative Filtering (CF) is one of the most popular approaches to build recommender systems. It is first utilized in GroupLens system [29], which recommends items based on users with similar interests. The drawback of user-based CF is that the amount of work dramatically increases with the number of users. Item-based CF is explored for this issue, which uses the relationship between different items to make recommendation [32]. However, these traditional CF algorithms can neither handle very large datasets nor users who have very few ratings. They are called the problems of scalability and data sparsity in recommendation systems, which are recognized as the two most crucial challenges of recommendation systems [1]. In Netflix Prize [18], a multifaceted collaborative filtering method combing matrix factorization and neighborhood models greatly improve the accuracy, which makes matrix factorization techniques widely applied in recommendation systems [18]. A Probabilistic Matrix Factorization (PMF) model is presented to handle the large Netflix dataset, in which the capacity of this model can be controlled automatically [31]. Though the scalability of recommendation systems can be solved by PMF, data sparsity is still an unsolved problem for traditional recommendation systems.

With the popularity of social networks, more users prefer to share their experiences on the Internet, such as ratings, reviews, images and locations, which make it possible to address the problems of cold start and data sparsity in recommendation systems. Following the assumption that the social circles of a person affect his/her behaviors, a factor analysis based PMF is proposed to solve the problem of data sparsity by employing both user’s social circles and rating records [24]. Users’ tastes and the favors of their trusted friends are naturally fused into a novel probabilistic factor analysis framework for recommendation in [24]. The trust-model between users is built in [12], which is measured by a random walk model. A novel problem of item-level social influence prediction is proposed in [4, 5]. The model proposed in [5] incorporates rich prior knowledge on both user dimension and item dimension, while this problem is transformed to two information retrieval scenarios: user and item ranking [4]. The concept of inferred category-specific circles is introduced to improve recommendation accuracy in [41], which divides social circles of users into several subsets according to the item category. Motivated by the fact that social contextual factors are significant for users on social networks, a social recommendation on the basis of psychology and sociology studies is investigated in [14], which considers two crucial factors: interpersonal influence derived from social networks and personal interest derived from social context. A joint social-content recommendation framework is designed in [37] to suggest videos which can be imported or re-shared on the online social network. A personalized tagging approach for photos is proposed in [27]. It can recommend tags for uploaded photos according to users’ tagging habits. Compared with [14], personal interest, interpersonal interest similarity and interpersonal influence are fused into a unified personalized recommendation model base on PMF [28]. An Author Topic-based Collaborative Filtering (ATCF) method is proposed to facilitate comprehensive Points of Interest (POI) recommendation [15]. User preferred topics are extracted from the geo-tag constrained textual description of photos via the author topic model instead of only from the geo-tags (GPS locations). Three types of information, i.e., POI properties, user interests and sentiment indications are modeled under a unified POI recommendation framework in [7] with the consideration of their relationship.

From the analysis of the aforementioned works, we can see that user social circles enrich the information of users, so that the performance can be improved based on social networks. Unfortunately, clothing recommendation considering user social circles is not well explored, which motivates this work.

2.2 Clothing research

Recently, many works have begun to study related issues of clothing, such as clothing annotation and attribute learning [2, 3, 26, 34], clothing retrieval [6, 10, 21, 23, 36], clothing segmentation and parsing [16, 19, 22, 38, 39, 42], fashion analysis of clothing [8, 17, 33, 40] and clothing recommendation [9, 11, 20, 25, 35].

Clothing annotation and attribute learning are treated as classification problems in [2, 3, 26], which model the relationship between well-defined attributes and low-level visual features using machine learning techniques. While clothing annotation is regarded as data mining problems in [34], which uses tags of visual neighbors to label a query clothing image.

Clothing retrieval attempts to find similar clothing items for a query clothing item. Earlier works rely on handcrafted features, such as color histogram [36] and visual phrases [6]. A practical problem, called cross-scenario clothing retrieval, is first introduced in [21]: given a daily human photo captured in general environment, finding similar clothing images in online shops. There are large discrepancies between daily scenarios and online shopping scenarios. To address these issues of cross domain clothing retrieval, several methods are explored including transfer learning techniques [21], Dual Attribute-aware Ranking Network (DARN) [10] and Fashionnet [23].

Clothing segmentation and parsing predict pixel-wise labeling for clothing items, which provide the foundation for other tasks, such as clothing recognition. Several solutions have been proposed [16, 19, 22, 38, 39, 42]. The problem of clothing segmentation is formulated as cross-scenario retrieval in [16]. A novel clothing cosegmentation algorithm is proposed to improve the accuracy of clothing extraction by exploiting the properties of multiple clothing images with the same apparel [42]. An elegant framework is proposed to iteratively refine the human detection results and fashion parsing [38]. Different from [38], the parsing result is improved with the help of fashion images of similar styles [39]. Without pixel-level labels in the training phrase, a fashion image parsing with weak labels is addressed in [22]. It combines human pose estimation, MRF-based color and category inference, and superpixel-level category classifier learning. A well-engineered framework is construct for joint parsing a batch of clothing images given image level tags [19], which consists of two phrases: image co-segmentation and region co-labeling.

Due to the rise of fashion-focused online communities, such as Polyvore and Chictopia, there are some works for fashion analysis and clothing recommendation. The relationship between personal style and clothing items are examined by an online competitive style rating game called Hipster Wars [17]. Similar to [17], visual popularity and perception of fashionability are discussed in [33, 40], respectively. Visual evolution of fashion trends is modeled in [8], which can simultaneously discover the visual appearance of fashion products as well as their evolution over time.

There are only a handful of works to explore the problem of clothing recommendation. An occasion-oriented clothing recommendation system [20] is constructed by considering two key criterions: wear properly and wear aesthetically. Two classes of recommenders are proposed in [11], namely deterministic and stochastic fashion recommenders, which mainly focus on color modeling for recommendation. The task of fashion matching is considered in [9, 25, 35], which explore logic regression [25], convolutional neural network [35], tensor factorization [9] to measure the similarity of different clothing items in a latent feature space.

From the summarization of related works, it is obvious that the study of clothing has drawn much attention of researchers. Most of existing works of clothing recommendation are based on the visual analysis of clothing images, which cannot meet requirements of active users on social fashion websites. Visual contents and social circles are key factors for clothing items and social websites, respectively. Therefore, these two factors should be leveraged for social clothing recommendation. Unfortunately, to the best of our knowledge, there is no such work integrating social circles and fashion style consistency so far.

3 Personalized clothing recommendation

3.1 Framework

The framework of the proposed personalized clothing recommendation is illustrated in Fig. 4. Symbols and notations of this paper are listed in Table 1. On the social fashion websites, there are three kinds of data: (1) user social circles: showing the relationship among users; (2) user-clothing records: indicating the likeness of users to clothing items; if (3) fashion matched pairs: representing fashion style consistency between different clothing items. An influence matrix denoted by S between users is constructed by mining user social circles, which indicates the trust degree of one user to another. Note that it is asymmetric, because the relationship between users is not always bi-directional. Fashion matched pairs shared by fashion icons provide opportunities to discover the implied associations among different clothing items, which is called fashion style consistency. It is modeled by SCNN in this paper. This implied association is denoted as matrix Y. Each value of Y indicates the similarity between two different clothing items in fashion style space. The user-clothing rating records are expressed with matrix R. Each value of R denotes the rating of a user to a clothing item, which is a binary value of 0 or 1. User interests can be projected to the fashion style space by mining R and Y. The interest similarity between users is then calculated, which is denoted as W. Note that W is symmetric. Meanwhile, the relevance matrix Q of user interests to clothing items is obtained. W and Q are within the range of [0, 1]. Finally, a novel PMF framework is proposed to integrate user-user social influnce S, user-user interest similarity W, user-clothing similarity Q and clothing-clothing fashion style consistency Y to recommend appropriate clothing items to users, which is formulated as an optimization problem.

Fig. 4
figure 4

The framework of the proposed personalized clothing recommendation

Table 1 Symbols and their descriptions utilized in this paper

Similar to existing works [14, 28, 41], given the observed matrice R, S, W, Q and Y, our task is to represent users and clothing items with K-dimension vectors in a latent space, which is implemented by PMF. The unkown ratings can be directly predicted by latent vectors of users and clothing items, which is expressed as follows:

$$ \hat{R}=r+UP^{T} $$
(1)

where r is an offset value, which is set as the average rating value in the training data.

The following subsections will introduce the details to calculate the observed matrice R, S, W, Q and Y and the training process.

3.2 Fashion style consistency modeling

Given a pair of clothing items from different categories, our goal is to measure their style consistency. In other words, our model can map clothing items from different categories to a specific style space that captures style compatibility, which is implemented by SCNN [35]. Since SCNN [35] learns this style space by analyzing the training dataset, the sampling method has impact on the performance. Therefore, we first describe our novel sampling strategy to generate the training dataset and then introduce the training of SCNN [35] to learn the specific style space.

3.2.1 Training dataset generation

The clothing items are simply divided into two categories: tops and bottoms. The whole dataset is split into training, validation and test datasets according to the ratio of 7:1:2. The balance of positive and negative examples in three datasets is considered in the sampling phrase. It is note that we only have positive examples obtained from social fashion websites, in which each pair of positive example contains two compatible clothing items. While negative examples need to be formulated by ourselves based on different strategies. In this paper, three strategies are introduced as follows:

Naive::

All positive and negative examples are randomly selected from the whole dataset. Positive and negative examples can be from the same category or different categories. This sampling method is just a baseline in [35].

Strategic::

This sampling approach is introduced in [35]. The motivation of this sampling approach is as follows: clothing items from the same category are almost visually similar to each other, and clothing items from different categories tend to be visually dissimilar. Moreover, convolutional neural networks often map visually similar clothing items close in the output feature space. Therefore, to learn the concepts of style across categories, matched clothing items from different categories should locate closely in the feature space. All positive examples should be from different categories, while negative examples can be either from different categories or the same category. Note that negative examples are also randomly selected from the whole dataset.

Ours::

Positive examples in our dataset are professionally selected and recommended by fashion icons or experts. More importantly, the association of positive examples in our dataset is totally based on visual compatibility, while the dataset from [35] may be based on price, function or other factors of clothing items. Therefore, we just need to select negative examples. Random sampling like [35] is obviously unreasonable, since there exist a large amount of visually similar clothing items in the dataset. For example, a positive example contains a T-shirt and a jean. For this T-shirt, a clothing item is selected to construct a negative example. It is possible that another visual similar jean will be sampled using random sampling method like [35]. Thus there is almost no difference between positive and negative examples, which drops the performance of SCNN [35]. Therefore, a novel sampling strategy based on deep features is proposed in our work. For each positive example (t i ,b i ), negative examples will be constructed for t i or b i respectively. we take t i for an example. The details are as follows:

  1. i.

    The training dataset is divided into two categories, the tops and bottoms sets, denoted as T and B, respectively.

  2. ii.

    Visual neighbors of t i , denoted as T s , are obtained by searching the visually similar items of t i in T.

  3. iii.

    For each t j T s , there exists a positive example (t j ,b j ). Accordingly, for each b j , their visual neighbors form B s .

  4. iv.

    Finally, negative examples for t i are randomly selected from the candidate set C, {CC = BB s }.

Here visual features used in this section is the deep feature in [25].

3.2.2 SCNN training

To train the SCNN, the follow the same training procedure and network parameters are set the same as [35]. In this work, we use GoogleNet [30] pretrained on ILSVRC2012 and augment the networks with a 256-dimensional fully connected layer. The network is then fine-tuned on about 500 thousand positive and negative examples in a ratio of 1:16. It takes approximately 48 hours on a server with a CPU: Intel Core i7 5930K and four GPUs: NVIDIA GTX TITAN-X 12GB using the Caffe library [13].

The scores between tops and bottoms of 30050 clothing items are calculated by the trained SCNN. The matrix Y is then constructed, which can express the fashion style consistency between a top and a bottom.

3.3 User modeling

3.3.1 User interest description

Clothing items are mapped to a specific style space in Section 3.2. User interests are then profiled in this style space according to user rating records. User interest description is measured as follows:

$$ D_{u}= \frac{1}{\left| H_{u}\right|}\sum\limits_{i\in H_{u}}^{}D_{i} $$
(2)

where D u and D i represent feature vectors of user u and clothing item i in the fashion style space.

3.3.2 Personal interest modeling

For users with many rating records, they usually have unique tastes on clothing items and are not easily influenced by their friends. Therefore, personal interest is a significant factor affecting their decision-making, which is measured by the similarity between user interest vectors and clothing item vectors. In our work, this similarity matrix is denoted by Q, which is calculated with the cosine distance between D u and D i :

$$ Q_{u,i}= cos\left( D_{u},D_{i}\right) $$
(3)

Actually, the personal interest Q can be viewed as the prior rating value of a user to a clothing item. Thus, it can also enhance the robustness of recommendation systems to reduce the attack of malicious ratings.

3.3.3 Interpersonal interest similarity

The motivation to take the interpersonal interest similarity into consideration is that user latent feature vector should be similar to his/her friends’ latent feature vectors. Here the interest similarity value between user u and user v is denoted by W u,v , which is measured with the cosine distance between the feature vectors of D u and D v .

$$ W_{u,v}= cos\left( D_{u},D_{v}\right) $$
(4)

It is noted that each row in W is normalized.

3.3.4 Interpersonal influence mining

The interpersonal influence between users is repesented by the matrix S. Since we only have one category, CircleCon2a in [41] is utiliezd in our model to calculate the interpersonal influence, which is shown in Fig. 1. The influence value of v to u is denoted by a positive value S u,v , which is calculated as follows:

$$ S_{u,v}= \left| H_{v}\right| $$
(5)

where H v is the set of clothing items rated by v and |H v | is the total number of clothing items in H v . The interpersonal influence value is normalized as follows:

$$ S_{u,v}^{*}=S_{u,v}/\sum\limits_{v\in F_{u}}S_{u,v} $$
(6)

Finally, the matrix S consists of \(S_{u,v}^{*}\).

3.4 Personalized clothing recommendation

The personalized clothing recommendation model contains four factors: fashion style consistency Y i,j which indicates the association between clothing item i and j; personal interest Q u,i which means what clothing items a user would be interested in; interpersonal interest similarity W u,v which means whose interest is similar with yours; interpersonal influence S u,v which means whom you trust. Given these observed constraints, the objective function of our model is formualted as follows:

$$ \begin{array}{llllll} & \varphi \left( R,U,P,S,W,Q,Y\right) \\ & = \frac{1}{2}\sum_{u,i}^{} \left( R_{u,i}-\hat{R}_{u,i}\right)^{2}+\frac{\lambda }{2}\left( \left\| U\right\|_{F}^{2}+\left\| P\right\|_{F}^{2}\right) \\ & +\frac{\alpha }{2}\sum_{u}^{} \left( \left( U_{u}-\sum_{v}^{} S_{u,v}U_{v}\right) \left( U_{u}-\sum_{v}^{} S_{u,v}U_{v}\right)^{T}\right) \\& +\frac{\beta }{2}\sum_{u}^{} \left( \left( U_{u}-\sum_{v}^{} W_{u,v}U_{v}\right) \left( U_{u}-\sum_{v}^{} W_{u,v}U_{v}\right)^{T}\right) \\& +\frac{\gamma }{2}\sum_{i}^{} \left( \left( P_{i}-\sum_{j}^{} Y_{i,j}P_{j}\right) \left( P_{i}-\sum_{j}^{} Y_{i,j}P_{j}\right)^{T}\right) \\& +\frac{\eta }{2}\sum_{u,i}^{}\left| H_{u}\right|\left( Q_{u,i}-U_{u}{P_{i}^{T}}\right)^{2} \end{array} $$
(7)

The descriptions of symbols in (7) are listed in Table 1. The factor of interpersonal influence is enforced by the third term, which indicates the latent feature vector of user u should be affected by his/her friends’ latent feature vectors with a weight S u,v . The factor of interpersonal interest similarity is enforced by the fourth term, which means the latent feature vector of user u should be similar to his/her friends’ latent feature vectors with a weight W u,v . The factor of fashion style consistency is enforced by the fifth term, which means the latent feature vector of clothing item i should be similar to the latent feature vectors of clothing items with the same fashion style. The factor of personal iterest is enforced by the last term, which means that the relevance of latent feature vectors between user u and clothing item i should be subjected to the prior value Q u,i .

It is noted that the objective functions of BaseMF [31], CircleMF [41] and ContextMF [14] consist of the first two, three, and four terms. Compared with ContextMF, the objection function of CombinedMF adds the last term. Furthmore, CircleMF, ContextMF and CombinedMF only add constrants to refine the results of PMF from user’s perspective. We consider the refinement of PMF from both user’s and item’s perspectives.

figure e

3.5 Model training

The objective funciton can be minimized by the gradient decent approach. The latent feature vectors of U and P are obtained. The calculations of U u and P i are expressed as (8) and (9), respectively.

$$ \begin{array}{llllll} \frac{\partial \varphi }{\partial U_{u}}= & \sum_{i\in H_{u}}^{}I_{u,i}^{R}\left( \hat{R}_{u,i}-R_{u,i}\right)P_{i}+\lambda U_{u} \\ & +\alpha \left( U_{u}-\sum_{v\in F_{u}}^{}S_{u,v}U_{v}\right) \\ & -\alpha \sum_{v:u\in F_{v}}^{}S_{v,u}\left( U_{v}-\sum_{\omega \in F_{v}}^{}S_{v,\omega }U_{\omega }\right) \\ & +\beta \left( U_{u}-\sum_{v\in F_{u}}^{}W_{u,v}U_{v}\right) \\ & -\beta \sum_{v:u\in F_{v}}^{}W_{v,u}\left( U_{v}-\sum_{\omega \in F_{v}}^{}S_{v,\omega }U_{\omega }\right) \\ & +\eta \sum_{i\in H_{u}}^{}I_{u,i}^{R}\left| H_{u}\right|\left( U_{u}{P_{i}^{T}} -Q_{u,i}\right)P_{i} \end{array} $$
(8)
$$ \begin{array}{llllll} \frac{\partial \varphi }{\partial P_{i}}= & \sum_{u}^{}I_{u,i}^{R}\left( \hat{R}_{u,i}-R_{u,i}\right)U_{u}+\lambda P_{i} \\ & +\gamma \left( P_{i}-\sum_{j\in F_{i}}^{}Y_{i,j}P_{j}\right) \\ & -\gamma \sum_{j:i\in F_{j}}^{}Y_{j,i}\left( P_{j}-\sum_{k \in F_{j}}^{}Y_{j,k }P_{k }\right) \\ & +\eta \sum_{u}^{}I_{u,i}^{R}\left| H_{u}\right|\left( U_{u}{P_{i}^{T}} -Q_{u,i}\right)U_{u} \end{array} $$
(9)

where \(I_{u,i}^{R}=\left \{ 0,1\right \}\), when user u has rated clothing item i, \(I_{u,i}^{R}=1\), otherwise it is equal to 0. The descriptions of other symbols are listed in Table 1. U and P are initialized by sampling from the normal distribution with zero mean like [28]. The training algorithm is shown as Algorithm 1. l is the step size and t is the number of iterations, which can be adjusted to ensure the decrease of the objection function in training phrase.

4 Experiments

In this section, two kinds of experiments are conducted: one is to evaluate the performance of our fashion consistency model and compare with the existing model [35]; the other is to evaluate the performance of the proposed personalized clothing recommendation and compare with the existing approaches. The compared approaches are BaseMF [31], CircleMF [41], ContextMF [14] and CombinedMF [28]. The details of experiments are elaborated as follows.

4.1 Dataset

There are several existing clothing datasets available, including outfit dataset [9], Amazon clothing dataset [35], eBay dataset [11], magic closet [20] and deep fashion [23]. However, none of them is suitable for our task, due to the lack of user records, user social circles and matched pairs of clothing items. Meanwhile, these three types of data need to be crawled from the same social fashion website in the same time period, since the study of fashion changes with the time and region. Therefore, a clothing dataset is collected to meet these requirements. Similar to PinterestFootnote 2 and Polyvore,Footnote 3 Mogujie is a famous website of social fashion in China, which has more than 50 million registered users and almost 80 million page views in December 2016. It allows users to share their favorite clothing items and submmit reviews about clothing items. Moreover, users can build the relationship with someone who has the same taste of fashion. It is noted that the relationship between users is unidirectional in this website. As the popularity of this website, a large number of fashion icons are attracted to post matched pairs of clothing items, which provides valuable clues for the study of fashion.

For our task, two kinds of data are collected. One is for the fashion consistency modeling. We collect 37343 pairs of clothing items shared by fashion icons. For some of these images with fashion models, we manually cut clothing items from these images. The other is for personalized clothing recommendation. It contains 1031 users with their relationships, 30050 clothing items with reviews. The statistics of this dataset are listed in Table 2. Note that these two datasets are crawled from January 2015 to April 2015, which ensures that they are from the same period and region. In addition, for the whole experiment, clothing items are just classified to two categories: tops and bottoms. To the best of our knowledge, it is the first dataset which fully contains user records on clothing items, user social circles and matched pairs of clothing items.

Table 2 Statistics of the dataset for personlized clothing recommendation

4.2 The evaluation of fashion style consistency modeling

In this subsection, we evaluate the performance of the fashion consistency modeling. As introduced in Section 3.2, three sampling approaches are compared. The metric is Receiver Operating Characteristic (ROC), which is widely applied in performance evaluation of binary classification problems.

Our test dataset contains positive and negative examples with a ratio of 1:1. The result of ROC is shown in Fig. 12, which clearly shows that our sampling approach outperforms two baselines. The naive approach is randomly sampling for training dataset, so its result is nearly the diagonal line, which indicates that the prediction of test dataset is randomly decided. Positive examples using the strategic approach are matched pairs of clothing items shared by fashion icons, in which the SCNN can capture the fashion style consistency between clothing items of different categories. However, negative examples using the strategic approach are randomly selected from the same category or different categories. Two clothing items with the similar fashion style are probably selected as negative examples, so that the SCNN cannot clearly distinguish positive and negative examples. Compared with these two baselines, positive examples using our sampling approach are the same as the strategic approach, while negatives examples are sampled based on deep features of clothing items, which prevents that visually similar clothing items are selected as negative examples. Essentially, our sampling approach refines the training dataset. Therefore, the SCNN trained by our sampling approach achieves the best performance, which is validated in Fig. 5. Some results of the detection of fashion style consistency based on our SCNN model are shown in Fig. 6. Each column is a matched pair of clothing items with high scores generated by our model. They belong to the same fashion style. For instance, the first two columns are business style, the medium two columns belong to casual style, and the last two columns are sport suits.

Fig. 5
figure 5

ROC curves of three sampling approaches

Fig. 6
figure 6

Some results with fashion style consistency obtained by our SCNN model

4.3 The evaluation of personalized clothing recommendation

4.3.1 Parameter settings

Here we explain the meanings of all parameters and give the parameter settings in our experiment, which are set as the same as [28].

  • K: it is the dimension of the latent space. Users and clothing items are represented by K-dimension vectors in the latent space. If K is too small, it is hard to distinguish users and clothing items. If K is too large, the complexity of recommendation systems will increase considerably. Previous works [14] study the change of performance with different K. Actually, whatever the K is, it is fair for all compared models. Therefore, K is set to 10.

  • λ: the normalized parameter is set to 0.1.

  • α: it indicates the weight of user social circle, which is set to 30.

  • β: it indicates the weight of inferred interest circle, which is also set to 30.

  • γ: the weight of fashion style consistency is set to 30, which keeps balance with other factors.

  • η: the weight of personal interest is set to 30.

λ, α, β, γ and η are tradeoff parameters in our model, which adjust the strengths of different terms in the objective function (7).

4.3.2 Performance metrics

For personalized clothing recommendation, training and test datasets are with a ratio of 8:2. 80% of each user’s rating data is used in training phrase, while the left 20% is exploited in test phrase. Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are used as metrics in our experiment, which are most popular for the evaluation of recommendation systems. They are defined as follows:

$$ RMSE=\sqrt{\frac{\sum_{\left( u,i\right)\in R_{test}}\left( R_{u,i}-\hat{R}_{u,i}\right)^{2}}{\left| R_{test}\right|}} $$
(10)
$$ MAE=\frac{\sum_{\left( u,i\right)\in R_{test}}\left| R_{u,i}-\hat{R}_{u,i}\right|}{\left| R_{test}\right|} $$
(11)

where R u,i is the real rating value of user u on clothing item i, \(\hat {R}_{u,i}\) is the corresponding predicted rating value by the model, which is calculated as (1). |R t e s t | is the number of user-clothing pairs in test dataset.

4.3.3 Performance comparison

The results of all models are listed in Table 3. From Table 3, we can see that the accuracy of our personalized clothing recommendation model is the best among all models. For BaseMF, it is easy to suffer from the sparsity of the dataset, since it doesn’t consider any other factors. Compared with BaseMF, CircleMF decreases the error of prediction by 20% on RMSE and 25% on MAE, which indicates the factor of user social circles is really significant on social websites. ContextMF takes the interpersonal interest similarity into consideration, which decreases the prediction error by 19% on RMSE and 18% on MAE compared with CircleMF. It demonstrates the reasonability that the interest of a user should be similar to the interests of his/her friends. The factor of user personal interest is added to CombinedMF on the basis of ContextMF. It measures the extent to which a clothing item meets user interest, so that CombinedMF achieve a better performance than ContextMF. As aforementioned, these models profile users and clothing items in latent space from users’ perspective. The factor of fashion style consistency between clothing items is integrated into our model. It drops the prediction error by 28% on RMSE and 30% on MAE compared with CombinedMF, which validates that the consistency of fashion style between clothing items is an essential factor for clothing recommendation. Meanwhile, all factors as constrains alleviate the impact of the sparsity of the dataset. For the fairness of the comparison, it is noted that parameters for all factors are set to 30, which indicate that all factors have the equal weight.

Table 3 Performance comparison of all models (Note: the best performance is emphasized in bold)

4.3.4 Discussion

Data sparsity is the key problem for recommendation systems, which is associated with the amount of user records (the number of user rated clothing items and users’ friends) in our experiment. Therefore, we will discuss its impact. Meanwhile, the effectiveness of four factors aforementioned will also be evaluated in this subsection.

The impact of the amount of user records

There are two types of user records: users’ rated clothing items and users’ friends in their social circles. The parameters are the same as Section 4.3.1. First, the impact of the number of user rated clothing items is analyzed. The test dataset is divided into five groups according to the number of user rated clothing items, which is listed in Table 4. The histograms of RMSE and MAE are illustrated in Figs. 7 and 8, respectively. The values of horizontal axis represent the number of user rated clothing items. With the increase of user rated clothing items, the prediction errors of all models are reduced, which illustrates the sparsity of the dataset is still the main problem for recommendation systems. In addition, our model is superior to compared models in each group. Furthermore, our model achieves a smaller difference in each group compared to other models, which indicates the robustness of our model for the sparse dataset. To analyze the impact of the number of users’ friends, the test dataset is also split into five groups according to the number of users’ friends, as listed in Table 5. The histograms of RMSE and MAE are shown in Figs. 9 and 10. We can see that the more the prediction errors are dropped as the number of users’ friends increases. Meanwhile, compared to BaseMF which don’t consider social factors, the pridiction accuracy of other models is significantly improved. These consequences are demonstrated that social factors are effective for recommendation systems.

Fig. 7
figure 7

RMSE histograms of the impact of the number of user rated clothing items

Fig. 8
figure 8

MAE histograms of the impact of the number of user rated clothing items

Fig. 9
figure 9

RMSE histograms of the impact of the number of users’ friends

Fig. 10
figure 10

MAE histograms of the impact of the number of users’ friends

Table 4 Number of users in each group based on the number of user rated clothing items
Table 5 Number of users in each group based on the number of users’ friends

The impact of four independent factors

In this subsection, the performances of four independent factors are compared. As stated in Section 4.3.1, α, β, γ and η are the parameters which can adjust the weights of four factors. Therefore, we can test different combinations of factors by setting these parameters. For example, the factor of interpersonal influence can be tested when we set α = 30, β = γ = η = 0. Furthermore, we can set α = β = 30, γ = η = 0 to test the combination of factors of interpersonal influence and interest similarity. The weights of factors are set to 30 for fairness. RMSE and MAE are also evaluation metrics.

The performances of different combinations of factors are illustrated in Figs. 11 and 12. R, S, W, Q and Y mean the basic PMF without any factors, interpersonal influence, interpersonal interest similarity, personal interest, and fashion style consistency, respectively. R + S denotes that we consider the factor of interpersonal influence. R + S + W + Q + Y denotes that we take all factors into consideration. From Figs. 11 and 12, the more factors we consider, the less prediction errors are obtained. For four independent factors, R + S, R + W, R + Q and R + Y, they all obtain much better results than R. Especially, R + Y achieves the best performance, which indicates the fashion style consistency is a significant factor for clothing recommendation and plays an important role in decision making process of users. It also demonstrates that it is absolutely reasonable to integrate the relationship between items as a constraint for the refinement of PMF.

Fig. 11
figure 11

RMSE histograms of the combinations of different factors

Fig. 12
figure 12

MAE histograms of the combinations of different factors

5 Conclusion

In this paper, a practical problem based on social fashion websites is addressed: given the historical records of clothing items of a user and his social information, recommendation systems can recommend a list of clothing items, which can both meet users’ interest and match his clothing items bought previously according to fashion style consistency. Fashion style consistency between clothing items is modeled by SCNN with a novel sampling approach. As a new constraint, it is integrated into the PMF framework with other three facors (i.e., personal interest, interpersonal interest similarity and interpersonal influence). To verify our model, experiments are conducted on our datasets crawled from a social fashion website Mogujie. The results demonstrate that the proposed method is more effective than the state-of-the art approaches. However, facing the massive clothing items on social fashion websites, our method currently needs much time to implement clothing recommendation. In the future, we will improve the efficiency of our model by distributed systems.