Keywords

1 Introduction

In recent years, with the development of online shopping, the shopping experience of online customer has been investigated by many researchers [4, 26, 28, 36] from different perspectives. Among these studies an important issue of online shopping experiences lies in the difference between online and offline shopping experience [10, 18]. These researches showed that the socioeconomic variables which traditionally considered being import have changed to be insignificant as before but security aspect tends to be more related. Based on those findings, on-line shopping websites are built to improve the shopping experience from several perspectives including quality control of website [26], interface design for elderly people [27], service quality experience [4] etc. Previous studies have identified that product uncertainty and low retailer visibility will have negative impact on customer satisfaction and thus poor online shopping experiences [33]. Researchers have endeavored to capture more information about products and other features to enhance customers’ online shopping experience including utilizing big data, computer vision, and machine learning techniques recently developed.

The online recommendation systems for improved customer online shopping experience have gained popularity because the past transaction data could be used to predict customers purchasing choices [45]. At the same time, there are many successful solutions for online customer recommendation systems [2, 39]. For example, Amazon had increased nearly 30 % of its sales by developing the online recommendation system from customer browsing history. At the same time, the online recommendation system also helps Amazon to control the security and price of the selling item by analyzing the big data provided by customers and products [31, 38]. However, most existing online recommendation systems are developed from readable text [3, 21, 40], leaving many new types of data such as image and multimedia data unused. Multimedia data and image data provides much rich information than readable texts. How to extract meaningful information from multimedia resources like images and apply the extracted information into the online shopping recommendation systems has rarely been considered mainly due to the relevantly new development of computer vision technologies. This study aims at exploring the new techniques from Computer Vision and Machine Learning perspective and proposing a framework of integrating these new techniques with the existing online shopping recommendation systems. The fashion and clothing industry are used in this study as an example to explore such possibility.

We firstly reviewed past research from online shopping experience perspective, mainly on the design of online shopping website or online customer satisfaction, followed by a search for the Computer Vision methods which may be applied to improve online shopping experience. We reviewed most top conferences on Computer Vision such as Computer Vision and Pattern Recognition and ACM Multimedia Conference, especially targeting at the fashion and clothing area. The review results demonstrated that attribute learning method could be used to improve online shopping experience. We illustrated this by demonstrating how fashion item recommendation system could be developed with attribute learning method in computer vision. The implications to both researchers and practitioners are then discussed.

2 Computer Vision Methods and Online Recommendation Systems

The previous research of online shopping behavior [30] shows the dimensions of web site design, reliability, responsiveness, and trust affect overall service quality and customer satisfaction. This paper mainly explore how online shopping experience could be improved from the web site design perspective and explores what new type of technologies from Computer Vision could be used to improve the website design. Meanwhile, we also try to discuss the implications of machine learning and computer vision on information systems theories. We firstly discussed the recent developments in the Computer Vision area and then explore how these new techniques could be applied into the online shopping recommendation system.

2.1 Extract Semantic Attributes from Images

Early studies on online recommendation system rarely consider image as an important factor but only to display the pictures clearly to achieve the optimal product effects [16]. The information in the picture is not fully explored mainly because the image processing techniques haven’t been fully developed in early days. Alongside the development of the Image Interactivity technology which enables the creating and manipulation of product images, the potential to exploit more feature from images increase. In the beginning, researchers started focusing on sketching and modelling fashion items [7]. Recently, due to the techniques from machine learning, Computer Vision is witnessed some big breakthroughs. One of the major breakthroughs in Computer Vision is the recognition of image categories [14, 29, 42]. The first improvement comes with feature representation of images, for example, at the feature level, there are kinds of features that could be extracted by different methods including SIFT [24], GIFT [35], Histograms of Oriented Gradient (HOG) [11], Local Binary Pattern (LBP) [1], Maximum Response Filters [43]). Based on these features, a well-trained model could be developed to classify different objects, such as shirts, shoes or hats into categories. The semantic attributes provided by researcher can be used to further assist object classification. Some business solutions had already used this method to preform image mining and achieved satisfactory results [5]. An example of semantic Attribute on Clothes is shown in Table 1.

Table 1. Example of semantic attribute on clothes [5]

However, the problem with this kind of recognition mechanism is that it usually ignores certain type appearance of objects such as the color and texture. In order to solve this problem, some new models were introduced to learn visual attributes [15]. By using this method, human understandable properties could be extracted from images. If we put those properties as labels attached to images, then we can group images by a combination of labels [13, 25]. For example, we can describe a shirt in a specific style with black and white stripes or a white shirt with red round on it and classify clothes with these properties. By using those methods, we could extract some high level semantic features from images such as clothing style, patterns and textures. But these methods only work well with clear and simple image data. As a result, in the realistic online shopping environment, those methods can hardly handle the complex and noisy image resources.

In order to solve this problem, some object detection models have been developed [9, 46]. These models use human pose estimation or simple object detection method to locate the interesting item in an image so that attribute learning method can be applied only to those located item. With this kind of preprocessing method, we could extract semantic attributes from images in a real online shopping environment. There is already some success research in this area. Actually, there is already some success research on this. For example, through collection of a well labelled dataset, Chen et al. [8] extracted complex semantic features from clothing in Fig. 1. Moreover, Liu et al. [32] collected both top and bottom clothes and identified the semantic feature relations between them, which enable them to make further suggestion on item combinations of clothes.

Fig. 1.
figure 1

Extract clothing attribute

As shown above, applying those information collected from Computer vision method could help to improve the design of website and improve not only the description of products but also the shopping experience. However, based on the research of the complexity of website, Park and Kim [36] separated the whole web site into six aspects and find the importance of each part is not equal, and the design of website should not be too complex [17]. So when applying these new technologies into online shopping environment, we need to consider the complexity of new feathers. To apply the huge amount of information provided by Computer Vision methods, certain work is required to be done in information system area to measure the effects of those semantic features. Currently there is no research in the information systems area trying to explore the usage of computer vision methods to improve customer experience. This paper aims at exploring the new perspective and new theories that might arise from the interaction between computer vision and information systems research areas.

2.2 Enrich Recommendation System with Image Features

Analyzing the customers’ behavior from their shopping history and using these information to make recommendations to customers so that customers shopping experience could be enhanced has become a trend in most e-commerce websites [10, 36]. Currently, most online shopping websites such as Amazon and eBay make suggestions to their customer by analyzing customers searching or shopping history. This method is successful because related items or products similar to those from their browsed history could be pushed to customers. The limitation is that all the predictions are only based on the item-to-item or user-to-item combinations [31, 37]. The algorithm of these models only considers the relations between item and user or item and item, but ignores the features of the products themselves.

The most salient features extracted from images in e-commerce websites would be used to enhance online recommendation systems and thus shopping experience. Extracted feathers could be those descriptive feathers perceived by human beings such as color and style etc. For example, clothes on Amazon web-site usually contains 5 labels: color, style of sleeve, material and brand, but from the pictures provided by website we can extract more than 10 additional labels such as length, cut, pocket, collar, and material etc. [8, 12]. Moreover, new algorithm could be built based on some public training datasets [5, 23], and well trained model can automatically extract the clothing part and analyzing possible labels from each clothes. These labels could be implemented from human perspectives and some cognitive factors could also be used to extract useful information from clothes pictures. For instance, personality type could be used to classify clothes style based on attributes extracted from images. There is thus a possibility to provide more accurate description of products from higher cognitive and conceptual level so that customers could be provided more enriched products information at higher conceptual and cognitive level.

The overall trend for online fashion recommendation system enables the online shopping systems to be more personalized. There are some successful examples for the fashion recommendation systems through mining the combination of both text and image features. Jagadeesh et al. [22] proposed a fashion recommender by analyzing the color model from street images for item recommendation. Iwata et al. [20] collected text and image data from fashion magazines to build a topic based recommendation system. These two works are item based which only consider the relationship between items and the item-user relationship is not considered here. With the development of social networks, personalized recommendation systems with image features are gaining popularity in recent research. Sigurbjornsson et al. [41] proposed a personalized tag recommendation system based on a Flickr dataset. In this work, they analyzed the frequently used tags of customers to automatically recommend personalized tags for newly added photos. And another research from Yue et al. [47] provided a similar personalize recommendation system by collecting customers’ feedbacks. This type of research mostly concentrates on the customer side, and provides recommendations by finding similar customers. Meanwhile, there is also some research considering both user-to-item and item-to-item relationship at the same time. In Hu et al.’s [19] research, they built a model with each customer’s preferred fashion items and then combined these items to make a personalized recommendation for a set of fashion item as shown in Fig. 2.

Fig. 2.
figure 2

Finding tops to match with given bottom and shoes with image features [19]

As shown in Fig. 2, researchers build various recommendation systems through mining the large set of data collected from computer vision methods. However, the current contribution of these new papers is mostly on the new mathematical methods or algorithms that could handle different types of datasets. These works only focus on the recommendation algorithm from the technology perspective. How customers will response to this new type of data hasn’t been investigated from the information systems perspective. What type of features shall be extracted? Which features are more salient in improve online customers shopping experience haven’t been explored as well.

2.3 Image Analysis with Humans in the Loop

Most Computer Vision problems are solved by machine learning algorithms and there is no need to build a huge image dataset to be learned by that algorithm. Rather researchers need to collect a well labelled fashion dataset for training purpose. The quality of that dataset determined the accuracy of the computer vision model in a certain degree. However, the collection of that dataset is normally expensive and time consuming. Specifically, in fashion and clothing industry, the product and style are changing ever year and fashion companies update their dataset frequently. To solve this issue, the humans in the loop method is proposed [6, 34]. In this method, humans answers are collected for some specifically designed questions, and these questions are formed as human knowledge to enrich the model. Compared with the previous algorithm, the Humans in the loop method use less dataset and get more intelligent results in a dynamic way.

The current progress for humans in the loop methods only have been widely used in animal datasets [6] or unfamiliar classes [44]. There are not any works on fashion items mainly because the feedbacks on fashion items are different among different customer groups, which is not like those structured feedbacks on animals. To improve the humans in the loop methods for the fashion items, more feedbacks from different customer groups could be adopted in the algorithm. The past marketing research findings on customer segmentation could be considered to apply into the humans in the loop methods. The integration of previous marketing theories and information systems theories is expected to contribute to the humans in the loop methods.

3 Conclusion

The purpose of this research aims at exploring the potential to combine the computer vision method with information system method to improve the online shopping experience. We have reviewed and visited a series of computer vision methods and machine learning skills, especially from the fashion area, followed by the current development of online shopping recommendation systems. We found that most online shopping recommendation systems only used the text information from the products and a large amount of information from pictures are not considered in the current online recommendation systems.

We proposed that more fine and enriched information extracted form product pictures with computer vision methods could improve the online shopping experience, and illustrated with the current progress in this area. Although, the potential for the online recommendation system through computer vision methods is very promising there are still many issues to be tackled. We have proposed two important perspectives to be considered to better apply computer vision methods into online recommendation systems. Firstly, what type of semantics features shall be used to build the conceptual models to extract attributes from products pictures? We may have fantastic computer vision techniques but customers may not like any information extracted from product pictures. The conceptual models and even past marketing theories could be used to make the conceptual features more meaningful for computer vision methods. Secondly, with the humans in the loop methods, what type of customer knowledge shall be used to build the algorithm for fashion items?

To apply the computer vision methods into online recommendation system, it’s thus essential to gain insights and knowledge from customers’ perspective. More research shall focus on testing and investigating customer feedbacks on the current online recommendation systems through computer vision methods. There are also some issues to be solved before applying extracted information from images to the online recommendation system from the technology perspective. Those new algorithms mentioned above all concentrate on the technology side, and most of them only work well with detailed labelled training data. In realistic situation, it might difficult to build the well labelled training data and the images to be analyzed also contain lots of noise data. In this case, the performance of current Computer Vision algorithms should be carefully tested before putting in use.