Keywords

1 Introduction

Recently, EC (Electronic Commerce) sites that are online stores to purchase variety of products through internet getting popular (Fig. 1). Under such situation, in real-world retailing such as supermarket industry, are required improvement of sales by unique service.

Fig. 1.
figure 1

Global online retail sales have increased 17% yearly since 2007 [1]

In order to propose some new unique services, analyst must know the characteristics of customers and tendency of purchasing behavior. Moreover, characteristics of each store are equally important to consider because the characteristics of store such as floor layout, competitor of the area and population of the market area, are very different. However, previous analyses of real-world retailing performed only purchasing history. It is necessary analyze using store causal data to improve the sales because customers are different depending on the surrounding environment.

2 Purpose of This Study

In this study, we aim to clarify customer characteristics and purchasing behavior using store causal data. It can provide useful suggestion for proposing sales strategies such as shelf allocation within stores and types of products to be stocked compared to analysis using only sales history. In addition, analysis of the relationship between product sales histories, will lead to improvement in store sales and satisfaction.

3 Analytical Procedure

We use ID-POS (point of sales with customer ID number) data of 53 store at a supermarket chains from March 2015 to February 2016 excluding nonmember data. Figures 2 and 3 the summarizing own data.

Fig. 2.
figure 2

Male-female ratio of all customers

Fig. 3.
figure 3

Compared ratio of age

Figure 4 shows the outline of analytical procedure. First, we classify stores by cluster analysis using store causal data. Second, focusing on each classified cluster, we aggregate customer characteristics such as age structure and household composition and classify customers. Second, we perform association analysis on purchasing behaviors and clarify concurrent selling commodities for each cluster. Finally, we clarify customer characteristics and purchasing behavior for each store causal from these analysis results.

Fig. 4.
figure 4

Outline of analytical procedure

4 Analysis Method

4.1 Hierarchical Clustering Analysis

Hierarchical cluster analysis is a method of classifying objects by creating collections of objects similar to each other from a group in which objects of different properties are mixed based on the distance between objects.

In this study, 53 stores were classified into nine clusters by using store causal data, in order to clarify the relationship between the characteristics owned by the store and the customer characteristic and purchasing behavior.

The cosine distance was used as the distance between objects used for classification, and the Ward method [2] was adopted to merge cluster (Table 1).

Table 1. Variables used for hierarchical clustering

4.2 Decile Analysis

Decile analysis is an analytical method for calculating the sales composition ratio of each rank by ranking the purchase price of all customers, based on purchase history data.

Generally, decile analysis is conducted to clarify customers with high purchase price per cluster and to evaluate differences in purchase behavior with other customers.

In this study, we performed decile analysis for each cluster that classified by hierarchical cluster analysis, and we defined that Decile rank 1 to 8 customers as “general customers” and 9 to 10 customers as “good customers”.

4.3 Association Analysis

Association analysis or market basket analysis in marketing is used to extract meaningful relevance between products from enormous log data. In order to pick up concurrent selling relationship and to obtain suggestions leading to sales measures such as display and sales floor placement, association analysis was conducted.

In this analysis, association analysis is performed for “good customer” and “general customer” set for each cluster, we used the basket ID assigned to the shopping cart as a key for making concurrent selling relationship.

5 Results of Analysis

5.1 Store Classification

The results of the hierarchical cluster analysis are shown in Tables 2 and 3.

  • Clusters 1 and 8

    They composed only of city center stores.

  • Clusters 2, 3, 6 and 7

    They composed only of suburban stores, clusters 4 and 5 are clusters constituted only of mountainous stores.

  • Cluster 9

    It composed of city center stores and suburban stores.

Table 2. Result of hierarchical cluster analysis (1)
Table 3. Result of hierarchical cluster analysis (2)

In order to extract customer features for each cluster, we compiled the ratio of male and female (Fig. 5), the ratio of age, the ratio of unmarried and married (Fig. 6), and the ratio of the number of household members.

Fig. 5.
figure 5

Ratio of male and female per cluster

Fig. 6.
figure 6

Ratio of age per cluster

Fig. 7.
figure 7

Ratio of unmarried and married per cluster

From the Fig. 8 in the ratio of male to female, the cluster with the lowest female ratio is cluster 1, the highest cluster is cluster 9. In the ratio of unmarried and married, the cluster with the lowest marriage ratio is cluster 1, the highest cluster is cluster 2, 7. However, there was not much difference between the clusters in either case.

Fig. 8.
figure 8

Ratio of the number of household members per cluster

On the other hand, the number of households and age differed among the clusters. In the ratio of the number of household members, the proportion of “1 person” in clusters 1, 8, 9 is high, while in the clusters 4, 5, the ratio of “5 or more people” is high.

In this way, it was found that there is a difference in characteristics of customers for each cluster (Fig. 7).

5.2 Classification of Customers and Evaluation of Purchasing Behavior

We performed a decile analysis for each cluster and classified our customers as “general customers” and “good customers.”

Then, we analyzed purchasing behavior such as aggregation of product categories with high purchase price.

Table 4 shows the number of goods per purchasing opportunity for general customers and good customers is described for each cluster.

Table 4. Number of items per purchase average

From Tables 5, 6, 7 and 8 show the top five items of the item category with high purchase price are listed. In this paper, we described only the results of cluster 1 and cluster 4.

Table 5. Top 5 items of category in cluster 1 (general customers)
Table 6. Top 5 items of category in cluster 1 (good customers)
Table 7. Top 5 items of category in cluster 4 (general customers)
Table 8. Top 5 items of category in cluster 4 (good customers)

The number of items per purchase is higher for good customers than for general customers. Also, it can be seen that there is a difference in the number of items per purchasing opportunity for each cluster.

  • Cluster 1

For general customers, merchandise categories that do not require cooking such as ready to eat sushi and frozen boiled rice are higher in purchase price, whereas good customers are higher price of merchandise category requiring cooking such as brand pork and Japanese beef.

Although not listed in the table, beer and the third beer (malt-free beer like alcoholic beverage) were included at the top of purchase price for both general customers and good customers.

  • Cluster 4

There was almost no difference between categories where ordinary customers’ purchase price was high and categories with high purchase price of superior customers.

5.3 Association Analysis

Using the purchase history from 2015/03/01 to 2016/02/29, we conducted association analysis with basket ID as the key for general customers and good customers in all clusters.

At this time, we extracted the association rule with support: greater than or equal 0.1%, Lift: greater than or equal 1%, rule length: 2 as each threshold (Table 9).

Table 9. Top 5 items of category in cluster 4 (general customers)

There were many association rules are extracted for general customers, even for good customers. The clusters with number of rules to be extracted are clusters 4, 5 and 6, and clusters with few rules to be extracted are clusters 1, 3 and 9.

Tables 10, 11, 12 and 13 describe characteristic rules from the extracted association rules. In this study, we described only the results of cluster 1 and cluster 4.

Table 10. The extracted association rule in cluster 1 (general customers)
Table 11. The extracted association rule in cluster 1 (good customers)
Table 12. The extracted association rule in cluster 4 (general customers)
Table 13. The extracted association rule in cluster 4 (good customers)

It can be seen that there is a difference between general customers and good customers, we did not find much difference between clusters.

6 Discussions

As the customer characteristic of Clusters 1 and 8, the number of households is one person, the unmarried rate is slightly high, and the proportion of elderly people is low. Clusters 1 and 8 are considered to be clusters where single households in 30 s to 40 s are more than other clusters. On the other hand, as a feature of purchasing behavior, cluster 1 includes beer and third beer in upper category of purchase price. It seems that this is because customers of single households are using stores in Cluster 1, which is centered on small stores, to buy alcoholic drink.

As the customer characteristic of suburban stores in clusters 2, 3, 6 and 7, the ratio of 2 to 4 people in household composition ratio is 70%, that are higher than other clusters, while the ratio of one or five people is low. From these facts, it is speculated that customers are mainly housewives of nuclear families living in the suburbs.

As a characteristic of purchasing behavior, clusters 2, 6, and 7 are such that the number of purchased goods per purchasing opportunity is large and the number of extracted association rules is also large. In Cluster 3, the number of products purchased per purchasing opportunity is small, but this is probably because the average store area is small and the number of products is small. As association rule, typical characteristic was not found. In the clusters 4 and 5 in the mountainous area, the number of households was 5 or more, and the ratio of households in the 60 s and over 70 s was higher in the ages. From these facts, clusters 4 and 5 are considered to be clusters of elderly people couple families who live in mountainous areas compared with other clusters.

The characteristic of purchasing behavior is that the purchase price of sushi is high, the number of purchased goods per purchasing opportunity is large, and the number of extracted association rules is large.

Also, since it is a cluster with an average parking number of 100 or more, it is thought that there are many customers who visit by car and buy many items at a time.

As a feature of purchasing behavior common to all clusters, general customers have higher product categories that do not require cooking, whereas good customers tend to have higher-ranking product categories requiring cooking. In addition, as a result of association analysis, rules such as “chocolate - snack” and “banana - yogurt” are found for general customers, and rules such as “potatoes - onions” and “wooden mushrooms - tofu” are found for good customers.

For this reason, we think that there are differences in the purpose of using supermarkets for general customers and good customers, and it is necessary to propose appropriate measures for each.

7 Conclusion and Future Works

Study, first, we classified stores by cluster analysis using store causal data. Second, focusing on each classified cluster, we aggregate customer characteristics such as age structure and household composition and classified customers. In addition, we performed association analysis on purchasing behaviors and clarified concurrent selling commodities for each cluster. Finally, we clarified customer characteristics and purchasing behavior for each store causal from these analysis results.

Own analysis revealed that there are differences in customer characteristics and purchasing behavior depending on the characteristics of each store, such as the sales floor area and the population within the trading area. By using this result, it is possible to propose marketing measures unique to that store according to the surrounding environment, store size, etc., which were not taken into account in analysis using only the purchase history. In order to classify the stores more accurately, it is necessary to consider a method for the commercial areas of each store, taking into account competing stores and the like. In addition, it is thought that more useful suggestions can be obtained by looking at changes in purchasing trends such as the seasons of each cluster and considering customers using many stores.