Keywords

1 Introduction

Decision making is an integral part of human life. Every day, every person is faced with different kinds of decision-making problems, which can affect both professional and private life. An example of a decision-making problem can be a change of legal regulations in the state, choice of university, purchase of a new car, determination of the amount of personal income tax, selection of a suitable location for the construction of a nuclear power plant, adoption of a plan of research, or the sale or purchase of stock exchange shares.

In the majority of cases, decision-making problems are based on many, often contradictory, decision-making criteria. Therefore multi-criteria decision-analysis (MCDA) methods and decision support systems enjoy deep interest both in the world of business and science. Almost in every case, a reliable decision requires the analysis of many alternatives. Each of them should be assessed from the perspective of all the criteria characterizing its acceptability. As the complexity of the problem increases, it becomes more and more challenging to make the optimal decision. An additional complication is that there is no complete mathematical form depending on the criteria and the expected consequences. In particularly important problems, the role of the decision-maker is entrusted to an expert in a given field or to a group of experts who will help to identify the best solution. We talk about individual or group decision making, respectively. Even then, it can often be problematic for an individual expert as well as for collegiate bodies to determine the right decision. In this case, MCDA methods can be helpful.

MCDA methods are great tools to support the decision-maker in the decision-making process. We can identify two main groups of MCDA methods, i.e., American and European schools [33]. Methods of the American school of decision support are based on the utility or value function [5, 16]. The most important methods belonging to this family are: analytic hierarchy process (AHP) [34], analytic network process (ANP) [35], utility theory additive (UTA) [21], simple multi-attribute rating technique (SMART) [28], technique for order preference by similarity to ideal solution (TOPSIS) [3, 32], or measuring attractiveness by a categorical based evaluation technique (MACBETH) [2]. Methods of European school of decision support use outranking relation in the preference aggregation process, where the most popular are ELECTRE family [1, 36] and PROMETHEE methods [8, 15]. Additionally, we can indicate the set of techniques based strictly on the rules of decision making. These methods use the fuzzy sets theory (COMET) [12, 25,26,27] and the rough set theory (DRSA) [29].

Generally, MCDA methods help to create a ranking of decision variants where the most preferred alternative comes first [4]. The problem arises when we use more than one MCDA method, and the rankings obtained are not identical. Then the question arises on how to compare the received rankings? Currently, the most popular approach is the analysis based on the correlation between the two or more rankings [7, 12, 19, 24]. However, we are going to show that this analysis is insufficient in the decision support domain. An appropriate approach should ensure that a better ranking in terms of the order can be identified. Then, with a proper benchmark, it would be possible to assess the correctness of the MCDA methods in terms of the rankings generated [22].

In this paper, we identify the shortcomings of currently used coefficients to measure the similarity of two rankings. The most significant contribution is the WS coefficient, which depends strictly on the position on which the difference in the ranking occurred. Afterward, three linguistic terms are identified by using trapezoidal fuzzy numbers, i.e., low, medium, and high similarity. We compare the proposed coefficient with \(\rho \) Spearman, \(\tau \) Kendall, and \(\gamma \) Goodman-Kruskal coefficients, which are commonly used to measure rankings similarity in MCDA problems [9, 12, 17, 18, 23]. In addition, the proposed approach is compared with the similar coefficients presented in [6, 10, 13]. For this purpose, numerical experiments are discussed.

The rest of the paper is organized as follows: In Sect. 2, some basic preliminary concepts are discussed. Section 3 introduces a new coefficient of rankings similarity in the decision-making problems. In Sect. 4, the practical feasibility study of the WS coefficient is discussed. In Sect. 5, we present the summary and conclusions.

2 Preliminaries

An important issue is how to compare the correctness of the order of the two rankings. The simplest method is to check whether the rankings are consistent or inconsistent. Such an approach is not sufficient and can be used almost exclusively for 2 or 3 elementary rankings [27]. The much more common approach is to use one of the coefficients of monotonous dependence of two variables, where the obtained rankings for a set of considered alternatives are our variables. The most commonly used symmetrical coefficient of such dependence is the Spearman’s coefficient [9, 17, 18, 23], which is expressed by the following formula (1):

$$\begin{aligned} r_s=1-\frac{6 \cdot \sum d_{i}^{2}}{n \cdot \left( n^{2}-1\right) } \end{aligned}$$
(1)

where \(d_i\) is defined as the difference between the ranks \(d_i = R_{xi} - R_{yi}\) and n is the number of elements in the ranking. The Spearman’s coefficient is interpreted as a percentage of the rank variance of one variable, which is explained by the other variable [31].

The most frequently used asymmetrical monotonous coefficients of two variables are Kendall [12, 20] and Goodman-Kruskal coefficients [12, 14]. They are expressed in formulas (2) and (3) respectively:

$$\begin{aligned} \tau =2 \cdot \frac{N_{s}-N_{d}}{n \cdot (n-1)} \end{aligned}$$
(2)
$$\begin{aligned} G=\frac{N_{s}-N_{d}}{N_{s}+N_{d}} \end{aligned}$$
(3)

where \(N_s\) is the number of compatible pairs, \(N_d\) is the number of non-compliant pairs, and n is the number of all pairs. The Kendall and Goodman-Kruskal coefficients, unlike Spearman, are interpreted in terms of probability. They represent the difference between the probability that the compared variables will be in the same order for both variables and the probability that they will be in the opposite order.

The presented coefficients are the most frequently used measures of the analysis of the rankings similarity in decision-making problems [9, 12, 17, 18, 23]. However, we want to indicate a significant shortcoming, which is related to the place of difference occurrence. The idea of measuring the rankings similarity is not new and has been the subject of many works [11, 30]. Particularly interesting in the context of the presented approach are works related to Blest’s measure of rank correlation v and the weighted rank measure of correlation \(r_w\) [6, 10, 13]. They are expressed in formulas (4) and (5) respectively:

$$\begin{aligned} r_{w}=1-\frac{6 \sum _{i=1}^{n}\left( R_{xi}-R_{yi}\right) ^{2}\left( \left( n-R_{xi}+1\right) +\left( n-R_{yi}+1\right) \right) }{n^{4}+n^{3}-n^{2}-n}\end{aligned}$$
(4)
$$\begin{aligned} v=1-\frac{12 \sum _{i=1}^{n}(n+1-R_{xi})^{2} \cdot R_{yi}-{n(n+1)^{2}(n+2)}}{n(n+1)^{2}(n-1)}\end{aligned}$$
(5)

The presented coefficients (1-3) are regardless of whether the error occurs at the top or bottom; the values of the factors will be identical. In Table 1, the simple example shows five rankings, including one reference (\(R_x\)) and four test rankings (\(R_y^{(1)} - R_y^{(4)}\)). The test rankings were created by a change in the correct ranking of the two adjacent alternatives. We want to remind that the rankings are determined to choose the best possible solution, and the value of the preferences decreases with each position in the ranking. The difference at the top should be more significant than an error at the bottom of the ranking. The exchange of alternative locations from the first and second position is a more considerable error than the swap of the second and third position. However, the values of the coefficients indicate that similarity of the test rankings to the reference ranking is the same for all test sets.

3 WS Coefficient of Rankings Similarity

The new ranking similarity factor should be resistant to the situation described in the previous section, and at the same time, should be sensitive to significant changes in the ranking. Besides, this factor should be easy to interpret, and its values should be limited to a specific interval.

We assumed that the new indicator should be strongly related to the difference between two rankings on particular positions. An additional assumption is that the top has a more significant influence on similarity than the bottom of the ranking. Based on these assumptions, a new indicator was developed, which can be presented as (6):

$$\begin{aligned} WS=1 - \sum _{i=1}^{n} \left( 2^{-R_{xi}} \cdot \frac{ |R_{xi}- R_{yi}|}{max\{|1-R_{xi}|, |N-R_{xi}|\}} \right) \end{aligned}$$
(6)

where WS is a value of similarity coefficient, N is a length of ranking, \(R_{xi}\) and \(R_{yi}\) mean the place in the ranking for \(i-th\) element in respectively ranking x and ranking y.

The proof of convergence for the WS factor is quite simple. The formula (6) can be divided into two main components. The first one (7) is responsible for making the WS value dependent on the position in the reference ranking (\(R_x\)).

$$\begin{aligned} 2^{-R_{xi}} \end{aligned}$$
(7)

We are dealing with a geometric series which is convergent. As proof, we can calculate a trivial limit.

$$\begin{aligned} \lim _{n \rightarrow \infty } \sum _{i=1}^{n} \left( 2 \right) ^{-R_{xi}}=1 \end{aligned}$$
(8)

The second component (9) determines to what extent the difference in rankings affects the similarity of rankings. This value can be obtained from zero (the positions are identical) to one.

$$\begin{aligned} \frac{ |R_{xi}- R_{yi}|}{max\{|1-R_{xi}|, |N-R_{xi}|\}} \end{aligned}$$
(9)

If we multiply the (7) by (9) then this series cannot be higher than one. Therefore, it is clear that the WS coefficient can only take values from zero to one. We can compare all coefficients for a simple example in Table 1. The WS, \(r_w\), and v coefficients take into account the position of the error occurrence, and the rest of them remain the same regardless of where the error occurs. In the next section, other tests comparing the performance of the indicators will be presented and discussed.

Table 1. Summary of the test with reference ranking (Rx) and four test rankings (\(R_y^{(1)} - R_y^{(4)}\)) with the calculated correlation factors and proposed WS coefficient for the set of five alternatives (\(A_1 - A_5\)), each having a different position in the ranking.
Table 2. Summary of the test with reference ranking (Rx) and four test rankings (\(R_y^{(1)} - R_y^{(4)}\)) with the calculated correlation factors and proposed WS coefficient for the set of five alternatives (\(A_1 - A_5\)), where one pair has the same position in the ranking.

4 Results and Discussion

4.1 Analysis of Five-Element Rankings

The first experiment here presents tied ranks, i.e., the same values in the ranking. It happens when two alternatives get the same place. For example, if two decision variants receive the first place together, the ranking will contain a value of 1.5 for both (the average of their positions). Table 2 shows the results of calculations for the five-element ranking, where the different location of tied pairs is considered.

One more again, WS, \(r_w\), and v coefficients show the change of value together with the change of position on which there are the tied pairs. It is a property that was identified as a significant drawback of the currently used methods, i.e., \(\rho \) Spearman, \(\tau \) Kendall, and \(\gamma \) Goodman-Kruskal. Ranking \(R_y^{(4)}\) is more similar than \(R_y^{(1)}\) because full correctness occurs on the first three positions and not on the last three.

Another simple experiment consists in creating test rankings, where successive rankings differ from the base ranking by the alternative indicated as the best. The results for the five-element ranking are shown in Table 3. Replacing the best option with the worst in all coefficients results in a negative value result (except WS). It means a negative correlation, which is not trivial to interpret in decision-making problems. Besides, interesting is the case of the \(R_y^{(3)}\) ranking because it obtained a total lack of correlation (for \(\tau \) and G coefficients). It means that the order of the base and second rankings is utterly unrelated to each other. It is a confirmation that these classical rank coefficients do not examine the similarity of the two rankings thoroughly. In general, all coefficients assess test rankings against the base ranking in a somewhat similar way. The rationale for this is that the three positions in the ranking have been indicated flawlessly.

The last example in this subsection examines the \(r_w\) coefficients in two cases, i.e., for test rankings \(R_y^{(1)} -- R_y^{(2)}\) and \(R_y^{(3)} -- R_y^{(4)}\)). Once again, the \(R_x\) ranking is used as a reference point. The detailed results are presented in Table 4.

Rankings \(R_y^{(1)}\) and \(R_y^{(2)}\) have equal values for most of the coefficients, where only WS and v are exceptions. Ranking \(R_y^{(1)}\) is significantly better than ranking \(R_y^{(2)}\). Even though the \(A_5\) alternative has been identified as the best in \(R_y^{(1)}\). The rest of this ranking has been correctly identified according to the right order. However, in the \(R_y^{(2)}\), the best alternative was wrongly rated as the worst. Therefore, the best alternative (\(A_1\)) has a chance to be chosen in the first case and not in the second. These rankings cannot be evaluated as being the same. This shows the superiority of WS and v coefficients in decision-making ranks analysis. Rankings \(R_y^{(3)}\) and \(R_y^{(4)}\) show greater variability of coefficients, i.e., the ranking \(R_y^{(3)}\) has a coefficient value less, equal to or greater than the ranking \(R_y^{(4)}\). It all depends on which coefficient is taken into account, but WS and v again point to the superiority of the ranking \(R_y^{(3)}\).

Table 3. Summary of the test with reference ranking (Rx) and four test rankings (\(R_y^{(1)} - R_y^{(4)}\)) with the calculated correlation factors and proposed WS coefficient for the set of five alternatives (\(A_1 - A_5\)), where each ranking has a different position error.
Table 4. Summary of the test with reference ranking (Rx) and four test rankings (\(R_y^{(1)} - R_y^{(4)}\)) with the calculated correlation factors and proposed WS coefficient for the set of five alternatives (\(A_1 - A_5\)), where the change of coefficients is investigated

4.2 Influence of a Ranking Size on Coefficients

In this subsection, we want to indicate the impact of the ranking size on the achieved value of the indicator. Figure 1 shows comparisons of WS, \(r_s\), \(r_w\), and v coefficients. Only alternatives on the first and second positions have been replaced. It is a consequence of the conclusion drawn from Table 1. We take into account the rankings with the number from 5 to 50 elements. We can observe that with the increased ranking size, the similarity with the assumed assumptions increases. The Ws coefficient is characterized by the greatest variability, depending on the size of the ranking. Figure 2 shows the changes in the WS value when replacing the best elements with the second, third, fourth, and fifth ones (the position of one adjacent pair is swapped). As we can see, the WS values decrease accordingly, as the quality of the rankings decreases as the best solution moves away from the top of the ranking.

Fig. 1.
figure 1

The value of the coefficients depending on the length of the ranking (n), where occurs one error (change of the first and second position in the ranking) and the converted positions in the ranking.

Fig. 2.
figure 2

The value of the WS coefficient depending on the length of the ranking (n) and the converted positions in the ranking.

Fig. 3.
figure 3

Sorted distribution of all values of the Kednall coefficient in relation to the length of the ranking (n).

Fig. 4.
figure 4

Sorted distribution of all values of the Spearman coefficient in relation to the length of the ranking (n).

Fig. 5.
figure 5

Sorted distribution of all values of the WS coefficient in relation to the length of the ranking (n).

4.3 Distribution of Coefficients Values

In the next step, we attempted to visualize the distributions for three indicators. Figure 3 presents the distribution of the \(\tau \) Kendall factor for all possible permutations of the sets of five, six, seven, eight, nine, and ten elements. Figures 4 and 5 show the distribution of the \(\rho \) Spearman coefficient and the WS coefficient, respectively. The shape of the \(\rho \) Spearman values is smoother than the \(\tau \) Kendall. Both indicators have a symmetrical distribution, unlike the WS coefficient. The problem may be the interpretation of the WS value, because it is a new approach. However, the question arises when the similarity of the WS coefficient is low, medium, and high. A statistical analysis of the distribution of the WS should be carried out to define three appropriate linguistic terms and answer on this research question.

Table 5. A summary of the basic statistics of the WS coefficient for all possible permutations, where n length of the ranking.
Fig. 6.
figure 6

The definitions of three linguistic terms, i.e., low, medium, and high similarity of rankings by using trapezoidal fuzzy numbers.

4.4 Definition of Rankings Similarity

All possible permutations and values of the WS coefficient are determined for ranks of the size from 3 to 10 elements. Based on the obtained values, we calculated basic statistics, which are presented in Table 5. For larger rankings, statistics are based on random samples of 100,000 rankings. Both population and random sampling data are used. Note the convergence of the arithmetic mean, standard deviation, and typical value ranges. The biggest differences concern the arithmetic mean, and it is equal to 0.207 (for a ranking of 10 and 1000 elements). Based on the analysis of typical values, i.e. interval of [\(\bar{x} - S_x\); \(\bar{x} + S_x\)], we identified the linguistic terms low, medium and high similarity of rankings.

It can indeed be said that if the WS is less than 0.234, then the similarity is low. If the value is higher than 0.808, then the similarity is high. The medium of likeness, which corresponds to a typical value, belongs to the range from 0.352 to 0.689. The remaining values are values where we can talk about a partial belonging to linguistic concepts according to the theory of fuzzy sets, or just low/medium and medium/high concept can be used. Detailed definitions are presented in Fig. 6. Linguistic values are important because they can be used to evaluate the adjustment of the reference and test rankings.

5 Conclusions

The main contribution of the paper is a proposal of the new coefficient of the rankings similarity. For this purpose, the short analysis of classical factors are presented, and some of their shortcomings are emphasized. The most critical is the equality of the values of the classical coefficients in case the ranking error concerns the replacement of a pair of adjacent alternatives (Table 1). The paper presents a theoretical foundation of proposed WS coefficient, which ensures that a new factor is free of identified shortcomings.

The results of numerical experiments compare all analyzed coefficients and their correctness, i.e., \(\rho \) Spearman, \(\tau \) Kendall, G Goodman-Kruskal, and WS coefficients. Then, the distributions of \(\tau \) Kendall, \(\rho \) Spearman, and WS coefficients were compared. WS values can be used to measure similarity of rankings.

Finally, three linguistic concepts were formulated for the low, medium, and high similarity of the two rankings. The properties of the WS coefficient indicate that it is a useful tool for comparing the similarity of rankings and is better suited for this purpose than the currently used correlation coefficients.

During the research, some improvement areas have been identified. The future work directions should concentrate on:

  • further comparing between existing coefficients and the proposed WS;

  • testing the use of the WS coefficient in real-life examples;

  • detection and correction of WS coefficient shortcomings;

  • adaptation of the proposed coefficient to uncertain (fuzzy) rankings.