Keywords

1 Introduction

Over the past three decades, numerous scholars efforted to develop reliable and valid scales for measuring customers’ perceived quality of service industry. Among those scales the best known one is called SERVQUAL, which was developed by Parasuraman et al. (1988; 1985). SERVQUAL is based on a multidimensional model including five dimensions and 22 items. And the model has a far-reaching impact on later studies about measuring service quality. However, Parasuraman et al. (2005) found that judging online service quality differed from judging traditional service quality, and developed two new scales for online service quality measurement. Ladhari (2009) reviewed numerous scholars about e-service quality and found that dimensions of e-service quality tended to be contingent on the service industry. Therefore, scales of measuring online service quality of a specify service industry need to be studied independently.

Retail banking is one of the sample service industries for developing SERVQUAL. Since Internet banking service have been offered by a lot of banks, a large number of customers enjoy retail banking services online. Recently, there are two obvious changes occurring in the Internet bank industry for retail customers. Firstly, with the rise of the Internet and smart phones, mobile applications and even mobile payment are easy to use and become more and more popular among people, and banks launched their mobile applications for retail customers one after another, in order to offer more convenient service to customers. Secondly, since many mobile payment companies launched online financial products to attract individuals, so that individuals would like to save money in their applications rather than in bank, and the role of service quality in fostering the growth of online financial services has received much attention in the academic and practitioner communities. In order to restore the loss of customers, many banks also offered individual wealth management service as a new online service to customers, which now is one of the hottest services in online retail banking. Consequently, recent online service quality has a significant changing influence on many important aspects of retail banking service. An understanding of how consumers evaluate Internet bank now is thus of the utmost importance for scholars and practitioners alike.

The purpose of this study is to review reliability and validity of scales measuring Internet banking service quality in recent studies, and to summarize guidelines for developed a reliable and valid scales measuring Internet banking service quality. We reviewed the literature on Internet bank, with an emphasis on the methodological issues involved in developing measurement scales and issues related to the reliability and validity of the Internet banking service quality construct. We selected some studies on Internet bank in recent five years from Web of Science, and subjected them to a thorough content analysis.

2 Issues of Internet Banking Service

Internet banking began in the 1990s. In 1992, some American banks launching their e-banking services (Sikdar et al. 2015). Since then, e-banking service became more and more important and more and more popular for both bank and customers (Keskar and Pandey 2018). Today, banks are almost being replaced by mobile phones, tablets, and laptops. Customers use Internet banking services to finish the deposits, remittances, transfers, payments, securities orders, insurance services, and interest rate inquiries, and even do some investment recently, by using electronic devices at any time and place without personally attending a bank.

Previous studies agreed that service quality was a multidimensional construct but there was no firm agreement regarding the generic dimensions (Peng and Moghavvemi 2015). And the key dimensions to the service quality of Internet bank still need to be explored today (Dharmavaram and Nittala 2018), because of the everchanging needs of customers and the addition of new Internet banking services. Previously, customers were concerned only with the safety and ease of operating Internet banking interfaces. As Internet banking evolved, websites were user-centered designing, mobile service were opened, and new money arrangement services were offered, customers began focusing on other factors such as the information content and professionalism and attitudes of customer service personnel in online banking. Thus, previous scales of measuring Internet banking service quality are likely to be inappropriate for measurement now. We need to pay attention to new studies about measuring Internet banking service quality in recent years.

3 Methodological Issues in Developing Service Quality Scales of Internet Bank

We use “Online”, “Bank”, “Service quality”, “Measure” as keywords, searching the keywords on web of science. Only studies in recent five years and focusing on developing a complete instrument for measuring Internet banking service quality are included and are subjected to a comprehensive in-depth content. Only five literature including a complete instrument developing procedure and a final scale were found, and other thirteen incomplete studies were also reviewed in this study. Table 1 shows the complete studies we found.

Table 1. Selected studies on Internet banking service quality scale development.

The general complete method for developing scales measuring perceived Internet banking service quality can be concluded from literatures as eight steps: (1) Define construct of service quality; (2) Identify dimensions; (3) Generate items on all dimensions and design a scale according to items; (4) Collect data; (5) Purify scale; (6) Collect fresh data from a new sample on a set of items to emerge from the previous step; (7) Further purify scale and get the final version of the scale; (8) Evaluate reliability and validity of the scale. During the first three steps, researchers designed an original version of service quality scale with some methods to ensure the reliability and validity of the scale. In the next four steps, the original scale was purified for better reliability and validity by factor analysis. In the last step, a final version of scale had been completed and researchers measured the reliability and validity of the final version to proof the usability and effectivity of the scale.

The methodological issues identified in this review can be summarized as follows: research methods for identify dimensions; research methods for generation of items; sampling methods; assessment and purification of scale; scale reliability and validity.

3.1 Research Methods for Identify Dimensions

All of the studies we reviewed used one or more qualitative methods to identify dimensions of Internet banking service quality. Full literature reviews were conducted by all the studies.

Raza et al. (2015) finally decided to use five dimensions in SERVQUAL model directly. Amin (2016) use the scales from (Herington and Weaven 2009) and (Ho and Lin 2010). Arcand et al. (2017) intergraded all the literatures they reviewed and finally got five dimensions including security/privacy, practicality, design/aesthetics, sociality and enjoyment. However, the three studies did not explain why they choose these dimensions.

Jovovic et al. (2016) stated the theoretical model applied in their study was ES-QUAL/E-Rec S-QUAL, developed by Parasuraman et al. (2005), modified for measuring the quality of online banking services. They considered quality dimensions of e-banking services consisted of efficiency, privacy, readiness to provide answers/contact, as well as dimensions of security, empathy, and website design. Effectiveness, privacy and responsiveness had been taken from ES-QUAL/E-Rec S-QUAL, which served as the basis, while the rest of dimensions had been taken from other similar models in accordance with the needs of e-banking services in Montenegro. However, although Parasuraman et al. (2005) considered ES-QUAL/E-Rec were suitable for measuring all online services, they in fact just chose electronic commerce industry as research sample, which was extremely different from Internet banking service.

Only Roy and Balaji (2015) used more than one method to identify dimensions. Their study based on the Grönroos’s (1984) service quality model and Delone and McLean’s (2003) Information system success model. And moreover, three focus group interviews consisting of eight participants each, and eight depth interviews exploring participants’ insights on their evaluation of the quality of online financial services were carried out to design original scales. Each of the focus groups lasted for about 90 min and moderated by an experienced moderator. And they finally identify for dimensions including system quality, information quality, interaction quality and image quality.

Due to the big changes of new platform and new service mode in recent Internet banking service, new dimensions may be included in service quality. Thus, re-searcher should use qualitative research at the earliest stage, using more than one method rather than use only literature reviews. Using only literature review method is inadvisable. One method that researchers seldom use in complete scale design study is the critical incident technique (CIT), a qualitative interview method to study significant processes, incidents and events identified by respondents (Chell 2004). Jun and Palacios (2016) use CIT to identify the dimensions of mobile banking service quality, and a total of 17 dimensions were successfully found, five of which considered as the main sources of customer satisfaction, which proved that the dimensionality of recent Internet banking service quality is far more complex than that in past service because of business and platform changes. CIT is suitable for finding new dimensions of quality in new service, but is more effortful at the same time, so researchers can selectively use CIT, or refer the outcomes by other studies based on CIT.

3.2 Research Methods for Generation of Items

Because that scale items are specific in different context of Internet banking service, they are generated using both inductive methods (such as literature reviews) and deductive methods (such as exploratory research).

Researchers are more active in doing research to generate items than identify dimensions, and the descriptions of procedure in generation are more detailed in their articles. But still part of researches came through deductive methods. Raza et al. (2015) only did literature review, and even did not explain the literature source of items. Jovovic et al. (2016) also did literature review only to gather items. Arcand et al. (2017) gathered items in literature review and modified some items by themselves to fit for local Internet banking service.

Roy and Balaji (2015) did both literature reviews and offline interviews with experts to generate items. They referred the measurement scales from the previous research studies and additional items identified in the interviews and focus group discussions were used to identify an initial pool of measurement items for the dimensions. And then, five administrators of online financial service providers in India were interviewed to obtain clarity on the constructs’ composition. These items were then adjusted per the interviewee’s perceptions of the importance of each of the dimensions.

Amin (2016) did an offline questionnaire research after literature review to generate items. The questionnaire was written in both Bahasa Malaysia and English language to ensure clarity, and their content validity (wording and meaning) was checked carefully by two Malaysians experts. A convenience sampling approach was used, and respondents were selected among those customers who visited the sampled banks during day time and at various days for a week or a month. Ten commercial banks and forty branches were selected in four different cities in Peninsula Malaysia. Finally, 25 respondents participated the research.

In several studies, items generated are based on literatures and qualitative research such as expert interviews and focus groups, but are based solely on literatures in other studies. Future research should develop a more specific theoretical framework to identify scale-items.

3.3 Sampling Methods

The samples for purifying scale of measuring Internet banking service quality are drawn from a variety of populations. Most studies use convenience sampling and random sampling methods. Raza et al. (2015) collected the data of 400 users of Internet banking of different banks located in Karachi city of Pakistan. Roy and Balaji (2015) collected the data of 630 customers who have had experience of using online financial services in the last 12 months. Arcand et al. (2017) cooperated with the marketing research firm tasked with randomly sending invitations to panelists, and finally collected the data of 375 users, all accustomed to conducting banking activities on mobile platforms. While Amin (2016) and Jovovic et al. (2016) did not introduce specific sampling methods.

In convenience sampling, the reasons for Internet use and the behavior of these participants may differ from those in other places. The literature on traditional service quality shows that dimensions of service quality differ from one country to another (Ladhari 2008). Therefore, future studies should use more diversified samples.

3.4 Assessment and Purification of Scale

The dimensionality and items of the scale are commonly assessed using exploratory factor analysis (EFA) and/or confirmatory factor analysis (CFA, belonging to structural equation model). Factor analysis is used to reduce the items whose factor loading are low, and to confirm the number of dimensions.

Some researcher directly used factor analysis to purify the original version of scale. Raza et al. (2015) and Jovovic et al. (2016) did EFA after collecting data. In the study of Raza et al. (2015), the 25 questions related to the Internet banking and customer satisfaction have been categorized into five overlapping groupings of items. Factor loading were all over 0.5, except one item, which factor loading was negative, author thought question of the item is required to be interpreted in an opposed direction from the actual way it is written for that factor. The outcome of EFA were satisfactory, since dimensions and items had no changes, which proved the reliability and validity of SERVQUAL. Jovovic et al. (2016) did EFA and the analysis has provided results of three dimensions, and a rejection of one item because of its equal factor loadings with two dimensions.

Moreover, other researchers did some pre-tests to assess the scale before doing the factor analysis. Roy and Balaji (2015) finished pilot studies in two steps for assessing the scale. First step is student evaluation. 30 students were asked to comment on general design of the questionnaire and clarity of individual measurement items. Second step is interviewing real users by CATI telephone data collection methodology. the actual users of online financial services, final usable response is 190. Then they did both EFA and CFA, finally obtained a 5-dimension model with 25 items, cutting off 19 items whose factor loadings were below 0.5. Amin (2016) conducted a pre-test to improve questionnaire structure and content before using CFA in the study, but details about the pre-test is not described in the article. The outcome of CFA was good, and a 4-dimension model with 14 items was confirmed with no original items being dropped. Arcand et al. (2017) stated that the questionnaire in their study was developed by the research team and pretested twice to validate the measures and ensure that the questions/statements were clear and well understood. And then they did structural equation model to assess the scale, and finally get a 5-dimension model with 16 items. No origin items were dropped.

EFA and CFA are suitable for purification of scales, and pre-test to modify some descriptions of items or develop a clear questionnaire is also important for avoiding bias from respondents’ understanding of scales, so that researchers can get better outcome of factor analysis.

3.5 Scale Reliability and Validity

The reliability of scales (that is, the internal homogeneity of a set of items) is usually assessed by Cronbach’s α coefficient. All the scales in the present review exhibit good reliability in terms of Cronbach’s α coefficient, with values greater than 0.70, the minimum standard according to Nunnally’s work (1978). The good outcome of Cronbach’s α coefficient in all the five studies indicates that the scales in recent study of measuring Internet banking service quality provides a good estimate of internal consistency.

Validity of scale is more complex and more neglected than reliability by re-searchers in Internet banking service quality study. Raza et al. (2015) and Jovovic et al. (2016) even only measured the reliability in their study and did not use any method to assess the validity of final scale.

Other researchers measured construct validity of the scales, including discriminant validity (that is, the extent to which measures of theoretically unrelated constructs do not correlate with one another) and convergent validity (that is, the extent to which a set of items assumed to represent a construct does in fact converge on the same construct). Convergent validity can be ascertained if the factor loadings in factor analysis are greater than 0.5 (Fornell and Larcker 1981), composite reliability (CR) greater than 0.7 (Hair et al. 2011) and the average variance extracted (AVE) is greater than 0.5 (Fornell and Larcker 1981). Discriminant validity can be ascertained by comparing the AVE values with the corresponding inter-factor squared correlation values. Roy and Balaji (2015) used factor loading and the AVE values to confirm the convergent and discriminant validity. Amin (2016) used factor loading, CR and the AVE values to assess the convergent validity. Arcand et al. (2017) used factor loading and the AVE values to assess the validity.

Predictive/nomological validity (that is, the extent to which the scores of one construct are empirically related to the scores of other conceptually related constructs), belonging to criterion-related validity, were measured by structural equation modeled in several studies (Amin 2016; Arcand et al. 2017; Roy and Balaji 2015). However, all of the studies did not measure the overall service quality during data collection, so they cannot measure the correlation between scale outcome and overall service quality, which referred to content validity and concurrent validity, other type of criterion-related validity.

The good reliability and ignoring content and concurrent validity mean that, the recent scales of measuring Internet banking service quality are accurately for measuring a concept, but the concept may not be the service quality, or not be the whole service quality. Since the big changes happened in Internet bank recently, the possibility of bad validity is high. Researchers should consider the validation process a major issue. Future studies addressing the measurement of Internet banking service quality scale should rigorously test and report on the psychometric properties of the newly developed scales.

4 Conclusion

Even though the reliability of new scale is great according to our review, there are three obvious problems about validity in recent Internet banking service quality scales studies:

Firstly, when identifying dimensions and generating items, researchers mostly refer literature without considering the changes of Internet banking service, and even some of researchers developed the scale according only to literatures. No new dimensions and items can be found during the procedure of scale development, which are the cause of low content validity to new service quality measurement;

Secondly, convenience sampling method and lack of pre-test before factor analysis, and the brief descriptions about the preparations of sampling and pre-testing work in articles, are both terrible for validity of the scale;

Thirdly, researchers paid more concern on the relationship between service quality and other concepts such as customer satisfaction and loyalty, but not the relationships between the outcome of scale and overall service quality. Few recent studies measured the overall service quality during data collection, which means they cannot measure the correlation between scale outcome and overall service quality. Thus, they cannot confirm the validity of their new scale.

Since there are many big and important changes in Internet banking services as we mentioned before, new scales are need to be developed at the beginning of the whole development procedure. But current status of researches is unsatisfactory. We considered integrated guidelines to help future studies, hoping researchers can develop a reliable and valid new scale for measuring the new Internet banking service quality. Guidelines for the reliability and validity of scales were summarized as follow:

4.1 Ways to Ensure Reliability and Validity

We summarized ways to ensure the reliability and validity of a service quality scale during designing and developing process.

There were many ways can be used by researchers to ensure the original scales’ reliability and validity during designing process. Researchers firstly identify critical factors or determinants of service quality, build a service quality model containing several dimensions and many items belonging to each dimension through literature reviewing. A draft scale measuring service quality need to be designed according to the service quality model. Because some of the dimensions and items come from other service quality models or scales from other service industries or with other range of application, modification need to be done for better content validity, construct validity and internal consistency reliability. Experts or customers are invited to participate in interviews, in-depth interview, focus groups, CIT or pre-testing to comment on any perceived ambiguities, omissions or errors concerning the dimensions and items in the draft scales, and consequently changes are made accordingly, and the original scale can be built.

Iterative questionnaire researches and Exploratory Factor Analysis are the com-mon ways to purify original scales for better validity and reliability. Researchers send the original scales as a questionnaire to target customers, collecting data from them. Customers sampling are important for representative reliability, so balance of age, gender, sites and so on were considered in most questionnaire survey. The data need to be analyzed by Exploratory Factor Analysis. Before factor analysis, Kaiser–Meyer–Olkin and Bartlett’s tests should be done to check sampling adequacy. The dimensions and items of the scales are modified according to the outcome of EFA, which was called scale purifying. The original scale is purified several times to be developed to the final version. Factor loadings in EFA, exceed the recommended level of 0.5, are the assurance of the construct validity and internal consistency reliability of the scales.

4.2 Measurement of Reliability and Validity

We summarized methods to measuring the reliability and validity of a service quality scale.

Reliability.

The internal consistency of scales has been demonstrated in numerous studies of the application of the measure. Most researchers have used Cronbach’s alpha to evaluate the reliability of scales, which is necessary. Moreover, there are three advice for researchers to evaluate to scale reliability. Firstly, the test-retest method can measure the coefficient of stability, supporting for the test-retest reliability. Secondly, messing up the item order to measure the coefficient of equivalence, supporting for the alternative form reliability. Thirdly, although most researchers have traditionally used Cronbach’s alpha, Spearman–Brown formula or item-to-total correlation to evaluate the reliability has been increasing criticism of this practice in the past ten years. Several authors have suggested that Cronbach’s alpha might not be the most appropriate measure of psychometric quality and recommend Spearman–Brown formula or item-to-total correlation.

Validity.

Content validity is ensured by quality model constructing and pretesting. Construct validity is classified into two types, convergent validity and discriminant validity. Convergent validity is supported by the factor loadings for all constructs exceed the recommended level of 0.7, indicating acceptable item convergence on the intended constructs. Discriminant validity ss supported by the correlation between constructs, with the correlations of no pair of measures exceeding the criterion (0.9 and above). The total construct validity can be measured by structural equation model such as confirmatory factor analysis, regression model, correlations between dimensions and Fornell and Larcker’s discriminant validity test. Criterion-related validity is tested by the correlations between the scales’ score and criterion score. The common criterions are customers’ overall perceived service quality, recommendation intention, complaint level, satisfaction or other service quality scale.

This study reviewed the development of scales measuring service quality of Internet bank in recent five years, and offered guidelines for getting reliable and valid scales of measuring Internet banking service quality. The findings should be valuable to academics and practitioners alike.