Forecasting satisfaction of grid electricity to a rural household: examples from Nepal

Abstract

In this paper, the energy poverty of 300 households of national grid electricity users is quantified and analyzed. Multivariate statistics is used in this process. These households only have a partial access to electricity. Canonical correlation analysis (CCA) and structural equation modeling (SEM) are used in forecasting consumer satisfaction and thus assessing energy poverty. A multivariate data from a sample survey of 300 households are used here. Structural relationships between distance covered for firewood, time spent in firewood collection, collection of firewood (who), electricity satisfaction, residents 15 years and more of age, size of females in a family, size of family, payment of electricity bill, registration of electricity connection, decisions regarding electricity, profit kept from electricity connection and delivery at home or hospital are quantified and analyzed here. Here time spent in firewood collection is predicted with SEM. Among several structural models specified, two most suitable time spent structural models are discussed in detail. Then one model is chosen as final; model specification, identification, estimation, testing, identification and validation procedures are used. CCA is used in predicting relationship between these variables classified into two groups cause and effect (of energy poverty). Latent factors playing a critical role are identified and measured. Such studies are very crucial for countries like Nepal with limited and scarce official data.

Introduction

Energy is needed to power and sustain the societies of the world. This energy is sourced by renewable and non-renewable sources. Nepal, a Himalayan country with several rivers and its tributaries, stands fifth in the world in its hydropower potential. The national grid electricity of Nepal is also hydropower based. In 2018, 42.18% of total electricity generated in Nepal was consumed by household sector and 77.8% of the total population had access to this electricity (Ministry of Finance 2018). Here among 12.76% of municipalities having no access to electricity in 2018, 89.01% were in rural areas. Many households have merely a connection to the national grid. Among 59.8% of the municipalities having a partial access to electricity, 62.5% were from rural areas. This is shown by percentage of rural electrification achieved till 2018. The electricity is mainly used for lighting, running radios, TVs, and operating rice cooker. Operating spice grinding mills and water pump for irrigation are other use of electricity in such areas. The industrial sector share of energy consumption was about 7.3% of the total energy consumption in Nepal, taking thus the third place after residential sector and trade sector. According to 2011 census, 67.26% of the total households have electricity and 17.23% of the total households have electricity connection and are in rural areas (CBS Nepal 2018).

Nepal with a total population of 29.10 million shows an average annual population growth rate of 1.3 per cent per annum in 2019 (ADB 2019). The per capita gross national income of Nepal was 800 USD in 2019. Gross domestic product (GDP) of Nepal was 29,040 million USD in 2018. Nepal was ranked 102 in 2018 by the World Bank on the basis of GDP of 2018. United States topped the list with a GDP of 20,544, 343 million USD (World Bank 2019).

Countries like Nepal lack a good infrastructure of good quality official data. Remote geographical location, lack of incentives and awareness have resulted in limited and scarce data. But accurate data on issues related to sustainable development goals are needed. This is because, what gets measured also gets done. Thus, data-based studies like this are very crucial for such countries. It also provides an evidence-based perspective to a development issue. This approach makes the observations reliable and undisputable.

Multivariate statistics has vast interdisciplinary applications. Multiple variables can be considered at the same time in such analyses. It comprises of several techniques that can be applied to diverse fields. For example, Garcia and Caraschi (2019) used hierarchical clustering and principal components. This was used to select the most favorable vegetable biomass for the production of bio-fuel pellets. Bystrzanowska and Tobiszewski (2020) used cluster analysis, principal component analysis and K-means hierarchical clustering. It was used to find the patterns in the dataset and the discriminators between the clusters of compounds. Xu et al. (2019) predicted the power of power generation system. Three factors, solar radiation intensity, temperature and humidity, were chosen, and multivariate statistical regression model was established using multivariate statistical theory. Sarkodiea and Ozturkb (2020) conducted a statistically inspired modification of partial least squares regression. It revealed that renewable energy played an important role in the reduction of green house emission. This was with respect to a study related to energy efficiency and energy consumption indicators in Kenya. Similarly, Makki and Mosly (2020) explored factors affecting public willingness to adopt renewable energy technologies in the western region of Saudi Arabia. Using dimension reduction technique of principle component analysis, five main components clustering, the 19 extracted factors were revealed. They were namely cost and government regulations and policies, public awareness and local market, environment and public infrastructure, residential building, and renewable energy technology systems. Urugulu (2019) conducted a study, in which natural gas, electricity and oil consumption were estimated using multiple regression equations using income and population as independent variables. This result is based on data from Turkey. Liubachyna et al. (2017) analyzed the current situation of State Forest Management Organizations by grouping them with the help of a cluster analysis. It was according to indicators that reflect the three pillars of the common understanding of the sustainable forest management (SFM) concept. Simon et al. (2018) incorporated multivariate data analysis and biological gas conversion mechanistic in study on C02 based biological methane gas production. Todde et al. (2016) used multivariate statistical approach in livestock classification. Here quantitative and qualitative variables were used throughout the statistical analysis to obtain farms descriptions. Wanne et al. (2017) established a multiple linear regression approach to estimate the solar radiation in the Senegalese territories.

The arrangement of the paper is in the following manner. This section is followed by “Methods”, then by “Results and discussion”. These sections are followed by “Conclusion” and Acknowledgements.

Methods

Sample survey and data

A sample survey of 300 households of national grid electricity users was conducted. This survey generated more than 350 multivariate data. The aim was to study the energy consumption dynamics of a rural household. These household are partially connected to the grid. They use electricity mainly for lighting and are located in Kavre district of Nepal. This is at a distance of 40 km from Kathmandu. In spite of being close to the capital city, this area does not have a frequent connection through a mode of public transportation. Without any industry, this area is very rural despite of being so close to the capital.

The questionnaire developed for this study was structured. The answers were provided as multiple choice options. These options were scaled in a way that generated data on ordinal or nominal scale. It was pretested on 10 households and refined. This ensured the accuracy of the data collected.

Detailed information on type of agriculture production, amount consumed by the household and the amount sold was also collected. Type of house, source of water for daily needs of cooking, washing, bathing and irrigation, type and location of latrine and details of electronic devices used were considered as information on proxy asset indicators. These asset indicators indirectly assessed the socioeconomic status of the household. The net average income of the household and the distance of the household from bank, employer, market, health post, bus stop and cinema were also noted. The age distribution, number of adult family members and number of females in the family helped asses the family composition. The development status of that area was assessed through seven questions on reproductive health. Information on time spent on firewood collection, source of firewood collected and distance covered were also collected. Data were also collected on income generating activities of the household and role of women in decision-making.

Statistical methods

Canonical correlation analysis (CCA) Let, \(X\in {M}_{n, q}\) and \(Y\in {M}_{n, p}\) with mean vectors \(\mu_{x } \left( {1 \times q} \right)\) and \({\mu }_{y}\)(1 × p), respectively (Bhuyan 2008). The covariance matrices of these two data matrices are \(\Sigma_{xx}\) (q × q) and \(\Sigma_{yy} \left( {p \times p} \right)\), respectively. Consider that the covariance matrix of X and Y is \(\Sigma_{xy } \left( {q \times p} \right)\). Let \({S}_{xx}\), \({S}_{yy}\) and \({S}_{xy}\) be the estimates of \({\Sigma }_{xx}\), \({\Sigma }_{yy}\) and \({\Sigma }_{xy}\), respectively. If \(X\sim {N}_{q}({\mu }_{x}, {\Sigma }_{xx})\) and \(Y\sim {N}_{p}({\mu }_{y}, {\Sigma }_{yy})\), this implies that X and Y are multivariate normal variables with mean \({\mu }_{x}, {\mu }_{y}\) and variances \({\Sigma }_{xx}\) and \({\Sigma }_{yy}.\) Then \({S}_{xx}\), \({S}_{yy}\) and \({S}_{xy}\) are the ML estimator of \({\Sigma }_{xx}\), \({\Sigma }_{yy}\) and \({\Sigma }_{xy}\). Here, the assumption of multivariate normality of data is not essential, especially if the analysis is descriptive in nature.

It is observed that the product matrices \({\Sigma }_{xx}^{-1}{\Sigma }_{xy}{\Sigma }_{yy}^{-1}{\Sigma }_{yx}\) and \({\Sigma }_{yy}^{-1}{\Sigma }_{yx}{\Sigma }_{xx}^{-1}{\Sigma }_{xy}\) are used as input for canonical correlation analysis. Theoretically, the largest eigenvalue of either of these product matrices is the square of the canonical correlation coefficient. The two eigenvectors for these product matrices are \(a=\frac{1}{\sqrt{\lambda }}{\Sigma }_{xx}^{-1}{\Sigma }_{xy}b\) and \(b=\frac{1}{\sqrt{\lambda }}{\Sigma }_{yy}^{-1}{\Sigma }_{yx}a\) .

The sample based product matrices are \({S}_{xx}^{-1}{S}_{xy}{S}_{yy}^{-1}{S}_{yx}\) and \({S}_{yy}^{-1}{S}_{yx}{S}_{xx}^{-1}{S}_{xy}\) which can be used in analysis in practice. If the variables are standardized, the product matrices transform to \({R}_{xx}^{-1}{R}_{xy}{R}_{yy}^{-1}{R}_{yx}\) and \({R}_{yy}^{-1}{R}_{yx}{R}_{xx}^{-1}{R}_{xy}\), where

$$R=\left[\begin{array}{cc}{R}_{xx}& {R}_{xy}\\ {R}_{yx}& {R}_{yy}\end{array}\right]$$

is the sample correlation. Thus. to estimate the canonical correlation coefficient and to estimate canonical weights the product matrices \({R}_{xx}^{-1}{R}_{xy}{R}_{yy}^{-1}{R}_{yx}\) and \({R}_{yy}^{-1}{R}_{yx}{R}_{xx}^{-1}{R}_{xy}\) can be used as input. Here each eigenvalue of either of these product matrices equals the squared of canonical correlations for each pair of canonical variates. Thus, they are positive. Thus, generic eigenvalue belongs to positive real field.

Let the eigenvalues of the product matrix be \({l}_{1}>{l}_{2}>{l}_{3}>\dots >{l}_{k}\), where \(k=\mathrm{min}\left(p, q\right)\) and estimates of eigenvectors \(a\) and \(b\) be \(\widehat{a}\) and \(\widehat{b}\), respectively. Here, \({l}_{j}\in {R}^{+}, \forall j=1, 2, \dots k.\) Let the eigenvectors corresponding to the largest eigenvalue \({\widehat{a}}_{(1)}\) and\({\widehat{b}}_{(1)}\). Then the first canonical variate pair is \({\widehat{a}}_{(1)}^{^{\prime}}X\) and \({\widehat{b}}_{(1)}^{^{\prime}}Y\). The second canonical variate pair is \({\widehat{a}}_{(2)}^{^{\prime}}X\) and \({\widehat{b}}_{(2)}^{^{\prime}}Y\), where \({\widehat{a}}_{(2)}\) and \({\widehat{b}}_{(2)}\) are the eigenvectors corresponding to the second largest eigenvalue\({l}_{2}\). The second pair is independent of the first pair.

Since \(k\) canonical variate pairs are extracted from the sample data, where \(k=\mathrm{min}(p, q)\); it is also a data reduction technique. The reduced data are the canonical scores. The scores will be dimensionless if canonical correlation analysis is performed from correlation input. On the other hand, \(\widehat{a}\) and \(\widehat{b}\) will be units proportional to those of respective responses in each set if the analysis is done from covariance input. In that case, the dimensionality of the respective canonical variates will be meaningful.

Structural equation modeling (SEM) This technique uses various types of models to depict relationships among observed variables. It is with the same basic goal of providing quantitative test of theoretical model. The goal here is to determine the extent to which the theoretical model is supported by the sample data (Schumacker and Lomax 2010). Theoretical model is tested with SEM using scientific methods of hypothesis testing. This helps us understand complex relationships between the constructs. In other words, SEM is an extension of multiple regression. It has more than one regression like equation also including latent variables. Here variables can be explanatory in one equation and response in another. These SEM can be explained by path diagrams.

The structural model is written in terms of following matrix Eqs. (1), (2) and (3):

$$\eta =B\eta +\Gamma \xi +\zeta .$$
(1)

Here η is the \(m \times 1\) vector of endogenous concepts and \(\xi\) is the \(n \times 1\) vector of exogenous concepts. \(B, \Gamma\) are \(m \times m\) and \(m \times n\) matrices of structural coefficients. \(\zeta\) is \(m \times 1\) error vector. The variance–covariance matrix of \(\xi\) is \(\Phi\), a \(n \times n\) matrix. The variance- covariance of matrix of error \(\zeta\) is \(\Psi\), a \(m \times m\) matrix.

The measurement model is represented by Eq. (2). Latent independent variables are represented by Eq. (3):

$${\mu }_{y}^{^{\prime}}={\Lambda }_{Y}\eta +\varepsilon .$$
(2)

Here \(\mu_{y}^{^{\prime}}\) is the \(p \times 1\) vector, \(\Lambda_{Y}\) is the \(p \times m\) matrix, \(\eta\) is the \(m \times 1\) vector and \(\varepsilon\) is the \(p \times 1\) error vector. The variance–covariance matrix of ε is \(\Theta_{\varepsilon }\), a \(p \times p\) matrix.

$${\mu }_{x}^{^{\prime}}={\Lambda }_{x}\xi +\delta .$$
(3)

Here \(\mu_{x}^{^{\prime}}\) is the \(q \times 1\) vector, \(\Lambda_{X}\) is the \(q \times n\) matrix, \(\xi\) is the \(n \times 1\) vector and \(\delta\) is the \(q \times 1\) error vector. The variance–covariance matrix of \(\delta\) is \(\Theta_{\delta }\), a \(q \times q\) matrix.

SEM has broader multivariate perspective, as it may involve computation of measurement error as a part of analysis.

Results and discussion

The 300 households of national grid electricity users are farmers by profession. Figures 1 and 2 give the cultivation pattern of these 300 households. More than 85% of the households have claimed Wheat and Paddy is their major cultivation. More than 60% claimed that they cultivated Mustard partially. Similarly, more than 70% claimed that they cultivated Tomato and Potato partially.

Fig. 1
figure1

Cultivation of electricity users of rural household (percentage of households)

Fig. 2
figure2

Cultivation of electricity users of rural household (percentage of households)

In the search for a better understanding of electricity consumption dynamics with respect to energy poverty, several models were explored. Here time spent in firewood collection is considered as proxy indicator of energy poverty. It is modeled using SEM. A review of different of alternative measures and indicators of energy poverty targeted to specific audiences and purposes is done by Pachuari and Spreng. Desperately energy poor households are those that have access only to biomass and kerosene and use barely enough of it to cook a meal a day (Pachauri and Spreng 2011).

The two time spent models selected are summarized in Table 1. This table encompasses all the parameters and variables used in the two models. Their physical significance and meaning are explained as follows. There are no intercept terms. Thus, the endogenous and exogenous variables are actually differences from their means. The physical interpretation of increase or decrease of exogenous or endogenous variables implies, increment and decrement from their mean values.

Table 1 A comparison of two most suitable structural models

As also seen from Table 1, Model I is identified as SEM with two latent variables namely family strength and status of women, whereas Model II is has three latent variables with energy satisfaction as the third in addition to the two of Model I. The efficiency of both the models is explained with the help of \({\chi }^{2}\), p value, SRMR, RMSEA, \(\frac{{\chi }^{2}}{df} ,\) CFI and TLI. These model efficiency parameters can be classified as absolute fit and comparative fit indices. Absolute fit indices measure how well the specified model reproduces the data. They provide an assessment of how well a researcher’s theory fits the sample data (Hair 2006). The main absolute fit index is the \({\chi }^{2}\) (Chi square), which tests the extent of misspecification. A significant \({\chi }^{2}\) suggests that the model does not fit the sample data. A non-significant \({\chi }^{2}\) suggests that the model fits the data well. But the test statistic \({\chi }^{2}\) is very sensitive to the sample size. The value of \({\chi }^{2}\) test statistic increases when the number of observed variables increases. So, a non-significant \({\chi }^{2}\) value is not uncommon in cases of large sample sizes. This happens even when the model closely fits the observed data. So, \({\chi }^{2}\) is not the sole indicator of model fit in SEM. The other commonly used absolute fit measures are SRMR, RMSEA, \({\chi }^{2}/df\), CFI and TLI (Khine 2013). Standardized root mean square residual (SRMR) is an indication of extent of error resulting from the estimation of the specified model. On the other hand, the amount of error or residual illustrates how accurate the model is; hence, lower SRMR values (< 0.05) represents a better model fit. The root mean square error (RMSEA) corrects the tendency of \({\chi }^{2}\) to reject models with large sample size or number of variables. Like SRMR, a lower RMSEA (< 0.05) value indicates a good fit. The ratio of \({\chi }^{2}\) test statistic to the degrees of freedom denoted by \(\frac{{\chi }^{2}}{df}\) gives the efficiency of the model. It should be < 2. In comparative fitting, the hypothesized model is assessed on whether it is better than competing model also called the null model. In the null model, it is assumed that all the variables are uncorrelated. A widely used index is the comparative fit index (CFI). It is normed and varies between 0 and 1. Here higher values represent better fit. The CFI is widely used because of its strengths, including its relative insensitivity to model complexity. A value of > 0.95 for CFI is associated with a good model. Another comparative fit index is Tucker–Lewis index (TLI). Since TLI is not normed, its values can fall below 0 and above 1. Typically, models with good fit have values that approach 1.

As seen from Table 1, the \({\chi }^{2}\) value is 33.032 of goodness of fit of Model I. It has a p value of 0.017. For Model II, 80.122 is the \({\chi }^{2}\) value and 0.003 is the p value. These values are statistically significant at α = 0.05 for both the samples. But for large samples size of 300, this is normally the case even for good models. Both these models satisfy the criteria of a good model with respect to the model efficiency parameters. The criteria of model efficiency parameters are, namely \(\frac{{\chi }^{2}}{df}<2\), RMSEA < 0.05, SRMR < 0.05, CFI > 0.95 and TLI > 0.95 (Schumacker and Lomax 2010). Model II is selected as it explains the time spent (in firewood collection) with respect to three latent variables. The factor loadings of the latent variables family strength, status woman and energy satisfaction are also given in Table 1. In Model I, time spent (for firewood collection) is regressed on family strength and status of women. This is given by the following multiple regression:

$${\text{Time spent}} = - 0.078 \times {\text{Family strength}} - 1.127 \times {\text{Status of women}}{.}$$

This implies that as family strength increases by one unit, the time spent decreases by 0.078 units. As status of women increases by one unit, the time spent decreases by 1.127 units. Similarly, Model II can be expressed as following regression equation:

$${\text{Time spent}} = - 0.044 \times {\text{Family strength}} - 1.085 \times {\text{Status of women}} + 4.694 \times {\text{Energy satisfaction}}{.}$$

This implies that as family strength increases by one unit, and then time spent decreases by 0.044 units. As status of women increases by one unit, the time spent decreases by 1.085 units. Also as energy satisfaction increases by one unit, time spent increases by more than four folds by 4.694 units.

Here time spent (in the collection of firewood) is taken as indicator of energy poverty. If time is spent in the collection of firewood is more, then the household is relatively poorer with respect to energy needs. In Model I, higher status of women has a negative influence on energy poverty. Here status of woman is a latent variable and measures the participation of woman in all the decisions related to electricity. It also measures her reproductive health. It is seen from Table 1 that it is the most sensitive variable in this model, as it takes the values of − 1.127. The pattern reflected in Model I is also reflected in Model II. But in Model II, it is seen that Energy Satisfaction is nearly five times more sensitive to energy poverty. The values of these coefficients in Model I and Model II have a physical significance. Model II is taken as a final structural equation model. The factor loadings of latent variables along with the p values are also provided in Table 1. The significance of factor loadings is tested at α = 0.05. The out of 11 parameters of Model II, nine are highly significant. This is denoted by the asterisks sign.

Model II is explained with the help of path diagram given in Fig. 3. Here familyStrength (strength of family), statusWoman (status of woman) and fuelSatisf (energy satisfaction) are the three latent variables. The latent variable familyStrength explains the latent relationship between res15mor (residents 15 years and more of age), toefem (size of females in a family) and totfamno (size of family). The latent variable statusWoman explains payment (payment of electricity bill), regisrat (registration of electricity connection), decision (decisions regarding electricity), profit (profit kept from electricity connection) and delivery (delivery at home or hospital). And finally the latent variables fuelSatif explains elecsati (Electricity satisfaction), distance (distance covered for firewood) and collFirewood (collection of firewood).

Fig. 3
figure3

Path diagram of SEM Model II

A comparison between Model I and Model II is shown in Fig. 4. It is done in terms of model parameters and efficiency indices. In order to provide a better overview of the results, the values given in Table 1 are also represented with the help of bar graphs.

Fig. 4
figure4

A comparison between Model I and Model II

The correlation between the 12 variables used in this study is explained in Table 2. It is seen that time spent variables is significantly correlated with seven variables. Size of the family has a high positive significant correlation with residents 15 years or more of age and size of females in the family. Payment of electricity bill has a high positive significant correlation with decisions regarding electricity and profit kept from electricity connection.

Table 2 Correlation matrix of variables of time spent Model II

The details of the variables used in CCA and SEM are provided in Table 3. We see that three are from ratio scale. Nine variables are categorical data that can be classified either in nominal or ordinal scale. Descriptive statistics of these variables in terms of mean, median, first quartile (\({Q}_{1}\)), third quartile (\({Q}_{3}\)), skewness and kurtosis is also provided here. Here mean and median give an idea of the center of gravity of data. The standard deviation denoted by SD gives the spread of the data.

Table 3 Details of the variables

Model II is also validated on the basis of four random samples of 200 households drawn from the data of 300 households. It is shown in Table 4. The values of model efficiency parameters, \(\frac{{\chi }^{2}}{df}\), RMSEA, SRMR, CFI and TLI fulfill the criteria of a good SEM mentioned above. The consistency of the values of these parameters further confirms the choice of Model II.

Table 4 Validation of time spent Model II by comparing results from four samples

The original, reproduced, residual and standardized residual covariance matrices for the time spent Model II is provided in Table 5. There is a close correspondence between the original and the reproduced covariance matrices. The residual matrix is equivalent to a null matrix as seen in Table 5. This validates the goodness of the model. The standardized residual matrix also provided in Table 5, gives the values in standard deviation units. This matrix considers the different variances of different variables and expresses them to same unit. This facilitates comparison among different variables.

Table 5 Original, reproduced, residual and standardized residual covariance matrices for time spent Model II

The variables listed in Table 3 are also classified into two groups namely Cause and Effect. These are cause and effect of energy poverty. Cause comprises of the following variables. Residents 15 years and more of age, size of females in a family, size of family, payment of electricity bill, registration of electricity connection, decisions regarding electricity, profit kept from electricity connection and delivery at home or hospital. Effect comprises of the following variables. Distance covered for firewood, time spent in firewood collection, collection of firewood (who) and electricity satisfaction. So if we apply it in the theory of CCA, n = 300, q = 8, p = 4. CCA is conducted on identification of structural relationship of these two sets of variables simultaneously.

The results of CCA conducted on 300 households are given in Tables 6 and 7. The test of dimensionality for the canonical correlation analysis is shown in Table 6. It indicates that one of the four canonical dimensions is statistically significant at the 0.05 level. The F value for the first dimension is 2.189 with a p value of 0.00017. It also seen from Table 6, that dimension 1 had a canonical correlation of 0.352 between the sets of variables. While for dimension 2 the canonical correlation was lower at 0.26. Table 7 presents the standardized canonical coefficients for the first two dimensions. It is across both sets of variables. For the cause variables, the first canonical dimension is most strongly influenced by payment of electricity (− 1.022) and size of the family (− 0.676). This implies that as payment of electricity increases by 1 standard deviation unit, the first canonical variate in the first set decreases by the 1.022 standard deviation unit. Similarly as size of family increases by 1 standard deviation unit, then first canonical variate of the first set decreases by 0.676 standard deviation unit. For the second set, time spent in firewood collection (1.008) and collection of firewood (0.328) play a critical role in influencing the first canonical dimension. This shows that as Time spent in firewood collection increases by 1 standard deviation unit, the first canonical variate in the second set increases by the 1.008 standard deviation unit. Similarly as size of family increases by 1 standard deviation unit, then first canonical variate of the second set increases by 0.328 standard deviation unit.

Table 6 Test for canonical dimensions based on 300 households
Table 7 Standardized canonical coefficients

The results of the CCA conducted on 300 households are validated on two random samples of size 200 each. The results are shown in Table 8. Two random samples of size 200 are drawn at random from the data of 300 households. The test of dimensionality for the canonical correlation analysis is validated in Table 8. It indicates that one of the four canonical dimensions is statistically significant at the 0.05 level. The F value for the first dimension is 1.573 and 1.583 with p values of 0.0242 and 0.023. These are for the first and second sample, respectively. It also seen from Table 8, that dimension 1 had a canonical correlation of 0.354 between the sets of variables. While for dimension 2 the canonical correlation was lower at 0.303 for the first sample. Similarly, that dimension 1 had a canonical correlation of 0.358 between the sets of variables. And for dimension 2, the canonical correlation was lower at 0.301 for the second sample. The pattern reflected by the canonical correlation coefficients in Table 6 is also exhibited by canonical correlation for the two samples in Table 8.

Table 8 Validation of CCA from two random samples

Conclusion

SEM and CCA are used here to provide a holistic view of the energy consumption pattern. This is with reference to rural households having a partial access to electricity. These methods give a better understanding of interplay of several variables in the energy consumption dynamics. Direct benefits are obvious. The impact of latent variables in energy consumption dynamics is measured.

SEM is used in exploring and validating theoretical relationships between variables. It is expressed in terms of several multiple linear regressions. Out of many models explored, two models Model I and Model II are discussed in great detail. And the Model II is confirmed as final. This Model II is explained with the help of path diagram. Model II says that the independent variables energy satisfaction and status of women significantly affect the dependent variable Time spent (in the collection of firewood). These independent variables are also latent variables. Here variable time spent is a proxy indicator of energy poverty. The poorer the household was in terms of energy sources, the more time it spent in firewood collection. In the variable energy satisfaction, commonality between responses to electricity satisfaction, distance covered and collection of firewood was considered. And the variable status of women is based on the commonality between variables, role of women in payment, registration, decision, profit from electricity and reproductive health. The reproductive health of a woman is assessed from her response to the question on the delivery of her child in home or in the hospital. Here, family strength is the latent variable for residents 15 years or more of age, size of females in family and family size. The results of SEM of Model II are validated by four random samples of size 200 each.

The variables are also divided into two groups: cause and effect (of energy poverty) for CCA. It is found that the first canonical variate can explain the correlation between these two groups at α = 0.05 significance. These results are also validated by two random samples of size 200 each drawn from 300 households. It is also seen that payment of electricity bill in the set cause and time spent on firewood collection in the set effect are most sensitive to the first canonical variate.

In the developing world, the policies are made but their impacts cannot be quantified. This data-based approach of assessing energy poverty is hence very crucial. These results can also be generalized to other countries of south Asia and Africa. They provide guidelines to policy-makers and planners of such countries; here there is no background of good quality official records.

The obvious advantage of grid electricity is in lighting the houses, and that is easily seen. But improved access to electricity and reduction in the prices of electricity can have a great effect on several latent variables and their interrelationships. These latent variables and their interrelationships are not directly visible to the eye. But they have a huge impact on a rural household. This impact is quantified in this paper with SEM and CCA.

As future research, in-depth analysis of causal relationships between various variables discussed here will be explored in detail using Granger causality and entropy.

Availability of data and material

All the details of the data are provided in the paper.

Code availability

Not applicable.

Abbreviations

SEM:

Structural equation modeling

CCA:

Canonical correlation analysis

References

  1. ADB (2019) [Internet]. https://www.adb.org/publications/basic-statistics-2019. Accessed on 21 Apr 2020

  2. Bhuyan KC (2008) Multivariate analysis and its applications. New Central Book Agency, Delhi

    Google Scholar 

  3. Bystrzanowska M, Tobiszewski M et al (2020) Searching for solvents with an increased carbon dioxide solubility using multivariate statistics. Molecules 25:1156

    Article  Google Scholar 

  4. CBS, Nepal (2018) Statistical year book 2018. Central Bureau of Statistics, Kathmandu, p 12

    Google Scholar 

  5. Garcia DP, Caraschi JC et al (2019) Assessment of plant biomass for pellet production using multivariate statistics (PCA and HCA). Renew Energy 139:796–805

    Article  Google Scholar 

  6. Hair J et al (2006) Multivariate data analysis, 6th edn. Prentice Education Inc, Upper Saddle River

    Google Scholar 

  7. Khine M (2013) Applications of structural equation modeling in education research and practice. Sense Publishers, Rotterdam

    Google Scholar 

  8. Liubachyna A, Bubbico A, Secco L, Pettenella D (2017) Management goals and performance: clustering State Forest Management Organizations in Europe with multivariate statistics. Forests 8:504

    Article  Google Scholar 

  9. Makki A, Mosly I (2020) Factors affecting public willingness to adopt renewable energy technologies: an exploratory analysis. Sustainability 12:845

    Article  Google Scholar 

  10. Ministry of Finance (2018) Economic survey: Ministry of Finance. Government of Nepal, Kathmandu

    Google Scholar 

  11. Pachauri S, Spreng D (2011) Measuring and monitoring energy poverty. Energy Policy 39:7497–7504

    Article  Google Scholar 

  12. Sarkodiea SA, Ozturkb I (2020) Investigating the environmental Kuznets curve hypothesis in Kenya: a multivariate analysis. Renew Sustain Energy Rev. https://doi.org/10.1016/j.rser.2019.109481

    Article  Google Scholar 

  13. Schumacker RE, Lomax RG (2010) A beginner’s guide to structural equation modeling, 3rd edn. Taylor & Francis Group, Routledge

    Google Scholar 

  14. Simon K, Rittmann MR, Seifert AH, Kinetics BS (2018) multivariate statistical modeling and physiology of CO2 based biological methane production. Appl Energy 218:751–760

    Google Scholar 

  15. Todde G, Murgia E, Caria M, Pazzona A (2016) A multivariate statistical analysis approach to characterize mechanization, structural and energy profile in Italian dairy farms. Energy Rep 2:129–134

    Article  Google Scholar 

  16. Urugulu E (2019) Estimating demand of Turkish energy market: a multivariate regression model. J Energy Energija 68:3–10

    Article  Google Scholar 

  17. Wanne O, Navarro AA, Ramirez L, Valenzuela RX, Vindel JM, Cobocs FF, Kebe CMF, Zarzalejo LF (2017) Innovations and interdisciplinary solutions for underserved areas ses applications. In: Conference proceeding CNRIA

  18. World Bank (2019) [Internet]. https://databank.worldbank.org/data/download/GDP.pdf. Accessed on 21 Apr 2020

  19. Xu J, Zhou H, Fang Y (2019) Prediction of grid-connected photovoltaic power generation based on multivariate statistics. Concepts Mod Opt 51:105–119

    Google Scholar 

Download references

Acknowledgements

The author gratefully acknowledges the contributions of partner industry Rapti Renewable and Energy Services and students of the data collection team. She is also very thankful to Renewable Nepal financially supported by NORAD and SinTef Energy Research of Norway, for funding this work.

Funding

This work is funded by NORAD and SINTEF energy research of Norway under Renewable Nepal Project Grant number RENP-10-06-PID-379 under further support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jyoti U. Devkota.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Devkota, J.U. Forecasting satisfaction of grid electricity to a rural household: examples from Nepal. SN Bus Econ 1, 30 (2021). https://doi.org/10.1007/s43546-020-00036-3

Download citation

Keywords

  • Canonical correlation analysis
  • Structural equation modeling
  • Model validation
  • Model efficiency parameters
  • Structured questionnaire