1 Introduction

Productivity and technical change have long been of interest to economists. Beginning with Tinbergen (1942), economists have used a time trend in economic functions (e.g., production functions or their dual representations such as cost functions) to represent the rate at which new technology is introduced into the production unit. Specifically, the time trend approach usually involves adding a linear term in the time trend, a quadratic term in the time trend, and/or interactions of the time trend with factor input prices or outputs to a fixed coefficient economic function. Given estimation of the economic function, technical change (thus productivity growth) can then be readily expressed in terms of the estimated coefficients of the economic function, e.g., by taking the derivative of the economic function with respect to the time trend. Due to its empirical tractability, this approach has enjoyed considerable popularity since its introduction, and in fact it remains the dominant econometric approach to productivity measurement.

Despite its popularity, the time trend approach has two major drawbacks. First, it “produces a smooth, slowly changing characterization of the pace of technical change” (Baltagi and Griffin, 1988). This pattern of technical change is not supported by the evidence from index number approaches to calculating rates of technical change. For example, Baltagi and Griffin (1988), using the Divisia productivity index, found that productivity growth in the U.S. electricity industry showed considerable variability across time periods. Feng and Serletis (2008), using the Fisher productivity index, found that productivity growth in the U.S. manufacturing industry varied substantially from year to year. In addition, the smooth, slowly changing pattern of technical change obtained using the time trend approach is also inconsistent with findings in the investment literature (Cooper et al., 1999; Abel and Eberly, 1994) that suggest new technology adoptions occur in a “lumpy” fashion with discrete jumps. Hall and Mairesse (1995) also argued that disembodied technical change in practice includes “any errors in the price deflators common across firms, or other macro influences which may affect measured outputs and inputs”, and thus could not be smooth over time. Diewert and Wales (1992) argued against the standard time trend approach, because the single linear time trend approach “frequently does not fit the data well” and “rates of technical progress rarely remain even approximately constant”. Fox (1998) also argued that “there is no reason to expect, a priori, that technical progress enters each equation in a linear fashion”.

Second, the time trend approach often suffers from a built-in problem—“the problem of trending elasticities”. This problem was first noted by Diewert and Lawrence (2002) in the context of the standard time-trend normalized quadratic (NQ) profit function. Specifically, Diewert and Lawrence (2002) found that the price elasticities produced by this functional form often exhibited little variation over time, and further showed that this was mainly because the coefficients for the quadratic terms in input and output prices, on which price elasticities depend, were constant over time. Unfortunately, this problem is not confined to the NQ functional form. Taking for example the standard time-trend translog cost function to be examined in this paper, the price elasticity of demand for input i with respect to input price j is calculated as ηij = βij/si + sj − δij, where βij is the coefficient for the quadratic term in log input prices, si (sj) is the cost share of input i (j), and δij = 1 if i = j and 0 otherwise. As can be seen, δij is a constant, and si and sj are cost shares that do not vary much for some industries (Diewert and Lawrence, 2002; Feng and Serletis, 2008). If βij is also restricted to be constant over time, then price elasticities (i.e., ηij’s) will inevitably show little temporal variation.

To overcome the first drawback, Baltagi and Griffin (1988) proposed an innovative procedure, which involves first replacing linear and quadratic terms in the time trend in the standard translog cost function with a general index of technical change, and then estimating the index by use of a set of time-specific dummies and their interactions with input prices and output quantities. This procedure offers numerous advantages over the time trend approach, among which a major one is that it is capable of producing estimates of productivity growth that closely track the “observed” productivity growth represented by the Divisia productivity index. While not discussed in their paper, this procedure still has the problem of trending elasticities, because the coefficients for the quadratic terms in log input prices in the translog cost function (i.e., βij discussed above) remain constant over time.

The purpose of this paper is to propose a new procedure to simultaneously overcome the two drawbacks inherent in the standard time trend approach. Specifically, we propose a categorical time-varying coefficient translog cost function, whose primary feature is that each of its coefficients is expressed as a nonparametric function of a categorical time variable (which consists of T time points or T categories, where T is the total number of discrete time periods). The advantage of this feature is that it allows each time period to have its own set of coefficients and thus its own cost function. In other words, the new cost function has time-specific coefficients, thus enabling one to model production technology in a time-specific manner. To see this clearly, let tc = 1, 2, …, T denote the categorical time variable, y denote a vector of outputs, and w denote a vector of input prices, then the categorical time-varying coefficient cost function can be written as \(C^{t^c}\)(y, w), where the superscript tc = 1, 2, …, T is used to indicate that the coefficients of the cost function differs across time periodsFootnote 1. In contrast, all the coefficients are restricted to be constant over time in standard time trend models. To distinguish the categorical time-varying coefficient translog cost function from the standard time trend translog cost function, we refer to the former as “the categorical time-varying coefficient model” and the latter as “the standard time trend model”.

The formulation of the categorical time-varying coefficient model is inspired by recent econometric advances in varying-coefficient models (Fan and Zhang, 1999; Fan and Zhang, 2008; Gao and Phillips, 2013), particularly in categorical varying coefficient models (Li et al., 2013). The main feature of varying-coefficient models is that their regression coefficients are not set to be constants but are allowed to evolve with certain characteristics (covariates). Because these models allow the exploration of dynamic features that may exist in the data set (Fan and Zhang 2008), they have received increasing attention in different areas of economics, such as monetary policy (Primiceri, 2005) and growth theory (Durlauf et al., 2001). However, to the best of our knowledge, this is the first study that uses a varying-coefficient model to model technical change and price elasticities.

The categorical time-varying coefficient model has two advantages. First, it is capable of producing estimates of productivity growth that closely track the Törnqvist discrete approximation to the Divisia productivity index (hereafter “the discrete Divisia index”). Index numbers (such as the discrete Divisia and Fisher indexes) are widely used as benchmarks to check the accuracy of productivity estimates obtained from econometric models (Baltagi and Griffin, 1988; Feng and Serletis, 2008). There are two reasons for this. First, these indexes are simple and transparent. As pointed out by Good et al. (1997), these indexes “embody less stringent assumptions than are required by econometric models” and thus “provide valuable checks on the results of those (econometric) models”. Hulten (2001) also recommended that researchers “exploit the relative simplicity and transparency of these indexes to serve as a benchmark for interpreting the more complicated results of the parametric (econometric) approach”. Second, these indexes, particularly the Divisia and Fisher indexes, satisfy many desirable statistical properties such as constant quantities, time reversal, and proportionality (Dean, et al. 1996). In this paper, we follow Baltagi and Griffin (1988) and use the discrete Divisia index as a benchmark to evaluate productivity estimates obtained from our model.

We apply the categorical time-varying coefficient model to a panel of 81electricity firms in the U.S. over the period 1986–1998. We find that the productivity estimates obtained from the new model show considerable year-to-year variation. Particularly, we compare the productivity estimates obtained from the new model and those constructed from the discrete Divisia index and find that the former estimates closely track the latter ones, suggesting that the new model is capable of producing estimates of productivity growth with accuracy comparable to the discrete Divisia index. In addition, we find that our productivity estimates also closely track those obtained using the general index of technical change of Baltagi and Griffin (1988), further confirming the ability of the new model to closely track well-known productivity indexes. In contrast, we find that the standard time trend model yields only a smoothed version of the discrete Divisia index. Moreover, we find that the standard time trend model results in a misleading conclusion regarding the relative importance of technical change and scale effects.

Second, the categorical time-varying coefficient model is capable of producing price elasticities that show considerable year-to-year variations, as indicated by our empirical results. This is not surprising because all the coefficients of this new model, including βij on which all price elasticities are based, are allowed to vary from time period to time period. In contrast, we find that the standard time trend model generates price elasticities that show little temporal variations. Due to the importance and wide applications of price elasticities, this latter advantage should be of interest to economists in many fields, such as energy economics, public economics, international economics, and labor economics (Hochman et al., 2010; Farrell and Walker, 1999).

The rest of the paper is organized as follows. Section 2 provides a brief summary of two approaches to estimating technical change—the econometric approach and the index number approach. Section 3 presents three competing econometric methods for estimating technical change and price elasticities: the standard time trend model, the categorical time-varying coefficient model, and the Baltagi and Griffin (1988) model. Section 4 discusses the estimation procedure for the categorical time-varying coefficient model. Section 5 deals with data issues. Section 6 compares the empirical results of the three models. Section 7 concludes the paper.

2 Overview of index number and econometric approaches to measuring technical change

Generally, there are four approaches to measuring technical change: the growth accounting approach, the index number approach, the nonparametric frontier approach, and the econometric approach (see, for example, Hulten, 2001). In this section we focus on the two approaches that are related to this paper: the index number approach and the econometric approach.

2.1 Index number approach

In general, a total factor productivity growth (TFPG) index is defined as the growth in outputs not attributable to the growth in inputs. An advantage of index numbers is that they “embody less stringent assumptions than are required by econometric models” and thus “provide valuable checks on the results of those (econometric) models” (Good et al. 1997). For example, Baltagi and Griffin (1988) used a Divisia productivity index as a benchmark to evaluate productivity estimates obtained from the general index of technical change. In this paper, we follow Baltagi and Griffin (1988) and use a Divisia productivity index to evaluate productivity estimates obtained from our categorical time-varying coefficient model. Therefore, in what follows we provide an overview of Divisia productivity indexes.

Solow (1957) was the first to propose a Divisia TFPG index. Specifically, he began with an aggregate production function with a Hicksian neutral shift parameter and constant returns to scale. Assuming each input is paid the value of its marginal product, Solow (1957) showed that a Divisia TFPG index based on this production function was calculated as output growth minus observed cost-share-weighted input growth. Later, Jorgenson and Griliches (1967) generalized this index to a multiple-output framework and showed that a Divisia TFPG index based on a multiple output production function was calculated as the observed revenue-share-weighted output growth rate minus the observed cost-share-weighted input growth rate. Formally,

$$TFPG = \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \tilde s_m\dot y_m - \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} s_n\dot x_n,$$
(1)

where ym is output m (m = 1, 2, …, M); xn (n = 1, 2, …, N) is input n; a dot over a variable indicates the percentage growth of that variable (i.e., \(\dot y = d{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y{\mathrm{/}}dt\)); \(\tilde s_m\) is the observed revenue share for output m; and sn is the observed cost share for input n.

While having enjoyed considerable popularity, these two indexes are restricted in the sense that they are obtained under perfect competition and constant returns to scale. Noting this problem, Denny et al. (1981) replaced the observed revenue shares with cost elasticity shares, resulting in an index that is valid in the presence of imperfect competition and increasing returns to scale. Specifically, Denny et al. (1981) assumed that the underlying production process of a cost-minimizing productive firm was represented by the following cost function

$$C\left( t \right) = {\boldsymbol{w}}\prime {\boldsymbol{x}} = C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right),$$

where C is total cost; w = (w1, …, wN)′ is an N × 1 vector of input prices; x = (x1, …, xN)′ is an N × 1 input vector; y = (y1, …, yM)′ is an M × 1 output vector; and t is a time trend.

Denny et al. (1981) showed that the conceptually correct expression for TFPG for the cost-minimizing firm was

$$\begin{array}{*{20}{l}}{\mathrm{TFPG}} \hfill & = \hfill & - \left( {\dot C - \mathop {\sum}\limits_{m = 1}^M \frac{{\epsilon _m}}{\epsilon }\dot y - \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} s_n\dot w_n} \right)\hfill \\ \hfill & = \hfill & \mathop {\sum}\limits_{m = 1}^M \frac{{\epsilon _m}}{\epsilon }{\kern 1pt} \dot y - \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} s_n\dot x_n,\hfill\end{array}$$
(2)

where \(\epsilon _m = \partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} C\)(y, w, t)/∂ ln ym is the elasticity of the cost function with respect to output m, and \(\epsilon = \mathop {\sum}\nolimits_{j = 1}^M {\kern 1pt} \epsilon _m\) is the reciprocal of local returns to scale. The TFPG index defined in Eq. [2] has been either widely used or discussed in the literature (see, for example, Jorgenson, 1991).

The continuous-time Divisia TFPG indexes given in Eqs. [1] and [2] must be approximated by reasonable discrete-time approximations as data do not come in continuous-time form. As is well known, the Törnqvist approximation to the Divisia index is “exact” if the production/cost function has the translog form. In other words, the Törnqvist index is not an approximation at all, but is actually exact under right conditions. In addition, because the translog production function is a second order approximation to other production/cost functions, the discrete-time Törnqvist index is a sensible choice even if the underlying true functional form is not a translog (see, for example, Hulten, 2001). The continuous-time Divisia TFPG index given in Eq. [1] can be approximated by the following discrete-time Törnqvist index (Fuss, 1994):

$$\begin{array}{*{20}{l}}{\mathrm{TFPG}} \hfill & = \hfill & \mathop {\sum}\limits_{m = 1}^M \frac{1}{2}\left( {\tilde s_{m,t} + \tilde s_{m,t - 1}} \right){\mathrm{\Delta }}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m\hfill\\ \hfill & \hfill & - \mathop {\sum}\limits_{n = 1}^N \frac{1}{2}\left( {s_{n,t} + s_{n,t - 1}} \right){\mathrm{\Delta }}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} x_n.\hfill\end{array}$$
(3)

With respect to the continuous-time Divisia TFPG index in Eq. [2], it can be approximated by the following discrete-time Törnqvist index (Fuss, 1994):

$${\mathrm{TFPG}} = \mathop {\sum}\limits_{m = 1}^M \frac{1}{2}\left( {\frac{{\epsilon _{m,t}}}{{\epsilon _t}} + \frac{{\epsilon _{m,t - 1}}}{{\epsilon _{t - 1}}}} \right){\mathrm{\Delta }}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m - \mathop {\sum}\limits_{n = 1}^N \frac{1}{2}\left( {s_{n,t} + s_{n,t - 1}} \right){\mathrm{\Delta }}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} x_n.$$
(4)

In this paper, we follow Baltagi and Griffin (1988) and use Eq. [3] as a benchmark to compare productivity estimates obtained using the categorical time-varying coefficient model with those obtained using the standard time trend model. This is because Eq. [3] does not require econometric specification and estimation of technology. In contrast, Eq. [4], while theoretically correct, involves the specification and estimation of a cost function to obtain estimates of elasticities of cost with respect to output. This means that TFPG estimates obtained using Eq. [4] may vary considerably depending on how the cost function is specified and estimated, making Eq. [4] less suitable as a benchmark.

2.2 Econometric approach

The econometric approach to productivity measurement involves estimating the parameters of an economic function—a production, cost, or profit function. Productivity growth can then be expressed in terms of the estimated parameters of the economic function. Compared with the index number approach, this approach has three advantages. First, it avoids the need to impose the marginal productivity conditions that are required by the Solow (1957) and Jorgenson and Griliches (1967) TFPG indexes. Second, it gives a full representation of the technology such that the estimated parameters can be used not only in the calculation of productivity but also in the calculation of substitution elasticities and scale parameters. Third, noncompetitive pricing behavior, nonconstant returns, and factor-augmenting technical change can be accommodated to help “explain” the sources of productivity (Hulten, 2001).

The dominant econometric approach is the standard time trend approach, which involves using a time trend in cost or production functions to represent the rate at which new technology is introduced into the production unit. While this approach has enjoyed considerable popularity since its inception by Tinbergen (1942), no theoretical justification exists for the use of the time trend to proxy technical change. In fact, this approach produces a smooth, slowly changing characterization of the pace of technical change (Baltagi and Griffin, 1988), which is neither supported by the evidence from index number approaches to calculating rates of technical change, nor consistent with findings that suggest production technology proceeds in a “lumpy” fashion with discrete jumps.

Dissatisfaction with the standard time trend approach has led researchers to propose different techniques to overcome the problem associated with the standard time trend approach. For example, Baltagi and Griffin (1988) proposed an innovative procedure that involves the use of time-specific dummies and their interactions with input prices and output quantities. In this paper, we approach the problem from a different perspective—we approach the problem via the coefficients. More specifically, we allow the coefficients of our model (the categorical time-varying coefficient model) to potentially vary over time by expressing each coefficient as a nonparametric function of a categorical time variable. This flexible treatment allows each time period to have its own set of coefficients and thus its own cost function, which in turn leads to two major advantages: (1) our model is capable of producing estimates of productivity growth that closely track those obtained using the Divisia productivity index, and (2) our model is capable of overcoming the well-known “trending elasticities problem”. In the following section we will explain the categorical time-varying coefficient model in more details.

3 Model specifications

In this section, we specify the standard time trend model and the categorical time-varying coefficient model respectively. While the focus of this study is on the comparison between the standard time trend model and the categorical time-varying coefficient model, it would also be of interest to compare the latter model with the Baltagi and Griffin (1988) model. Therefore, we also specify the Baltagi and Griffin (1988) model at the end of this section.

3.1 The standard time trend model

The standard time trend model is written as

$$\begin{array}{*{20}{l}} {{\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)} \hfill & = \hfill & {\mathop {\sum}\limits_{k = 2}^K {\kern 1pt} \tilde \lambda _kD_k + \alpha _0 + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \alpha _i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \gamma _j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j + \tau t} \hfill \\ {} \hfill & + \hfill & {\frac{1}{2}\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n + \frac{1}{2}\mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m + \frac{1}{2}\delta t^2} \hfill \\ {} \hfill & + \hfill & {\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \phi _it{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \varphi _jt{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j,} \hfill \end{array}$$
(5)

where C is total cost; w = (w1, …, wN)′ is an N × 1 vector of variable input prices; y = (y1, …, yM)′ is an M × 1 output vector; t is a time trend; K is the number of firms; Dk (k = 2, …, K) are firm-specific dummies; and \(\tilde \lambda _k\) (k = 2, …, K) are the corresponding coefficients for the dummies. The usual symmetry restrictions require βin = βni (i, n = 1, …, N) and \(\gamma _{jm}^ \ast = \gamma _{mj}^ \ast\) (j, m = 1, …, M). Moreover, homogeneity of degree one in input prices implies the following restrictions:

$$\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \alpha _i = 1,\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \beta _{in} = \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{ni} = \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \psi _{ij} = \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \phi _i = 0.$$
(6)

Although we could estimate Eq. [5] directly, efficiency gains can be realized by estimating Eq. [5] together with its cost share equationsFootnote 2, which can be obtained by applying Shephard’s lemma to the cost function Eq. [5]:

$$s_i = \frac{{w_ix_i}}{C} = \alpha _i + \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j + \phi _it,\,i = 1, \cdots ,N,$$
(7)

where si is the cost share for input i. It is worth noting that the parameters αi, βin, ψij, and ϕi are common across the system of equations.

Given the estimated parameters from Eqs. [5] and [7]. Technical change can be computed as follows:

$${\mathrm{TC}} = - \frac{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)}}{{\partial t}} = - \left( {\tau + \delta t + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \phi _i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \varphi _j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j} \right).$$

Total factor productivity growth can then be computed as in Baltagi and Griffin (1988) and Fuss (1994)

$${\mathrm{TFPG}} = {\mathrm{TC}} + \left( {1 - \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}} \right)\dot y,$$
(8)

where for j = 1, 2, …, M,

$$\epsilon _{cy_j} = \frac{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)}}{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j}} = \gamma _j + \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \varphi _jt$$

is the cost elasticity of the jth output, and \(\dot y\) = \(\mathop {\sum}\nolimits_{j = 1}^M \left( {\epsilon _{cy_j}{\mathrm{/}}\mathop {\sum}\nolimits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}} \right)\dot y_j\) is the cost-elasticity-share weighted growth rate of outputs. According to Eq. [8], productivity growth can be decomposed into two components: technical change (TC) and scale effects (\(\left( {1 - \mathop {\sum}\nolimits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}} \right)\dot y\)). The latter component is positive (negative) in the presence of increasing (decreasing) returns to scale.

3.2 The categorical time-varying coefficient model

Our categorical time-varying coefficient model involves specifying each coefficient of the standard translog cost function (without the usual time trend) as a nonparametric function of a categorical time variable:

$$\begin{array}{*{20}{l}} {{\mathrm{ln}}{\kern 1pt} C^{t^c}\left( {{\boldsymbol{y}},{\boldsymbol{w}}} \right)} \hfill & = \hfill & {\mathop {\sum}\limits_{k = 2}^K {\kern 1pt} \lambda _kD_k + \alpha _0\left( {t^c} \right) + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \alpha _i\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \gamma _j\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j} \hfill \\ {} \hfill & + \hfill & {\frac{1}{2}\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n + \frac{1}{2}\mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast \left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m} \hfill \\ {} \hfill & + \hfill & {\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j,} \hfill \end{array}$$
(9)

where tc is the categorical time variable (which consists of T time points or T categories); \(C^{t^c}\)(y, w) is the cost function for period tc; Dk (k = 2, …, K) are firm-specific dummies; and λk are the corresponding parameters, which are assumed to be constant over time. Symmetry requires βin(tc) = βni(tc) (i, n = 1, …, N) and \(\gamma _{jm}^ \ast \left( {t^c} \right)\) = \(\gamma _{mj}^ \ast \left( {t^c} \right)\) (j, m = 1, …, M). Linear homogeneity in w implies

$$\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \alpha _i\left( {t^c} \right) = 1,\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \beta _{in}\left( {t^c} \right) = \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{ni}\left( {t^c} \right) = \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \psi _{ij}\left( {t^c} \right) = \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \phi _i\left( {t^c} \right) = 0.$$
(10)

Applying Shephard’s lemma to the cost function Eq. [9] yields the following cost share equations:

$$\begin{array}{*{20}{l}}s_i^{t^c}\hfill & = \hfill & \frac{{w_ix_i}}{C} = \alpha _i\left( {t^c} \right) + \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} w_n + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j,\hfill\\ i \hfill & = \hfill & 1, \cdots ,N.\hfill\end{array}$$
(11)

Note that the parameters αi(tc), βin(tc), and ψij(tc) are common across the cost system.

Noting that tc is a categorical variable, technical change from period t − 1 to t is computed as:

$$\begin{array}{*{20}{l}} {{\mathrm{TC}}_{t - 1,t}} \hfill & = \hfill & { - \frac{1}{2}\left\{ {\left[ {{\mathrm{ln}}{\kern 1pt} C^{t^c}\left( {{\boldsymbol{y}}^{t^c - 1},{\boldsymbol{w}}^{t^c - 1}} \right) - {\mathrm{ln}}{\kern 1pt} C^{t^c - 1}\left( {{\boldsymbol{y}}^{t^c - 1},{\boldsymbol{w}}^{t^c - 1}} \right)} \right]} \right.} \hfill \\ {} \hfill & {} \hfill & {\left. { + \left[ {{\mathrm{ln}}{\kern 1pt} C^{t^c}\left( {{\boldsymbol{y}}^{t^c},{\boldsymbol{w}}^{t^c}} \right) - lnC^{t^c - 1}\left( {{\boldsymbol{y}}^{t^c},{\boldsymbol{w}}^{t^c}} \right)} \right]} \right\}{\mathrm{,}}} \hfill \end{array}$$

where \({\boldsymbol{y}}^{t^c}\) (\({\boldsymbol{w}}^{t^c}\)) is the output (input price) vector for period tc, and \({\boldsymbol{y}}^{t^c - 1}\) (\({\boldsymbol{w}}^{t^c - 1}\)) is the output (input price) vector for the previous period. Given estimation of TC, the total factor productivity growth can be computed as

$${\mathrm{TFPG}} = {\mathrm{TC}} + \left( {1 - \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}^{t^c}} \right)\dot y{\mathrm{,}}$$
(12)

where, for j = 1, 2, …, M, the cost elasticity of the jth output, \(\epsilon _{cy_j}^{t^c}\), is

$$\epsilon _{cy_j}^{t^c} = \frac{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} C^{t^c}\left( {{\boldsymbol{y}},{\boldsymbol{w}}} \right)}}{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j}} = \gamma \left( {t^c} \right) + \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast \left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_m + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \psi _{ij}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} w_i$$
(13)

As in the case of the standard time trend model, Eq. [12] suggests that productivity growth can be decomposed into two components: technical change (TC) and scale effects \(\left( {\left( {1 - \mathop {\sum}\nolimits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}^{t^c}} \right)\dot y} \right)\).

The categorical time-varying coefficient model is similar in spirit to the smooth coefficient estimators of systems of equations proposed by Orbe et al. (2003) and Henderson et al. (2015). Specifically, Orbe et al. (2003) estimated a varying-coefficient SUR (seemingly unrelated regressions) system, but their coefficients are treated as functions of a continuous time variable (as opposed to discrete), while Henderson et al. (2015) estimated a varying-coefficient translog cost system, but their smooth coefficients depend on a different, continuous variable (i.e., bank size).

3.3 The Baltagi and Griffin (1988) model

The Baltagi and Griffin (1988) model is written as

$$\begin{array}{*{20}{l}} {{\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)} \hfill & = \hfill & {\mathop {\sum}\limits_{k = 2}^K {\kern 1pt} \overline \lambda _kD_k + \alpha _0 + A(t) + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \alpha _i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \gamma _j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j} \hfill \\ {} \hfill & + \hfill & {\frac{1}{2}\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n + \frac{1}{2}\mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m} \hfill \\ {} \hfill & + \hfill & {\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \phi _iA\left( t \right){\mathrm{ln}}{\kern 1pt} w_i + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \varphi _jA\left( t \right){\mathrm{ln}}{\kern 1pt} y_j,} \hfill \end{array}$$
(14)

where \(\overline \lambda _k\) (k = 2, …, K) are the coefficients for the firm-specific dummies. The same symmetry and linear homogeneity restrictions are imposed on Eq. [14] as in the case of the standard time trend model.

Applying Shephard’s lemma to the cost function Eq. [14] yields the following cost share equations:

$$s_i = \alpha _i + \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j + \phi _iA\left( t \right),\,i = 1, \cdots ,N,$$
(15)

As in Baltagi and Griffin (1988), Eqs. [14] and [15] are estimated by replacing A(t) with time-specific dummies. A(t)’s can then be recovered in the same way as in Baltagi and Griffin (1988) (see Eqs. (12a)–(12c) in Baltagi and Griffin (1988) for more details). With the estimates of A(t)’s, technical change can be calculated as follows

$$\begin{array}{*{20}{l}} {\mathrm{{TC}}} \hfill & = \hfill & {A(t) - A(t - 1) + \mathop {\sum}\limits_i {\kern 1pt} \bar \phi _i\left[ {A(t) - A(t - 1)} \right]{\mathrm{ln}}{\kern 1pt} w_i} \hfill \\ {} \hfill & {} \hfill & { + \mathop {\sum}\limits_j {\kern 1pt} \varphi _j\left[ {A(t) - A(t - 1)} \right]{\mathrm{ln}}{\kern 1pt} y_i.} \hfill \end{array}$$
(16)

Total factor productivity growth can then be computed as follows

$${\mathrm{TFPG}} = {\mathrm{TC}} + \left( {1 - \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \epsilon _{cy_j}} \right)\dot y,$$
(17)

where the cost elasticity of the jth output is defined as

$$\epsilon _{cy_j} = \frac{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)}}{{\partial {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j}} = \gamma _j + \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i + \varphi _jA\left( t \right).$$

4 Semiparametric estimation

In this section we detail the semiparametric estimation procedure for the categorical time-varying coefficient model (i.e., Eqs. [9]–[11]). In doing so, we draw on recent advances in semiparametric estimation for categorical varying coefficient models (Li et al., 2013). For the standard time trend model (Eqs. [5]–[7], its estimation procedures have been widely documented in the traditional factor demand literature (Barten, 1969; Christensen and Green 1976) and thus is not discussed in this paper.

Before proceeding to the semiparametric estimation procedure, we first impose the linear homogeneity restrictions in Eq. [10]. This is done by normalizing the cost and input prices in Eqs. [9] and [11] by one of the input prices (say, wN)

$$\begin{array}{*{20}{l}} {{\mathrm{ln}}\frac{{C^{t^c}\left( {{\boldsymbol{y}},{\boldsymbol{w}}} \right)}}{{w_N}}} \hfill & = \hfill & {\mathop {\sum}\limits_{k = 2}^K {\kern 1pt} \lambda _kD_k + \alpha _0\left( {t^c} \right) + \mathop {\sum}\limits_{i = 1}^{N - 1} {\kern 1pt} \alpha _i\left( {t^c} \right){\mathrm{ln}}\frac{{w_i}}{{w_N}} + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \gamma _j\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j} \hfill \\ {} \hfill & + \hfill & {\frac{1}{2}\mathop {\sum}\limits_{i = 1}^{N - 1} {\kern 1pt} \mathop {\sum}\limits_{n = 1}^{N - 1} {\kern 1pt} \beta _{in}\left( {t^c} \right){\mathrm{ln}}\frac{{w_i}}{{w_N}}{\mathrm{ln}}\frac{{w_n}}{{w_N}} + \frac{1}{2}\mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast \left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j{\mathrm{ln}}{\kern 1pt} y_m} \hfill \\ {} \hfill & + \hfill & {\mathop {\sum}\limits_{i = 1}^{N - 1} {\kern 1pt} \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}\left( {t^c} \right){\mathrm{ln}}\frac{{w_i}}{{w_N}}{\mathrm{ln}}{\kern 1pt} y_j,} \hfill \end{array}$$
(18)

and

$$\begin{array}{*{20}{l}} s_i \hfill & = \hfill & \alpha _i\left( {t^c} \right) + \mathop {\sum}\limits_{n = 1}^{N - 1} {\kern 1pt} \beta _{in}\left( {t^c} \right){\mathrm{ln}}\frac{{w_n}}{{w_N}} + \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}\left( {t^c} \right){\mathrm{ln}}{\kern 1pt} y_j,\hfill\\ i \hfill & = \hfill & 1, \cdots ,N - 1.\hfill\end{array}$$
(19)

This normalization method has been widely used to impose linear homogeneity property on economic functions (Griffiths et al., 2000).

Equations [18] and [19] can then be combined to form a system of N equations that, upon appending idiosyncratic error terms, takes the common seemingly unrelated regression (SUR) form. In estimating this SUR system, we follow the spirit of Bai (2009) and use an iteration scheme, where each iteration involves two steps: 1) given the time-invariant coefficients for the firm-specific dummy variables (i.e., λk, k = 2, …, K), we compute the time-varying coefficients for the non-dummy variables (i.e., α0(tc), αi(tc), γj(tc), βin(tc), \(\gamma _{jm}^ \ast \left( {t^c} \right)\), and ψij(tc)) using the semiparametric estimation procedure developed by Li et al. (2013); and 2) given the time-varying coefficients, compute the time-invariant coefficients. As pointed out by Bai (2009), this iteration scheme is very robust and has an excellent convergence property. Considering that the second step is straightforward, we elaborate on the first step in what follows.

The SUR system in the first step is subject to many cross-equation restrictions implied by Shephard’s lemma. Specifically, as can be seen from Eq. [18], the coefficients αi(tc), βin(tc), and ψij(tc) are common across the cost and share equations. To allow for such equality restrictions, we follow Wooldridge (2010, p.188) and Cameron and Trivedi (2005, p. 210) and redefine the regressors and coefficients given in Eqs. [18] and [19] so that the SUR system in the first step can be estimated by the method of least squares. Specifically, we first define the dependent variable vector and the disturbance vector. Let ql be an N × 1 vector representing the dependent variables associated with the lth observation with the first element being (ln \({\textstyle{{C_l^{t^c}\left( {{\boldsymbol{y}},{\boldsymbol{w}}} \right)} \over {w_{lN}}}}\) − \(\mathop {\sum}\nolimits_{k = 2}^K {\kern 1pt} \hat \lambda _kD_{lk}\)) and the second to the last element being the N − 1 shares (i.e., sl,i, i = 1, …, N − 1), and ul = (ul1, …, ulN)′ be an N × 1 disturbance vector, whose variance-covariance matrix is Σ = E(ul\({\boldsymbol{u}}_l^\prime\)|Xl). We then define regressors and coefficients equation by equation. For the normalized cost equation, let Xl1 be a \(1 \times {\textstyle{{N^2 + M^2 + 2MN + M + N} \over 2}}\) vector representing all the non-dummy regressors in the normalized cost function (i.e., Eq. [18]): X(tc) be the corresponding coefficients, i.e., all coefficients for non-dummy variables in Eq. [18]. The first equation of the N equation system can be written as

$$q_{l1} = X_{l1}{\boldsymbol{\beta }}\left( {t^c} \right) + u_{l1}.$$

For the first normalized share equation, we still use β(tc) as our redefined coefficient vector. However, the regressor vector, Xl2, is redefined in such a way that Xl2β(tc) is equal to the right hand side of the first normalized share equation. Formally, Xl2 = (0, 1, \({\mathbf{0}}_{N + M - 2}\), \({\mathrm{ln}}{\textstyle{{w_{l2}} \over {w_{lN}}}}, \ldots ,{\mathrm{ln}}{\textstyle{{w_{l,N - 1}} \over {w_{lN}}}}\), \({\mathbf{0}}_{{\textstyle{{N^2 - 5N + 12} \over 0}}}\), \({\mathrm{ln}}{\textstyle{{w_{l1}} \over {w_{lN}}}}\), \({\mathbf{0}}_{N - 2 + {\textstyle{{M(M + 1)} \over 2}}}\), \({\mathrm{ln}}{\kern 1pt} y_{l1}\), 0N−2, …, ln ylM, 0N−2), where 0p is a 1 × p vector of zeros. Thus, the second equation of the N-equation system can be written as

$$q_{l2} = {\boldsymbol{X}}_{l2}{\boldsymbol{\beta }}\left( {t^c} \right) + u_{l2}.$$

The ith (i = 2, 3, …, N − 1) normalized share equation can be redefined in a similar manner. Stacking all the N equations associated with the lth (l = 1, ⋯, KT) observation in the data set yields:

$${\boldsymbol{q}}_l = {\boldsymbol{X}}_l{\boldsymbol{\beta }}\left( {t^c} \right) + {\boldsymbol{u}}_l,$$
(20)

The entire system of equations associated with the KT observations can then be written as

$$\left[ {\begin{array}{*{20}{c}} {{\boldsymbol{q}}_1} \\ \vdots \\ {{\boldsymbol{q}}_{KT}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{\boldsymbol{X}}_1} \\ \vdots \\ {{\boldsymbol{X}}_{KT}} \end{array}} \right]{\boldsymbol{\beta }}\left( {t^c} \right) + \left[ {\begin{array}{*{20}{c}} {{\boldsymbol{u}}_1} \\ \vdots \\ {{\boldsymbol{u}}_{KT}} \end{array}} \right],$$
(21)

which can be written more compactly as

$${\boldsymbol{q}} = {\boldsymbol{X\beta }}\left( {t^c} \right) + {\boldsymbol{q}},$$
(22)

where q and u are NKT × 1 vectors and X is a \(NKT \times {\textstyle{{N^2 + M^2 + 2MN + M + N} \over 2}}\) matrix. The NKT × 1 disturbance vector u has the following variance-covariance matrix: Ω = E(uu′) = IKTΣ, where IKT is an identity matrix of dimension KT.

Following Li et al. (2013) and Henderson et al. (2015), we estimate the nonparametric categorical time-varying coefficient functions (β(tc)) by a kernel-based nonparametric method. Specifically, the least-squares estimator of β(tc) in Eq. [22] is the solution to

$${\mathbf{0}} = {\boldsymbol{X}}\prime {\boldsymbol{L}}\left( {t^c} \right)^{1/2}{\bf{\Omega }}^{ - 1}{\boldsymbol{L}}\left( {t^c} \right)^{1/2}\left[ {{\boldsymbol{q}} - {\boldsymbol{X\beta }}\left( {t^c} \right)} \right],$$
(23)

where L(tc) is a NKT × NKT kernel diagonal matrix with the ith (i = 1, 2, …, NKT) diagonal element defined as in Li et al. (2013)Footnote 3

$$l\left( {t_i^c,t^c,\lambda } \right) = \left\{ {\begin{array}{*{20}{c}} {1,} & {{\mathrm{when}}\,t_i^c = t^c} \\ {\lambda ,} & {{\mathrm{when}}\,t_i^c \ne t^c} \end{array}} \right.,$$
(24)

where λ is a smoothing parameterFootnote 4. The range of λ is from 0 to 1. λ = 0 leads to an indicator function, while λ = 1 gives a uniform weight function. Equation [24] indicates that when estimating β(tc), the kernel-based estimation method allows “borrowing” information from periods other than tc. It also indicates that when estimating β(tc), observations in period tc are given more weights than observations in other periods.

Solving for β(tc) in Eq. [23] leads to the estimator

$$\widehat {\boldsymbol{\beta }}\left( {t^c} \right) = \left[ {{\boldsymbol{X}}\prime {\boldsymbol{L}}\left( {t^c} \right)^{1/2}{\bf{\Omega }}^{ - 1}{\boldsymbol{L}}\left( {t^c} \right)^{1/2}{\boldsymbol{X}}} \right]^{ - 1}{\boldsymbol{X}}\prime {\boldsymbol{L}}\left( {t^c} \right)^{1/2}{\bf{\Omega }}^{ - 1}{\boldsymbol{L}}\left( {t^c} \right)^{1/2}{\boldsymbol{q}},$$
(25)

where the error covariance matrix Ω, as in the case of the standard feasible generalized least squares (FGLS) method for SUR models (Wooldridge, 2010 p.176), can be estimated by using the consistent system estimator which ignores the information in the variance-covariance matrix (i.e., by setting Ω = INKT). In this case, Eq. [25] reduces to

$$\widehat {\boldsymbol{\beta }}\left( {t^c} \right) = \left[ {{\boldsymbol{X}}\prime {\boldsymbol{L}}\left( {t^c} \right){\boldsymbol{X}}} \right]^{ - 1}{\boldsymbol{X}}\prime {\boldsymbol{L}}\left( {t^c} \right){\boldsymbol{q}}.$$
(26)

Using Eq. [26], we can obtain the N × 1 vector of residuals associated with the lth observation as \({\tilde{\boldsymbol u}}_l = {\boldsymbol{q}}_l - {\boldsymbol{X}}_l\widetilde {\boldsymbol{\beta }}\left( {t_c} \right)\) = \(\left[ {\tilde u_{l1},\tilde u_{l2}, \ldots ,\tilde u_{lN}} \right]\prime\). The estimate of the variance covariance matrix is given by \(\widehat \sum = \frac{1}{{KT}}\mathop {\sum}\nolimits_{l = 1}^{KT} {\kern 1pt} {\tilde{\boldsymbol u}}_l{\tilde{\boldsymbol u}}_l^\prime\), and hence we can construct our estimator of Ω.

The choice of the smoothing parameter λ is crucial. When λ = 0, our estimator is equivalent to estimating T independent cost functions with one for each period, whereas when λ = 1, \(\widetilde {\boldsymbol{\beta }}\left( {t_c} \right)\) becomes unrelated to tc, implying that the coefficients are constant over time. When choosing λ, we follow Li et al. (2013) and minimize the following least squares cross-validation:

$${\mathrm{CV}}\left( \lambda \right) = \frac{1}{{NKT}}\mathop {\sum}\limits_{j = 1}^{NKT} \left[ {g_j - {\boldsymbol{z}}_j\widehat {\boldsymbol{\beta }}_{ - j}\left( {t_j^c} \right)} \right]^2,$$

where gj is the jth row of q, zj is the jth row of X, and the leave-one-out estimates of the time-varying coefficients is expressed as

$$\widehat {\boldsymbol{\beta }}_{ - j}\left( {t_j^c} \right) = \left( {{\boldsymbol{X}}_{ - j}^\prime {\boldsymbol{L}}_{ - j}\left( {t_j^c} \right)^{1/2}{\bf{\Omega }}_{ - j}^{ - 1}{\boldsymbol{L}}_{ - j}\left( {t_j^c} \right)^{1/2}{\boldsymbol{X}}_{ - j}} \right)^{ - 1}{\boldsymbol{X}}_{ - j}^\prime {\boldsymbol{L}}_{ - j}\left( {t_j^c} \right)^{1/2}{\bf{\Omega }}_{ - j}^{ - 1}{\boldsymbol{L}}_{ - j}\left( {t_j^c} \right)^{1/2}{\boldsymbol{q}}_{ - j}$$

and the notation −j implies that the jth row is removed from Ω, L(tc), X and q.

5 Data

Our data was provided by Kumbhakar and Tsionas (2011) and consisted of annual time-series data for 81 privately investor-owned electric utilities in the United States over the period 1986–1998. The choice of U.S. electric utilities is particularly relevant considering the numerous studies of productivity in this industry.

With regard to the specification of outputs and inputs, one output is specified (i.e., M = 1) and represented by net steam electric power generation in megawatt-hours, which is defined as the amount of power produced using fossil-fuel fired boilers to produce steam for turbine generators during a given period of time. On the input side, three inputs are specified (i.e., N = 3): the aggregate of labor and maintenance, fuels, and capital stocks. The aggregate price of labor and maintenance is a cost-share weighted price for labor and maintenance. The price of labor is a company-wide average wage rate. The price of maintenance and other supplies is a price index of electrical supplies from the Bureau of Labor Statistics. The weight is calculated from the labor cost share of nonfuel variable costs for those utilities with entirely steam power production. Quantities of labor and maintenance equal the aggregate costs of labor and maintenance divided by a cost-share weighted price for labor and maintenance. The price of fuel aggregate is a Törnqvist price index of fuels (i.e., coal, oil, gas). The fuel quantities are calculated by dividing the fuel expenses by the Törnqvist price of fuel aggregate. The values of capital stocks are calculated by the valuation of base and peak load capacity at replacement cost to estimate capital stocks in a base year and then updating it in the subsequent years based upon the value of additions and retirements to steam power plant. The price of capital is the yield of the firm’s latest issue of long-term debt adjusted for appreciation and depreciation of the capital good using the Christensen and Jorgenson (1970) cost of capital formula.

6 Empirical results

In this section, we compare empirical results among the three models, namely, the categorical time-varying coefficient model, the standard time trend model, and the Baltagi and Griffin (1988) model. We estimate the three models separately for the electric utilities, and report the estimated parameters and their associated standard errors in Tables 13. As can be seen from Table 1, the point estimates of the coefficients for the categorical time-varying coefficient model vary considerably over time. Taking α1 for example, its point estimate varies markedly from 0.120 to 0.614.

Table 1 Parameter estimates for the categorical time-varying coefficient model for years 1986–1998
Table 2 Parameter estimates for the standard time trend model
Table 3 Parameter estimates for the Baltagi and Griffin (1988) model

With regard to the smoothing parameter (λ) for the categorical time-varying coefficient model, its estimate is pretty close to zero (0.039), indicating that the categorical time variable, tc, has a strong impact on the coefficients of the model. It is also indicative of an estimator that is close to the pure frequency estimator, where T independent cost systems are estimated with one for each period. The small bandwidth could be due to the fact that least-squares cross-validation tends to select a relatively small bandwidth (undersmoothing) (Chu et al. 2017). In our case where a geometric discrete kernel is used, the small bandwidth could also be an artifact of the selection mechanism, rather than the appropriate amount of smoothing (Rajagopalan and Lall, 1995). Having said that, we find that the estimates of productivity growth and technical change produced by the categorical time-varying coefficient model (presented in Subsections 6.2 and 6.3) are very close to those produced by the fully parametric Baltagi and Griffin (1988) model, whose results do not depend on bandwidth. This finding suggests indirectly that undersmoothing may not be a big issue in this paper.

We check monotonicity and curvature for each of the three models. Monotonicity requires that the first-order derivatives of the cost function, which correspond to input demands, be nonnegative. Curvature requires that the cost function be a concave function of prices or, equivalently, that the Hessian matrix of the cost function be negative semidefinite (see Serletis and Feng, 2015). We find that monotonicity is satisfied for every observation for each model. However, concavity violations are observed for 149 of the 1053 observations in the standard time trend model, for all observations in the Baltagi and Griffin (1988) modelFootnote 5, and for 340 of the 1053 observations in the categorical time-varying coefficient model. A possible reason for the large number of violations in the three models is that some electric utilities may not be cost minimizers throughout the sample period. Specifically, until the mid-1990s electric utilities in the U.S. typically operated as state-regulated monopolies under the jurisdiction of regulatory commissions in each state. Because of asymmetric information between these regulatory bodies and producers, effort-averse managers may engage in inefficient activities, resulting in possible deviations from cost-minimization (Fabrizio et al., 2007). These distortions may be amplified because electric power rates were set by the asymmetrically informed regulatory bodies (Laffont and Tirole, 1993; Fabrizio et al., 2007).

6.1 Comparison of the categorical time-varying coefficient model and the standard time trend model

Note that the standard time trend model is a special case of the categorical time-varying coefficient model when the coefficient functions of the latter model reduces to constants. This fact enables us to formally compare these two models using the parameter constancy test in Li and Racine (2010). To this end, we first rewrite the standard time trend model as follows:

$$\begin{array}{*{20}{l}} {{\mathrm{ln}}{\kern 1pt} C\left( {{\boldsymbol{y}},{\boldsymbol{w}},t} \right)} \hfill & = \hfill & {\mathop {\sum}\limits_{k = 2}^K {\kern 1pt} \tilde \lambda _kD_k + \left( {\alpha _0 + \tau t + \frac{1}{2}\delta t^2} \right) + \mathop {\sum}\limits_{i = 1}^N \left( {\alpha _i + \phi _it} \right){\mathrm{ln}}{\kern 1pt} w_i} \hfill \\ {} \hfill & + \hfill & {\mathop {\sum}\limits_{j = 1}^M \left( {\gamma _i + \phi _it} \right){\mathrm{ln}}{\kern 1pt} y_j + \frac{1}{2}\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{n = 1}^N {\kern 1pt} \beta _{in}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_n} \hfill \\ {} \hfill & + \hfill & {\frac{1}{2}\mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \mathop {\sum}\limits_{m = 1}^M {\kern 1pt} \gamma _{jm}^ \ast {\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_m + \mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \mathop {\sum}\limits_{j = 1}^M {\kern 1pt} \psi _{ij}{\kern 1pt} {\mathrm{ln}}{\kern 1pt} w_i{\kern 1pt} {\mathrm{ln}}{\kern 1pt} y_j.} \hfill \end{array}$$
(27)

If we treat \(\left( {\alpha _0 + \tau t + {\textstyle{1 \over 2}}\delta t^2} \right)\) in Eq. [27] as the coefficient for the constant term, i + ϕit) as the coefficient for ln wi, and (γi + ϕit) as the coefficient for ln yj, then a comparison of Eq. [27] with the categorical time-varying coefficient model in Eq. [9] reveals that the former model is a special case of the latter model at t = tc = 1, 2, …, T. More specifically, let θ0(tc) denote the vector of coefficients of Eq. [27] at period tc and θ(tc) denote the vector of coefficients of the categorical time-varying coefficient model at the same period, then θ(tc) nests θ0(tc) as a special parametric case. Thus, it would be of interest to test if θ(tc) is of the parametric form θ0(tc) at t = tc = 1, 2, …, T. If yes, we should therefore estimate the standard time trend model, because correctly specified parametric models are relatively more efficient than their semiparametric counterparts. Otherwise, we should estimate the semiparametric categorical time-varying coefficient model, because misspecified parametric models will lead to inconsistent results.

When conducting the test, we employ the parameter constancy test in Li and Racine (2010). Specifically, we formulate the null hypothesis as follows: H0 : Pr(θ(tc) = θ0(tc)) = 1, i.e., the probability of θ(tc) being equal to θ0(tc) is one. Following Li and Racine (2010), we use a modified version of the sum of squared differences \(\left( {\mathop {\sum}\nolimits_{t^c = 1}^T \left[ {\hat \theta \left( {t^c} \right) - \hat \theta _0\left( {t^c} \right)} \right]\prime \left[ {\hat \theta \left( {t^c} \right) - \hat \theta _0\left( {t^c} \right)} \right]} \right)\) as our test statistics. Here, a hat (∧) indicates a parameter estimate. As shown in Li and Racine (2010), the test statistic follows a standard normal distribution. Since the test statistics is notationally complicated, we omit it here to save space. Interested readers are referred to Eq. [27] and Theorem 4.1 of Li and Racine (2010, p. 1617) for the test statistics and its distribution respectively. In our particular case, the test statistic is 6.667 with an associated p–value of approximately zero; hence we reject the null that the general nonparametric coefficient function θ(tc) is of the parametric form θ0(tc).

We would like to point out at this point that while semi-parametric or nonparametric models in general offers additional flexibility in comparison with parametric models, this comes at the cost of potential overparameterization because of the large effective number of parameters required. This is a particular concern if the sample size is small. In our case, the effective number of parameters for the semi-parametric categorical time-varying coefficient model is 67Footnote 6, higher than the number of the parameters in the standard time trend model Eq. [25] and that in the Baltagi and Griffin (1988) model Eq. [30]. This relatively higher number of parameters, together with our relatively small sample size, results in wide confidence intervals for some of the time-specific parameters and price elasticities in Table 1 and Figs. 5-13. Therefore, we would like to warn the reader that the categorical time-varying coefficient model does require a large sample size (especially a large number of firms) in order to avoid wide confidence intervals.

Fig. 1
figure 1

Estimates of productivity growth

Fig. 2
figure 2

95% Confidence intervals for productivity growth estimated from the categorical time-varying model

Fig. 3
figure 3

Estimates of scale effects

Fig. 4
figure 4

Estimates of technical change

Fig. 5
figure 5

Estimates of own elasticity of labor

Fig. 6
figure 6

Estimates of elasticity of demand for labor with respect to the price of fuel

Fig. 7
figure 7

Estimates of elasticity of demand for labor with respect to the price of capital

Fig. 8
figure 8

Estimates of elasticity of demand for fuel with respect to the price of labor

Fig. 9
figure 9

Estimates of own elasticity of fuel

6.2 Estimates of total factor productivity

In this subsection, we compare the performance of the three models in terms of their ability to estimate total factor productivity growth. In doing so, we compute four industry-level TFPG indexes with the first one based on the standard time model (denoted by TFPGTime), the second one based on the categorical time-varying coefficient model (denoted by TFPGCTC), the third one based on the Baltagi and Griffin (1988) model (denoted by TFPGBG), and the fourth one based on the discrete Divisia TFPG index (denoted by TFPGDivisia). Specifically, TFPGTime is obtained by first computing utility-level total factor productivity growth using Eq. [8] and then computing an industry-level index as an average of the 81 utility-level TFPG estimates. TFPGCTC, TFPGBG and TFPGDivisia are obtained in a similar way but by using Eqs. [12], [17], and [3], respectively. As discussed in the Introduction, TFPGDivisia is used here as a benchmark for assessing TFPGTime, TFPGBG and TFPGCTC, because the discrete Divisia index does not require direct estimation of the underlying technology, satisfies many desirable statistical properties, and also is widely used by major statistical agencies around the world (see, for example, Good et al. 1997; Dean et al. 1996; Hulten, 2001). We also construct 95% confidence intervals for TFPGCTC using 1000 bootstrap replications.

Figure 1 plots TFPGTime, TFPGBG, TFPGCTC, and TFPGDivisia over the sample period. To avoid graphical clutter, the 95% bootstrap confidence intervals for TFPGCTC are plotted in Fig. 2. We first compare TFPGTime and TFPGDivisia. As can be seen from Fig. 1, TFPGTime shows much less variation than TFPGDivisia and is roughly a smoothed version of TFPGDivisia. To better observe this, we first examine the temporal pattern of the benchmark series, TFPGDivisia. Roughly speaking, the series TFPGDivisia can be divided into five segments: 1987–1988, 1988–1989, 1989–1993, 1993–1996, and 1996–1998, with the first, third and fifth segments representing three productivity slowdowns and the second and fourth segments representing two productivity resurgences. More specifically, in the first segment it decreases significantly from 8.37% in 1987 to 0.98% in 1988; in the second segment it rebounds from 0.98% in 1988 to 6.08% in 1989; in the third segment it declines substantially from 6.08% in 1989 to −3.14% in 1993; in the fourth segment it rebounds from −3.14% in 1993 to 3.68% in 1996; and in the last segment it drops sharply from 3.68% in 1996 to −3.78% in 1998. Turning now to TFPGTime, we see that for each of the five TFPGDivisia segments, TFPGTime passes close by the mean of the segment. Considering the third segment as an example, the TFPGTime series crosses this segment at a point where TFPG is approximately 0, which is close to the mean (0.0006) of this segment. This suggests that the series TFPGTime can be regarded as being obtained by linking the means of the five segments. In this sense, TFPGTime is roughly a smoothed version of TFPGDivisia.

Fig. 10
figure 10

Estimates of elasticity of demand for fuel with respect to the price of capital

Fig. 11
figure 11

Estimates of elasticity of demand for capital with respect to the price of labor

Turning to TFPGBG and TFPGCTC, we see from Fig. 1 that both indexes track TFPGDivisia very closely, dropping in 1988, 1991, 1993, 1995, 1998 and rising in 1989, 1994, 1996. Thus, compared with the standard time trend model, both the categorical time-varying coefficient model and the Baltagi and Griffin (1988) model are capable of producing productivity estimates that closely track the discrete Divisia index.

To quantify the performance of TFPGTime, TFPGBG, and TFPGCTC, we calculate the mean squared error (MSE) for each of them: \({\mathrm{MSE}}\) = \({\textstyle{1 \over T}}\mathop {\sum}\nolimits_{t = 1}^T \left( {\mathrm{{T}}\widehat {\mathrm{{FP}}}{\mathrm{G}}_t - {\mathrm{TFPG}}_t^{\mathrm{{Divisia}}}} \right)^2\), where \({\mathrm{T}}\widehat {\mathrm{{FP}}}{\mathrm{G}}_t\) stands for an estimate of industry-level TFPG for period t (e.g., \({\mathrm{TFPG}}_t^{\mathrm{{CTC}}}\), TFPGBG, or \({\mathrm{TFPG}}_t^{\mathrm{{Time}}}\)). Our results show that TFPGCTC has an MSE of 0.016%, TFPGBG has a slightly higher MSE of 0.021%, whereas TFPGTime has a much higher MSE of 0.067%. This confirms that both the categorical time-varying coefficient model and the Baltagi and Griffin (1988) model track more closely with the discrete Divisia index than does the standard time trend model.

We would like to point out here that in general one should not prefer one parametric model over another because the former can produce productivity estimates closer to those produced by a non-parametric modelFootnote 7. This is because endogeneity in general cannot be easily addressed in most nonparametric models, whereas it can be easily addressed in parametric models. For example, productivity estimates obtained from a DEA non-parametric model in general cannot be used as a standard for evaluating parametric models that do treat endogeneity, because most DEA non-parametric models cannot handle endogeneity. For the same reason, Törnqvist productivity estimates in general cannot be used as a standard to compare parametric models that treat endogeneity. In our case, however, both the categorical time-varying coefficient cost function and the Törnqvist index are based on prices and output, which can be considered to be exogenous under the assumption of cost minimization. However, if a distance function were estimated then endogeneity would have to be dealt with.

6.3 Decomposition of total factor productivity into technical change and scale effects

An interesting question to ask at this point is: what causes the lack of variation in TFPGTime? To answer this question, we decompose each of the three TFPG indexes, TFPGTime, TFPGBG, and TFPGCTC, into two components: technical change and scale effectsFootnote 8. For notational clarity, let TCTime (SCTime) denote technical change (scale effects) obtained using the standard time trend model, TCBG (SCBG) denote technical change (scale effects) obtained using the Baltagi and Griffin (1988) model, and TCCTC (SCCTC) denote technical change (scale effects) obtained using the categorical time-varying coefficient model. Figure 3 presents the estimates of SCTime, SCBG, and SCCTC. Looking at this figure, we see that SCTime, SCBG, and SCCTC closely track each other, with the latter two almost coinciding. This suggests that SCTime, SCBG, and SCCTC show a similar degree of variation, which in turn implies that scale effects cannot lead to the lack of variation in TFPGTime.

Fig. 12
figure 12

Estimates of elasticity of demand for capital with respect to the price of fuel

Fig. 4 presents the estimates of TCTime, TCBG, and TCCTC. As can be seen, TCBG and TCCTC closely track each other, both showing large year-to-year variation. In contrast, TCTime declines in a linear fashion over the sample period, suggesting that it is the lack of variability in TCTime that leads to the lack of variation in TFPGTime. In fact, TCTime is roughly a smoothed version of TCCTC. To see this, we follow Feng and Serletis (2008) and obtain a smoothed TCCTC series by regressing the raw TCCTC series on firm dummies and a continuous time trend, calculating the fitted values, and aggregating across individual electric utilities. The smoothed TCCTC is also plotted in Fig. 4. As can be seen, TCTime evolves in a similar pattern as the smoothed TCCTC, confirming that TCTime is roughly a smoothed version of TCCTC. In fact, TCTime is also a smoothed version of TCBG. However, to avoid graphical clutter, we do not plot the smoothed version of TCBG.

Fig. 13
figure 13

Estimates of own elasticity of capital

It is worth noting here that as bandwidth (λ) increases, one would expect TCTime and TCCTC to converge. Particularly, when bandwidth approaches one, there will be little difference between TCTime and TCCTC, because in the case the categorical time-varying coefficient model reduces to the standard time trend model. When bandwidth approaches zero (as in our case), there will be a large difference between TCTime and TCCTC.

It would be of interest to discuss possible reasons for the large variation in technical change found with the categorical time-varying coefficient model. Before that, we emphasize the following two points. First, the estimates of technical change produced by the well-known Baltagi and Griffin (1988) model also show a similar, large year-to-year variation, providing support for the large variation in technical change found with the categorical time-varying coefficient model. Second, in practice technical change not only includes technological change, but also captures effects of other factors that shift the cost frontier such as regulatory changes, improvements in input quality, and organizational innovations (see, for example, Griliches, 1994; Hulten, 2001).

In our case, the estimates of technical change produced by the categorical time-varying coefficient model (i.e., TCCTC in Fig. 4) reveal two pronounced temporal changes: a technical change slowdown between 1991–1993 and a technical change resurgence in the post-1993 period. The resurgence is likely to be caused by the deregulation of the electric industry, which began with the Energy Policy Act of 1992Footnote 9. This deregulation is well documented to reduce costs and improve technical efficiencies in investor owned utilities in restructured states. For example, using annual plant-level data for large fossil-fueled generating plants owned by US electric utilities over the period 1985–1999, Fabrizio et al. (2007) find that investor owned plants in restructuring regimes reduced their labor and non-fuel operating expenses by 6 to 12 percent relative to government- and cooperatively owned plants that were largely insulated from restructuring incentives. In addition, previous studies find that the deregulation reduced costs and improved technical efficiencies in investor owned utilities in non-restructured states as well. For example, Fabrizio et al. (2007, p. 1266) find that “even IOUs in nonrestructuring regimes improved their input use to a large extent, perhaps in response to latent threats of increased competition and restructuring.” Rungsuriyawiboon and Stefanou (2007, p. 233) also find that the deregulation of the electricity industry resulted in “an increase of allocative and technical efficiencies of variable inputs for both electric utilities located in and outside states with a deregulation plan”, although “electric utilities located in states with a deregulation plan have higher increases of average technical efficiency of variable in puts than those located outside states with a deregulation plan.” These findings suggest that the deregulation increased the technical efficiency of investor owned utilities in both restructured and non-restructured states, and thus is a likely cause of the technical change resurgence in the post-1993 period.

With regard to the technical change slowdown in early 1990s, it is likely due to the Clean Air Act Amendments (CAAA) of 1990, which established a sulfur dioxide (SO2) allowance trading program intended to cut nationwide emissions of SO2 by 50 percent below 1980 levels by the year 2000. The program was implemented in two stages: Phase I (starting January 1, 1995) and Phase II (starting January 1, 2000). In Phase I, half the total reductions were required by January 1, 1995, largely by requiring 110 electric power generating plants to cut sulfur dioxide emission rates to 2.5 lbs/million British thermal units (mmBtu). According to The U.S. Department of Energy (1997), Phase I utilities, which accounted for 62 percent of total SO2 emissions of the industry in 1990, reduced their SO2 emissions by 45.4% between 1990 and 1995. This significant reduction in SO2 emissions was achieved by using methods such as switching to expensive lower-sulfur coal, installing flue gas desulfurization equipment, and retiring units (U.S. Department of Energy 1997). Obviously, all these emission-reducing methods increased the operating costs of the utilities for a given level of output and thus possibly lowered their technical change in early 1990s. Again, this is because in practice technical change not only includes technological change, but also captures effects of other factors that shift the cost frontier.

6.4 Estimates of price elasticities

As discussed in the Introduction, the standard normalized quadratic (NQ) functional form suffers “the problem of trending elasticities”, that is, the price elasticities produced by this functional form often exhibit little variation over time. This problem raises an intriguing question: Do the three competing translog models in the present paper suffer the same problem? Figs. 513 present the estimates of own price elasticity of labor (η11), elasticity of demand for labor with respect to the price of fuel (η12), elasticity of demand for labor with respect to the price of capital (η13), elasticity of demand for fuel with respect to the price of labor (η21), own price elasticity of fuel (η22), elasticity of demand for fuel with respect to the price of capital (η23), elasticity of demand for capital with respect to the price of labor (η31), elasticity of demand for capital with respect to the price of fuel (η32), and own price elasticity of capital (η33). For each model, firm-level estimates of price elasticity of demand for input i with respect to input price n are calculated as

$$\eta _{in} = \beta _{in}{\mathrm{/}}s_i + s_n - \delta _{in},$$
(28)

where βin is the coefficient for ln wi ln wn, si (sn) is the cost share of input i (n), and δin = 1 if i = n and 0 otherwise. These firm-level estimates are then aggregated to produce industry-level estimates of price elasticities. In each figure, the solid line shows the estimates obtained from the categorical time-varying coefficient model and the dotted lines show the associated 95% bootstrap confidence intervals. The dashed line shows the estimates obtained from the standard time trend model. The dashdotted line shows the estimates obtained from the Baltagi and Griffin (1988) model.

Looking at the industry-level estimates of the own elasticity of labor demand implied by the standard time trend model (i.e., \(\eta _{11}^{\mathrm{{Time}}}\)) in Fig. 5, we see that it varies within an very narrow range between −0.7510 and −0.7398. This result is not surprising. As can be seen from Eq. (28), δin is a constant, while si and sn are cost shares that do not vary much in practice, especially over a short period of time. If the coefficient, βin, is also restricted to be a constant over time as in the standard time trend translog cost function, then ηin is doomed to show little variation. Thus, as with the standard normalized quadratic functional form, the standard translog functional form suffers a similar problem (i.e., lacking variation in price elasticities), due largely to the time-invariant nature of βin.

Turning to the industry-level estimates of the own elasticity of labor demand implied by the Baltagi and Griffin (1988) model (denoted by \(\eta _{11}^{\mathrm{{BG}}}\)) in Fig. 5, we see that \(\eta _{11}^{\mathrm{{BG}}}\) almost coincides with \(\eta _{11}^{\mathrm{{Time}}}\), showing little variation over the sample period. This finding is not surprising, because although the use of time dummies enables the Baltagi and Griffin (1988) model to track the Divisia index closely, it still does not allow βin, on which the price elasticities are based, to vary over time. Specifically, as in the case of the standard time trend model, the price elasticity of demand for input i with respect to input price n in the Baltagi and Griffin (1988) model is also calculated as

$$\eta _{in} = \beta _{in}{\mathrm{/}}s_i + s_n - \delta _{in},$$

where βin is the coefficient for ln wi ln wn, si (sn) is the cost share of input i (n), and δin = 1 if i = n and 0 otherwise. As can be seen from this expression, the constantness of βin, together with the facts that δin is a constant and that si and sn do not vary much in practice, implies that the estimates of price elasticities based on the Baltagi and Griffin (1988) model are bound to show little variation.

Turing lastly to the industry-level estimates of the own elasticity of labor demand implied by the categorical time-varying coefficient model (i.e., \(\eta _{11}^{\mathrm{{CTC}}}\)) in Fig. 5, we see that it shows more year-to-year variation. Specifically, it starts at −0.8550 in 1986, rises to −0.7612 in 1987, remains at roughly the same value in 1988, drops to −0.8128 in 1989, rebounds to −0.7463 in 1990, falls to −0.8032 in 1991, rises to −0.6042 in 1993, falls to −0.6955 in 1994, rebounds to −0.6249 in 1995, and falls to −1.0217 in 1998. The finding that \(\eta _{11}^{\mathrm{{CTC}}}\) shows much more variation is not surprising, because β11 is no longer a constant in the categorical time-varying coefficient model. Instead, it varies from time period to time period, thus allowing η11 to vary from one period to another even when δ11 is a constant and s1 does not vary much. More specifically, β11 is a nonparametric function of the categorical time variable (tc) and thus is very flexible with respect to time. This flexible treatment allows β11 to vary over time, which in turn allows price elasticities (i.e., η11) to vary over time.

It is worth noting that in Fig. 5 the elasticity estimates obtained from the standard time trend model (i.e., \(\eta _{11}^{\mathrm{{Time}}}\)) fall into the 95% bootstrap confidence intervals for those obtained from the categorical time-varying coefficient model (i.e., \(\eta _{11}^{\mathrm{{CTC}}}\)). There are two possible reasons why this happens. First, as can be seen from the figure, the dashed line representing \(\eta _{11}^{\mathrm{{Time}}}\) is roughly a smoothed version of the solid line representing \(\eta _{11}^{\mathrm{{CTC}}}\), making it very likely that the former line falls into the 95% confidence intervals associated with the latter line. Second, as noted above, the categorical time-varying coefficient model requires a large number of observations for each period (i.e., a large number of firms) in order to obtain tight elasticity confidence intervals. In our case, however, we have only 81 observations (firms) for each period, which may not be large enough to produce tight elasticity confidence intervals. Therefore, we would like to warn the reader again that the categorical time-varying coefficient model does require a large sample size (especially a large number of firms) in order to avoid wide confidence intervals.

It would be of interest to discuss why the price elasticities varied substantially over time. Before that, note that this is not the first study that attempts to correct the problem of trending elasticities inherent in locally flexible functional forms. For example, Diewert and Lawrence (2002) noted that the normalized quadratic (NQ) function suffered from this problem because the substitution matrix of this function was constant over time. To solve this problem, they allowed the substitution matrix to change over time by setting the matrix equal to a weighted average of a matrix C (which characterizes substitution possibilities at the beginning of the sample period) and a matrix D (which characterizes substitution possibilities at the end of the sample period). In other words, they allowed the NQ function to be flexible at two points (the first sample point and the last) instead of the usual one point. By applying this modified NQ model to Australian aggregate data between 1967–1997, Diewert and Lawrence (2002) found that some of their price elasticities showed large year-to-year variations.

Turning back to our case, the large variations in the price elasticities are not surprising, considering that the restructuring of the U.S. electric power industry in 1990s has been described as “one of the largest single industrial reorganizations in the history of the world” (Kwoka, 2008). This restructuring changed the production process of the electric utilities in many ways. First, many vertically integrated electric utilities sold some or all of their generating plants to non-utility producers, thereby dissolving the structure traditionally relied upon to capture economies of coordination among vertical stages of electricity production (Kwoka, 2002; Wolfram, 2005; Borenstein and Bushnell, 2015). Second, electric utilities transitioned from a cost-of-service regulation model, in which they were compensated based upon average production cost, to a market-based pricing model, under which these assets earned a market price for the output they were able to produce (Borenstein and Bushnell, 2015). This transition changed another important aspect of the production process—input mix (Wolfram, 2005). Specifically, as is well known, a major problem with the cost-of -service regulation model is that firms regulated in this manner tend to engage in excessive amounts of capital accumulation in order to expand the volume of their profits, thus resulting in inefficiently high capital-labor ratios. For example, studies find that during 1970s and 1980s, some utilities invested in nuclear power plants that far exceeded their budgets (Wolfram, 2005). However, utilities under the market-based pricing model are less likely to engage in capital over-accumulation and more likely to choose an input mix that minimizes their total costs at prevailing input prices (Wolfram, 2005). Third, facilities incorporating new technologies, which were smaller and more efficient than existing large facilities employed by utilities, prompted nonutility producers to enter the competitive power market. According to The U.S. Department of Energy (DOE, 1996, p. 38), “No longer is it necessary to build a 1,000 megawatt generating plant to exploit economies of scale”, because new technologies such as aero-derivative gas turbines could be efficient at scales as small as 10 megawatts. These new technologies met the needs of nonutilities and enabled them “to generate electricity more cheaply than the total (regulatory) costs of many utilities that now use previously developed fossil-fueled or nuclear-fueled technologies” (DOE, 1996, p. 35).

All the changes in production process discussed above could potentially affect price elasticities. For example, using a 1992–2000 panel of 34 US major investor-owned electric utilities, Granderson and Forsund (2014) found that the shift from cost-of -service regulation to competition significantly changed own input price elasticities of demand for labor and fuel. Here, we would like to point out that restructuring initiatives progressed at different paces in different states. This implies that the changes in price elasticities were likely to last for a considerable amount of time as shown in Figs. 513.

7 Conclusion

The econometric approach to productivity measurement literature has long been dominated by the time trend approach. Despite its popularity, this approach has two major drawbacks. First, it produces a smooth, slowly changing characterization of the pace of technical change. This pattern of technical change is neither supported by the evidence from index number approaches to calculating rates of technical change, nor consistent with findings in the investment literature that suggest technologies are introduced in a “lumpy” fashion with discrete jumps. Second, it suffers the problem of trending elasticities.

To overcome the two drawbacks associated with the standard time trend approach, we propose in the present paper a categorical time-varying coefficient translog cost function. The main feature of this model is that each of its coefficients is expressed as a nonparametric function of a categorical time variable (which consists of T time points or T categories, where T is the total number of discrete time periods), thus allowing each time period to have its own set of coefficients and cost function. In this sense, the time-varying feature of the new cost function relaxes the restrictive implicit assumption underlying the standard time trend models that all sample years have to share the set of coefficients, thus making the new cost function a more general representation of production technology. Our technique requires panel data on firms within the same industry to allow the coefficients to differ across time periods.

We apply the categorical time-varying coefficient model to a sample of 81 electric utilities in the United States over the period 1986–1998. We find that the categorical time-varying coefficient model is capable of producing estimates of productivity growth that closely track those obtained using the Divisia productivity index. In contrast, the standard time trend model produces estimates of technical change that is only a smoothed version of those implied by the Divisia productivity index. We also find that the categorical time-varying coefficient model free of the problem of trending elasticities. Specifically, we find the price elasticities produced by the model show considerable year-to-year variations, whereas those produced by the standard time trend model or by Baltagi and Griffin (1988) model vary within very narrow ranges. Considering the importance and wide applications of price elasticities, this latter advantage should be of interest to applied economists in many fields.