Professional Documents
Culture Documents
E-1
E-2
E-3
NOTE: One must be the first variable in the Rhs list in all model specifications.
The default specification is Aigner, Lovell and Schmidts canonical normal-half normal model. The
default form is a production frontier model,
y = x + v - u, u = |U|.
That is, the right hand side of the equation specifies the maximum goal attainable. To specify a cost
frontier model or other model in which the frontier represents a minimum, so that
y = x + v + u, u = |U|,
use
; Cost
This specification is used in all forms of the stochastic frontier model. As noted below, one
additional specification you may find useful is
; Start = values for , , .
(The meanings of the parameters are developed below.) ALS also developed the normal-exponential
model, in which u has an exponential distribution rather than a half normal distribution. To request
the exponential model, use
; Model = Exponential (or ; Model = E )
in the FRONTIER command. For this model, the parameters are (,,v). Further details appear
below. There are also several model forms, and numerous modifications such as heteroscedasticity
that are developed below.
E-4
This is the full list of general specifications that are applicable to this model estimator.
Controlling Output from Model Commands
; Par
keeps ancillary parameters , , etc. with main parameter vector in b.
; OLS
displays least squares starting values when (and if) they are computed.
; Table = name saves model results to be combined later in output tables.
Robust Asymptotic Covariance Matrices
; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown),
same as ; Printvc.
; Choice
uses choice based sampling (sandwich with weighting) estimated matrix.
; Cluster = spec requests computation of the cluster form of corrected covariance estimator.
Optimization Controls for Nonlinear Optimization
; Start = list
; Tlg [ = value]
; Tlf [ = value]
; Tlb[ = value]
; Alg = name
; Maxit = n
; Output = n
; Set
E-5
ei yi xi
This residual is usually not of interest in itself. It is, however, the crucial ingredient in the efficiency
estimator discussed in Section E62.8. The estimator of ui that we will use is computed by the
Jondrow formula E[u|v-u] or E[u|v+u] if based on a cost frontier,
( w)
E [u | ]
w , v u , w = /,
2
1 1 ( w)
v2 u2 ,
u
.
v
In the JLMS formula, ei is the estimator of i. The formulas and computations are discussed in
Section E62.8.
The frontier model is, save for its involved disturbance term, a linear regression model. The
conditional mean in the model is
E[yi|xi] = xi - E[ui|xi].
In most cases, E[ui|xi]is not a function of xi, so the derivatives of E[yi|xi] with respect to xi are just .
In other cases, we will consider, the conditional mean of ui does depend on xi or other variables, so
the partial effects in the model might be more involved than this. Once again, however, these will
usually not be of direct interest in the study. But, in all cases, E [u | ] will be an involved function of
xi and any other variables that appear anywhere else in the model. We will examine the partial
effects on the efficiency estimators in Section E62.8.
b
= regression parameters, ,
varb = asymptotic covariance matrix
Scalars:
E-6
Use ; Par to add the ancillary parameters to these. The ancillary parameters that are estimated for
the various models are as follows, including the scalars saved by the estimation program:
Half and truncated normal:
Truncated normal:
Exponential:
Heteroscedastic model:
Heterogeneity in mean:
year
= 1970...1984
revenue = revenue
points = number of points served
mtl
= materials quantity
fuel
= fuel quantity
eqpt
= equipment quantity
labor = labor quantity
property = property quantity
pk
= capital price index
= log(cost)
= log(pm)
= log(pl)
= log(pm/pp)
= log(pl/pp)
= log(eqpt)
= log(output)
cn
lpf
lpp
lpfpp
lf
ll
lq2
= cost/pp
= log(pf)
= log(pp)
= log(pf/pp)
= log(fuel)
= log(labor)
= lq2
lcn
lpe
lpk
lpepp
lm
lp
= log(cn)
= log(pe)
= log(pk)
= log(pe/pp)
= log(mtl)
= log(property)
E-7
Std. Dev.
*
*
*
12.2051123
12.1442590
694.216237
2.73370613
.449149577
7891.20036
2871.84294
.090206941
.498933251
20.2340835
.915983955
.952225978
Description
country number omitting internal units, 1...,191
year (1993-1997)
internal political unit, 0 for countries, else 1,...,6.
composite health care attainment
disability adjusted life expectancy
health expenditure per capita, PPP units
educational attainment, years
OECD member country, dummy variable
per capita GDP in PPP units
population density per square KM
gini coefficient for income distribution
dummy variable for tropical location
proportion of health spending paid by government
World Bank government effectiveness measure
World Bank measure of democratization
(The data were analyzed in Greene (2004a,b). Some of the variables, such as popden and gdpc, were
augmented from other sources in these studies.) Although the data are a five year panel a few
countries were observed for fewer than five years there is almost no cross year variation in any
variable. (The proportion of total variation that is within groups is less than 1% for the four time
varying variables.) We have created a cross section from these data as follows: First, we discarded the
data on internal political units. We then averaged comp, dale, hexp and educ across the five years. We
retained a sample of 191 cross sectional (country) units. The following command set creates the data set.
SAMPLE
REJECT
SETPANEL
RENAME
CREATE
CREATE
CREATE
CREATE
CREATE
CREATE
CREATE
CREATE
REJECT
; 1-840 $
; small > 0 $
; Group = country ; Pds = ti $
; hc3 = educ $
; lpubthe = log(pubthe) $
; dalebar = Group Mean(dale, Pds = ti) $
; compbar = Group Mean(comp, Pds = ti) $
; educbar = Group Mean(educ, Pds = ti) $
; hexpbar = Group Mean(hexp, Pds = ti) $
; logdbar = Log(dalebar) ; logcbar = Log(compbar) $
; logebar = Log(educbar) ; loghbar = Log(hexpbar) $
; loghbar2 = loghbar^2 $
; year # 1997 $
E-8
REGRESS
FRONTIER
KERNEL
; Ran(12345) $
; 1-500 $
; u = Abs(Rnn(0,2))
; v = Rnn(0,1)
; x = Rnn(0,1)
;y=x+v+u$
; Lhs = y ; Rhs = one,x
; Res = e $
; Lhs = y ; Rhs = one,x $
; Rhs = e $
The CREATE command generates y exactly according to the model, except note that u is not
subtracted, it is added. Thus, we should expect this model to perform poorly. The estimation results
from the FRONTIER command are shown below. Note the string of warnings. Estimation is
allowed to proceed, but the results are not a frontier as such. The final estimate of is essentially
zero, with a huge standard error and the reported estimate of u2 in the box above the results is
0.0000. The other estimates are, in fact, the same as OLS. The kernel density estimator for the OLS
residuals is clearly skewed in the positive, that is, the wrong direction. Once again, we emphasize,
this is a failure of the data to conform to the model.
Error
315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.
WARNING! OLS residuals have the wrong skewness for SFM
Other forms of the model models may also behave poorly.
In this case, one MLE for the half normal model is OLS
for beta and sigma and zero for the inefficiency term.
Warning
141: Iterations:current or start estimate of sigma nonpositive
Warning
141: Iterations:current or start estimate of sigma nonpositive
Warning
141: Iterations:current or start estimate of sigma nonpositive
Warning
141: Iterations:current or start estimate of sigma nonpositive
Warning
141: Iterations:current or start estimate of sigma nonpositive
Line search at iteration 30 does not improve fn. Exiting optimization.
E-9
E-10
Unfortunately, the Waldman result is a sufficient condition, not a necessary one. That is, it
has been shown that when the OLS residuals have the right skewness, then the MLE for the frontier
model is unique, and you will have no trouble in estimation. When they have the wrong skewness,
it is only shown that the OLS results are a local stationary point of the log likelihood, not that they
are the global maximizers. There may be another point that is yet better than OLS. Our airline data
used below provide an example. Consider the following results, where we present both the
stochastic frontier estimates and OLS. (The model, itself, is developed later, so we show only the
useful results here.) As above, we receive the initial warning about the skewness of the OLS
residuals. Then, estimation proceeds and an apparently routine solution emerges that is different
from, and better than (has a higher log likelihood) OLS.
Error
315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE.
WARNING! OLS residuals have the wrong skewness for SFM
Other forms of the model models may also behave poorly.
In this case, one MLE for the half normal model is OLS
for beta and sigma and zero for the inefficiency term.
Normal exit: 11 iterations. Status=0, F=
-105.0617
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LQ
Log likelihood function
105.06169
Variances: Sigma-squared(v)=
.02411
Sigma-squared(u)=
.00457
Sigma(v)
=
.15527
Sigma(u)
=
.06757
Stochastic Production Frontier, e = v-u
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LQ| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Deterministic Component of Stochastic Frontier Model
Constant|
-1.05847***
.02333
-45.37 .0000
-1.10419 -1.01274
LF|
.38355***
.07045
5.44 .0000
.24547
.52163
LE|
.21961***
.07300
3.01 .0026
.07653
.36270
LM|
.71667***
.07654
9.36 .0000
.56666
.86668
LL|
-.41139***
.06382
-6.45 .0000
-.53647
-.28630
LP|
.18973***
.02960
6.41 .0000
.13171
.24775
|Variance parameters for compound error
Lambda|
.43515**
.20117
2.16 .0305
.04086
.82944
Sigma|
.16933***
.00057
295.74 .0000
.16821
.17045
--------+-------------------------------------------------------------------Ordinary
least squares regression ............
Diagnostic
Log likelihood
=
105.05876
Standard error of e =
.16244
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LQ| Coefficient
Error
t
|t|>T*
Interval
--------+-------------------------------------------------------------------Constant|
-1.11237***
.01015 -109.57 .0000
-1.13227 -1.09247
LF|
.38283***
.07116
5.38 .0000
.24335
.52231
LE|
.21922***
.07389
2.97 .0033
.07441
.36404
LM|
.71924***
.07732
9.30 .0000
.56769
.87078
LL|
-.41015***
.06455
-6.35 .0000
-.53665
-.28364
LP|
.18802***
.02980
6.31 .0000
.12961
.24643
--------+--------------------------------------------------------------------
E-11
There is no simple bullet proof strategy for handling this situation. You can try different
starting values with ; Start = values for , , that differ from OLS, but it is hard to know where
these will come from. Moreover, it is likely that you will end up at OLS anyway. As Waldman
points out, this is a potentially ill behaved log likelihood function. We offer the preceding as a
caution for the practitioner. For the particular data set used here, we can identify a specific culprit.
The failure of the model emerges in the presence of the variable lm, and does not occur when lm is
omitted from the equation. We have no theory, however, for why this should be the case. Simply
deleting variables from the model until one which does not have the skewness problem emerges does
not seem like an effective strategy.
We do note, the failure might signal a misspecified model. For example, for our airlines
example, the specification above omits the capital variable. When lk = log(k) is added to the model, we
obtain the following quite routine results (albeit with the wrong signs on capital and labor inputs).
Normal exit: 13 iterations. Status=0, F=
-108.4392
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LQ
Log likelihood function
108.43918
Estimation based on N =
256, K =
9
Inf.Cr.AIC =
-198.9 AIC/N =
-.777
Variances: Sigma-squared(v)=
.01902
Sigma-squared(u)=
.01692
Sigma(v)
=
.13791
Sigma(u)
=
.13007
Sigma = Sqr[(s^2(u)+s^2(v)]=
.18957
Gamma = sigma(u)^2/sigma^2 =
.47074
Var[u]/{Var[u]+Var[v]}
=
.24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0
108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] =
.730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LQ| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------| Deterministic Component of Stochastic Frontier Model
Constant|
-2.98823***
.72136
-4.14 .0000
-4.40206 -1.57439
LF|
.37257***
.07038
5.29 .0000
.23463
.51052
LE|
2.09473***
.68790
3.05 .0023
.74647
3.44299
LM|
.69910***
.07580
9.22 .0000
.55054
.84766
LL|
-.42909***
.06315
-6.79 .0000
-.55287
-.30530
LP|
.44533***
.09498
4.69 .0000
.25917
.63149
LK|
-2.09806***
.76556
-2.74 .0061
-3.59853
-.59759
| Variance parameters for compound error
Lambda|
.94309***
.16870
5.59 .0000
.61244
1.27373
Sigma|
.18957***
.00064
297.81 .0000
.18832
.19082
--------+--------------------------------------------------------------------
We emphasize, the Waldman result, and this particular theoretical outcome, is specific to the
normal-half normal model. However, when it occurs, problems of a similar sort will often, but not
always, show up in other models. Thus, in spite of a warning, your fitted exponential, or panel data
model, may be quite satisfactory.
E-12
or
in which e has zero mean and constant variance, and is orthogonal to (1,x1). Thus, the model as shown
can be estimated consistently by OLS. The constant term estimates = (0 - E[u]). Assuming that
E[u] is estimable, therefore, estimation of by MLE vs. OLS is a question of efficiency, not
consistency. (However, we remain interested in estimation of u, so this may be a moot point.)
E-13
The following shows computation of a COLS estimator for the airlines. The FRONTIER
command requests both the inefficiency estimates, ui, and the cost efficiency estimates, eui_cost.
The kernel density estimate for the cost efficiency is shown in Figure E62.3. The results for the
estimator begin with the standard output for least squares regression. The second panel includes
some preliminary results for the stochastic frontier model, including the chi squared test for zero
skewness (which is rejected); 2 = (n/6)(m3/s3)2. The standard normal statistic is the signed (based on
m3) square root of 2. The third panel presents descriptive statistics for ui and exp(-ui).
CREATE
CREATE
CREATE
FRONTIER
KERNEL
; lc = Log(cost/pp)
; lpkp = Log(pk/pp)
; lplp = Log(pl/pp)
; lpmp= Log(pm/pp)
; lpep = Log(pe/pp)
; lpfp = Log(pf/pp) $
; lk = Log(k) $
; ly = Log(output) ; ly2 = .5*ly*ly $
; Lhs = lc ; Rhs = one,ly,ly2,lpkp,lplp,lpmp,lpep,lpfp
; Cost ; Model = COLS
; Costeff = Eui_cost ; Eff = ui $
; Rhs = eui_cost
; Title = Estimated Cost Efficiency Based on COLS Estimator $
E-14
E-15
Skewness[e] = Skewness[u]
since v is symmetric. The left hand sides can be consistently estimated using the OLS residuals:
m2 = (1/n)i ei2
and
m3 = (1/n)iei3.
Both of the functions on the right hand side are known for the half normal and exponential models.
In particular, for the half normal model, the moment equations are
m2 = v2 + [1 - 2/]u2 ,
m3 = (2/)1/2[1 - 4/]u3.
1/ 3
m / 2
u 3
1 4 /
and v m2 (1 2/ ) u2 .
Note that there is no solution for u if m3 is not negative, which is the problem discussed in Section
E62.5. Assuming that this problem does not arise, the corrected constant term is
u 2/ .
a + Est.E[u] = a +
This is the modified least squares (MOLS) estimator that is discussed in a number of sources, such
as Greene (2005). These are the values used for starting values for the MLE, as well. Looking
ahead, note that there is no natural method of moments estimator for the mean parameter in the
truncated normal model discussed in Section E63.3. For this model, we use
/u = 0.
For the normal-exponential model, the moment equations that correspond to the preceding are
m2 = v2 + 1/2
3
m3 = -2/ .
1/ 3
Therefore,
2
m3
and
a +
and v m2 1/ 2
1/ .
E-16
The header information in the results table will display the decomposition of the variance of
the composed error in two parts. In the case of the half normal model,
Var[u] = [(-2)/]u2
not u2. Therefore, the estimated parameters might be a bit misleading as to the relative influence of
u on the total variation in the structural disturbance.
We note, these estimators are sometimes quite far from the maximum likelihood estimators,
particularly when the sample is small. But, they are generally quite satisfactory as starting values for
the MLE. The following demonstrates these results for the airline data, where we use MOLS and
MLE to fit a normal-half normal cost frontier. (Note, the signs of the OLS residuals are reversed
because we are fitting a cost function.) In the results below, we have imposed the assumption of
linear homogeneity in prices in the cost function by normalizing the six input prices, pk, pl, pe, pp,
pm, pf, by the property price, pp. The model contains log(pj/pp). To complete the constraint, we
have also normalized total cost by pp before taking logs.
CREATE
CREATE
CREATE
NAMELIST
REGRESS
CREATE
CALC
CALC
FRONTIER
; lpk = Log(pk) $
; lpmpp = lpm - lpp ; lpfpp = lpf - lpp ; lpepp = lpe - lpp
; lplpp = lpl - lpp ; lpkpp = lpk - lpp $
; lcp = lc - lpp $
; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp $
; Lhs = lc ; Rhs = x ; Res = e $
; e = -e ; e2 = e*e ; e3 = e2*e $
; m2 = Xbr(e2) ; m3 = Xbr(e3) $
; List ; su = (m3 * Sqr(pi/2) / (1-4/pi))^(1/3)
; sv = Sqr(m2 - (1-2/pi) * su^2)
; a = b(1) + su * Sqr(2/pi) ; lambda = su/sv
; sgma = Sqr(su^2 + sv^2) $
; Lhs = lc ; Rhs = x ; Cost $
The first set of results below are the OLS estimates with the correction to the constant term
and the method of moments estimators of u and v used to start the MLE. The maximum likelihood
estimators are shown next. The estimates for the stochastic frontier model include the log likelihood
and the implied estimates of u, v and their squares, based on the estimates of = u/v and 2 = u2
+ v2, which are estimated by ML. (The reverse transformations are u2 = 22/(1 + 2) and v2 =
2/(1 + 2). The MLE is documented further in the next section.
----------------------------------------------------------------------------Ordinary
least squares regression ............
LHS=LC
Mean
=
2.84024
Standard deviation
=
1.09256
No. of observations =
256 Degrees of freedom
Regression
Sum of Squares
=
300.028
7
Residual
Sum of Squares
=
4.36487
248
Total
Sum of Squares
=
304.393
255
Standard error of e =
.13267
Fit
R-squared
=
.98566 R-bar squared =
.98526
Model test
F[ 7,
248]
=
2435.25310 Prob F > F*
=
.00000
Diagnostic
Log likelihood
=
157.91523 Akaike I.C.
= -4.00909
Restricted (b=0)
=
-385.41031 Bayes I.C.
= -3.89830
Chi squared [ 7]
=
1086.65108 Prob C2 > C2* =
.00000
E-17
E-18
U ~ N[0,u2],
v
~ N[0,v2].
The default form is the normal-half normal model. In this form, model estimates consist of ,
v2 u2 and = u/v, and the usual set of diagnostic statistics for models fit by maximum
likelihood. The other basic form in the ALS model is the exponential model,
u ~ exp(-u), u> 0,
which has mean inefficiency E[u] = 1/ and standard deviation, u= 1/. The parameters estimated in
the exponential specification are (,,v). The estimate of u is reported in the results as well.
The following illustrate the estimator, with a normal-half normal cost frontier and a normalexponential production frontier. The coefficient estimates for the exponential cost frontier are shown
as well.
FRONTIER
FRONTIER
The stochastic frontier results include the standard output for MLEs The derived estimates of u, v,
u2, v2 and are shown as well. The value of = u2/2 is given for comparability with other parts
of the literature. This ratio, which lies in (0,1) is sometimes reported as a variance decomposition of
. However, the variance of u = |U| is (1 - 2/)u2, so the appropriate decomposition is (1 2/)u2/[v2 + (1 - 2/)u2]. This is the value shown next under in the results.
A likelihood ratio test against the hypothesis of no inefficiency follows the variance
estimates. The degrees of freedom for the test are accumulated in the table.. The first is for u in the
base case. The second is for the heteroscedasticity terms in Var[u] when they are introduced in the
model. Heteroscedasticity is developed in Chapter E63. The third term is for the truncation
parameters in the normal-truncated normal model, also developed in the next chapter. The degrees
of freedom for the inefficiency model are the sum of these three terms. The likelihood ratio statistic
is presented next. This is a nonstandard test because the null value of u is on the boundary of the
parameter space. Appropriate tables for the mixed chi squared test used here are given in Kodde and
Palm (1986). (A copy of the relevant parts of the table is kept internally by the program. (See, also,
Coelli, Rao and Battese (1998) for further details.)
E-19
Results for the normal-exponential model appear below. It is not possible to use a LR test to
choose between these two models. The test has zero degrees of freedom neither model is obtained
by a restriction on the other. One possibility might be a Vuong (1989) statistic, which would be
computed as
nm
V
, mi log( fi | normal ) log( f i | exponential ) .
sm
Results of the test are shown below the model results. The statistic is well inside the inconclusive
region.
FRONTIER
CREATE
FRONTIER
CREATE
CALC
[CALC] VUONG
E-20
E-21
E62.7.1 Log Likelihoods for the Half Normal and Exponential Models
As will be evident below, different formulations of the log likelihood are most convenient
for estimation of the different forms of the frontier models. (And, different authors sometimes
parameterize the models differently.) The base case is the normal-half normal model. In this form,
vi~ N[0,v2] and ui = |Ui| where Ui ~ N[0,u2]. It follows that f(ui) = 2(ui/u), ui> 0. The density of
i = vi- ui has been shown to be
f(i)
= (2/)(i/)(-i/).
The most common form of the individual term in the log likelihood function (and the one used in
LIMDEP) is
log Li = log(2/) - log - (i/)2 + log[-Si/]
where
= yi - xi
= u / v,
= u2 + v2, v2 = 2 / (1 + 2), u2 = 22 / (1 + 2)
Olsens transformation is used for maximizing the log likelihood. We reparameterize the function in
terms of = 1/ and = (1/). Then,
log Li = log(2/) + log + i2 + log (-Si)
where
= yi - xi.
ai
= -Si
= (ai)/(ai)
= -aii = i2.
xi
log Li / i yi i S yi 1 /
0
0
xi xi
2 log Li / yi xi
0
2 xi xi
i 2 yi xi
i x
0
yi2
0
2 yi xi
2 yi2
i yi
0
0
i xi 0
0
i yi 0
1 / 2
i2 i Sxi i Syi
i Sxi
i Syi
0
E-22
i
v
2
log Li / i
v
1 / v S i
2
v
2 v
S i / v
x x / 2 Sx
ai Sxi / v
v
i
i i
2 log Li / i Sxi
v2
av
ai Sxi / v ai v
v v
ai2
Sx i
i Sxi
0
Sxi 1 / 2 v2
2v i
2
3
2v i 2i S i / v
i Sxi
.
1
The parameterization in terms of is more convenient but does not produce different results.
E-23
( w)
E [u | ]
w , v u , w =S/.
1 2 1 ( w)
(This is an indirect estimator of u. Unfortunately, it is not possible to estimate ui directly from any
observed sample information. The various surveys noted earlier discuss the computation of and
properties of this estimator.) The counterpart for the normal-exponential model is
( w)
E [u | ] v
w , w = (S/v + v).
1 ( w)
These are computed and saved as new variables in your data set with
; Eff = variable name
The ; List specification will also request a listing of this variable. This form is used for all
distributions and all variations of the stochastic frontier model.
By adding ; Eff = u to the frontier command, then
KERNEL
; Rhs = u $
we obtain the results below. (We also added the title to the command with ; Title = ) Note an
important element of the estimation. The Standard Deviation reported below is 0.054895, whereas
the estimate of u is 0.13746. The difference arises because the 0.054895 is an estimate of the
standard deviation of E[u|], not the standard deviation of u.
+---------------------------------------+
| Kernel Density Estimator for U
|
| Observations
=
256
|
| Points plotted
=
256
|
| Bandwidth
=
.016298
|
| Statistics for abscissa values---|
| Mean
=
.109394
|
| Standard Deviation =
.054895
|
| Minimum
=
.030722
|
| Maximum
=
.350422
|
| ---------------------------------|
| Kernel Function
=
Logistic
|
| Cross val. M.S.E. =
.000000
|
| Results matrix
=
KERNEL
|
+---------------------------------------+
E-24
y
Exp(u )
Optimal y
or
if you estimate a cost frontier instead. You may compute both inefficiencies and efficiency measures
in the same command. Figure E62.5 was obtained by adding
; Costeff = ecu
to the FRONTIER command, then requesting the kernel density estimator as before (with the title
changed accordingly).
E-25
= u/v
i* = -iu2/2 = -i2/(1+2)
* = uv/ = /(1 + 2)
LBi i * * 1 1 (1 2 ) i * / *
UBi i * * 1 1
i * / *
Then, if the elements were the true parameters, the region [LBi,UBi] would encompass 100(1-)% of
the distribution of ui|i. For constructing confidence intervals for technical efficiency, TEi|i, it is
necessary only to compute TEUBi = exp(-LBi) and TELBi = exp(-UBi).
E-26
We note two caveats about the estimator. First, the received papers based on classical
methods have labeled this a confidence interval for ui. However, it is a range that encompasses
100(1-)% of the probability in the conditional distribution of ui|i. based on E[ui|i], not ui, itself.
The interval is centered at the estimator of the conditional mean, E[ui|i], not the estimator of ui,
itself, as a conventional confidence interval would be. The estimator is actually characterizing the
conditional distribution of ui|i, not constructing any kind of interval that brackets a particular ui
that is not possible. Second, these limits are conditioned on known values of the parameters, so they
ignore any variation in the parameter estimates used to construct them. Thus, we regard this as a
minimal width interval.
You can request computation of these lower and upper bounds by adding
; CI(100( 1 - )) = lower, upper
where 100(1-) is one of 90, 95, or 99 and lower, upper are names for two variables that will be
created. You may use this feature with ; Eff = variable or ; Techeff = variable (or ; Costeff =
variable for a cost frontier). If you have both ; Eff and ; Techeff in the command, the confidence
intervals are computed for ; Techeff. (You can obtain the interval for ; Eff in this case by computing
the negatives of the logs with CREATE.)
We obtained these bounds for our cost function with
; Costeff = euc ; CI(95) = eucl,eucu
We followed the estimation with
PLOT
; Rhs = eucl,ecu,eucu
; Title = Upper and Lower Bound Estimates of Cost Efficiency
; Vaxis = Cost Efficiency$
E-27
The centipede plot is also a useful device in this context. The following redraws Figure E62.6 using
a different view for the lower and upper bounds
CREATE
PLOT
; Firm_i = Trn(1,1) $
; Lhs = firm_i ; Rhs = eucl,eucu
; Centipede ; Endpoints = 0,260 ; Grid
; Title = Confidence Limits for Cost Efficiency $
E-28
Where m = half normal or exponential, m = /(1+2) for the half normal and 1/v for the
exponential, and wm is defined earlier. We now suppose that
= y - x - z
where x is the theoretical inputs to the goal and z are the environmental variables. We require the
derivatives with respect to z. For convenience, let W = -w and exploit the symmetry of the normal
density. Then, A[wm()] = [(W)/(W) + W]. The derivative is
Efficiency/z = Efficiency-mdA(W)/dW -1 wm/ -.
The two terms that we need to complete the derivation are wm/ = S/ for the half normal model
and S/v for the exponential model and
2
dA(W ) W (W ) (W )
1
D(W ).
dW
(W ) (W )
Collecting terms,
2 /(1 2 )
Efficiency
Efficiency D(W ) or
S ( )
z
We can sign this result, though the magnitude will be empirical. The first three terms are all between
zero and one, as is their product. S is either +1 for a production frontier or -1 for a cost frontier.
Thus, in total, the derivative is a fraction of the corresponding coefficient, which takes the same sign
for a cost frontier and the opposite sign for a production frontier.
Partial derivatives and simulations are computed with PARTIALS and SIMULATE. The
general approach would be
FRONTIER
; Cost (optional)
; Lhs = goal variable
; Rhs = one, x variables, z variables $
The command might also contain ; Eff = variable, ; Techeff = variable or ; Costeff = variable.
Then, you may follow it with
or
PARTIALS
SIMULATE
The function analyzed in these two commands is the technical or cost efficiency,
Efficiency = exp{- E [u | ] }.
E-29
The following demonstrates using the cost frontier, with variables z = (load factor, log stage length,
points served). Data on z are missing for one of the firms.
CREATE
NAMELIST
FRONTIER
SIMULATE
; logstage = Log(stage) $
; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp
; z = loadfctr,logstage,points $
; Cost ; Lhs = lc ; Rhs = x,z
; Eff = u ; Costeff = euc ; CI(95) = eucl,eucu $
; Scenario: & loadfctr = .4(.025)1 ; Plot(ci) $
E-30
E-31
E-32
; x = one,lq,lq^2,lpmpp,lpfpp,lpepp,lplpp,lpkpp $
; z = loadfctr,logstage,points $
; Cost ; Lhs = lcp ; Rhs = x,z $
; Effects : lq ; summary $
Note that the specification will correctly account for the fact that the square of LogQ appears in the
cost function when it computes the partial effects.
The Rnk function sorts the data for you and creates the ranking variable. The observation with the
highest value gets the rank of one. The lowest gets a rank of n. Note, tied observations do not get the
same rank. Tied observations are ranked in the order in which they appear in the data. For example, in
a sample of 100, if 10 observations are tied for third place, they will receive ranks 3 through 12.
Two CALC functions provide descriptive measures for ranks. For two sets of ranks, the
Spearman rank correlation coefficient is computed as
= 1 - 6 idi2 /n(n2 - 1),
di= variable1i - variable2i
E-33
; List ; Rkc(variable1,variable2) $
The rank correlation is a correlation coefficient, so it has a natural range of measurement. (See the
application below.) For more than two sets of ranks, a useful statistic is Kendalls coefficient of
concordance,
W = 12 i1 (Si - S )2/[nK2(n2 - 1)]
n
Si = krankk,i.
where
; List ; Cnc(ranks1,...,ranksK) $
The concordance coefficient is not a correlation coefficient, so its magnitude is ambiguous. It can be
used for a large sample test of discordance. Under the null hypothesis that the sets of ranks are
independent, the statistic has a large sample chi squared distribution. In particular,
K(n-1)W 2[K(n-1)].
To illustrate these computations, we have analyzed the WHO data described in Section
E62.4.2. We have fit identical stochastic frontier models for the two attainment variables, lcomp, the
log of the composite measure, and ldale, the log of disability adjusted life expectancy. We then
computed the ranks for the 191 countries and plotted the ranks for the two measures as well as the
raw efficiency measures. The simple correlation for the efficiency measures and the rank correlation
for the ranks are displayed. The commands are as follows:
NAMELIST
NAMELIST
FRONTIER
FRONTIER
CREATE
CREATE
PLOT
PLOT
CALC
CALC
; x = one,logebar,loghbar,loghbar2 $
; z = gini,lpopden,lgdpc,geff,voice,oecd,lpubthe,tropics $
; Lhs = logdbar ; Rhs = x,z
; Eff = udale ; Techeff = edale $
; Lhs = logcbar ; Rhs = x,z
; Eff = ucomp ; Techeff = ecomp $
; dalerank = 192 - Rnk(edale) $
; comprank = 192 - Rnk(ecomp) $
; Lhs = dalerank ; Rhs = comprank
; Endpoints = 0,200 ; Limits = 0,200
; Title = Ranks of Efficiencies: DALE vs. COMP $
; Lhs = edale ; Rhs = ecomp ; Endpoints = .8,1 ; Grid
; Title = Efficiencies: DALE vs. COMP $
; List ; Rkc(dalerank,comprank) $
; List ; Cor(edale,ecomp) $
E-34
.6353076
.6062125
E-35
E-36
E-37
E-38
E62.9.1 Application
We have reestimated the airlines cost frontier with the semiparametric estimator. The
frontier functions differ noticeably, primarily in the parameter estimates that are statistically
insignificant. The kernel estimators suggest, however, that the difference in the estimates of
inefficiency are quite modest. The descriptive statistics suggest the same pattern. The final plot
shows more graphically how the nonparametric function has changed the estimates. The fact that
most of the estimates from the nonparametric estimator lie below the 45 degree line is consistent
with the appearance that generally, they are smaller than the parametric values. The last set of
results are the ordinary (Pearson) correlation and Kendalls tau.
FRONTIER
FRONTIER
KERNEL
DSTAT
PLOT
CALC
E-39
.8690148
.6339461
2 scalar results
E-40
E-41
= /(1+2)1/2,
s2
= q2 / (1 (2/)a2), then s
= as 2 /
ei
= residuali - m.
These residuals and s are used to compute logLi and the derivative with respect to . This estimation
step provides the estimator of that we need to compute the efficiencies. After estimation of ,
computation of the JLMS estimates of inefficiency is done the same as in the parametric form of the
model, using the LOWESS residuals.
P exp(ui )uiP 1
, ui 0, P 0, 0.
( P )
This model is more flexible than the half normal or exponential model in that with two parameters, it
allows the both the shape and location to vary independently. (The truncation model does likewise,
but it is considerably more difficult to estimate.) To specify the gamma model, use
; Model = Gamma (or ; Model = G)
The normal-gamma model is estimated by the method of simulated maximum likelihood.
(See Greene (2000b) and the details in Section E62.10.2.) The counterpart to the JLMS estimator of
the inefficiency, E[u|] must also be estimated by simulation.
E-42
; Lhs
; Pts
; Lhs
; Rhs
; Title
We note by the Wald and likelihood ratio tests, we cannot reject the hypothesis of the exponential
model (P is close to one). The similarity of the kernel density estimators is consistent with this finding.
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LC
Log likelihood function
159.94270
Estimation based on N =
256, K = 11
Inf.Cr.AIC =
-297.9 AIC/N =
-1.164
Model estimated: Aug 22, 2011, 22:09:16
Normal-Gamma frontier model
Variances: Sigma-squared(v)=
.01169
Sigma-squared(u)=
.00547
Sigma(v)
=
.10814
Sigma(u)
=
.07399
Stochastic Cost Frontier Model, e = v+u
Half Normal:u(i)=|U(i)|; frontier model
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0
157.91523
Chi-sq=2*[LogL(SF)-LogL(LS)] =
4.055
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LC| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Deterministic Component of Stochastic Frontier Model
Constant|
22.9007
27.13658
.84 .3987
-30.2860
76.0874
LY|
.96086***
.02028
47.38 .0000
.92112
1.00061
LY2|
.09283***
.01327
7.00 .0000
.06682
.11883
LPKP|
1.67283
2.12387
.79 .4309
-2.48987
5.83553
LPLP|
-.01112
.06724
-.17 .8687
-.14290
.12066
LPMP|
-.07676
1.37564
-.06 .9555
-2.77297
2.61944
LPEP|
-.63376
.68533
-.92 .3551
-1.97698
.70946
LPFP|
-.06405***
.02311
-2.77 .0056
-.10934
-.01876
|Variance parameters for compound error
Theta|
12.4180**
5.05037
2.46 .0139
2.5194
22.3165
P|
.84426
.69128
1.22 .2220
-.51062
2.19913
Sigmav|
.10814***
.01148
9.42 .0000
.08563
.13064
--------+--------------------------------------------------------------------
Figure E62.13 Kernel Density Estimates for Gamma and Exponential Inefficiencies
E-43
E-44
h(r,i) =
z r 1/ v ( z i ) / v dz
1/ v ( z i ) / v dz
, i = -i - v2.
The normal-exponential model results if P = 1. Computation of the function h(r,i) is the obstacle to
estimation. Beckers and Hammond (1987) derived a closed form expression, but the result has never
been operationalized it is complex in the extreme. Greene (1990) attempted estimation by using a
crude approximation with Simpsons rule, but failed to obtain reasonable results. (See Ritter and
Simar (1997).)
A satisfactory solution is produced by the technique of maximum simulated likelihood. The
integral and its derivatives can be estimated consistently by Monte Carlo simulation. The crucial
result is that h(r,i) is the expectation of a random variable;
h(r,i) = E[zr | z 0]
where
~ N[i, v2]
= -i- v2
Therefore, h(r,i) is the expected value of zr where z has a truncated at zero normal distribution.
Thus, we estimate h(r,i) by using the mean of a sample of draws from this distribution. For given
values of i and i (i.e., yi, xi, , v, , r), h(r,i) is consistently estimated by
1 Q
hi q 1 ziqr
Q
where ziq is a random draw from the truncated normal distribution with mean parameter i and
variance parameter v. This produces the simulated log likelihood function
Log LS = Log L(exponential)
+ n[(P-1)log - log(P)] + i log h (P-1,i)
which for a given set of draws is a smooth and continuous function of the parameters.
E-45
Random draws from the truncated distribution are obtained using Gewekes method as
follows: Let
L = truncation point = 0 for this application
PL = [(L - ) / ]
Then,
Collecting all terms, then, this produces the simulated log likelihood function:
= n{log + v22} + i{di + log[-(di/v + v)]}
Log L
+ n[(P-1)log - log(P)]
1
+ i log
Q
i
q1
Q
i
1
i v Fiq (1 Fiq )
P 1
= yi - xi
i = -i- v2
and Fiq is a fixed set of Q draws from U[0,1] specific to the individual. Derivatives of h(r,i) and log
h(r,i) are also estimated by simulation. The JLMS efficiency measure has the simple form
E[u|] = h(P,i) / h(P-1,i).
The final consideration is the method of obtaining the draws. The default method is to use
the random number generators. Since this is a very computation intensive model, it is usually more
efficient to use Halton draws you can use many fewer Halton draws than random draws to obtain
the same quality results. Halton draws are discussed in Section R24.7. To use Halton draws with
this estimator, add
; Halton
to the command. The number of points for either method is specified with
; Pts = the desired number of draws
We have used this feature in the example in the previous section.
E-46
= z + w, d = 1(d* > 0)
= x + v - u
Thus, the selection operates through the heterogeneity component of the production model, not the
inefficiency. (Thus, observation is not viewed as a function of the level of inefficiency.)
The model is fit by maximum simulated likelihood. To request it, use LIMDEPs usual
format for sample selection models,
PROBIT
FRONTIER
The model must be the base case, half normal, with no panel data application, no truncation, or
heteroscedasticity, etc. You may control the simulations with ; Halton and ; Pts for the simulation.
Efficiency and inefficiency estimates are saved as with other models with ; Eff and ; Techeff.
However, observations in the nonselected part of the sample are given missing values (-999) for any
of these computations. The PARTIALS and SIMULATE commands do not inherit the selection
model these commands are not available after fitting this model.
E62.11.1 Application
The following creates a data set that conforms exactly to the assumptions of the model.
CALC
SAMPLE
CREATE
CREATE
CREATE
CREATE
CREATE
CREATE
PROBIT
FRONTIER
; Ran(123457) $
; 1-2000 $
; z1 = Rnn(0,1) ; z2 = Rnn(0,1) $
; v1 = Rnn(0,1) ; v2 = Rnn(0,1) $
; e1 = v1 ; e2 = .7071 * (v1+v2) $
; ds = z1 + z2 + e1 ; d = ds > 0 $
; u = Abs(Rnn(0,1)) ; x1 = Rnn(0,1) ; x2 = Rnn(0,1) $
; y = x1 + x2 + e2 - u $
; Lhs = d ; Rhs = one,z1,z2 ; Hold $
; Lhs = y ; Rhs = one,x1,x2 ; Selection $
E-47
E-48
= x + vv - u u
(Note for convenience later, we have moved the scale parameters into the structural model.) To set
up the estimator, we now write w in its conditional on v form,
w|v = v + h where h ~ N[0, (1 - 2)] and h is independent of v.
Therefore,
Then,
z v
For the selected observations, d = 1, conditioned on v, the joint density for y and d is the product of
the marginals since conditioned on v, y and d are independent;
f(y, d = 1|x,z,v) = f(y|x,v) Prob(d = 1|z,v).
We have the second part above. For the first part,
y|x,v = (x + vv ) - uu
where u is the truncation at zero of a standard normal variable, so f(u) = 2(u), u>0. The Jacobian of
the transformation from u to y is 1/u, so by the change of variable, the conditional density is
f ( y | x, v)
2 (x v v) y
,(x v v) y 0.
u
u
2
u
(x v v) y z v
.
2
u
E-49
To obtain the unconditional density, it is necessary to integrate v out of the conditional density.
Thus,
2 v v ( y x)) z v
f (v)dv .
f ( y, d 1| x, z)
v
2
u
u
1
The relevant term in the log likelihood is log f(y,d=1|x,z). For the nonselected observations, the
contribution to the log likelihood is the log of the unconditional probability of nonselection, which is
Prob(d = 0|z) =
z v
f (v)dv .
1 2
The integrals do not exist in closed form, so these terms cannot be evaluated as is. Before
proceeding, we note the additional complication, x + vv - y = uu> 0, so the density f(v) is not the
standard normal that intuition might suggest; it is a truncated normal.
The integrals can be computed by simulation. By construction,
2 x + v y) z v
2 x + v v y) z v
v
f (v)dv Ev
2
2
u
u
u
u
1
1
so by sampling from the distribution of v, we can compute the function of v and average to obtain the
integrals. In order to sample the draws on v, we note the implied truncation,
v> (y - x)/v or v>/v.
Draws from the truncated normal can be obtained using result (E-1) in Greene (2011). Let A equal a
draw from the uniform (0,1) population. The desired draw from the truncated normal distribution
will be
vr = -1 [(/v) + Ar(-/v)].
Collecting all terms, then, the simulated log likelihood will be
log LS i log
1 R
R r 1
2 x + v y ) z v
v ir
ir
di
2
u
u
z - v
ir
+(1- di )
1 2
where the draws on vir are as shown above. Derivatives of this simulated log likelihood are obtained
numerically using finite differences.
E-50
Heteroscedasticity in v and/or u
Truncated normal with nonzero, heterogeneous mean in the underlying U
Heterogeneity in the parameter of the exponential or gamma distribution
Amsler et al.s scaling model
E-51
The models of scale heterogeneity may extend either variance parameter with the
specification of the variance functions
Var[U|zi] = ui2 = u2 exp(zi)
(heteroscedastic)
Var[v|zi] = v2 = v2 exp(wi)
(heteroscedastic)
(doubly heteroscedastic)
There is no requirement that the same variables enter the two functions, and either or both may be
heterogeneous. The model specification is
; Heteroscedasticity or ; Het
and either or both of
; Hfv = variables in the variance of v
; Hfu = variables in the variance of u
If either variance is not given, it is assumed to be constant. The variance function is the exponential
format used throughout LIMDEP If either variance is unspecified, the implied model is ji2 = exp(
or ) which is the same as
; Hfv = one or ; Hfu = one
If both are unspecified, then the implied model
; Het ; Hfv = one ; Hfu = one
is the default, normal-half normal stochastic frontier model. It provides identical estimates. (Try it.)
A constant (one) is automatically inserted into both lists if you do not include it. This form may be
used with the normal-half normal and normal-truncated normal models.
E-52
The list should not contain a constant term, one. This may be used in all implementations of the
exponential gamma model. Note, however, that in the panel data settings, the parameter is assumed
to be time invariant. The values for zi are taken from the data record for the last period for firm i.
We will return to this subject below. The symmetric component, v, may also be heteroscedastic, as
in the other models, with
; Hfv = list of variables.
( w)
E [u | ]
w , v u , w =S/
2
1 1 ( w)
E [u | ] v
w , w = (S/v + v)
1
(
w
)
for the exponential models. These functions are evaluated for each observation at
i = u,i / v,i
i2 = u,i2 + v,i2
and
for the half normal model and v,i and i likewise in the exponential and gamma models.
E63.2.4 Application
The estimates below show a production frontier based on the six inputs. The second set of
results presents the heteroscedastic model, with the variance of v a function of the log of the average
stage length and the variance of u depending on the load factor and the log of the number of points
served. We examine the efficiency results, then compute the average partial effects of the
environmental variables on technical efficiency.
FRONTIER
FRONTIER
PARTIALS
KERNEL
PLOT
E-53
E-54
The figure below displays the kernel density estimators for the two sets of estimated
inefficiencies. The upper one is for the heteroscedastic model. The figure shows clearly the
influence of the heterogeneity. The means of the two distributions are virtually the same, but the
variance in the heteroscedastic model is considerably higher.
E-55
Figure E63.1 Kernel Estimators for Density of E[u|] with and without Heteroscedasticity
E-56
2
2
ui
ui
= ui / vi
ui2
= exp(zi)
vi
= exp(wi),
where S = +1 for a production frontier and -1 for a cost frontier. Likewise, for the truncation model,
log Li = - log2 -logi - [(Si + )/i]2
+ log[(/i - Sii)/i] - log(/u.i ).
We build the structure of the model with two freely varying variance parameters, u,i and v,i, rather
than the reduced form parameters and . The use of i as a free parameter would not be
appropriate because the numerator and denominator of i must be allowed to vary freely and
independently. A like consideration rules out the composed parameter i. The formulation of the
log likelihood and its derivatives follows the results given earlier for the homogeneous cases. Where
the derivatives with respect to and emerge, we use the chain rule to differentiate with respect to
u,i and v,i first. Note that the independent parameter u and v have been absorbed into the
exponential functions. Thus, v is exp(0). This ensures that the variances are always positive.
The normal-gamma and normal-exponential models are not reparameterized. The log
likelihood for the exponential model with variance heterogeneity is
log Li = logi+ i2i,v2 + iSi+ log[-Si/i,v - ii,v]
where
= exp(-zi)
and
i,v
= v exp(wi).
The sign change in i is used to make the normal-exponential model comparable to the normal-half
normal model, since Var[ui] = 1/i2.
E-57
The specification of the cost frontier and the estimator of technical inefficiency are requested in the
same fashion,
; Cost
and
; Eff = variable name
Other optional parts of the command are the same as that for the normal-half normal model.
We note, this model is extremely volatile, owing to the rather weak identification of the
parameter . It is difficult to distinguish the mean from the variance parameter in this model. In the
truncation model,
E[ui] = + u(/u)/(/u).
This implies that u and can covary so as to produce little or no variation in the expectation of ui.
The likelihood is not a function of the square of ui, so this mean is the only source of information
about these two parameters. (By totally differentiating the expected value, one can solve for the
implicit relationship, d/du that produces dE[ui] = 0.) The example below suggests how this aspect
of the model influences (or fails to) the estimates of inefficiency. For purposes of the JLMS
estimator for the half normal model, when the mean of U is a nonzero , the argument to the
function is replaced with
w = S/ - /().
The remaining part of the computation is the same.
E-58
E63.3.1 Application
The results below show estimates of a stochastic cost frontier with the half normal then the
truncated normal specifications. The additional parameterization appears to have had a large impact
on the results; the estimates are noticeably different. The plot of the two sets of inefficiency
estimates suggest that the effect of the new specification has been little more than to double the
estimated values from the model the dashed line in the figure shows the function uTN = 2uHN. The
extremely large estimates of and the standard error do suggest that something is amiss with the
model, however.
The commands are:
FRONTIER
FRONTIER
PLOT
DSTAT
E-59
E-60
E-61
E-62
) + log( - di/).
The function is then maximized with respect to , , and . After optimization, the structural
parameter is recovered from the result = . For the model with heterogeneity in the mean
presented in Section E63.3.4,
i = zi
we simply replace with i= zi, then recover the parameter vector from the same transformation
as before, = .
For purposes of the JLMS estimator for the half normal model, when the mean of U is a
nonzero , the argument to the function is replaced with
w = S/ - /().
The remaining part of the computation is the same.
E-63
and
and/or
; All $
; loadfctr = 0 $
; i = Seq(firm) $
; Expand(i,0) $
; lk = Log(k) $
; xp = one,lf,lm,le,ll,lp,lk $
; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_ $
; Lhs = lq ; Rhs = xp ; Model = T ; Rh2 = loadfctr,_i_
; Het ; Hfv = lstage $
E-64
E-65
E-66
Note in this case, Rh2 and Hfu give the same list. To obtain the scaling model without forcing the
equality of and , use
FRONTIER
Note, ; Model = Scaling in the equality constrained case and ; Model = S when the equality
constraint is relaxed. (In this formulation, the variable lists could differ.) To constrain = 0, which
just produces the heteroscedasticity model, use
FRONTIER
E-67
To constrain = 0, you would use the available setup for the truncated normal form, but ; Model = S
rather than ; Model = T to obtain the exponential scaling of the mean.
FRONTIER
Finally, with both = 0 and = 0, this is just the standard normal-truncated normal model.
Technical Details
The implementation of the scaling model in LIMDEP is just a version of the truncation
model with heteroscedasticity. The modifications of that model are:
The constant terms in the mean and variance are enforced by the program.
The mean function is exponential.
In the first form of the model, a constraint is imposed that the coefficients in the mean and
variance functions are the same.
As Alvarez et al. note in their paper, this model is not supported by any particular theory of the
frontier framework. They suggest it as a natural extension of the familiar model with truncation.
Rather, they argue that the unnatural form of the model would be the one with different scaling
factors in the mean and variance functions.
Application
To illustrate the scaling model, we use the airlines cost data. The cost function is fit with
truncation mean and variance functions that depend on the load factor and (log of) the average stage
length. The equality constraint is imposed in the first model and relaxed in the second.
FRONTIER
FRONTIER
E-68
E-69
E-70
The panel models developed here will share features with other panel models in LIMDEP, as
presented in Chapters R22-R25. As in other settings, panels in all models may be unbalanced. Panels
are identified by
SETPANEL ; $
then
in the command, or
; Panel
; Pds = group count
E-71
Nearly all of the models to be presented here actually require panel data, but a few will work, albeit
not as well as otherwise, with ; Pds = 1, i.e., with a cross section. This will be specifically noted
below when it is the case. Second, in all models, the cost form as opposed to the production form is
requested with
; Cost
This and other model specifications are generally the same as the cross sectional cases.
Pitt and Lee is the default panel data model. The only necessary change for the default case is
specification of the panel with ; Panel. As in the cross section case, the normal-exponential case is
requested with
; Model = Exponential
while the normal-truncated normal is requested with
; Rh2 = one or ; Rh2 = one, additional variables
(The ; Model = T is not needed.) The truncation model may not be combined with the exponential
specification; it is only supported for the normal-truncated normal form.
NOTE: The gamma model does not have a random effects (panel data) version. The model
extensions, such as the scaling model and sample selection described in Chapter E63 likewise do not
support a Pitt and Lee style random effects version.
There is an important consideration for the truncation version with heterogeneous mean. If
you are fitting a panel data version of this model, note that the assumption underlying the model is
that the same ui occurs in every period. Therefore, the zi must be the same in every period.
LIMDEP will assume this is the case, and only use the Rh2 variables provided for the first period.
E-72
When the random effects model is estimated, maximum likelihood estimates of the cross
section models are always computed first to obtain the starting values. This will produce a full set of
results which will ignore the panel nature of the data set. A second full set of results will then follow
for the random effects model.
The model estimates retained for all cases are
b
= regression parameters, ,
varb = asymptotic covariance matrix.
Use ; Par to retain the additional parameters in b and varb. As seen in the applications below, the
parameters estimated in each case will differ depending on the model formulation. The ancillary
parameters that are estimated for the various models are the same ones saved by the cross section
versions. All models save sy, ybar, nreg, kreg, and logl as well as s, b, varb, etc.
WARNING: Numerous experiments and applications have suggested that the normal-truncated
normal model is a difficult one to estimate. Identification appears to be highly variable, and small
variations in the data can produce large variation in the results. The model often fails to converge
even when convergence of the restricted model with zero underlying mean is routine.
E-73
E64.3.2 Applications
The following illustrates a few of the numerous formats of the random effects frontiers. The
data set used is the Swiss railroad data used in Greene (2011, Table F19.1). These data are provided
with the program as swissrailroads.lpj. The variables used here are
ct
pk
pe
pl
q2
q3
rack
tunnel
virage
narrow_t
= total cost
= capital price
= electricity price
= labor price
= passenger output passenger km
= freight output ton km
= dummy variable for rack rail in network
= dummy variable for network with tunnels over 300 meters on average
= dummy variable for networks with narrow radius curvature
= dummy variable for narrow track (1m as opposed to standard 1.435m).
Preparing the data set includes bypassing one firm for which there is only a single year of data. For
the remaining 49 firms, Ti is a mixture 3, 7, 10, 12 or 13. Figure E64.1 details the distribution of
group sizes.
Descriptive statistics for the data are shown below. Variables with names beginning with M are
firm means, repeated for each year for the firm.
We fit four models to illustrate the estimator, the pooled normal-half normal, pooled normaltruncated (heterogeneous), basic Pitt and Lee and a full model with time invariant inefficiency,
truncation (heterogeneous) and double heteroscedasticity.
; Group = id ; Pds = ti $
; ti = 1 $
; lple = Log(pl/pe) ; lpke = Log(pk/pe) ; lnc = Log(ct/pe)$
; x = one,lnq2,lnq3,lple,lpke $
; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
; Lhs = lnc ; Cost ; Rhs = x $
; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = eusfp_l $
; Lhs = lnc ; Cost ; Rhs = x ; Rh2 = rack,tunnel
; Het ; Hfu = virage ; Hfv = virage ; Costeff = eushet_t $
; Lhs = lnc ; Cost ; Rhs = x ; panel ; Rh2 = rack,tunnel
; Het ; Hfu = virage ; Hfv = virage ; Costeff = fullmodl $
--------+--------------------------------------------------------------------Variable|
Mean
Std.Dev.
Minimum
Maximum
Cases Missing
--------+--------------------------------------------------------------------ID|
25.48760
14.60037
1.0
51.0
605
0
YEAR|
90.91570
3.692372
85.0
97.0
605
0
NI|
12.58347
1.305259
1.0
13.0
605
0
STOPS|
20.42479
18.48285
4.0
121.0
605
0
NETWORK|
39431.66
56642.38
3898.0
376997.0
605
0
LABOREXP|
12801.95
26232.69
951.0
173549.0
605
0
STAFF|
170.3810
333.0317
11.0
1934.0
605
0
ELECEXP|
968.1521
1944.830
14.0
14737.0
605
0
KWH|
7602.221
15608.39
82.0
104923.0
605
0
TOTCOST|
22470.44
42283.57
1534.0
280871.0
605
0
NARROW_T|
.676033
.468375
0.0
1.0
605
0
RACK|
.234711
.424169
0.0
1.0
605
0
TUNNEL|
.188430
.391379
0.0
1.0
605
0
T|
5.915702
3.692372
0.0
12.0
605
0
Q1|
813914.0
1083923
61000.0
6409000
605
0
Q2| .308145D+08 .550599D+08
409000.0 .311000D+09
605
0
Q3| .101934D+08 .527303D+08
150.0 .477000D+09
605
0
CT|
26728.37
49883.51
2120.968
307433.4
605
0
PL|
86051.77
6484.535
60932.91
104930.4
605
0
PE|
.157485
.022766
.076344
.265182
605
0
PK|
4534.491
2128.307
1040.323
14466.06
605
0
VIRAGE|
.715702
.451452
0.0
1.0
605
0
LABOR|
52.40245
9.598136
20.03025
73.11581
605
0
ELEC|
4.044504
1.422098
.568412
9.311660
605
0
CAPITAL|
43.55305
9.461303
23.88916
77.33154
605
0
LNCT|
11.30622
1.101691
9.462956
14.57019
605
0
LNQ1|
13.06322
1.010039
11.01863
15.67321
605
0
LNQ2|
16.31759
1.339167
12.92147
19.55500
605
0
LNQ3|
12.49439
2.716709
5.010635
19.98343
605
0
LNNET|
3.200860
.908512
1.360464
5.932237
605
0
LNPL|
13.21935
.163565
12.60449
13.77599
605
0
LNPE|
-1.859557
.152870
-2.572503
-1.327338
605
0
LNPK|
10.17950
.438886
8.740266
11.37466
605
0
E-74
E-75
E-76
This is the original Pitt and Lee normal-half normal model with time invariant inefficiency.
In comparison to the pooled model above, u has tripled and v has decreased by two thirds. The
assumption of time invariance of the inefficiency produces a large reallocation of the random
components between noise and inefficiency. This is evident in the kernel estimate below as well.
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LNC
Log likelihood function
527.11659
Estimation based on N =
604, K =
7
Inf.Cr.AIC = -1040.2 AIC/N =
-1.722
Stochastic frontier based on panel data
Estimation based on
49 individuals
Variances: Sigma-squared(v)=
.00621
Sigma-squared(u)=
.92297
Sigma(v)
=
.07879
Sigma(u)
=
.96071
Sigma = Sqr[(s^2(u)+s^2(v)]=
.96394
Gamma = sigma(u)^2/sigma^2 =
.99332
Var[u]/{Var[u]+Var[v]}
=
.98183
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0
-210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1475.140
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LNC| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Deterministic Component of Stochastic Frontier Model
Constant|
-7.25643***
.24767
-29.30 .0000
-7.74185 -6.77101
LNQ2|
.36259***
.01503
24.12 .0000
.33312
.39205
LNQ3|
.01902***
.00240
7.94 .0000
.01432
.02372
LPLE|
.64148***
.02112
30.38 .0000
.60009
.68287
LPKE|
.30842***
.00700
44.08 .0000
.29471
.32214
|Variance parameters for compound error
Lambda|
12.1932**
5.55909
2.19 .0283
1.2975
23.0888
Sigma(u)|
.96071***
.13303
7.22 .0000
.69998
1.22145
--------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E-77
E-78
This is the same model as immediately above, with the additional assumption that the
inefficiency is time invariant. Compared to the previous specification, u has now increased by a
factor of 30 while v has nearly vanished, falling from 0.27 to 0.005, that is, by a factor of 50.
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LNC
Log likelihood function
532.94237
Estimation based on N =
604, K = 11
Inf.Cr.AIC = -1043.9 AIC/N =
-1.728
Variances: Sigma-squared(v)=
.00003
Sigma-squared(u)=
.76238
Sigma(u)
=
.87314
Sigma(v)
=
.00543
Sigma = Sqr[(s^2(u)+s^2(v)]=
.87316
Variances averaged over observations
Stochastic frontier based on panel data
Estimation based on
49 individuals
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 1
Deg. freedom for truncation mean:
2
Deg. freedom for inefficiency model: 4
LogL when sigma(u)=0
-210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1486.792
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LNC| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Deterministic Component of Stochastic Frontier Model
Constant|
-7.26117***
.25317
-28.68 .0000
-7.75738 -6.76496
LNQ2|
.36162***
.01558
23.20 .0000
.33107
.39216
LNQ3|
.01947***
.00257
7.58 .0000
.01444
.02451
LPLE|
.64342***
.02165
29.72 .0000
.60099
.68584
LPKE|
.30730***
.00727
42.24 .0000
.29305
.32156
|Mean of underlying truncated distribution
RACK|
.81356
.52427
1.55 .1207
-.21399
1.84112
TUNNEL|
1.46353***
.47072
3.11 .0019
.54094
2.38613
|Scale parms. for random components of e(i)
ln_sgmaU|
-.17921
.21781
-.82 .4106
-.60611
.24769
ln_sgmaV|
-4.94678***
.20426
-24.22 .0000
-5.34711 -4.54644
|Heteroscedasticity in variance of truncated u(i)
VIRAGE|
.06076
.04703
1.29 .1964
-.03142
.15294
|Heteroscedasticity in variance of symmetric v(i)
VIRAGE|
-.37544
.44206
-.85 .3957
-1.24185
.49097
--------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E-79
The kernel estimator compares the estimated cost efficiency distributions for the pooled and
basic Pitt and Lee model. The pattern suggested earlier is clearly evident. The same comparison
appears for the truncated normal/heteroscedasticity models. (The estimated cost efficiency results
for the basic Pitt and Lee model and the expanded one are the same to three or four digits.) The
partial listing below shows the estimates for the four models, noting the time invariance of the Pitt
and Lee estimates.
E-80
= u2 / v2
= i/u
Ai
= 1 + Ti
hi
= (1 / Ti )
Then, the contribution of individual i to the log likelihood function for the normal-half normal model
is
log Li = (Ti/2)log 2Ti logu log Ai (Ti/2) log
( / u2)
t 1 it2
Ti
= (v/Ti + d i /v)
t 1 it2
Ti
+ Ti hi2 + log(hi Ti
The Jondrow estimator, as formulated in Battese and Coelli (1988) in as follows: Let
i
= 1 / (1 + 2Ti),
i2
= u2i,
Ei
= i + (1 - i)( i ),
and
= (1/Ti)tit.
Then,
Ti ( i v2/Ti).
E-81
ui = max(ai) - ai > 0.
(To change this to a cost frontier, change ui to [ai - min(ai)] This bears resemblance to a stochastic
frontier model, though in fact, it is a deterministic frontier model. The signature feature is that ui
equals zero for the most efficient firm in the sample. A natural interpretation of this is that what
we measure with the model is not the absolute inefficiency, but inefficiency of firm i relative to the
other firms in the sample. From the modelers point of view, this approach has several substantive
advantages and disadvantages: The main advantage is
As illustrated in the results below, this approach tends to produce very large estimates of ui.
The invariance assumption about ui has been criticized elsewhere. Attempts to relax this assumption
are a recurrent theme in the literature, including the Battese and Coelli and true fixed and random
effects approaches described later. Other early work on the model suggested direct manipulation of
the fixed effects, for example,
it = i0 + i1t + i2t2.
Other more recent research (Han, Orea and Schmidt (2005)) has proposed factor analytic forms for
it. The sections to follow will include several of these different approaches.
E-82
Application
This Cornwell, Schmidt and Sickles (CSS) approach requires only a linear fixed effects
regression and a few instructions to manipulate the fixed effects. The following analyzes the airline
data with this approach. The following computes the CSS estimates and compares them to the
unstructured pooled estimates (using the normal-half normal model from Chapter E62) and the Pitt
and Lee model introduced above. The commands for the analysis are as follows:
SAMPLE
; All $
CREATE
; Railroad = id $
CREATE
; If(railroad > 20)railroad = railroad - 1 $ (There is a gap in the data)
HISTOGRAM ; Rhs = railroad
; Title = Number of Observations for Firms in Swiss Railroad Sample $
SETPANEL ; Group = id ; Pds = ti $
REJECT
; ti = 1 $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
CREATE
; pooled = Group Mean(eusfpool, Pds = ti) $
FRONTIER ; Lhs = lnc ; Cost ; Rhs = x ; Panel ; Costeff = pittlee $
REGRESS
; Lhs = lnc ; Rhs = x ; Panel ; Fixed Effects $
CREATE
; ai = alphafe(railroad) $
CALC
; minai = Min(ai) $
CREATE
; css = Exp((minai - ai)) $
CREATE
; Period = Ndx(id,1) $
REJECT
; period#1 $
PLOT
; Lhs = railroad ; Rhs = pooled,css ; Grid ; Fill ; Limits = 0,1
; Vaxis = Estimated Cost Efficiency
; Title = Half Normal vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $
PLOT
; Lhs = railroad ; Rhs = css,pittlee ; Grid ; Fill ; Limits = 0,1
; Vaxis = Estimated Cost Efficiency
; Title = Pitt and Lee RE vs. Cornwell, Schmidt, Sickles FE Cost Efficiencies $
The results below show the considerable differences in the parameter estimates produced by the
three models. Figure E64.4 demonstrates the expected quite large differences between the time
varying estimates (using the group means) and the time invariant results based on the CSS model.
Figure E64.5 also shows a striking, albeit commonly observed result the CSS and Pitt and Lee
estimates are virtually identical.
E-83
----------------------------------------------------------------------------LSDV
least squares with fixed effects ....
LHS=LNC
Mean
=
11.30305
Standard deviation
=
1.09984
No. of observations =
604 Degrees of freedom
Regression
Sum of Squares
=
726.000
52
Residual
Sum of Squares
=
3.41179
551
Total
Sum of Squares
=
729.412
603
Standard error of e =
.07869
Fit
R-squared
=
.99532 R-bar squared =
.99488
Model test
F[ 52,
551]
=
2254.77325 Prob F > F*
=
.00000
Diagnostic
Log likelihood
=
706.21504 Akaike I.C.
= -5.00084
Restricted (b=0)
=
-914.01557 Bayes I.C.
= -4.61443
Chi squared [ 52]
=
3240.46122 Prob C2 > C2* =
.00000
Estd. Autocorrelation of e(i,t)
=
.668792
-------------------------------------------------Panel:Groups Empty
0,
Valid data
49
Smallest
3,
Largest
13
Average group size in panel
12.33
Variances
Effects a(i)
Residuals e(i,t)
.423441
.006192
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LNC| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------LNQ2|
.29374***
.02850
10.31 .0000
.23789
.34959
LNQ3|
.01612***
.00543
2.97 .0030
.00547
.02676
LPLE|
.66452***
.03580
18.56 .0000
.59434
.73469
LPKE|
.31777***
.01863
17.05 .0000
.28125
.35430
--------+--------------------------------------------------------------------
(These are the estimated parameters in the estimated pooled stochastic frontier model.)
Constant|
-10.0907***
1.14284
-8.83 .0000
LNQ2|
.64179***
.01371
46.80 .0000
LNQ3|
.06855***
.00655
10.46 .0000
LPLE|
.53971***
.08858
6.09 .0000
LPKE|
.26045***
.03260
7.99 .0000
|Variance parameters for compound error
Lambda|
1.29697***
.13854
9.36 .0000
Sigma|
.44345***
.00056
789.05 .0000
-12.3306
.61491
.05570
.36610
.19655
-7.8507
.66867
.08139
.71333
.32435
1.02545
.44235
1.56850
.44455
(These are the estimated parameters in the estimated Pitt and Lee model.)
|Deterministic Component of Stochastic Frontier Model
Constant|
-7.25643***
.24767
-29.30 .0000
-7.74185
LNQ2|
.36259***
.01503
24.12 .0000
.33312
LNQ3|
.01902***
.00240
7.94 .0000
.01432
LPLE|
.64148***
.02112
30.38 .0000
.60009
LPKE|
.30842***
.00700
44.08 .0000
.29471
|Variance parameters for compound error
Lambda|
12.1932**
5.55909
2.19 .0283
1.2975
Sigma(u)|
.96071***
.13303
7.22 .0000
.69998
-6.77101
.39205
.02372
.68287
.32214
23.0888
1.22145
Figure E64.5 Estimated Inefficiencies from Cornwell et al. and Pitt and Lee Models
E-84
E-85
uit
Several formulations are available. In Battese and Coellis original formulation, the distribution was
half normal and the base specification was
g(zit) = exp[-(t T)]
where T is the number of periods in their balanced panel. (Here it would be Ti.) They also suggested
g(zit) = exp[-1(t T) + -2(t T)2].
The first (linear) form is taken to be the default case for this model. The second is not provided in
this package. The BC92 model is requested with
FRONTIER
E-86
E64.5.1 Application
To illustrate the Battese and Coelli models, we return to the railroad data used previously.
The base case is the pooled data stochastic cost frontier. This is followed by the Pitt and Lee model
and, finally, by the original Battese Coelli time decay model,
g(zit) = exp[-(t - Ti)].
The commands are
SAMPLE
REJECT
FRONTIER
FRONTIER
DSTAT
KERNEL
KERNEL
; All $
; ti = 1 $
; Lhs = lnc ; Cost ; Rhs = x ; Costeff = eusfpool $
; Lhs = lnc ; Cost ; Rhs = x ; Model = BC ; Panel ; Costeff = eucbc92 $
; Rhs = eucbc92,eusfpool $
; Rhs = eucbc92,eusfpool
; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pooled $
; Rhs = eucbc92,pittlee
; Title = Estimated Cost Efficiencies - Battese-Coelli 1992 vs. Pitt and Lee $
The kernel density estimators are used to compare the efficiency estimates from the pooled data
model to the Battese and Coelli model. The estimates of exp(-E[uit|i]) from the Battese and Coelli
model are far larger than those from the pooled model. The assumption of time invariance of the
random term is a major component of this model. The second kernel estimator below compares
Battese-Coelli to Pitt-Lee. The correspondence of the two results is striking, albeit to be expected
given the small estimated value of .
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LNC
Log likelihood function
-209.42340
Estimation based on N =
604, K =
7
Inf.Cr.AIC =
432.8 AIC/N =
.717
Variances: Sigma-squared(v)=
.07332
Sigma-squared(u)=
.12333
Sigma(v)
=
.27077
Sigma(u)
=
.35119
Sigma = Sqr[(s^2(u)+s^2(v)]=
.44345
Gamma = sigma(u)^2/sigma^2 =
.62716
Var[u]/{Var[u]+Var[v]}
=
.37937
Stochastic Cost Frontier Model, e = v+u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0
-210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] =
2.060
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
E-87
E-88
--------+--------------------------------------------------------------------Variable|
Mean
Std.Dev.
Minimum
Maximum
Cases Missing
--------+--------------------------------------------------------------------EUCBC92|
.514566
.231680
.085140
.982112
604
0
EUSFPOOL|
.760991
.095229
.478178
.906348
604
0
--------+---------------------------------------------------------------------
Figure E64.6 Kernel Density Estimates for Inefficiencies from Battese and Coelli Model
E-89
A2
1
i log i i log ( Ai )
2
2
u2 2v
it
u2 / 2
yit xit
0 or or w i
Ai
(1 )i S Tt i 1 git it
(1 ) 1 Tti 1 git2 1
Derivatives of this function are complicated in the extreme, and are omitted here. (Some useful
results for obtaining them are found in Battese and Coelli (1992, 1995).)
The Jondrow estimator of uit is
E[uit | i1,i2,...] = git E[ui | i1,i2,...]
(i / i )
= git i i
(i / i )
where
(1 )i Tt i 1 git ( S it )
(1 ) Tti 1 git2
2i
(1 )2
(1 ) Tti 1 git2
E-90
uit
The default form used earlier is g(zit) = exp[-(t Ti)]. You may also use a more general form,
g(zit) = exp(zit)
where zit contains any desired set of variables. For this extension, use
FRONTIER
As before, the truncated normal version of the model is also supported. For an example, we have
used
FRONTIER
The estimates of cost efficiency produced by this model are identical to those from the base model in
the previous section.
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LNC
Log likelihood function
529.63533
Stochastic frontier based on panel data
Estimation based on
49 individuals
Variances: Sigma-squared(v)=
.00615
Sigma-squared(u)=
.94808
Sigma(v)
=
.07840
Sigma(u)
=
.97369
Sigma = Sqr[(s^2(u)+s^2(v)]=
.97685
Gamma = sigma(u)^2/sigma^2 =
.99356
Var[u]/{Var[u]+Var[v]}
=
.98247
Stochastic Cost Frontier Model, e = v+u
Battese-Coelli Models: Time Varying uit
Time varying uit=exp[eta*z(i,t)]*|U(i)|
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 3
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 4
LogL when sigma(u)=0
-210.45352
Chi-sq=2*[LogL(SF)-LogL(LS)] = 1480.178
Kodde-Palm C*: 95%: 8.761, 99%: 12.483
E-91
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LNC| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Deterministic Component of Stochastic Frontier Model
Constant|
-6.89845***
.32923
-20.95 .0000
-7.54374 -6.25316
LNQ2|
.35751***
.01591
22.47 .0000
.32632
.38870
LNQ3|
.02149***
.00236
9.10 .0000
.01686
.02613
LPLE|
.61741***
.02430
25.40 .0000
.56977
.66504
LPKE|
.30892***
.00759
40.71 .0000
.29405
.32380
|Variance parameters for compound error
Lambda|
12.4202***
.01108 1120.76 .0000
12.3984
12.4419
Sigma(u)|
.97369***
.13513
7.21 .0000
.70884
1.23855
|Coefficients in u(i,t)=[exp{eta*z(i,t)}]*|U(i)|
RACK|
.00024
.01743
.01 .9889
-.03392
.03441
VIRAGE|
-.02096
.01321
-1.59 .1126
-.04685
.00493
TUNNEL|
.00219
.01625
.14 .8926
-.02966
.03405
--------+--------------------------------------------------------------------
= | N[0, u2] |.
E-92
This model (as are the others) is fit by maximum likelihood, not least squares. The normal-half
normal model is applied to the stochastic part of the model. Note that the inefficiency term in this
model is time varying. The heterogeneity may appear in Stevensons truncated normal model as
follows. This is a true fixed effects, normal-truncated normal model.
yit
ui
= | N[i, u2] |
= zi.
In this form, the heterogeneity is still retained in the production function part of the model. Another
possibility is to allow the heterogeneity to enter the mean of the inefficiency distribution rather than
the production function this seems the most natural of the three forms. In this case,
yit
uit
= | N[it , u2] |
it
= i + (nonzero) or zi.
The mean of the inefficiency distribution shifts in time, but also has a firm specific component.
Finally, the heterogeneity may be shifted to the variance of the inefficiency distribution. In this
form, we have
yit = xit + vit - uit,
uit
= | N[0, ui2] |
E-93
b
= estimate of
varb = asymptotic covariance matrix for estimate of .
alphafe = estimated fixed effects (if ; Par is in the command)
Scalars:
kreg
nreg
logl
Last Model:
b_variables
The model must be fit twice. The first model is a pooled data model which provides the starting values
for the second. The second command is identical to the first save for the addition of the panel data
specification. In order to set up the initial values correctly, it is essential that your initial model include
the constant term first in the Rhs list and that the second model specification be identical to the first.
Other options and specifications for the fixed effects models are the same as in other applications. (See
Chapter R23 for details.) The fixed effects command also contains the constant term, but this will be
removed by the command processor later. See the example below for the operation of the command.
NOTE: Starting values must be provided by the first estimator. The specification ; Start = list of
values is not available for this model. You must fit both models each time you fit an FEM. The
starting values are not retained after the FEM is estimated.
All fixed effects forms are estimated by maximum likelihood. You may also fit a two way
fixed effects model
yit = i+ t + xit + vit - ui, (change to v + u for a stochastic cost frontier),
ui = | N[0, u2] |
where t is an additional, time (period) specific effect. The time specific effect is requested by adding
; Time
to the command if the panel is balanced, and
; Time = variable name
if the panel is unbalanced.
E-94
For the unbalanced panel, we assume that overall, the sample observation period is
t = 1,2,..., Tmax and that the time variable gives for the specific group, the particular values of t that
apply to the observations. Thus, suppose your overall sample is five periods. The first group is three
observations, periods 1, 2, 4, while the second group is four observations, 2, 3, 4, 5. Then, your
panel specification would be
; Pds = Ti,
; Time = Pd,
and
E-95
This command recovers the estimated fixed effects from the Cornwell et al. model. then replicates
them for each year in the data set. This is used to create the plot of the two sets of estimates of u i
shown below.
----------------------------------------------------------------------------Limited Dependent Variable Model - FRONTIER
Dependent variable
LQ
Log likelihood function
108.43918
Estimation based on N =
256, K =
9
Inf.Cr.AIC =
-198.9 AIC/N =
-.777
Model estimated: Aug 17, 2011, 06:36:42
Variances: Sigma-squared(v)=
.01902
Sigma-squared(u)=
.01692
Sigma(v)
=
.13791
Sigma(u)
=
.13007
Sigma = Sqr[(s^2(u)+s^2(v)]=
.18957
Gamma = sigma(u)^2/sigma^2 =
.47074
Var[u]/{Var[u]+Var[v]}
=
.24425
Stochastic Production Frontier, e = v-u
LR test for inefficiency vs. OLS v only
Deg. freedom for sigma-squared(u):
1
Deg. freedom for heteroscedasticity: 0
Deg. freedom for truncation mean:
0
Deg. freedom for inefficiency model: 1
LogL when sigma(u)=0
108.07431
Chi-sq=2*[LogL(SF)-LogL(LS)] =
.730
Kodde-Palm C*: 95%: 2.706, 99%: 5.412
E-96
E-97
----------------------------------------------------------------------------LSDV
least squares with fixed effects ....
LHS=LQ
Mean
=
-1.11237
Standard deviation
=
1.29728
No. of observations =
256 Degrees of freedom
Regression
Sum of Squares
=
426.103
30
Residual
Sum of Squares
=
3.04876
225
Total
Sum of Squares
=
429.152
255
Standard error of e =
.11640
Fit
R-squared
=
.99290 R-bar squared =
.99195
Model test
F[ 30,
225]
=
1048.21999 Prob F > F*
=
.00000
Diagnostic
Log likelihood
=
203.84835 Akaike I.C.
= -4.18825
Restricted (b=0)
=
-429.37729 Bayes I.C.
= -3.75896
Chi squared [ 30]
=
1266.45126 Prob C2 > C2* =
.00000
Estd. Autocorrelation of e(i,t)
=
.575211
-------------------------------------------------Panel:Groups Empty
0,
Valid data
25
Smallest
2,
Largest
15
Average group size in panel
10.24
Variances
Effects a(i)
Residuals e(i,t)
.030410
.013550
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LQ| Coefficient
Error
t
|t|>T*
Interval
--------+-------------------------------------------------------------------LF|
.14860
.09677
1.54 .1259
-.04107
.33828
LM|
.80497***
.07843
10.26 .0000
.65125
.95868
LE|
.68672
.67075
1.02 .3069
-.62792
2.00136
LL|
-.15977
.11829
-1.35 .1780
-.39162
.07208
LP|
.16227
.09973
1.63 .1050
-.03320
.35774
LK|
-.37897
.74689
-.51 .6123
-1.84284
1.08490
--------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
Figure E64.8 plots the Jondrow estimates of exp(-E[uit|it]) from the true fixed effects model
and the estimates of ui from the Cornwell, Schmidt and Sickles model of Section E64.4 for each
firm. Since the true FE estimates vary by period, we have plotted the group means. The implication
of the regression based model is clear in the figure. The estimates of technical efficiency from the
true FEM are generally considerably larger than those from the deterministic model.
E-98
use
FRONTIER
FRONTIER
yit
uit
= | N[it , u2] |
it
= (nonzero) or zit,
The Rh2 is optional in the first equation if you have only a constant term in the mean of the truncated
distribution. But, you should include it nonetheless so as to insure the match between the first and
second commands. Also, it is essential that both Rhs and Rh2 include constant terms in the first
positions.
To move the heterogeneity to the mean of the underlying truncated normal distribution,
use
FRONTIER
FRONTIER
yit
ui
= | N[itu2] |
it
= i + zit,
Note that this version differs from the earlier one only in the presence of ; Model = T in the second
form and its absence in the first. Again, the variable specifications in the two commands must be
identical, and both must include constant terms in the first position in both lists. As before, you may
use ; Rh2 = one if you do not require variables zit in the mean. (This constant term will be removed
from the fixed effects model, but this common value is used as the starting value for the firm specific
estimates.)
We note, we have had scant success with this model even with a carefully constructed data
set and good starting values. The problem appears to be Newtons method, which must be used for
the general fixed effects program which this is part of. If you have a small panel with no more than
100 groups, an alternative approach appears to work better. You may provide a stratification
variable in the cross section template to request that a set of dummy variables be inserted directly
into the function.
E-99
The stratification variable must take the full set of values from 1 to N up to 100 and all groups must
have at least two observations. For the second form, with the heterogeneity embedded in the mean
of the truncated normal distribution, add
; Mean
to the command.
This provides four possible forms of the model, which we illustrate with the airline data:
NAMELIST
; x = one,lf,lm,le,ll,lp,lk $
This is a true fixed effects model with normal-truncated normal structure for uit.
FRONTIER
; Lhs = lq ; Rhs = x
; Model = T
; Str = firm $
This model is the same as the preceding one except now i= 1 + 2loadfctri.
FRONTIER
; Lhs = lq ; Rhs = x
; Model = T
; Rh2 = one,loadfctr
; Str = firm $
This is a true fixed effects model with the fixed effects appearing in i rather than in the production
function.
FRONTIER
; Lhs = lq ; Rhs = x
; Model = T
; Mean
; Str = firm $
This model is the same as the preceding model except that loadfctr now also appears in the mean of
the truncated variable.
FRONTIER
; Lhs = lq ; Rhs = x
; Model = T
; Rh2 = one,loadfctr ; Mean
; Str = firm $
E-100
uit
= | N[0, uit2] |
uit
= | N[0, uit2] |
~ N[0,vit2]
vit2 = v2 exp(wit)
The command would be
FRONTIER
To continue the earlier example, the following fits a model of heteroscedasticity to the
airline data. The first model has heteroscedasticity and the fixed effects in the variance of ui. The
second is doubly heteroscedastic, again with the fixed effects in the variance of ui.
NAMELIST
FRONTIER
FRONTIER
; x = one,lf,lm,le,ll,lp,lk $
; Lhs = lq ; Rhs = x
; Het ; Hfu = one,loadfctr ; Hfv = one ; Str = firm $
; Lhs = lq ; Rhs = x
; Het ; Hfu = one,loadfctr ; Hfv = one,loadfctr ; Str = firm $
E-101
E-102
E-103
E-104
Details on estimating random parameters models appear in Chapter R24, so they will be omitted
here.
The command structure for the true random effects model is similar to that for the true fixed
effects model. The frontier model must be fit twice, first with no effects to generate the starting
values, then with the effect specified. The commands are
FRONTIER
FRONTIER
Application
To illustrate the true random effects model, we continue the analysis of the airline data. The
commands below estimate the pooled model, then the true RE model. In like fashion to the analysis
of fixed effects, we then compare the true random effects estimates of inefficiency to the Pitt and Lee
estimates. Figure E64.8 illustrates the general result that the estimated inefficiencies in the true fixed
effects model will differ considerably from those produced by the Cornwell et al. approach to fixed
effects. Figure E64.9 shows the same result for the two approaches to random effects. Numerous
studies in the literature (see Greene (2005) for discussion) have documented the similarity of the
random and fixed approaches when the same overall structure is used. Thus, Figure E64.10 shows
similar results for the true fixed and random effects models and for the Pitt and Lee and Cornwell et
al. models.
; x = one,lf,lm,le,ll,lp,lk $
; Lhs = lq ; Rhs = x ; Panel ; Eff = uplre $
; Lhs = lq ; Rhs = x ; Par $
; Lhs = lq ; Rhs = x ; Panel ; RPM ; Eff = utre
; Fcn = one(n) ; Pts = 50 ; Halton $
; Lhs = lq ; Rhs = x ; Par $
; Lhs = lq ; Rhs = x ; Panel ; FEM ; Eff = utfe $
; Rhs = uplre,utre $
; utrebar = Group Mean(utre, Str = firm) $
; Lhs = uplre ; Rhs = utrebar ; Grid
; Title = Group Means of u(i,t) vs. Time Invariant u(i) $
; Lhs = utfe ; Rhs = utre ; Grid
; Title = Time Varying FE u(i) vs. Time Varying RE u(i) $
E-105
E-106
These are the estimates of the true random effects model. Note that the variation of the
random terms in the model has been rearranged. In the pooled model, sv = 0.138 and su = 0.130. In
the random effects model, we have sv = .099 and su= .100. But, sw = .140. The proportional
allocation of the total to u and v has stayed roughly the same, but some additional variation is now
attributed to the random effect. Note that the production function parameters have changed
substantially as well.
E-107
Figure E64.10 Comparison of Time Varying Fixed and Random Effects Estimates
E-108
E-109
ui
= | N[it, uit2] |
it
= imit.
uit2 = u2 exp(iwit).
The model allows, all at once, half normal or truncated normal distribution for ui and firmwise and/or
timewise heteroscedasticity in uit. The model form allows parameters to be random in all three parts
of the specification with the single restriction noted below. (Only the variance of the disturbance,
vit is assumed to be constant. In addition, this model form does not accommodate heteroscedasticity
in vit.) As will be clear in what follows, the true random effects model developed in the previous
section is a special case of this model with nonrandom parameters in it and uit2 and
only a random constant term in i.
NOTE: The random parameters normal-truncated normal model with heteroscedasticity (in uit) at
the same time is not identified. Only one of these two should be specified. The command parser
will not prevent you from specifying such a model, but it will ultimately be impossible to obtain the
parameter estimates.
The general structure of the random parameters stochastic frontier model is based on the
conditional density
f(yit| xit, i) = f( ixit), i = 1,...,N, t = 1,...,Ti
i = + zi + vi
where
and f(.) is the density for the stochastic frontier regression model. The model assumes that
parameters are randomly distributed with possibly heterogeneous (across individuals) means
E[ i| zi]
= + zi,
E-110
(Note, again, only one of the two optional specifications noted should be specified.)
NOTE: For this model, your Rhs list must include a constant term. Though not strictly necessary,
you should also include constants in Rh2 or Hfn if they are specified.
E-111
The difference in the three formulations is in the enclosures, ( ) for production function, [ ] for mean
of the truncated distribution, and <> for the variance of the one sided disturbance. This distinction
is necessary because the lists might have variables in common, and this is the only way to distinguish
them. In particular, it is likely that all three lists would include one, so this device is used to
distinguish the three functions.
Three distributions may be specified All random variables have mean 0.
n = standard normal distribution, variance = 1,
t = triangular (tent shaped) distribution in [-1,+1], variance = 1/6,
u = standard uniform distribution [-1,1], variance = 1/3.
Note that each of these is scaled as it enters the distribution, so the variance is only that of the
random draw before multiplication. (See Chapter R23 for discussion of this computation and for
other distributions that can be specified.) The latter two distributions are provided as one may wish
to reduce the amount of variation in the tails of the distribution of the parameters across individuals
and to limit the range of variation. (See Train (2010) for discussion.) For example, to specify that
the constant term and the coefficient on x1 are normally distributed with fixed mean and variance,
and a normally distributed constant in the mean of the truncated distribution, you might use
; Fcn = one(n), x1(n), one[n]
This specifies that the first and second coefficients are random while the remainder are not. The
parameters estimated will be the mean and standard deviations of the distributions of these two
parameters and the fixed values of the other three.
NOTE: If you use the wrong enclosures for the variables, a diagnostic will appear that the program
does not recognize a variable. For example:
FRONTIER
The reason for the diagnostic is that the lf[n] would indicate a specification for the truncation model,
using ; Rh2 = list. But, this command specifies only heteroscedasticity, which is denoted with <>
enclosures. Hence, when the lf[n] is encountered, LIMDEP searches for lf in an Rh2 list, and finding
no such list, issues the diagnostic.
E-112
b
= estimate of
varb = asymptotic covariance matrix for estimate of .
beta_i = individual specific parameters, if ; Par is requested.
Scalars:
kreg
nreg
logl
Last Model:
b_variables
E-113
Application
We continue the earlier application by fitting the stochastic frontier model with random
parameters. The random parameters truncation model appears to be unidentified in these data, so the
second model fit is with heteroscedasticity. In the first model, the constant and one of the production
coefficients is specified to be random. In the second, these two coefficients and the parameter on the
variable that enters the variance function are all taken to be random. The kernel density estimators
compare the efficiency estimates from the random parameters model to those from the simplest
pooled estimator.
E-114
; x = one,lf,lm,le,ll,lp,lk $
; Lhs = lq ; Rhs = x ; Eff = u $
; Lhs = lq ; Rhs = x
; RPM ; Panel ; Pts = 50 ; Halton; Fcn = one(n),lf(n) ; Eff = urp1 $
; Rhs = urp1,u $
; Lhs = lq ; Rhs = x $
; Lhs = lq ; Rhs = x ; Hfn = one,loadfctr
; RPM ; Panel ; Pts = 50 ; Halton
; Fcn = one(n),lf(n),loadfctr<n> $
Figure E64.11 shows the distributions of the estimates of inefficiencies from the random parameters
model and the simple, pooled fixed parameters model. The figure suggests that the RP formulation
is moving some of the variation of the outcome variable out of the inefficiency term and into the
production model, in the form of parameter variation.
E-115
Figure E64.11 Kernel Density Estimator for Random Parameters Model Inefficiencies
----------------------------------------------------------------------------Random Coefficients FrntrTrn Model
Dependent variable
LQ
Log likelihood function
199.14429
Estimation based on N =
256, K = 13
Unbalanced panel has
25 individuals
Stochastic frontier, truncation/hetero.
Simulation based on
50 Halton draws
Estimated parameters of efficiency dstn
s(u) =
.189842 s(v)=
.07165
avgE[u|e]=
.10986 avgE[TE|e]= .90303
Lambda = su/sv =
2.64974
--------+-------------------------------------------------------------------|
Standard
Prob.
95% Confidence
LQ| Coefficient
Error
z
|z|>Z*
Interval
--------+-------------------------------------------------------------------|Nonrandom parameters
LM|
.62243***
.04223
14.74 .0000
.53966
.70521
LE|
.38353
.28063
1.37 .1717
-.16649
.93355
LL|
-.36579***
.03589
-10.19 .0000
-.43614
-.29544
LP|
.15282***
.04217
3.62 .0003
.07017
.23547
LK|
-.16125
.31392
-.51 .6075
-.77652
.45401
suONE|
9.05239***
1.65934
5.46 .0000
5.80014 12.30464
|Means for random parameters
Constant|
-1.17144***
.29799
-3.93 .0001
-1.75549
-.58739
LF|
.49011***
.04904
9.99 .0000
.39398
.58623
suLOADFC|
-16.4160***
3.47560
-4.72 .0000
-23.2281
-9.6039
|Scale parameters for dists. of random parameters
Constant|
.12591***
.00859
14.65 .0000
.10906
.14275
LF|
.01186**
.00593
2.00 .0456
.00023
.02350
suLOADFC|
1.47653***
.36192
4.08 .0000
.76718
2.18589
|Sigma(v) from symmetric disturbance.
Sigma(v)|
.07165***
.00670
10.69 .0000
.05851
.08478
--------+-------------------------------------------------------------------Note: ***, **, * ==> Significance at 1%, 5%, 10% level.
-----------------------------------------------------------------------------
E-116
i wi ( 12 wi2 )
k ,i k k wi
wi ~ N [0,1]
vit ~ N [0, v2 ]
uit | N [0, u2 ] |
This model is specified simply by creating the necessary variables, then building a random
parameters model with the two additional specifications,
; Common ; Mgt
The ; Common specification alone is generic, and applies to all random parameters models. Use it
to specify that the same random component appears in all random parameters. The ; Mgt
specification has no function outside the frontier model. It is used only with the frontier model to
specify this particular form. For example, consider the following three factor translog model:
FRONTIER
FRONTIER
(It is always necessary to fit the frontier model with fixed parameters first to generate the starting
values.)
E-117
An extension of this model that the authors considered was intended to ameliorate the
probable correlation between the random effect wi and the independent variables (factors). The
Mundlak approach to this problem is to incorporate the group means of the variables in the model.
For this model, they proposed
wi = k=1 k log xi,k + fi
K
where fi is now the structural random variable that drives the random parameters. This extension is
requested with
; Means
(The program deduces internally which variables are nonconstant and should be used.)
Application
The following is the Alvarez, Arias and Greene application. The data consists of six years of
observations on 247 Spanish dairy farms. The output, yit is milk production. The four inputs, x1, x2,
x3 and x4 are feed, land, labor and cows. Commands for fitting the model are as follows: (We have
restricted the number of iterations and the number of replications for purpose of this numerical
illustration.) Both models (with and without the Mundlak adjustment) are shown.
FRONTIER
FRONTIER
FRONTIER
FRONTIER
; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $
; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44
; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25 ; Common ; Mgt
; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $
; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44 ; Par $
; Lhs = yit
; Rhs = one,x1,x2,x3,x4,x11,x12,x13,x14,x23,x24,x34,x44
; RPM ; Halton ; Pts = 25 ; Pds = 6 ; Maxit = 25
; Common ; Mgt ; Means
; Fcn = one(n),x1(n),x2(n),x3(n),x4(n) $
The first set of results is the pooled stochastic frontier model with no extensions or
modifications.
E-118
E-119
E-120
E-121
(As in other panel data settings, it is necessary to fit the pooled model first to compute the starting
values.)
The Battese and Coelli models may be specified here with
; Model = BC
for the decay model and
; Model = BC
; Hfu = one, heteroscedasticity variables
For this model, you must fit the identical Battese and Coelli model without the latent class
specification first. The application below demonstrates.
The basic form of the latent class model assumes that the class probabilities are fixed values.
You may make them dependent on time invariant variables, wi with
; LCM = list of variables in w
Do not include one in the list.
E-122
Some particular variables computed for the latent class model are
; Group = the index of the most likely latent class
; Cprob = estimated probability for the most likely latent class
You can obtain a listing of these two results by using
; List
An example appears below. You can also use the ; Rst = list option to structure the latent class
model so that different variables appear in different classes or that certain coefficients are equal
across classes. Examples are given in Chapter E20.
Estimates retained by this model include:
Matrices: b
varb
beta_i
Note that b and varb involve J(K+2) estimates. Two additional matrices are created,
b_class = a JK matrix with each row equal to the corresponding j
class_pr = a J1 vector containing the estimated class probabilities
Scalars:
kreg
nreg
logl
exitcode
=
=
=
=
Standard Model Specifications for the Latent Class Stochastic Frontier Model
This is the full list of general specifications that are applicable to this model estimator.
Controlling Output from Model Commands
; Par
; Partial Effects
; OLS
; Table = name
E-123
Application
The airline data used in the preceding examples are clearly not compatible with this model;
no configuration of the equation produces meaningful results. To illustrate the estimator, we have
borrowed the Spanish dairy data used in the previous section. The following commands fit a two
class, Battese and Coelli decay model.
NAMELIST
FRONTIER
FRONTIER
; x = one,x1,x2,x3,x4 $
; Lhs = yit ; Rhs = x
; Model = BC
; Pds = 6 $
; Lhs = yit ; Rhs = x
; Model = BC
; LCM ; Pts = 2 ; Pds = 6 ; List $
E-124
E-125
E-126
E-127
ys / xs < 1, s = 1,...,N
m > 0, m = 1,...,M
k > 0, k = 1,...,K
E-128
The optimization program seeks the optimal weights to maximize the efficiency of firm s subject to
the restriction that the efficiencies of all firms are less than or equal to one, and that all weights are
nonnegative. Because the objective function is homogeneous of degree zero any multiple of the
weights produces the same solution it is normalized with a restriction such as xi = 1.
Transforming and simplifying the problem a bit produces the equivalent program,
Maximize wrt ,: yi
Subject to
xi = 1
ys - xs < 0, s = 1,...,N
>0
>0
An equivalent form of the problem is the envelopment form (hence the name),
Minimize wrt i, : i
Subject to
s sys yi > 0
i xi - sxs > 0
s > 0.
The value of i is the input oriented technical efficiency score for the ith firm
TEINPUT,i = i.
It measures the extent to which the firm could reduce inputs to obtain the same output relative to
other firms in the sample. Note that the program is solved for each firm in the sample an efficiency
score i is generated for each firm. For some firms in the sample, the efficiency score will be 1.0.
This indicates firms deemed to be technically efficient. Otherwise, i < 1.
The preceding formulation includes an implicit assumption of constant returns to scale
(CRS). The assumption is relaxed to variable returns to scale (VRS), by adding a restriction
s s = 1.
Variable returns to scale is the standard assumption in contemporary applications. This provides a
means by which the scale efficiency of the firm can be measured. Let iC denote the technical
efficiency measure obtained assuming constant returns and iV be the variable returns to scale
counterpart. Then, the scale efficiency may be measured by
SEi = iC / iV.
This can be computed using the results of the two different programs after computation. A
nonincreasing returns to scale (NRS) version of the program can be obtained by changing the adding
up restriction to
s s < 1.
E-129
An alternative view of the optimization process is to consider the extent to which outputs
could conceivably be increased using the same inputs again relative to the standard of other firms
in the sample. The linear program which produces this solution is
Maximize wrt i, : i
Subject to
s sys i yi > 0
xi - sxs > 0
s > 0.
Once again, this assumes constant returns to scale. The variable returns to scale form is obtained by
adding the constraint ss = 1. In this solution, 1 < i < . The technical efficiency measure is
0 < TEOUTPUT,i = 1/i < 1
As before, some firms in the sample (the same firms) will be found to be technically efficient by this
output oriented efficiency measure.
s sys yi > 0
i - sxs > 0
s > 0.
As before, to allow for variable returns to scale (VRS), we add s s = 1. In this program, i gives the
cost minimizing vector of inputs for output yi and input prices wi. The cost efficiency for the ith firm is
then the ratio
0 < CEi = wii / wixi < 1.
Allocative efficiency may be measured using
0 < AEi = CEi / TEINPUT,i < 1.
E-130
Subject to
bL < A < bU
dL < < dU.
We will define the components for the three programs defined earlier. Note, first, for convenience,
we define the data matrices, Y and X. Y is an NM matrix of outputs whose ith row is the vector of
outputs for firm i; X is the NK matrix of inputs, defined likewise. For an individual firm, we define
yi to the M1 column vector of outputs for firm i; thus, yi is the transpose of the ith row of Y.
Likewise, xi is the column vector of K inputs for firm i, the transpose of the ith row of X. Finally,
the column vector of weights is = (1,...,N). Thus,
s s ys = Y and s s xs = X.
Finally, we note once again, the programs about to be defined are solved for each firm to obtain the
efficiency scores. (In fact, should be indexed by firm, since it is recomputed each time. For
convenience, we have omitted this subscript.) We use the symbol K and M to indicate a vector
whose each element equals infinity (or sometimes minus infinity) and boldface 1 or 0 to indicate a
vector of ones or zeros with a subscript to indicate the number of elements. Finally, our tableaus
include the VRS restriction, which may be suppressed by the user for the CRS form.
With all this in place, we can define the solutions to the optimization problems just by
identifying the components of the linear programming problems. These are as follows:
0
0
1
d L = N , c = N , = , dU = N
0
1
1
i
- K
X -xi
0K
b L = y i , A = Y 0 M , bU = M
1
1N 0
1
Output Oriented Technical Efficiency
0
0
1
d L N , c N , , dU N
1
1
i
K
X 0 K
xi
b L 0M , A Y -y i , bU M
1
1N 0
1
E-131
Allocative Efficiency
0
0
1
d L N , c N , , dU N
0K
wi
i
K
X -I K
0K
b L -y i , A Y 0M K , bU M
1
1N
1
0K
One final note, DEA requires a fair amount of computation. The linear program involves
M+K+1 constraints and N+1 activities, and it is computed once for each of the N firms in the sample.
The amount of computation increases with the square of N. The particular computations are quite
fast, however
E-132
E-133
; Lhs = milk
; Rhs = cows,land,labor,feed
; Alg = DEA $
+---------------------------------------------------------------------------+
| Data Envelopment Analysis
|
| Output Variables: MILK
|
| Input Variables:
COWS
LAND
LABOR
FEED
|
| Underlying Technology assumes VARIABLE Returns to Scale.
|
+---------------------------------------------------------------------------+
| Estimated Efficiencies:
Mean
Std.Deviation
Minimum
Maximum |
| Technical Efficiency
=======
=============
=======
======= |
|
Input Oriented
.8301
.1416
.4823
1.0000 |
|
Output Oriented
.7388
.1268
.3875
1.0000 |
| Sample Size:
1482 Observations.
1482 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E
|
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE
|
| Incomplete observations are filled with zeros for efficiency values.
|
+---------------------------------------------------------------------------+
As noted, the computed efficiency scores are saved in two places, in the data area, as variables
deaeff_i and deaeff_o and deaeff_e if you provide input prices for the economic efficiency analysis.
The same results are saved as matrices, dea_effo, dea_effi, dea_effe. Note that in both occurrences,
the estimator is bypassing missing and bad (nonpositive) data. If any of the variables used in the
analysis are missing, the observation is assigned an efficiency score of 0.0. The matrices will have
row dimension equal to the original sample size, before the bypass of missing values.
The example below includes a listing of the efficiency scores. The observation identifier
shows I = the sequence number of the observation used in the analysis. The R = value shows,
instead, the actual location of the observation in the raw data set. I will not equal R if you have used
a subset of the data (e.g., with SAMPLE or REJECT), or if the program has bypassed missing data
the listing will only show the complete observations. If you have included observation labels, e.g.,
firm names, in your data set, these observation and row identifiers will be replaced with the
observation names for your data set.
For a second example, the following analyzes the Christensen and Greene (1976) electricity
generation data. For these data, we have the input prices, so we do the full analysis.
FRONTIER
E-134
+---------------------------------------------------------------------------+
| Data Envelopment Analysis
|
| Output Variables: OUTPUT
|
| Input Variables:
LABOR
CAPITAL FUEL
|
| Price Variables:
LPRICE
CPRICE
FPRICE
|
| Underlying Technology assumes VARIABLE Returns to Scale.
|
+---------------------------------------------------------------------------+
| Estimated Efficiencies:
Mean
Std.Deviation
Minimum
Maximum |
| Technical Efficiency
=======
=============
=======
======= |
|
Input Oriented
.7692
.1390
.3464
1.0000 |
|
Output Oriented
.7657
.1467
.2960
1.0000 |
| Economic Efficiency
.4331
.1965
.1411
1.0000 |
| Allocative Effic.
.5473
.1754
.1796
1.0000 |
| Sample Size:
123 Observations.
123 Complete observations |
| Efficiencies saved as variables DEAEFF_O, DEAEFF_I and DEAEFF_E
|
| Efficiencies saved as matrices DEA_EFFO, DEA_EFFI and DEA_EFFE
|
| Incomplete observations are filled with zeros for efficiency values.
|
| Compute allocative efficiency as technical divided by economic efficiency |
+---------------------------------------------------------------------------+
Estimated Efficiency Values for Individual Decision Making Units
(Results are listed only for complete observations)
===============================================================================
Observation
| Input Oriented| Output Oriented|
Economic
| Allocative
Sample
Data
| Rank
Value| Rank
Value| Rank
Value| Rank
Value
================+===============+================+===============+=============
I=
1 R=
1|
1 1.00000|
1
1.00000|
1
1.00000|
1 1.00000
I=
2 R=
2|
13
.98446|
16
.92501|
53
.43644|
87 .44333
I=
3 R=
3|
16
.96243|
28
.88393| 119
.17287| 123 .17962
I=
4 R=
4|
46
.79469|
83
.73593|
96
.29127| 103 .36652
I=
5 R=
5|
115
.57426|
118
.44224|
47
.44703|
15 .77845
I=
6 R=
6|
120
.44307|
122
.35608| 103
.26194|
43 .59120
I=
7 R=
7|
80
.73356|
100
.64826| 101
.26996| 102 .36801
I=
8 R=
8|
123
.34637|
123
.29601| 121
.15388|
85 .44425
I=
9 R=
9|
106
.62517|
110
.57829| 109
.21689| 111 .34692
I=
10 R=
10|
103
.63852|
107
.59578|
66
.38812|
39 .60783
(Remaining observations are omitted.)
---------------------------------------------------------------------------Results of Bootstrap analysis of technical efficiency.
50 replications
---------------------------------------------------------------------------Technical Estimated Corrected Standard
Confid. Limits
Observation_____ Efficiency
Bias
Tech.Eff. Deviation Lower
Upper
I=
1 R=
1
1.0000
.0000
1.0000
.0000
1.0000 1.0000
I=
2 R=
2
.9845
-.0634
1.0479
.1008
.6583 1.0000
I=
3 R=
3
.9624
-.0898
1.0522
.1391
.5023 1.0000
I=
4 R=
4
.7947
.1091
.6856
.0953
.7222 1.0000
I=
5 R=
5
.5743
.3006
.2737
.1215
.6007 1.0000
I=
6 R=
6
.4431
.4318
.0113
.1246
.5785 1.0000
I=
7 R=
7
.7336
.1086
.6250
.1131
.6609 1.0000
I=
8 R=
8
.3464
.5317
-.1853
.0979
.6977 1.0000
I=
9 R=
9
.6252
.2154
.4097
.1265
.5131 1.0000
I=
10 R=
10
.6385
.2267
.4118
.1062
.6645 1.0000
E-135
It is always interesting to compare the DEA results with those obtained using the stochastic
frontier model. The following fits a translog stochastic frontier production function for the
Christensen and Greene data, computes the technical efficiencies, and plots them against the DEA
efficiency scores. As has been widely documented, the results are not so close to each other as one
might hope.
FRONTIER
PLOT
; Lhs = logq
; Rhs = one,logcap,loglabor,logfuel,
loglsq,logksq,logfsq,logklogl,logklogf,logllogf
; Techeff = tesf $
; Lhs = tesf ; Rhs = deaeff_i
; Grid ; Title = DEA Efficiencies vs. Stochastic Frontier JLMS $
E-136
E65.5.2 Application
The following uses all the features of the routine save for the Malmquist TFP computation
and the allocative efficiency routine. The sample data are in an Excel spreadsheet:
IMPORT
FRONTIER
; File = testdea.csv $
; Lhs = cameras,video,warranty
; Rhs = floor,staff
; Alg = DEA ; CRS
; Peers
; Nbt = 50 $
E-137
E-138
PLOT
PLOT
; All $
; Small > 0 $
; dalebar = Group Mean(dale, Str = country) $
; hexpbar = Group Mean(hexp, Str = country) $
; educbar = Group Mean(educ, Str = country) $
; year # 1997 $
; logdbar = Log(dalebar) $
; loghbar = Log(hexpbar) $
; logebar = Log(educbar) $
; Lhs = logdbar ; Rhs = one,loghbar,logebar ; Techeff = effsfa $
; Lhs = dalebar ; Rhs = hexpbar,educbar ; Alg = DEA$
; Rhs = effsfa,deaeff_i,deaeff_o ; Output = 2 $
; Lhs = effsfa ; Rhs = deaeff_i ; Grid
; Title = SFA Efficiencies vs. DEA Input Efficiencies $
; Lhs = effsfa ; Rhs = deaeff_o ; Limits=.4,1.1 ; Grid
; Title = SFA Efficiencies vs. DEA Output Efficiencies $
; sfarank = Rnk(effsfa) $
; dearanki = Rnk(deaeff_i) $
; dearanko = Rnk(deaeff_o) $
; List ; Rkc(sfarank,dearanki)
; Rkc(sfarank,dearanko)
; Rkc(dearanki,dearanko) $
; Lhs = sfarank ; Rhs = dearanki
; Endpoints = 0,200 ; Limits = 0,200 ; Grid
; Title = Ranks of SFA Efficiencies vs. DEA Input Efficiencies $
; Lhs = sfarank ; Rhs = dearanko
; Endpoints = 0,200 ; Limits = 0,200 ; Grid
; Title = Ranks of SFA Efficiencies vs. DEA Output Efficiencies $
DSTAT
E-139
E-140
Figure E65.4 Plot of Ranks of SFA Efficiency Scores vs. Ranks of DEA Scores
E-141
E-142
TEi (t + 1 | t ) TEi (t + 1 | t + 1)
TEi (t | t ) TEi (t | t + 1)
where TE(r|s) indicates the earlier defined output oriented technical efficiency index for firm i, using
inputs xi,r and producing outputs yi,r relative to production (and input usage) for firms based in period s.
This index is computed using the following program:
0
0
1
d L N , c N , , dU N
0
1
ir
X 0 K
x
bL K , A s
, bU i
0M
M
Ys -y ir
This uses the constant returns to scale form. Also, since the period r output and input vectors for firm i
will not appear in Ys and Xs when r does not equal s, ir need not be larger than one. Note that this
requires solution of four linear programs for each firm in each period, so the total number of programs to
solve will be 4NT. Each is quite fast, so overall, the computations do not take long. In the sample of
247 firms and six periods, the nearly 6,000 programs, each involving 248 activities and six constraints,
took about 10 seconds.
These computations are carried out for each firm in each period save the last one, and produce an
NT matrix of TFP values, one row for each firm, one column for each period. The TFP value for the last
period is recorded as 1.0, though this is just a space filler.
To compute the Malmquist TFP indices, you will require a panel of data, at least two periods, for
each of N firms. Unlike other panel data routines in LIMDEP, this computation always requires a
balanced panel. Every firm must be observed in the same T periods. Also, this routine has no procedures
for avoiding missing or invalid data such as zero values for inputs or outputs. The balanced panel must be
clean before computation begins. To request the computations, just add
; Pds = t, the fixed number of periods.
Nothing else need be changed. There is no bootstrap feature (; Nbt = 0); the computations assume
constant returns to scale (; CRS is the default and cannot be changed) and no allocative efficiency (; Rh2
is ignored).
E-143
; Lhs = milk
; Rhs = cows,land,labor,feed
; Alg = DEA ; Pds = 6
; List $
The following results are displayed. In addition, a matrix containing the full table, named malmquist, is
created.
==============================================================================
Malmquist TFP Index for Productivity Change
Panel contained 247 firms each observed in
6 periods
Full Results saved as matrix MALMQIST
==============================================================================
Average results across firms, by period:
==============================================================================
Period:
1
2
3
4
5
TFP
1.0476 1.0233 1.0247 1.0298 1.0349
==============================================================================
Individual calculations by firm
(Only 8 periods can be displayed. TFP for the final period is not computed.)
==============================================================================
Observation
1
2
3
4
5
6
7
8
Firm =
1
1.1301 1.1002
.9736 1.0291 1.0901 1.
Firm =
2
1.0528 1.0343 1.0212 1.0109 1.0416 1.
Firm =
3
1.0525 1.0383
.9477 1.0465 1.0395 1.
Firm =
4
1.1418 1.0129 1.0079
.9829 1.0476 1.
Firm =
5
1.1192 1.0240 1.0082 1.0245 1.0641 1.
Firm =
6
.9871 1.0073
.9785 1.0322 1.0464 1.
Firm =
7
.9851 1.1484 1.1599
.8054 1.1110 1.
Firm =
8
1.0746
.9796
.9636 1.0671
.9753 1.
Firm =
9
.8977 1.1496
.9818 1.0500
.9867 1.
Firm =
10
1.0105 1.1507
.9751 1.0055 1.0469 1.
Firm =
11
1.1276
.9867
.9636 1.0826
.9873 1.
Firm =
12
1.0310 1.1020
.9822 1.0438
.9914 1.
Firm =
13
1.0549 1.1263
.9221 1.0723 1.1945 1.
Firm =
14
.9408 1.0740
.9938
.9739 1.0336 1.
Firm =
15
.8952
.7156 1.5056
.8614
.9204 1.
(Rows 66 247 omitted).