SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)
SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)
SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>SAS</strong>/<strong>STAT</strong> ®<br />
<strong>9.2</strong> User’s <strong>Guide</strong><br />
<strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
(<strong>Book</strong> <strong>Excerpt</strong>)<br />
<strong>SAS</strong> ® Documentation
This document is an individual chapter from <strong>SAS</strong>/<strong>STAT</strong> ® <strong>9.2</strong> User’s <strong>Guide</strong>.<br />
<strong>The</strong> correct bibliographic citation for the complete manual is as follows: <strong>SAS</strong> Institute Inc. 2008. <strong>SAS</strong>/<strong>STAT</strong> ® <strong>9.2</strong><br />
User’s <strong>Guide</strong>. Cary, NC: <strong>SAS</strong> Institute Inc.<br />
Copyright © 2008, <strong>SAS</strong> Institute Inc., Cary, NC, USA<br />
All rights reserved. Produced in the United States of America.<br />
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor<br />
at the time you acquire this publication.<br />
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation<br />
by the U.S. government is subject to the Agreement with <strong>SAS</strong> Institute and the restrictions set forth in FAR 52.227-19,<br />
Commercial Computer Software-Restricted Rights (June 1987).<br />
<strong>SAS</strong> Institute Inc., <strong>SAS</strong> Campus Drive, Cary, North Carolina 27513.<br />
1st electronic book, March 2008<br />
2nd electronic book, February 2009<br />
<strong>SAS</strong> ® Publishing provides a complete selection of books and electronic products to help customers use <strong>SAS</strong> software to<br />
its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the<br />
<strong>SAS</strong> Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.<br />
<strong>SAS</strong> ® and all other <strong>SAS</strong> Institute Inc. product or service names are registered trademarks or trademarks of <strong>SAS</strong> Institute<br />
Inc. in the USA and other countries. ® indicates USA registration.<br />
Other brand and product names are registered trademarks or trademarks of their respective companies.
Chapter 56<br />
<strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Contents<br />
Overview: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3886<br />
Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3887<br />
Notation for the Mixed Model . . . . . . . . . . . . . . . . . . . . . . . . . 3888<br />
PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s . . . . . . . . . . . . 3889<br />
Getting Started: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . 3890<br />
Clustered Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3890<br />
Syntax: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3896<br />
PROC <strong>MIXED</strong> Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3898<br />
BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910<br />
CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910<br />
CONTRAST Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3911<br />
ESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3914<br />
ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3916<br />
LSMEANS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3916<br />
MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3922<br />
PARMS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937<br />
PRIOR Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3939<br />
RANDOM Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3943<br />
REPEATED Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3948<br />
WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />
Details: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />
Mixed Models <strong>The</strong>ory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />
Parameterization of Mixed Models . . . . . . . . . . . . . . . . . . . . . . 3975<br />
Residuals and Influence Diagnostics . . . . . . . . . . . . . . . . . . . . . . 3980<br />
Default Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3989<br />
ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3993<br />
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3998<br />
Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4004<br />
Examples: Mixed <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4008<br />
Example 56.1: Split-Plot Design . . . . . . . . . . . . . . . . . . . . . . . 4008<br />
Example 56.2: Repeated Measures . . . . . . . . . . . . . . . . . . . . . . 4013<br />
Example 56.3: Plotting the Likelihood . . . . . . . . . . . . . . . . . . . . 4026<br />
Example 56.4: Known G and R . . . . . . . . . . . . . . . . . . . . . . . . 4033<br />
Example 56.5: Random Coefficients . . . . . . . . . . . . . . . . . . . . . 4041
3886 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Example 56.6: Line-Source Sprinkler Irrigation . . . . . . . . . . . . . . . 4049<br />
Example 56.7: Influence in Heterogeneous Variance Model . . . . . . . . . 4055<br />
Example 56.8: Influence Analysis for Repeated Measures Data . . . . . . . 4064<br />
Example 56.9: Examining Individual Test Components . . . . . . . . . . . 4073<br />
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4078<br />
Overview: <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> <strong>MIXED</strong> procedure fits a variety of mixed linear models to data and enables you to use these<br />
fitted models to make statistical inferences about the data. A mixed linear model is a generalization<br />
of the standard linear model used in the GLM procedure, the generalization being that the data<br />
are permitted to exhibit correlation and nonconstant variability. <strong>The</strong> mixed linear model, therefore,<br />
provides you with the flexibility of modeling not only the means of your data (as in the standard<br />
linear model) but their variances and covariances as well.<br />
<strong>The</strong> primary assumptions underlying the analyses performed by PROC <strong>MIXED</strong> are as follows:<br />
<strong>The</strong> data are normally distributed (Gaussian).<br />
<strong>The</strong> means (expected values) of the data are linear in terms of a certain set of parameters.<br />
<strong>The</strong> variances and covariances of the data are in terms of a different set of parameters, and<br />
they exhibit a structure matching one of those available in PROC <strong>MIXED</strong>.<br />
Since Gaussian data can be modeled entirely in terms of their means and variances/covariances, the<br />
two sets of parameters in a mixed linear model actually specify the complete probability distribution<br />
of the data. <strong>The</strong> parameters of the mean model are referred to as fixed-effects parameters, and the<br />
parameters of the variance-covariance model are referred to as covariance parameters.<br />
<strong>The</strong> fixed-effects parameters are associated with known explanatory variables, as in the standard<br />
linear model. <strong>The</strong>se variables can be either qualitative (as in the traditional analysis of variance)<br />
or quantitative (as in standard linear regression). However, the covariance parameters are what<br />
distinguishes the mixed linear model from the standard linear model.<br />
<strong>The</strong> need for covariance parameters arises quite frequently in applications, the following being the<br />
two most typical scenarios:<br />
<strong>The</strong> experimental units on which the data are measured can be grouped into clusters, and the<br />
data from a common cluster are correlated.<br />
Repeated measurements are taken on the same experimental unit, and these repeated measurements<br />
are correlated or exhibit variability that changes.<br />
<strong>The</strong> first scenario can be generalized to include one set of clusters nested within another. For example,<br />
if students are the experimental unit, they can be clustered into classes, which in turn can be
Basic Features ✦ 3887<br />
clustered into schools. Each level of this hierarchy can introduce an additional source of variability<br />
and correlation. <strong>The</strong> second scenario occurs in longitudinal studies, where repeated measurements<br />
are taken over time. Alternatively, the repeated measures could be spatial or multivariate in nature.<br />
PROC <strong>MIXED</strong> provides a variety of covariance structures to handle the previous two scenarios.<br />
<strong>The</strong> most common of these structures arises from the use of random-effects parameters, which are<br />
additional unknown random variables assumed to affect the variability of the data. <strong>The</strong> variances of<br />
the random-effects parameters, commonly known as variance components, become the covariance<br />
parameters for this particular structure. Traditional mixed linear models contain both fixed- and<br />
random-effects parameters, and, in fact, it is the combination of these two types of effects that led<br />
to the name mixed model. PROC <strong>MIXED</strong> fits not only these traditional variance component models<br />
but numerous other covariance structures as well.<br />
PROC <strong>MIXED</strong> fits the structure you select to the data by using the method of restricted maximum<br />
likelihood (REML), also known as residual maximum likelihood. It is here that the Gaussian assumption<br />
for the data is exploited. Other estimation methods are also available, including maximum<br />
likelihood and MIVQUE0. <strong>The</strong> details behind these estimation methods are discussed in subsequent<br />
sections.<br />
After a model has been fit to your data, you can use it to draw statistical inferences via both the fixedeffects<br />
and covariance parameters. PROC <strong>MIXED</strong> computes several different statistics suitable for<br />
generating hypothesis tests and confidence intervals. <strong>The</strong> validity of these statistics depends upon<br />
the mean and variance-covariance model you select, so it is important to choose the model carefully.<br />
Some of the output from PROC <strong>MIXED</strong> helps you assess your model and compare it with others.<br />
Basic Features<br />
PROC <strong>MIXED</strong> provides easy accessibility to numerous mixed linear models that are useful in many<br />
common statistical analyses. In the style of the GLM procedure, PROC <strong>MIXED</strong> fits the specified<br />
mixed linear model and produces appropriate statistics.<br />
Here are some basic features of PROC <strong>MIXED</strong>:<br />
covariance structures, including variance components, compound symmetry, unstructured,<br />
AR(1), Toeplitz, spatial, general linear, and factor analytic<br />
GLM-type grammar, by using MODEL, RANDOM, and REPEATED statements for model<br />
specification and CONTRAST, ESTIMATE, and LSMEANS statements for inferences<br />
appropriate standard errors for all specified estimable linear combinations of fixed and random<br />
effects, and corresponding t and F tests<br />
subject and group effects that enable blocking and heterogeneity, respectively<br />
REML and ML estimation methods implemented with a Newton-Raphson algorithm<br />
capacity to handle unbalanced data<br />
ability to create a <strong>SAS</strong> data set corresponding to any table
3888 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
PROC <strong>MIXED</strong> uses the Output Delivery System (ODS), a <strong>SAS</strong> subsystem that provides capabilities<br />
for displaying and controlling the output from <strong>SAS</strong> procedures. ODS enables you to convert any<br />
of the output from PROC <strong>MIXED</strong> into a <strong>SAS</strong> data set. See the section “ODS Table Names” on<br />
page 3993.<br />
<strong>The</strong> <strong>MIXED</strong> procedure now uses ODS Graphics to create graphs as part of its output. For general<br />
information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific<br />
information about the statistical graphics available with the <strong>MIXED</strong> procedure, see the PLOTS<br />
option in the PROC <strong>MIXED</strong> statement and the section “ODS Graphics” on page 3998.<br />
Notation for the Mixed Model<br />
This section introduces the mathematical notation used throughout this chapter to describe the<br />
mixed linear model. You should be familiar with basic matrix algebra (see Searle 1982). A more<br />
detailed description of the mixed model is contained in the section “Mixed Models <strong>The</strong>ory” on<br />
page 3962.<br />
A statistical model is a mathematical description of how data are generated. <strong>The</strong> standard linear<br />
model, as used by the GLM procedure, is one of the most common statistical models:<br />
y D Xˇ C<br />
In this expression, y represents a vector of observed data, ˇ is an unknown vector of fixed-effects<br />
parameters with known design matrix X, and is an unknown random error vector modeling the<br />
statistical noise around Xˇ. <strong>The</strong> focus of the standard linear model is to model the mean of y<br />
by using the fixed-effects parameters ˇ. <strong>The</strong> residual errors are assumed to be independent and<br />
identically distributed Gaussian random variables with mean 0 and variance 2 .<br />
<strong>The</strong> mixed model generalizes the standard linear model as follows:<br />
y D Xˇ C Z C<br />
Here, is an unknown vector of random-effects parameters with known design matrix Z, and<br />
is an unknown random error vector whose elements are no longer required to be independent and<br />
homogeneous.<br />
To further develop this notion of variance modeling, assume that and are Gaussian random<br />
variables that are uncorrelated and have expectations 0 and variances G and R, respectively. <strong>The</strong><br />
variance of y is thus<br />
V D ZGZ 0 C R<br />
Note that, when R D 2 I and Z D 0, the mixed model reduces to the standard linear model.<br />
You can model the variance of the data, y, by specifying the structure (or form) of Z, G, and R. <strong>The</strong><br />
model matrix Z is set up in the same fashion as X, the model matrix for the fixed-effects parameters.<br />
For G and R, you must select some covariance structure. Possible covariance structures include the<br />
following:
variance components<br />
compound symmetry (common covariance plus diagonal)<br />
unstructured (general covariance)<br />
autoregressive<br />
spatial<br />
general linear<br />
factor analytic<br />
PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s ✦ 3889<br />
By appropriately defining the model matrices X and Z, as well as the covariance structure matrices<br />
G and R, you can perform numerous mixed model analyses.<br />
PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s<br />
PROC <strong>MIXED</strong> is a generalization of the GLM procedure in the sense that PROC GLM fits standard<br />
linear models, and PROC <strong>MIXED</strong> fits the wider class of mixed linear models. Both procedures<br />
have similar CLASS, MODEL, CONTRAST, ESTIMATE, and LSMEANS statements, but their<br />
RANDOM and REPEATED statements differ (see the following paragraphs). Both procedures use<br />
the non-full-rank model parameterization, although the sorting of classification levels can differ<br />
between the two. PROC <strong>MIXED</strong> computes only Type I–Type III tests of fixed effects, while PROC<br />
GLM computes Types I–IV.<br />
<strong>The</strong> RANDOM statement in PROC <strong>MIXED</strong> incorporates random effects constituting the vector<br />
in the mixed model. However, in PROC GLM, effects specified in the RANDOM statement are still<br />
treated as fixed as far as the model fit is concerned, and they serve only to produce corresponding<br />
expected mean squares. <strong>The</strong>se expected mean squares lead to the traditional ANOVA estimates of<br />
variance components. PROC <strong>MIXED</strong> computes REML and ML estimates of variance parameters,<br />
which are generally preferred to the ANOVA estimates (Searle 1988; Harville 1988; Searle, Casella,<br />
and McCulloch 1992). Optionally, PROC <strong>MIXED</strong> also computes MIVQUE0 estimates, which are<br />
similar to ANOVA estimates.<br />
<strong>The</strong> REPEATED statement in PROC <strong>MIXED</strong> is used to specify covariance structures for repeated<br />
measurements on subjects, while the REPEATED statement in PROC GLM is used to specify various<br />
transformations with which to conduct the traditional univariate or multivariate tests. In repeated<br />
measures situations, the mixed model approach used in PROC <strong>MIXED</strong> is more flexible and<br />
more widely applicable than either the univariate or multivariate approach. In particular, the mixed<br />
model approach provides a larger class of covariance structures and a better mechanism for handling<br />
missing values (Wolfinger and Chang 1995).<br />
PROC <strong>MIXED</strong> subsumes the VARCOMP procedure. PROC <strong>MIXED</strong> provides a wide variety of covariance<br />
structures, while PROC VARCOMP estimates only simple random effects. PROC <strong>MIXED</strong><br />
carries out several analyses that are absent in PROC VARCOMP, including the estimation and testing<br />
of linear combinations of fixed and random effects.
3890 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> ARIMA and AUTOREG procedures provide more time series structures than PROC <strong>MIXED</strong>,<br />
although they do not fit variance component models. <strong>The</strong> CALIS procedure fits general covariance<br />
matrices, but the fixed effects structure of the model is formed differently than in PROC <strong>MIXED</strong>.<br />
<strong>The</strong> LATTICE and NESTED procedures fit special types of mixed linear models that can also be<br />
handled in PROC <strong>MIXED</strong>, although PROC <strong>MIXED</strong> might run slower because of its more general<br />
algorithm. <strong>The</strong> TSCSREG procedure analyzes time series cross-sectional data, and it fits some<br />
structures not available in PROC <strong>MIXED</strong>.<br />
<strong>The</strong> GLIMMIX procedure fits generalized linear mixed models (GLMMs). Linear mixed models—<br />
where the data are normally distributed, given the random effects—are in the class of GLMMs. <strong>The</strong><br />
<strong>MIXED</strong> procedure can estimate covariance parameters with ANOVA methods that are not available<br />
in the GLIMMIX procedure (see METHOD=TYPE1, METHOD=TYPE2, and METHOD=TYPE3<br />
in the PROC <strong>MIXED</strong> statement). Also, PROC <strong>MIXED</strong> can perform a sampling-based Bayesian<br />
analysis through the PRIOR statement, and the procedure supports certain Kronecker-type covariance<br />
structures. <strong>The</strong>se features are not available in the GLIMMIX procedure. <strong>The</strong> GLIMMIX<br />
procedure, on the other hand, accommodates nonnormal data and offers a broader array of postprocessing<br />
features than the <strong>MIXED</strong> procedure.<br />
Getting Started: <strong>MIXED</strong> <strong>Procedure</strong><br />
Clustered Data Example<br />
Consider the following <strong>SAS</strong> data set as an introductory example:<br />
data heights;<br />
input Family Gender$ Height @@;<br />
datalines;<br />
1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63<br />
2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63<br />
3 M 64 4 F 67 4 F 66 4 M 67 4 M 67 4 M 69<br />
;<br />
<strong>The</strong> response variable Height measures the heights (in inches) of 18 individuals. <strong>The</strong> individuals<br />
are classified according to Family and Gender. You can perform a traditional two-way analysis of<br />
variance of these data with the following PROC <strong>MIXED</strong> statements:<br />
proc mixed data=heights;<br />
class Family Gender;<br />
model Height = Gender Family Family*Gender;<br />
run;<br />
<strong>The</strong> PROC <strong>MIXED</strong> statement invokes the procedure. <strong>The</strong> CLASS statement instructs PROC<br />
<strong>MIXED</strong> to consider both Family and Gender as classification variables. Dummy (indicator) variables<br />
are, as a result, created corresponding to all of the distinct levels of Family and Gender. For<br />
these data, Family has four levels and Gender has two levels.
Clustered Data Example ✦ 3891<br />
<strong>The</strong> MODEL statement first specifies the response (dependent) variable Height. <strong>The</strong> explanatory<br />
(independent) variables are then listed after the equal (=) sign. Here, the two explanatory variables<br />
are Gender and Family, and these are the main effects of the design. <strong>The</strong> third explanatory term,<br />
Family*Gender, models an interaction between the two main effects.<br />
PROC <strong>MIXED</strong> uses the dummy variables associated with Gender, Family, and Family*Gender to<br />
construct the X matrix for the linear model. A column of 1s is also included as the first column of<br />
X to model a global intercept. <strong>The</strong>re are no Z or G matrices for this model, and R is assumed to<br />
equal 2 I, where I is an 18 18 identity matrix.<br />
<strong>The</strong> RUN statement completes the specification. <strong>The</strong> coding is precisely the same as with the GLM<br />
procedure. However, much of the output from PROC <strong>MIXED</strong> is different from that produced by<br />
PROC GLM.<br />
<strong>The</strong> output from PROC <strong>MIXED</strong> is shown in Figure 56.1–Figure 56.7.<br />
<strong>The</strong> “Model Information” table in Figure 56.1 describes the model, some of the variables that it<br />
involves, and the method used in fitting it. This table also lists the method (profile, factor, parameter,<br />
or none) for handling the residual variance.<br />
Figure 56.1 Model Information<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.HEIGHTS<br />
Dependent Variable Height<br />
Covariance Structure Diagonal<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Residual<br />
<strong>The</strong> “Class Level Information” table in Figure 56.2 lists the levels of all variables specified in the<br />
CLASS statement. You can check this table to make sure that the data are correct.<br />
Figure 56.2 Class Level Information<br />
Class Level Information<br />
Class Levels Values<br />
Family 4 1 2 3 4<br />
Gender 2 F M<br />
<strong>The</strong> “Dimensions” table in Figure 56.3 lists the sizes of relevant matrices. This table can be useful<br />
in determining CPU time and memory requirements.
3892 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Figure 56.3 Dimensions<br />
Dimensions<br />
Covariance Parameters 1<br />
Columns in X 15<br />
Columns in Z 0<br />
Subjects 1<br />
Max Obs Per Subject 18<br />
<strong>The</strong> “Number of Observations” table in Figure 56.4 displays information about the sample size<br />
being processed.<br />
Figure 56.4 Number of Observations<br />
Number of Observations<br />
Number of Observations Read 18<br />
Number of Observations Used 18<br />
Number of Observations Not Used 0<br />
<strong>The</strong> “Covariance Parameter Estimates” table in Figure 56.5 displays the estimate of 2 for the<br />
model.<br />
Figure 56.5 Covariance Parameter Estimates<br />
Covariance Parameter<br />
Estimates<br />
Cov Parm Estimate<br />
Residual 2.1000<br />
<strong>The</strong> “Fit Statistics” table in Figure 56.6 lists several pieces of information about the fitted mixed<br />
model, including values derived from the computed value of the restricted/residual likelihood.<br />
Figure 56.6 Fit Statistics<br />
Fit Statistics<br />
-2 Res Log Likelihood 41.6<br />
AIC (smaller is better) 43.6<br />
AICC (smaller is better) 44.1<br />
BIC (smaller is better) 43.9<br />
<strong>The</strong> “Type 3 Tests of Fixed Effects” table in Figure 56.7 displays significance tests for the three<br />
effects listed in the MODEL statement. <strong>The</strong> Type 3 F statistics and p-values are the same as those<br />
produced by the GLM procedure. However, because PROC <strong>MIXED</strong> uses a likelihood-based esti-
Clustered Data Example ✦ 3893<br />
mation scheme, it does not directly compute or display sums of squares for this analysis.<br />
Figure 56.7 Tests of Fixed Effects<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Gender 1 10 17.63 0.0018<br />
Family 3 10 5.90 0.0139<br />
Family*Gender 3 10 2.89 0.0889<br />
<strong>The</strong> Type 3 test for Family*Gender effect is not significant at the 5% level, but the tests for both main<br />
effects are significant.<br />
<strong>The</strong> important assumptions behind this analysis are that the data are normally distributed and that<br />
they are independent with constant variance. For these data, the normality assumption is probably<br />
realistic since the data are observed heights. However, since the data occur in clusters (families),<br />
it is very likely that observations from the same family are statistically correlated—that is, not<br />
independent.<br />
<strong>The</strong> methods implemented in PROC <strong>MIXED</strong> are still based on the assumption of normally distributed<br />
data, but you can drop the assumption of independence by modeling statistical correlation<br />
in a variety of ways. You can also model variances that are heterogeneous—that is, nonconstant.<br />
For the height data, one of the simplest ways of modeling correlation is through the use of random<br />
effects. Here the family effect is assumed to be normally distributed with zero mean and some<br />
unknown variance. This is in contrast to the previous model in which the family effects are just<br />
constants, or fixed effects. Declaring Family as a random effect sets up a common correlation among<br />
all observations having the same level of Family.<br />
Declaring Family*Gender as a random effect models an additional correlation between all observations<br />
that have the same level of both Family and Gender. One interpretation of this effect is that a<br />
female in a certain family exhibits more correlation with the other females in that family than with<br />
the other males, and likewise for a male. With the height data, this model seems reasonable.<br />
<strong>The</strong> statements to fit this correlation model in PROC <strong>MIXED</strong> are as follows:<br />
proc mixed;<br />
class Family Gender;<br />
model Height = Gender;<br />
random Family Family*Gender;<br />
run;<br />
Note that Family and Family*Gender are now listed in the RANDOM statement. <strong>The</strong> dummy variables<br />
associated with them are used to construct the Z matrix in the mixed model. <strong>The</strong> X matrix<br />
now consists of a column of 1s and the dummy variables for Gender.<br />
<strong>The</strong> G matrix for this model is diagonal, and it contains the variance components for both Family<br />
and Family*Gender. <strong>The</strong> R matrix is still assumed to equal 2 I, where I is an identity matrix.
3894 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> output from this analysis is as follows.<br />
Figure 56.8 Model Information<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.HEIGHTS<br />
Dependent Variable Height<br />
Covariance Structure Variance Components<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
<strong>The</strong> “Model Information” table in Figure 56.8 shows that the containment method is used to compute<br />
the degrees of freedom for this analysis. This is the default method when a RANDOM statement<br />
is used; see the description of the DDFM= option for more information.<br />
Figure 56.9 Class Level Information<br />
Class Level Information<br />
Class Levels Values<br />
Family 4 1 2 3 4<br />
Gender 2 F M<br />
<strong>The</strong> “Class Level Information” table in Figure 56.9 is the same as before. <strong>The</strong> “Dimensions” table<br />
in Figure 56.10 displays the new sizes of the X and Z matrices.<br />
Figure 56.10 Dimensions and Number of Observations<br />
Dimensions<br />
Covariance Parameters 3<br />
Columns in X 3<br />
Columns in Z 12<br />
Subjects 1<br />
Max Obs Per Subject 18<br />
Number of Observations<br />
Number of Observations Read 18<br />
Number of Observations Used 18<br />
Number of Observations Not Used 0<br />
<strong>The</strong> “Iteration History” table in Figure 56.11 displays the results of the numerical optimization<br />
of the restricted/residual likelihood. Six iterations are required to achieve the default convergence
criterion of 1E 8.<br />
Figure 56.11 REML Estimation Iteration History<br />
Iteration History<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
0 1 74.11074833<br />
1 2 71.51614003 0.01441208<br />
2 1 71.13845990 0.00412226<br />
3 1 71.03613556 0.00058188<br />
4 1 71.02281757 0.00001689<br />
5 1 71.02245904 0.00000002<br />
6 1 71.02245869 0.00000000<br />
Convergence criteria met.<br />
Clustered Data Example ✦ 3895<br />
<strong>The</strong> “Covariance Parameter Estimates” table in Figure 56.12 displays the results of the REML<br />
fit. <strong>The</strong> Estimate column contains the estimates of the variance components for Family and Family*Gender,<br />
as well as the estimate of 2 .<br />
Figure 56.12 Covariance Parameter Estimates (REML)<br />
Covariance Parameter<br />
Estimates<br />
Cov Parm Estimate<br />
Family 2.4010<br />
Family*Gender 1.7657<br />
Residual 2.1668<br />
<strong>The</strong> “Fit Statistics” table in Figure 56.13 contains basic information about the REML fit.<br />
Figure 56.13 Fit Statistics<br />
Fit Statistics<br />
-2 Res Log Likelihood 71.0<br />
AIC (smaller is better) 77.0<br />
AICC (smaller is better) 79.0<br />
BIC (smaller is better) 75.2<br />
<strong>The</strong> “Type 3 Tests of Fixed Effects” table in Figure 56.14 contains a significance test for the lone<br />
fixed effect, Gender. Note that the associated p-value is not nearly as significant as in the previous<br />
analysis. This illustrates the importance of correctly modeling correlation in your data.
3896 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Figure 56.14 Type 3 Tests of Fixed Effects<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Gender 1 3 7.95 0.0667<br />
An additional benefit of the random effects analysis is that it enables you to make inferences about<br />
gender that apply to an entire population of families, whereas the inferences about gender from the<br />
analysis where Family and Family*Gender are fixed effects apply only to the particular families in the<br />
data set.<br />
PROC <strong>MIXED</strong> thus offers you the ability to model correlation directly and to make inferences about<br />
fixed effects that apply to entire populations of random effects.<br />
Syntax: <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> following statements are available in PROC <strong>MIXED</strong>.<br />
PROC <strong>MIXED</strong> < options > ;<br />
BY variables ;<br />
CLASS variables ;<br />
ID variables ;<br />
MODEL dependent = < fixed-effects > < / options > ;<br />
RANDOM random-effects < / options > ;<br />
REPEATED < repeated-effect >< / options > ;<br />
PARMS (value-list) . . . < / options > ;<br />
PRIOR < distribution >< / options > ;<br />
CONTRAST ’label’ < fixed-effect values . . . ><br />
< | random-effect values . . . >, . . . < / options > ;<br />
ESTIMATE ’label’ < fixed-effect values . . . ><br />
< | random-effect values . . . >< / options > ;<br />
LSMEANS fixed-effects < / options > ;<br />
WEIGHT variable ;<br />
Items within angle brackets ( < > ) are optional. <strong>The</strong> CONTRAST, ESTIMATE, LSMEANS, and<br />
RANDOM statements can appear multiple times; all other statements can appear only once.<br />
<strong>The</strong> PROC <strong>MIXED</strong> and MODEL statements are required, and the MODEL statement must appear<br />
after the CLASS statement if a CLASS statement is included. <strong>The</strong> CONTRAST, ESTIMATE,<br />
LSMEANS, RANDOM, and REPEATED statements must follow the MODEL statement. <strong>The</strong><br />
CONTRAST and ESTIMATE statements must also follow any RANDOM statements.<br />
Table 56.1 summarizes the basic functions and important options of each PROC <strong>MIXED</strong> statement.
Syntax: <strong>MIXED</strong> <strong>Procedure</strong> ✦ 3897<br />
<strong>The</strong> syntax of each statement in Table 56.1 is described in the following sections in alphabetical<br />
order after the description of the PROC <strong>MIXED</strong> statement.<br />
Table 56.1 Summary of PROC <strong>MIXED</strong> Statements<br />
Statement Description Important Options<br />
PROC <strong>MIXED</strong> invokes the procedure DATA= specifies input data set, METHOD= specifies<br />
estimation method<br />
BY performs multiple<br />
PROC <strong>MIXED</strong> analyses<br />
in one invocation<br />
none<br />
CLASS declares qualitative variables<br />
that create indicator<br />
variables in design<br />
matrices<br />
none<br />
ID lists additional variables<br />
to be included in predicted<br />
values tables<br />
none<br />
MODEL specifies dependent vari- S requests solution for fixed-effects parameters,<br />
able and fixed effects, DDFM= specifies denominator degrees of free-<br />
setting up X<br />
dom method, OUTP= outputs predicted values to<br />
a data set, INFLUENCE computes influence diagnostics<br />
RANDOM specifies random effects, SUBJECT= creates block-diagonality, TYPE=<br />
setting up Z and G specifies covariance structure, S requests solution<br />
for random-effects parameters, G displays estimated<br />
G<br />
REPEATED sets up R SUBJECT= creates block-diagonality, TYPE=<br />
specifies covariance structure, R displays estimated<br />
blocks of R, GROUP= enables betweensubject<br />
heterogeneity, LOCAL adds a diagonal<br />
matrix to R<br />
PARMS specifies a grid of initial HOLD= and NOITER hold the covariance pa-<br />
values for the covariance rameters or their ratios constant, PARMSDATA=<br />
parameters<br />
reads the initial values from a <strong>SAS</strong> data set<br />
PRIOR performs a sampling- NSAMPLE= specifies the sample size, SEED=<br />
based Bayesian analysis<br />
for variance component<br />
models<br />
specifies the starting seed<br />
CONTRAST constructs custom hy- E displays the L matrix coefficients<br />
ESTIMATE<br />
pothesis tests<br />
constructs custom scalar<br />
estimates<br />
CL produces confidence limits<br />
LSMEANS computes least squares DIFF computes differences of the least squares<br />
means for classification means, ADJUST= performs multiple compar-<br />
fixed effects<br />
isons adjustments, AT changes covariates, OM<br />
changes weighting, CL produces confidence limits,<br />
SLICE= tests simple effects
3898 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.1 continued<br />
Statement Description Important Options<br />
WEIGHT specifies a variable by<br />
which to weight R<br />
PROC <strong>MIXED</strong> Statement<br />
PROC <strong>MIXED</strong> < options > ;<br />
none<br />
<strong>The</strong> PROC <strong>MIXED</strong> statement invokes the procedure. Table 56.2 summarizes important options in<br />
the PROC <strong>MIXED</strong> statement by function. <strong>The</strong>se and other options in the PROC <strong>MIXED</strong> statement<br />
are then described fully in alphabetical order.<br />
Table 56.2 PROC <strong>MIXED</strong> Statement Options<br />
Option Description<br />
Basic Options<br />
DATA= specifies input data set<br />
METHOD= specifies the estimation method<br />
NOPROFILE includes scale parameter in optimization<br />
ORDER= determines the sort order of CLASS variables<br />
Displayed Output<br />
ASYCORR displays asymptotic correlation matrix of covariance parameter estimates<br />
ASYCOV displays asymptotic covariance matrix of covariance parameter estimates<br />
CL requests confidence limits for covariance parameter estimates<br />
COVTEST displays asymptotic standard errors and Wald tests for covariance<br />
parameters<br />
IC displays a table of information criteria<br />
ITDETAILS displays estimates and gradients added to “Iteration History”<br />
LOGNOTE writes periodic status notes to the log<br />
MMEQ displays mixed model equations<br />
MMEQSOL displays the solution to the mixed model equations<br />
NOCLPRINT suppresses “Class Level Information” completely or in parts<br />
NOITPRINT suppresses “Iteration History” table<br />
PLOTS produces ODS statistical graphics<br />
RATIO produces ratio of covariance parameter estimates with residual<br />
variance<br />
Optimization Options<br />
MAXFUNC= specifies the maximum number of likelihood evaluations<br />
MAXITER= specifies the maximum number of iterations
Table 56.2 continued<br />
Option Description<br />
PROC <strong>MIXED</strong> Statement ✦ 3899<br />
Computational Options<br />
CONVF requests and tunes the relative function convergence criterion<br />
CONVG requests and tunes the relative gradient convergence criterion<br />
CONVH requests and tunes the relative Hessian convergence criterion<br />
DFBW selects between-within degree of freedom method<br />
EMPIRICAL computes empirical (“sandwich”) estimators<br />
NOBOUND unbounds covariance parameter estimates<br />
RIDGE= specifies starting value for minimum ridge value<br />
SCORING= applies Fisher scoring where applicable<br />
You can specify the following options.<br />
ABSOLUTE<br />
makes the convergence criterion absolute. By default, it is relative (divided by the current<br />
objective function value). See the CONVF, CONVG, and CONVH options in this section for<br />
a description of various convergence criteria.<br />
ALPHA=number<br />
requests that confidence limits be constructed for the covariance parameter estimates with<br />
confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is<br />
0.05.<br />
ANOVAF<br />
<strong>The</strong> ANOVAF option computes F tests in models with REPEATED statement and without<br />
RANDOM statement by a method similar to that of Brunner, Domhof, and Langer (2002).<br />
<strong>The</strong> method consists of computing special F statistics and adjusting their degrees of freedom.<br />
<strong>The</strong> technique is a generalization of the Greenhouse-Geiser adjustment in MANOVA models<br />
(Greenhouse and Geiser 1959). For more details, see the section “F Tests With the ANOVAF<br />
Option” on page 3973.<br />
ASYCORR<br />
produces the asymptotic correlation matrix of the covariance parameter estimates. It is<br />
computed from the corresponding asymptotic covariance matrix (see the description of the<br />
ASYCOV option, which follows). For ODS purposes, the name of the “Asymptotic Correlation”<br />
table is “AsyCorr.”<br />
ASYCOV<br />
requests that the asymptotic covariance matrix of the covariance parameters be displayed. By<br />
default, this matrix is the observed inverse Fisher information matrix, which equals 2H 1 ,<br />
where H is the Hessian (second derivative) matrix of the objective function. See the section<br />
“Covariance Parameter Estimates” on page 3991 for more information about this matrix.<br />
When you use the SCORING= option and PROC <strong>MIXED</strong> converges without stopping the<br />
scoring algorithm, PROC <strong>MIXED</strong> uses the expected Hessian matrix to compute the covariance<br />
matrix instead of the observed Hessian. For ODS purposes, the name of the “Asymptotic<br />
Covariance” table is “AsyCov.”
3900 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
CL< =WALD ><br />
requests confidence limits for the covariance parameter estimates. A Satterthwaite approximation<br />
is used to construct limits for all parameters that have a lower boundary constraint of<br />
zero. <strong>The</strong>se limits take the form<br />
b2 2<br />
;1 ˛=2<br />
2<br />
b 2<br />
2<br />
;˛=2<br />
where D 2Z 2 , Z is the Wald statistic b 2 =se.b 2 /, and the denominators are quantiles of<br />
the 2 -distribution with degrees of freedom. See Milliken and Johnson (1992) and Burdick<br />
and Graybill (1992) for similar techniques.<br />
For all other parameters, Wald Z-scores and normal quantiles are used to construct the limits.<br />
Wald limits are also provided for variance components if you specify the NOBOUND option.<br />
<strong>The</strong> optional =WALD specification requests Wald limits for all parameters.<br />
<strong>The</strong> confidence limits are displayed as extra columns in the “Covariance Parameter Estimates”<br />
table. <strong>The</strong> confidence level is 1 ˛ D 0:95 by default; this can be changed with the ALPHA=<br />
option.<br />
CONVF< =number ><br />
requests the relative function convergence criterion with tolerance number. <strong>The</strong> relative function<br />
convergence criterion is<br />
jf k f k 1j<br />
jf kj<br />
number<br />
where f k is the value of the objective function at iteration k. To prevent the division by jf kj,<br />
use the ABSOLUTE option. <strong>The</strong> default convergence criterion is CONVH, and the default<br />
tolerance is 1E 8.<br />
CONVG < =number ><br />
requests the relative gradient convergence criterion with tolerance number. <strong>The</strong> relative gradient<br />
convergence criterion is<br />
maxj jg jkj<br />
jf kj<br />
number<br />
where f k is the value of the objective function, and g jk is the j th element of the gradient<br />
(first derivative) of the objective function, both at iteration k. To prevent division by jf kj,<br />
use the ABSOLUTE option. <strong>The</strong> default convergence criterion is CONVH, and the default<br />
tolerance is 1E 8.<br />
CONVH< =number ><br />
requests the relative Hessian convergence criterion with tolerance number. <strong>The</strong> relative Hessian<br />
convergence criterion is<br />
g k 0 H 1<br />
k g k<br />
jf kj<br />
number<br />
where f k is the value of the objective function, g k is the gradient (first derivative) of the<br />
objective function, and H k is the Hessian (second derivative) of the objective function, all at<br />
iteration k.
If H k is singular, then PROC <strong>MIXED</strong> uses the following relative criterion:<br />
g 0<br />
k g k<br />
jf kj<br />
number<br />
PROC <strong>MIXED</strong> Statement ✦ 3901<br />
To prevent the division by jf kj, use the ABSOLUTE option. <strong>The</strong> default convergence criterion<br />
is CONVH, and the default tolerance is 1E 8.<br />
COVTEST<br />
produces asymptotic standard errors and Wald Z-tests for the covariance parameter estimates.<br />
DATA=<strong>SAS</strong>-data-set<br />
names the <strong>SAS</strong> data set to be used by PROC <strong>MIXED</strong>. <strong>The</strong> default is the most recently created<br />
data set.<br />
DFBW<br />
has the same effect as the DDFM=BW option in the MODEL statement.<br />
EMPIRICAL<br />
computes the estimated variance-covariance matrix of the fixed-effects parameters by using<br />
the asymptotically consistent estimator described in Huber (1967), White (1980), Liang and<br />
Zeger (1986), and Diggle, Liang, and Zeger (1994). This estimator is commonly referred to<br />
as the “sandwich” estimator, and it is computed as follows:<br />
IC<br />
.X 0bV 1 X/<br />
SX<br />
iD1<br />
X 0 i cVi 1 bi bi 0cVi 1 Xi<br />
!<br />
.X 0bV 1 X/<br />
Here, bi D yi Xi bˇ, S is the number of subjects, and matrices with an i subscript are<br />
those for the ith subject. You must include the SUBJECT= option in either a RANDOM or<br />
REPEATED statement for this option to take effect.<br />
When you specify the EMPIRICAL option, PROC <strong>MIXED</strong> adjusts all standard errors and test<br />
statistics involving the fixed-effects parameters. This changes output in the following tables<br />
(listed in Table 56.22): Contrast, CorrB, CovB, Diffs, Estimates, InvCovB, LSMeans, Slices,<br />
SolutionF, Tests1–Tests3. <strong>The</strong> OUTP= and OUTPM= data sets are also affected. Finally,<br />
the Satterthwaite and Kenward-Roger degrees of freedom methods are not available if you<br />
specify the EMPIRICAL option.<br />
displays a table of various information criteria. <strong>The</strong> criteria are all in smaller-is-better form,<br />
and are described in Table 56.3.<br />
Table 56.3 Information Criteria<br />
Criterion Formula Reference<br />
AIC 2` C 2d Akaike (1974)<br />
AICC 2` C 2dn =.n d 1/ Hurvich and Tsai (1989)<br />
Burnham and Anderson (1998)<br />
HQIC 2` C 2d log log n Hannan and Quinn (1979)<br />
BIC 2` C d log n Schwarz (1978)<br />
CAIC 2` C d.log n C 1/ Bozdogan (1987)
3902 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
INFO<br />
Here ` denotes the maximum value of the (possibly restricted) log likelihood, d the dimension<br />
of the model, and n the number of observations. In <strong>SAS</strong> 6 of <strong>SAS</strong>/<strong>STAT</strong> software, n equals<br />
the number of valid observations for maximum likelihood estimation and n p for restricted<br />
maximum likelihood estimation, where p equals the rank of X. In later versions, n equals the<br />
number of effective subjects as displayed in the “Dimensions” table, unless this value equals<br />
1, in which case n equals the number of levels of the first random effect you specify in a<br />
RANDOM statement. If the number of effective subjects equals 1 and you have no RANDOM<br />
statements, then n reverts to the <strong>SAS</strong> 6 values. For AICC (a finite-sample corrected version<br />
of AIC), n equals the <strong>SAS</strong> 6 values of n, unless this number is less than d C 2, in which<br />
case it equals d C 2.<br />
For restricted likelihood estimation, d equals q, the effective number of estimated covariance<br />
parameters. In <strong>SAS</strong> 6, when a parameter estimate lies on a boundary constraint, then it is still<br />
included in the calculation of d, but in later versions it is not. <strong>The</strong> most common example<br />
of this behavior is when a variance component is estimated to equal zero. For maximum<br />
likelihood estimation, d equals q C p.<br />
For ODS purposes, the name of the “Information Criteria” table is “InfoCrit.”<br />
is a default option. <strong>The</strong> creation of the “Model Information,” “Dimensions,” and “Number of<br />
Observations” tables can be suppressed by using the NOINFO option.<br />
Note that in <strong>SAS</strong> 6 this option displays the “Model Information” and “Dimensions” tables.<br />
ITDETAILS<br />
displays the parameter values at each iteration and enables the writing of notes to the <strong>SAS</strong> log<br />
pertaining to “infinite likelihood” and “singularities” during Newton-Raphson iterations.<br />
LOGNOTE<br />
writes periodic notes to the log describing the current status of computations. It is designed<br />
for use with analyses requiring extensive CPU resources.<br />
MAXFUNC=number<br />
specifies the maximum number of likelihood evaluations in the optimization process. <strong>The</strong><br />
default is 150.<br />
MAXITER=number<br />
specifies the maximum number of iterations. <strong>The</strong> default is 50.<br />
METHOD=REML<br />
METHOD=ML<br />
METHOD=MIVQUE0<br />
METHOD=TYPE1<br />
METHOD=TYPE2<br />
METHOD=TYPE3<br />
specifies the estimation method for the covariance parameters. <strong>The</strong> REML specification performs<br />
residual (restricted) maximum likelihood, and it is the default method. <strong>The</strong> ML specification<br />
performs maximum likelihood, and the MIVQUE0 specification performs minimum<br />
variance quadratic unbiased estimation of the covariance parameters.
MMEQ<br />
PROC <strong>MIXED</strong> Statement ✦ 3903<br />
<strong>The</strong> METHOD=TYPEn specifications apply only to variance component models with no<br />
SUBJECT= effects and no REPEATED statement. An analysis of variance table is included<br />
in the output, and the expected mean squares are used to estimate the variance components<br />
(see Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>,” for further explanation). <strong>The</strong> resulting method-ofmoment<br />
variance component estimates are used in subsequent calculations, including standard<br />
errors computed from ESTIMATE and LSMEANS statements. For ODS purposes, the<br />
new table names are “Type1,” “Type2,” and “Type3,” respectively.<br />
requests that coefficients of the mixed model equations be displayed. <strong>The</strong>se are<br />
"<br />
X 0bR 1X X 0bR 1Z Z 0bR 1X Z 0bR 1Z C bG 1<br />
# "<br />
X<br />
;<br />
0bR 1y Z 0bR 1 #<br />
y<br />
assuming that bG is nonsingular. If bG is singular, PROC <strong>MIXED</strong> produces the following<br />
coefficients:<br />
"<br />
X 0bR 1 X X 0bR 1 ZbG<br />
bGZ 0bR 1 X bGZ 0bR 1 ZbG C bG<br />
#<br />
;<br />
"<br />
X 0bR 1 y<br />
bGZ 0bR 1 y<br />
See the section “Estimating Fixed and Random Effects in the Mixed Model” on page 3970<br />
for further information about these equations.<br />
MMEQSOL<br />
requests that a solution to the mixed model equations be produced, as well as the inverted<br />
coefficients matrix. Formulas for these equations are provided in the preceding description of<br />
the MMEQ option.<br />
When bG is singular, b and a generalized inverse of the left-hand-side coefficient matrix are<br />
transformed by using bG to produce b and bC, respectively, where bC is a generalized inverse<br />
of the left-hand-side coefficient matrix of the original equations.<br />
NAMELEN< =number ><br />
specifies the length to which long effect names are shortened. <strong>The</strong> default and minimum value<br />
is 20.<br />
NOBOUND<br />
has the same effect as the NOBOUND option in the PARMS statement.<br />
NOCLPRINT< =number ><br />
suppresses the display of the “Class Level Information” table if you do not specify number.<br />
If you do specify number, only levels with totals that are less than number are listed in the<br />
table.<br />
NOINFO<br />
suppresses the display of the “Model Information,” “Dimensions,” and “Number of Observations”<br />
tables.<br />
NOITPRINT<br />
suppresses the display of the “Iteration History” table.<br />
#
3904 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
NOPROFILE<br />
includes the residual variance as part of the Newton-Raphson iterations. This option applies<br />
only to models that have a residual variance parameter. By default, this parameter is profiled<br />
out of the likelihood calculations, except when you have specified the HOLD= option in the<br />
PARMS statement.<br />
ORD<br />
ORDER=DATA<br />
displays ordinates of the relevant distribution in addition to p-values. <strong>The</strong> ordinate can be<br />
viewed as an approximate odds ratio of hypothesis probabilities.<br />
ORDER=FORMATTED<br />
ORDER=FREQ<br />
ORDER=INTERNAL<br />
specifies the sorting order for the levels of all CLASS variables. This ordering determines<br />
which parameters in the model correspond to each level in the data, so the ORDER= option<br />
can be useful when you use CONTRAST or ESTIMATE statements.<br />
<strong>The</strong> default is ORDER=FORMATTED, and its behavior has been modified for <strong>SAS</strong> 8. When<br />
the default ORDER=FORMATTED is in effect for numeric variables for which you have supplied<br />
no explicit format, the levels are ordered by their internal values. In releases previous to<br />
<strong>SAS</strong> 8, numeric class levels with no explicit format were ordered by their BEST12. formatted<br />
values. In order to revert to the previous method you can specify this format explicitly for<br />
the CLASS variables. <strong>The</strong> change was implemented because the former default behavior for<br />
ORDER=FORMATTED often resulted in levels not being ordered numerically and required<br />
you to use an explicit format or ORDER=INTERNAL to get the more natural ordering.<br />
Table 56.4 shows how PROC <strong>MIXED</strong> interprets values of the ORDER= option.<br />
Table 56.4 Sort Order and Value of ORDER= Option<br />
Value of ORDER= Levels Sorted By<br />
DATA order of appearance in the input data set<br />
FORMATTED external formatted value, except for numeric variables<br />
with no explicit format, which are sorted by their unformatted<br />
(internal) value<br />
FREQ descending frequency count; levels with the most observations<br />
come first in the order<br />
INTERNAL unformatted value<br />
For FORMATTED and INTERNAL, the sort order is machine dependent.<br />
For more information about sort order, see the chapter on the SORT procedure in the <strong>SAS</strong><br />
<strong>Procedure</strong>s <strong>Guide</strong> and the discussion of BY-group processing in <strong>SAS</strong> Language Reference:<br />
Concepts.
PLOTS < (global-plot-options ) > < =plot-request < (options ) > ><br />
PROC <strong>MIXED</strong> Statement ✦ 3905<br />
PLOTS < (global-plot-options ) > < = (plot-request< (options) >< . . . plot-request< (options) > >) ><br />
requests that the <strong>MIXED</strong> procedure produce statistical graphics via the Output Delivery System,<br />
provided that the ODS GRAPHICS statement has been specified. For general information<br />
about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For examples<br />
of the basic statistical graphics produced by the <strong>MIXED</strong> procedure and aspects of their computation<br />
and interpretation, see the section “ODS Graphics” on page 3998.<br />
<strong>The</strong> global-plot-options apply to all relevant plots generated by the <strong>MIXED</strong> procedure. <strong>The</strong><br />
global-plot-options supported by the <strong>MIXED</strong> procedure follow.<br />
Global Plot Options<br />
OBSNO<br />
uses the data set observation number to identify observations in tooltips, provided that<br />
the observation number can be determined. Otherwise, the number displayed in tooltips<br />
is the index of the observation as it is used in the analysis within the BY group.<br />
ONLY<br />
suppresses the default plots. Only the plots specifically requested are produced.<br />
UNPACK<br />
breaks a graphic that is otherwise paneled into individual component plots.<br />
ALL<br />
Specific Plot Options<br />
<strong>The</strong> following listing describes the specific plots and their options.<br />
requests that all plots appropriate for the particular analysis be produced.<br />
BOXPLOT < (boxplot-options) ><br />
requests box plots for the effects in your model that consist of classification effects only.<br />
Note that these effects can involve more than one classification variable (interaction<br />
and nested effects), but they cannot contain any continuous variables. By default, the<br />
BOXPLOT request produces box plots based on (conditional) raw residuals for the<br />
qualifying effects in the MODEL, RANDOM, and REPEATED statements. See the<br />
discussion of the boxplot-options in a later section for information about how to tune<br />
your box plot request.<br />
DISTANCE< (USEINDEX) ><br />
requests a plot of the likelihood or restricted likelihood distance. When influence diagnostics<br />
are requested with set selection according to an effect, the USEINDEX option<br />
enables you to replace the formatted tick values on the horizontal axis with integer indices<br />
of the effect levels in order to reduce the space taken up by the horizontal plot<br />
axis.
3906 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
INFLUENCEESTPLOT< (options) ><br />
requests panels of the deletiob estimates in an influence analysis, provided that the<br />
INFLUENCE option is specified in the MODEL statement. No plots are produced for<br />
fixed-effects parameters associated with singular columns in the X matrix or for covariance<br />
parameters associated with singularities in the ASYCOV matrix. By default,<br />
separate panels are produced for the fixed-effects and covariance parameters delete estimates.<br />
<strong>The</strong> FIXED and RANDOM options enable you to select these specific panels.<br />
<strong>The</strong> UNPACK option produces separate plots for each of the parameter estimates. <strong>The</strong><br />
USEINDEX option replaces formatted tick values for the horizontal axis with integer<br />
indices.<br />
INFLUENCE<strong>STAT</strong>PANEL< (options) ><br />
requests panels of influence statistics. For iterative influence analysis (see the<br />
INFLUENCE option in the MODEL statement), the panel shows the Cook’s D and<br />
CovRatio statistics for fixed-effects and covariance parameters, enabling you to gauge<br />
impact on estimates and precision for both types of estimates. In noniterative analysis,<br />
only statistics for the fixed effects are plotted. <strong>The</strong> UNPACK option produces separate<br />
plots from the elements in the panel. <strong>The</strong> USEINDEX option replaces formatted tick<br />
values for the horizontal axis with integer indices.<br />
RESIDUALPANEL < (residualplot-options) ><br />
requests a panel of raw residuals. By default, the conditional residuals are produced.<br />
See the discussion of residualplot-options in a later section for information about how<br />
to tune this panel.<br />
STUDENTPANEL < (residualplot-options) ><br />
requests a panel of studentized residuals. By default, the conditional residuals are produced.<br />
See the discussion of residualplot-options in a later section for information<br />
about how to tune this panel.<br />
PEARSONPANEL < (residualplot-options) ><br />
requests a panel of Pearson residuals. By default, the conditional residuals are produced.<br />
See the discussion of residualplot-options in a later section for information<br />
about how to tune this panel.<br />
PRESS< (USEINDEX) ><br />
requests a plot of PRESS residuals or PRESS statistics. <strong>The</strong>se are based on “leave-oneout”<br />
or “leave-set-out” prediction of the marginal mean. When influence diagnostics<br />
are requested with set selection according to an effect, the USEINDEX option enables<br />
you to replace the formatted tick values on the horizontal axis with integer indices of<br />
the effect levels in order to reduce the space taken up by the horizontal plot axis.<br />
VCIRYPANEL < (residualplot-options) ><br />
requests a panel of residual graphics based on the scaled residuals. See the VCIRY<br />
option in the MODEL statement for details about these scaled residuals. Only the<br />
UNPACK and BOX options of the residualplot-options are available for this type of<br />
residual panel.<br />
NONE<br />
suppresses all plots.
Residual Plot Options<br />
PROC <strong>MIXED</strong> Statement ✦ 3907<br />
<strong>The</strong> residualplot-options determine both the composition of the panels and the type of<br />
residuals being plotted.<br />
BOX<br />
BOXPLOT<br />
replaces the inset of summary statistics in the lower-right corner of the panel with<br />
a box plot of the residual (the “PROC GLIMMIX look”).<br />
CONDITIONAL<br />
BLUP<br />
MARGINAL<br />
constructs plots from conditional residuals.<br />
NOBLUP<br />
constructs plots from marginal residuals.<br />
UNPACK<br />
produces separate plots from the elements of the panel. <strong>The</strong> inset statistics are<br />
not part of the unpack operation.<br />
Box Plot Options<br />
<strong>The</strong> boxplot-options determine whether box plots are produced for residuals or for<br />
residuals and observed values, and for which model effects the box plots are constructed.<br />
<strong>The</strong> available boxplot-options are as follows.<br />
CONDITIONAL<br />
BLUP<br />
FIXED<br />
constructs box plots from conditional residuals—that is, residuals using the estimated<br />
BLUPs of random effects.<br />
produces box plots for all fixed effects (MODEL statement) consisting entirely<br />
of classification variables<br />
GROUP<br />
produces box plots for all GROUP= effects (RANDOM and REPEATED statement)<br />
consisting entirely of classification variables<br />
MARGINAL<br />
NOBLUP<br />
constructs box plots from marginal residuals.
3908 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
NPANEL=number<br />
provides the ability to break a box plot into multiple graphics. If number is<br />
negative, no balancing of the number of boxes takes place and number is the<br />
maximum number of boxes per graphic. If number is positive, the number of<br />
boxes per graphic is balanced. For example, suppose variable A has 125 levels,<br />
and consider the following statements:<br />
ods graphics on;<br />
proc mixed plots=boxplot(npanel=20);<br />
class A;<br />
model y = A;<br />
run;<br />
<strong>The</strong> box balancing results in six plots with 18 boxes each and one plot with<br />
17 boxes. If number is zero, and this is the default, all levels of the effect are<br />
displayed in a single plot.<br />
OBSERVED<br />
adds box plots of the observed data for the selected effects.<br />
RANDOM<br />
produces box plots for all random effects (RANDOM statement) consisting entirely<br />
of classification variables. This does not include effects specified in the<br />
GROUP= or SUBJECT= options of the RANDOM statement.<br />
REPEATED<br />
produces box plots for the repeated effects (REPEATED statement). This does<br />
not include effects specified in the GROUP= or SUBJECT= options of the<br />
REPEATED statement.<br />
STUDENT<br />
constructs box plots from studentized residuals rather than from raw residuals.<br />
SUBJECT<br />
produces box plots for all SUBJECT= effects (RANDOM and REPEATED statement)<br />
consisting entirely of classification variables.<br />
USEINDEX<br />
uses as the horizontal axis label the index of the effect level rather than the formatted<br />
value(s). For classification variables with many levels or model effects<br />
that involve multiple classification variables, the formatted values identifying the<br />
effect levels can take up too much space as axis tick values, leading to extensive<br />
thinning. <strong>The</strong> USEINDEX option replaces tick values constructed from formatted<br />
values with the internal level number.
RATIO<br />
Multiple Plot Request<br />
PROC <strong>MIXED</strong> Statement ✦ 3909<br />
You can list a plot request one or more times with different options. For example, the following<br />
statements request a panel of marginal raw residuals, individual plots generated from a<br />
panel of the conditional raw residuals, and a panel of marginal studentized residuals:<br />
ods graphics on;<br />
proc mixed plots(only)=(<br />
ResidualPanel(marginal)<br />
ResidualPanel(unpack conditional)<br />
StudentPanel(marginal box));<br />
<strong>The</strong> inset of residual statistics is replaced in this last panel by a box plot of the studentized<br />
residuals. Similarly, if you specify the INFLUENCE option in the MODEL statement, then<br />
the following statements request statistical graphics of fixed-effects deletion estimates (in a<br />
panel), covariance parameter deletion estimates (unpacked in individual plots), and box plots<br />
for the SUBJECT= and fixed classification effects based on residuals and observed values:<br />
ods graphics on / imagefmt=staticmap;<br />
proc mixed plots(only)=(<br />
InfluenceEstPlot(fixed)<br />
InfluenceEstPlot(random unpack)<br />
BoxPlot(observed fixed subject);<br />
<strong>The</strong> <strong>STAT</strong>ICMAP image format enables tooltips that show, for example, values of influence<br />
diagnostics associated with a particular delete estimate.<br />
This concludes the syntax section for the PLOTS= option in the PROC <strong>MIXED</strong> statement.<br />
produces the ratio of the covariance parameter estimates to the estimate of the residual variance<br />
when the latter exists in the model.<br />
RIDGE=number<br />
specifies the starting value for the minimum ridge value used in the Newton-Raphson algorithm.<br />
<strong>The</strong> default is 0.3125.<br />
SCORING< =number ><br />
requests that Fisher scoring be used in association with the estimation method up to iteration<br />
number, which is 0 by default. When you use the SCORING= option and PROC <strong>MIXED</strong><br />
converges without stopping the scoring algorithm, PROC <strong>MIXED</strong> uses the expected Hessian<br />
matrix to compute approximate standard errors for the covariance parameters instead of<br />
the observed Hessian. <strong>The</strong> output from the ASYCOV and ASYCORR options is similarly<br />
adjusted.<br />
SIGITER<br />
is an alias for the NOPROFILE option.<br />
UPDATE<br />
is an alias for the LOGNOTE option.
3910 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
BY Statement<br />
BY variables ;<br />
You can specify a BY statement with PROC <strong>MIXED</strong> to obtain separate analyses on observations in<br />
groups defined by the BY variables. When a BY statement appears, the procedure expects the input<br />
data set to be sorted in order of the BY variables. <strong>The</strong> variables are one or more variables in the<br />
input data set.<br />
If your input data set is not sorted in ascending order, use one of the following alternatives:<br />
Sort the data by using the SORT procedure with a similar BY statement.<br />
Specify the BY statement options NOTSORTED or DESCENDING in the BY statement for<br />
the <strong>MIXED</strong> procedure. <strong>The</strong> NOTSORTED option does not mean that the data are unsorted<br />
but rather that the data are arranged in groups (according to values of the BY variables) and<br />
that these groups are not necessarily in alphabetical or increasing numeric order.<br />
Create an index on the BY variables by using the DATASETS procedure (in Base <strong>SAS</strong> software).<br />
Because sorting the data changes the order in which PROC <strong>MIXED</strong> reads observations, the sorting<br />
order for the levels of the CLASS variable might be affected if you have specified ORDER=DATA in<br />
the PROC <strong>MIXED</strong> statement. This, in turn, affects specifications in the CONTRAST or ESTIMATE<br />
statement.<br />
For more information about the BY statement, see <strong>SAS</strong> Language Reference: Concepts. For more<br />
information about the DATASETS procedure, see the Base <strong>SAS</strong> <strong>Procedure</strong>s <strong>Guide</strong>.<br />
CLASS Statement<br />
CLASS variables ;<br />
<strong>The</strong> CLASS statement names the classification variables to be used in the analysis. If the CLASS<br />
statement is used, it must appear before the MODEL statement.<br />
Classification variables can be either character or numeric. By default, class levels are determined<br />
from the entire formatted values of the CLASS variables. Note that this represents a slight change<br />
from previous releases in the way in which class levels are determined. In releases prior to <strong>SAS</strong> ® 9,<br />
class levels were determined by using no more than the first 16 characters of the formatted values.<br />
If you want to revert to this previous behavior, you can use the TRUNCATE option in the CLASS<br />
statement. In any case, you can use formats to group values into levels. See the discussion of<br />
the FORMAT procedure in the Base <strong>SAS</strong> <strong>Procedure</strong>s <strong>Guide</strong> and the discussions of the FORMAT<br />
statement and <strong>SAS</strong> formats in <strong>SAS</strong> Language Reference: Dictionary. You can adjust the order of<br />
CLASS variable levels with the ORDER= option in the PROC <strong>MIXED</strong> statement.<br />
You can specify the following option in the CLASS statement after a slash (/):
CONTRAST Statement ✦ 3911<br />
TRUNCATE<br />
specifies that class levels should be determined by using no more than the first 16 characters<br />
of the formatted values of CLASS variables. When formatted values are longer than 16<br />
characters, you can use this option in order to revert to the levels as determined in releases<br />
previous to <strong>SAS</strong> ® 9.<br />
CONTRAST Statement<br />
CONTRAST ’label’ < fixed-effect values . . . ><br />
< | random-effect values . . . >, . . . < / options > ;<br />
<strong>The</strong> CONTRAST statement provides a mechanism for obtaining custom hypothesis tests. It is<br />
patterned after the CONTRAST statement in PROC GLM, although it has been extended to include<br />
random effects. This enables you to select an appropriate inference space (McLean, Sanders, and<br />
Stroup 1991).<br />
You can test the hypothesis L 0 D 0, where L 0 D .K 0 M 0 / and 0 D .ˇ 0 0 /, in several inference<br />
spaces. <strong>The</strong> inference space corresponds to the choice of M. When M D 0, your inferences apply<br />
to the entire population from which the random effects are sampled; this is known as the broad<br />
inference space. When all elements of M are nonzero, your inferences apply only to the observed<br />
levels of the random effects. This is known as the narrow inference space, and you can also choose<br />
it by specifying all of the random effects as fixed. <strong>The</strong> GLM procedure uses the narrow inference<br />
space. Finally, by setting to zero the portions of M corresponding to selected main effects and<br />
interactions, you can choose intermediate inference spaces. <strong>The</strong> broad inference space is usually<br />
the most appropriate, and it is used when you do not specify any random effects in the CONTRAST<br />
statement.<br />
<strong>The</strong> CONTRAST statement has the following arguments:<br />
label identifies the contrast in the table. A label is required for every contrast specified.<br />
Labels can be up to 200 characters and must be enclosed in quotes.<br />
fixed-effect identifies an effect that appears in the MODEL statement. <strong>The</strong> keyword INTER-<br />
CEPT can be used as an effect when an intercept is fitted in the model. You do<br />
not need to include all effects that are in the MODEL statement.<br />
random-effect identifies an effect that appears in the RANDOM statement. <strong>The</strong> first random<br />
effect must follow a vertical bar (|); however, random effects do not have to be<br />
specified.<br />
values are constants that are elements of the L matrix associated with the fixed and<br />
random effects.<br />
<strong>The</strong> rows of L 0 are specified in order and are separated by commas. <strong>The</strong> rows of the K 0 component<br />
of L 0 are specified on the left side of the vertical bars (|). <strong>The</strong>se rows test the fixed effects and are,<br />
therefore, checked for estimability. <strong>The</strong> rows of the M 0 component of L 0 are specified on the right<br />
side of the vertical bars. <strong>The</strong>y test the random effects, and no estimability checking is necessary.
3912 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
If PROC <strong>MIXED</strong> finds the fixed-effects portion of the specified contrast to be nonestimable (see the<br />
SINGULAR= option), then it displays a message in the log.<br />
<strong>The</strong> following CONTRAST statement reproduces the F test for the effect A in the split-plot example<br />
(see Example 56.1):<br />
contrast ’A broad’<br />
A 1 -1 0 A*B .5 .5 -.5 -.5 0 0 ,<br />
A 1 0 -1 A*B .5 .5 0 0 -.5 -.5 / df=6;<br />
Note that no random effects are specified in the preceding contrast; thus, the inference space is<br />
broad. <strong>The</strong> resulting F test has two numerator degrees of freedom because L 0 has two rows. <strong>The</strong><br />
denominator degrees of freedom is, by default, the residual degrees of freedom (9), but the DF=<br />
option changes the denominator degrees of freedom to 6.<br />
<strong>The</strong> following CONTRAST statement reproduces the F test for A when Block and A*Block are considered<br />
fixed effects (the narrow inference space):<br />
contrast ’A narrow’<br />
A 1 -1 0<br />
A*B .5 .5 -.5 -.5 0 0 |<br />
A*Block .25 .25 .25 .25<br />
-.25 -.25 -.25 -.25<br />
0 0 0 0 ,<br />
A 1 0 -1<br />
A*B .5 .5 0 0 -.5 -.5 |<br />
A*Block .25 .25 .25 .25<br />
0 0 0 0<br />
-.25 -.25 -.25 -.25 ;<br />
<strong>The</strong> preceding contrast does not contain coefficients for B and Block, because they cancel out in<br />
estimated differences between levels of A. Coefficients for B and Block are necessary to estimate the<br />
mean of one of the levels of A in the narrow inference space (see Example 56.1).<br />
If the elements of L are not specified for an effect that contains a specified effect, then the elements<br />
of the specified effect are automatically “filled in” over the levels of the higher-order effect. This<br />
feature is designed to preserve estimability for cases where there are complex higher-order effects.<br />
<strong>The</strong> coefficients for the higher-order effect are determined by equitably distributing the coefficients<br />
of the lower-level effect, as in the construction of least squares means. In addition, if the intercept<br />
is specified, it is distributed over all classification effects that are not contained by any other<br />
specified effect. If an effect is not specified and does not contain any specified effects, then all of<br />
its coefficients in L are set to 0. You can override this behavior by specifying coefficients for the<br />
higher-order effect.<br />
If too many values are specified for an effect, the extra ones are ignored; if too few are specified,<br />
the remaining ones are set to 0. If no random effects are specified, the vertical bar can be omitted;<br />
otherwise, it must be present. If a SUBJECT effect is used in the RANDOM statement, then the<br />
coefficients specified for the effects in the RANDOM statement are equitably distributed across the<br />
levels of the SUBJECT effect. You can use the E option to see exactly which L matrix is used.<br />
<strong>The</strong> SUBJECT and GROUP options in the CONTRAST statement are useful for the case when a<br />
SUBJECT= or GROUP= variable appears in the RANDOM statement, and you want to contrast
CONTRAST Statement ✦ 3913<br />
different subjects or groups. By default, CONTRAST statement coefficients on random effects are<br />
distributed equally across subjects and groups.<br />
PROC <strong>MIXED</strong> handles missing level combinations of classification variables similarly to the way<br />
PROC GLM does. Both procedures delete fixed-effects parameters corresponding to missing levels<br />
in order to preserve estimability. However, PROC <strong>MIXED</strong> does not delete missing level combinations<br />
for random-effects parameters because linear combinations of the random-effects parameters<br />
are always estimable. <strong>The</strong>se conventions can affect the way you specify your CONTRAST coefficients.<br />
<strong>The</strong> CONTRAST statement computes the statistic<br />
F D<br />
bˇ<br />
b<br />
0<br />
L.L 0bCL/ 1 L 0<br />
r<br />
bˇ<br />
b<br />
where r D rank.L 0bCL/, and approximates its distribution with an F distribution. In this expression,<br />
bC is an estimate of the generalized inverse of the coefficient matrix in the mixed model equations.<br />
See the section “Inference and Test Statistics” on page 3972 for more information about this F<br />
statistic.<br />
<strong>The</strong> numerator degrees of freedom in the F approximation are r D rank.L 0bCL/, and the denominator<br />
degrees of freedom are taken from the “Tests of Fixed Effects” table and corresponds to the final<br />
effect you list in the CONTRAST statement. You can change the denominator degrees of freedom<br />
by using the DF= option.<br />
You can specify the following options in the CONTRAST statement after a slash (/).<br />
CHISQ<br />
requests that chi-square tests be performed in addition to any F tests. A chi-square statistic<br />
equals its corresponding F statistic times the associate numerator degrees of freedom, and<br />
the same degrees of freedom are used to compute the p-value for the chi-square test. This<br />
p-value is always less than that for the F -test, as it effectively corresponds to an F test with<br />
infinite denominator degrees of freedom.<br />
DF=number<br />
specifies the denominator degrees of freedom for the F test. <strong>The</strong> default is the denominator<br />
degrees of freedom taken from the “Tests of Fixed Effects” table and corresponds to the final<br />
effect you list in the CONTRAST statement.<br />
E<br />
GROUP coeffs<br />
requests that the L matrix coefficients for the contrast be displayed. For ODS purposes, the<br />
label of this “L Matrix Coefficients” table is “Coef.”<br />
GRP coeffs<br />
sets up random-effect contrasts between different groups when a GROUP= variable appears in<br />
the RANDOM statement. By default, CONTRAST statement coefficients on random effects<br />
are distributed equally across groups.
3914 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
SINGULAR=number<br />
tunes the estimability checking. If v is a vector, define ABS(v) to be the absolute value of the<br />
element of v with the largest absolute value. If ABS(K 0 K 0 T) is greater than C*number for<br />
any row of K 0 in the contrast, then K is declared nonestimable. Here T is the Hermite form<br />
matrix .X 0 X/ X 0 X, and C is ABS(K 0 ) except when it equals 0, and then C is 1. <strong>The</strong> value for<br />
number must be between 0 and 1; the default is 1E 4.<br />
SUBJECT coeffs<br />
SUB coeffs<br />
sets up random-effect contrasts between different subjects when a SUBJECT= variable appears<br />
in the RANDOM statement. By default, CONTRAST statement coefficients on random<br />
effects are distributed equally across subjects.<br />
ESTIMATE Statement<br />
ESTIMATE ’label’ < fixed-effect values . . . ><br />
< | random-effect values . . . >< / options > ;<br />
<strong>The</strong> ESTIMATE statement is exactly like a CONTRAST statement, except only one-row L matrices<br />
are permitted. <strong>The</strong> actual estimate, L 0 bp, is displayed along with its approximate standard error. An<br />
approximate t test that L 0 bp = 0 is also produced.<br />
PROC <strong>MIXED</strong> selects the degrees of freedom to match those displayed in the “Tests of Fixed<br />
Effects” table for the final effect you list in the ESTIMATE statement. You can modify the degrees<br />
of freedom by using the DF= option.<br />
If PROC <strong>MIXED</strong> finds the fixed-effects portion of the specified estimate to be nonestimable, then<br />
it displays “Non-est” for the estimate entries.<br />
<strong>The</strong> following examples of ESTIMATE statements compute the mean of the first level of A in the<br />
split-plot example (see Example 56.1) for various inference spaces:<br />
estimate ’A1 mean narrow’ intercept 1<br />
A 1 B .5 .5 A*B .5 .5 |<br />
block .25 .25 .25 .25<br />
A*Block .25 .25 .25 .25<br />
0 0 0 0<br />
0 0 0 0;<br />
estimate ’A1 mean intermed’ intercept 1<br />
A 1 B .5 .5 A*B .5 .5 |<br />
Block .25 .25 .25 .25;<br />
estimate ’A1 mean broad’ intercept 1<br />
A 1 B .5 .5 A*B .5 .5;<br />
<strong>The</strong> construction of the L vector for an ESTIMATE statement follows the same rules as listed under<br />
the CONTRAST statement.<br />
You can specify the following options in the ESTIMATE statement after a slash (/).
ESTIMATE Statement ✦ 3915<br />
ALPHA=number<br />
requests that a t-type confidence interval be constructed with confidence level 1 number.<br />
<strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />
CL<br />
requests that t-type confidence limits be constructed. <strong>The</strong> confidence level is 0.95 by default;<br />
this can be changed with the ALPHA= option.<br />
DF=number<br />
specifies the degrees of freedom for the t test and confidence limits. <strong>The</strong> default is the denominator<br />
degrees of freedom taken from the “Tests of Fixed Effects” table and corresponds<br />
to the final effect you list in the ESTIMATE statement.<br />
DIVISOR=number<br />
specifies a value by which to divide all coefficients so that fractional coefficients can be entered<br />
as integer numerators.<br />
E<br />
GROUP coeffs<br />
requests that the L matrix coefficients be displayed. For ODS purposes, the name of this “L<br />
Matrix Coefficients” table is “Coef.”<br />
GRP coeffs<br />
sets up random-effect contrasts between different groups when a GROUP= variable appears<br />
in the RANDOM statement. By default, ESTIMATE statement coefficients on random effects<br />
are distributed equally across groups.<br />
LOWER<br />
LOWERTAILED<br />
requests that the p-value for the t test be based only on values less than the t statistic. A<br />
two-tailed test is the default. A lower-tailed confidence limit is also produced if you specify<br />
the CL option.<br />
SINGULAR=number<br />
tunes the estimability checking as documented for the SINGULAR= option in the<br />
CONTRAST statement.<br />
SUBJECT coeffs<br />
SUB coeffs<br />
sets up random-effect contrasts between different subjects when a SUBJECT= variable appears<br />
in the RANDOM statement. By default, ESTIMATE statement coefficients on random<br />
effects are distributed equally across subjects. For example, the ESTIMATE statement in the<br />
following code from Example 56.5 constructs the difference between the random slopes of<br />
the first two batches.<br />
proc mixed data=rc;<br />
class batch;<br />
model y = month / s;<br />
random int month / type=un sub=batch s;<br />
estimate ’slope b1 - slope b2’ | month 1 / subject 1 -1;<br />
run;
3916 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
UPPER<br />
UPPERTAILED<br />
requests that the p-value for the t test be based only on values greater than the t statistic. A<br />
two-tailed test is the default. An upper-tailed confidence limit is also produced if you specify<br />
the CL option.<br />
ID Statement<br />
ID variables ;<br />
<strong>The</strong> ID statement specifies which variables from the input data set are to be included in the OUTP=<br />
and OUTPM= data sets from the MODEL statement. If you do not specify an ID statement, then<br />
all variables are included in these data sets. Otherwise, only the variables you list in the ID statement<br />
are included. Specifying an ID statement with no variables prevents any variables from being<br />
included in these data sets.<br />
LSMEANS Statement<br />
LSMEANS fixed-effects < / options > ;<br />
<strong>The</strong> LSMEANS statement computes least squares means (LS-means) of fixed effects. As in the<br />
GLM procedure, LS-means are predicted population margins—that is, they estimate the marginal<br />
means over a balanced population. In a sense, LS-means are to unbalanced designs as class and<br />
subclass arithmetic means are to balanced designs. <strong>The</strong> L matrix constructed to compute them is<br />
the same as the L matrix formed in PROC GLM; however, the standard errors are adjusted for the<br />
covariance parameters in the model.<br />
Each LS-mean is computed as Lbˇ, where L is the coefficient matrix associated with the least squares<br />
mean and bˇ is the estimate of the fixed-effects parameter vector (see the section “Estimating Fixed<br />
and Random Effects in the Mixed Model” on page 3970). <strong>The</strong> approximate standard errors for the<br />
LS-mean is computed as the square root of L.X 0bV 1 X/ L 0 .<br />
LS-means can be computed for any effect in the MODEL statement that involves CLASS variables.<br />
You can specify multiple effects in one LSMEANS statement or in multiple LSMEANS statements,<br />
and all LSMEANS statements must appear after the MODEL statement. As in the ESTIMATE<br />
statement, the L matrix is tested for estimability, and if this test fails, PROC <strong>MIXED</strong> displays<br />
“Non-est” for the LS-means entries.<br />
Assuming the LS-mean is estimable, PROC <strong>MIXED</strong> constructs an approximate t test to test the null<br />
hypothesis that the associated population quantity equals zero. By default, the denominator degrees<br />
of freedom for this test are the same as those displayed for the effect in the “Tests of Fixed Effects”<br />
table (see the section “Default Output” on page 3989).<br />
Table 56.5 summarizes important options in the LSMEANS statement. All LSMEANS options are<br />
subsequently discussed in alphabetical order.
Table 56.5 Summary of Important LSMEANS Statement Options<br />
Option Description<br />
Construction and Computation of LS-Means<br />
AT modifies covariate value in computing LS-means<br />
BYLEVEL computes separate margins<br />
DIFF requests differences of LS-means<br />
OM specifies weighting scheme for LS-mean computation<br />
SINGULAR= tunes estimability checking<br />
SLICE= partitions F tests (simple effects)<br />
LSMEANS Statement ✦ 3917<br />
Degrees of Freedom and P-values<br />
ADJDFE= determines whether to compute row-wise denominator degrees<br />
of freedom with DDFM=SATTERTHWAITE or<br />
DDFM=KENWARDROGER<br />
ADJUST= determines the method for multiple comparison adjustment of LSmean<br />
differences<br />
ALPHA=˛ determines the confidence level (1 ˛)<br />
DF= assigns specific value to degrees of freedom for tests and confidence<br />
limits<br />
Statistical Output<br />
CL constructs confidence limits for means and or mean differences<br />
CORR displays correlation matrix of LS-means<br />
COV displays covariance matrix of LS-means<br />
E prints the L matrix<br />
You can specify the following options in the LSMEANS statement after a slash (/).<br />
ADJDFE=SOURCE<br />
ADJDFE=ROW<br />
specifies how denominator degrees of freedom are determined when p-values and confidence<br />
limits are adjusted for multiple comparisons with the ADJUST= option. When you do not<br />
specify the ADJDFE= option, or when you specify ADJDFE=SOURCE, the denominator<br />
degrees of freedom for multiplicity-adjusted results are the denominator degrees of freedom<br />
for the LS-mean effect in the “Type 3 Tests of Fixed Effects” table. When you specify AD-<br />
JDFE=ROW, the denominator degrees of freedom for multiplicity-adjusted results correspond<br />
to the degrees of freedom displayed in the DF column of the “Differences of Least Squares<br />
Means” table.<br />
<strong>The</strong> ADJDFE=ROW setting is particularly useful if you want multiplicity adjustments to<br />
take into account that denominator degrees of freedom are not constant across LS-mean<br />
differences. This can be the case, for example, when the DDFM=SATTERTHWAITE or<br />
DDFM=KENWARDROGER degrees-of-freedom method is in effect.<br />
In one-way models with heterogeneous variance, combining certain ADJUST= options with<br />
the ADJDFE=ROW option corresponds to particular methods of performing multiplicity adjustments<br />
in the presence of heteroscedasticity. For example, the following statements fit a
3918 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
ADJUST=BON<br />
heteroscedastic one-way model and perform Dunnett’s T3 method (Dunnett 1980), which is<br />
based on the studentized maximum modulus (ADJUST=SMM):<br />
proc mixed;<br />
class A;<br />
model y = A / ddfm=satterth;<br />
repeated / group=A;<br />
lsmeans A / adjust=smm adjdfe=row;<br />
run;<br />
If you combine the ADJDFE=ROW option with ADJUST=SIDAK, the multiplicity adjustment<br />
corresponds to the T2 method of Tamhane (1979), while ADJUST=TUKEY<br />
corresponds to the method of Games-Howell (Games and Howell 1976). Note that<br />
ADJUST=TUKEY gives the exact results for the case of fractional degrees of freedom in<br />
the one-way model, but it does not take into account that the degrees of freedom are subject<br />
to variability. A more conservative method, such as ADJUST=SMM, might protect the<br />
overall error rate better.<br />
Unless the ADJUST= option of the LSMEANS statement is specified, the ADJDFE= option<br />
has no effect.<br />
ADJUST=DUNNETT<br />
ADJUST=SCHEFFE<br />
ADJUST=SIDAK<br />
ADJUST=SIMULATE< (sim-options) ><br />
ADJUST=SMM | GT2<br />
ADJUST=TUKEY<br />
requests a multiple comparison adjustment for the p-values and confidence limits for the<br />
differences of LS-means. By default, PROC <strong>MIXED</strong> adjusts all pairwise differences unless<br />
you specify ADJUST=DUNNETT, in which case PROC <strong>MIXED</strong> analyzes all differences<br />
with a control level. <strong>The</strong> ADJUST= option implies the DIFF option.<br />
<strong>The</strong> BON (Bonferroni) and SIDAK adjustments involve correction factors described in Chapter<br />
39, “<strong>The</strong> GLM <strong>Procedure</strong>,” and Chapter 58, “<strong>The</strong> MULTTEST <strong>Procedure</strong>;” also see Westfall<br />
and Young (1993) and Westfall et al. (1999). When you specify ADJUST=TUKEY<br />
and your data are unbalanced, PROC <strong>MIXED</strong> uses the approximation described in Kramer<br />
(1956). Similarly, when you specify ADJUST=DUNNETT and the LS-means are correlated,<br />
PROC <strong>MIXED</strong> uses the factor-analytic covariance approximation described in Hsu (1992).<br />
<strong>The</strong> preceding references also describe the SCHEFFE and SMM adjustments.<br />
<strong>The</strong> SIMULATE adjustment computes adjusted p-values and confidence limits from the simulated<br />
distribution of the maximum or maximum absolute value of a multivariate t random<br />
vector. All covariance parameters except the residual variance are fixed at their estimated<br />
values throughout the simulation, potentially resulting in some underdispersion. <strong>The</strong> simulation<br />
estimates q, the true .1 ˛/th quantile, where 1 ˛ is the confidence coefficient. <strong>The</strong><br />
default ˛ is 0.05, and you can change this value with the ALPHA= option in the LSMEANS<br />
statement.
LSMEANS Statement ✦ 3919<br />
<strong>The</strong> number of samples is set so that the tail area for the simulated q is within of 1 ˛ with<br />
100.1 /% confidence. In equation form,<br />
P.jF .bq/ .1 ˛/j / D 1<br />
where Oq is the simulated q and F is the true distribution function of the maximum; see<br />
Edwards and Berry (1987) for details. By default, = 0.005 and = 0.01, placing the tail<br />
area of Oq within 0.005 of 0.95 with 99% confidence. <strong>The</strong> ACC= and EPS= sim-options reset<br />
and , respectively; the NSAMP= sim-option sets the sample size directly; and the SEED=<br />
sim-option specifies an integer used to start the pseudo-random number generator for the<br />
simulation. If you do not specify a seed, or if you specify a value less than or equal to zero,<br />
the seed is generated from reading the time of day from the computer clock. For additional<br />
descriptions of these and other simulation options, see the section “LSMEANS Statement”<br />
on page 2456 in Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>.”<br />
ALPHA=number<br />
requests that a t-type confidence interval be constructed for each of the LS-means with confidence<br />
level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />
AT variable = value<br />
AT (variable-list) = (value-list)<br />
AT MEANS<br />
enables you to modify the values of the covariates used in computing LS-means. By default,<br />
all covariate effects are set equal to their mean values for computation of standard LS-means.<br />
<strong>The</strong> AT option enables you to assign arbitrary values to the covariates. Additional columns in<br />
the output table indicate the values of the covariates.<br />
If there is an effect containing two or more covariates, the AT option sets the effect equal<br />
to the product of the individual means rather than the mean of the product (as with standard<br />
LS-means calculations). <strong>The</strong> AT MEANS option sets covariates equal to their mean values<br />
(as with standard LS-means) and incorporates this adjustment to crossproducts of covariates.<br />
As an example, consider the following invocation of PROC <strong>MIXED</strong>:<br />
proc mixed;<br />
class A;<br />
model Y = A X1 X2 X1*X2;<br />
lsmeans A;<br />
lsmeans A / at means;<br />
lsmeans A / at X1=1.2;<br />
lsmeans A / at (X1 X2)=(1.2 0.3);<br />
run;<br />
For the first two LSMEANS statements, the LS-means coefficient for X1 is x1 (the mean<br />
of X1) and for X2 is x2 (the mean of X2). However, for the first LSMEANS statement, the<br />
coefficient for X1*X2 is x1x2, but for the second LSMEANS statement, the coefficient is<br />
x1 x2. <strong>The</strong> third LSMEANS statement sets the coefficient for X1 equal to 1:2 and leaves it<br />
at x2 for X2, and the final LSMEANS statement sets these values to 1:2 and 0:3, respectively.<br />
If a WEIGHT variable is present, it is used in processing AT variables. Also, observations<br />
with missing dependent variables are included in computing the covariate means, unless these
3920 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
observations form a missing cell and the FULLX option in the MODEL statement is not in<br />
effect. You can use the E option in conjunction with the AT option to check that the modified<br />
LS-means coefficients are the ones you want.<br />
<strong>The</strong> AT option is disabled if you specify the BYLEVEL option.<br />
BYLEVEL<br />
requests PROC <strong>MIXED</strong> to process the OM data set by each level of the LS-mean effect<br />
(LSMEANS effect) in question. For more details, see the OM option later in this section.<br />
CL<br />
CORR<br />
COV<br />
requests that t-type confidence limits be constructed for each of the LS-means. <strong>The</strong> confidence<br />
level is 0.95 by default; this can be changed with the ALPHA= option.<br />
displays the estimated correlation matrix of the least squares means as part of the “Least<br />
Squares Means” table.<br />
displays the estimated covariance matrix of the least squares means as part of the “Least<br />
Squares Means” table.<br />
DF=number<br />
specifies the degrees of freedom for the t test and confidence limits. <strong>The</strong> default is the denominator<br />
degrees of freedom taken from the “Tests of Fixed Effects” table corresponding to<br />
the LS-means effect unless the DDFM=SATTERTHWAITE or DDFM=KENWARDROGER<br />
option is in effect in the MODEL statement. For these DDFM= methods, degrees of freedom<br />
are determined separately for each test; see the DDFM= option for more information.<br />
DIFF< =difftype ><br />
PDIFF< =difftype ><br />
requests that differences of the LS-means be displayed. <strong>The</strong> optional difftype specifies which<br />
differences to produce, with possible values being ALL, CONTROL, CONTROLL, and<br />
CONTROLU. <strong>The</strong> difftype ALL requests all pairwise differences, and it is the default. <strong>The</strong><br />
difftype CONTROL requests the differences with a control, which, by default, is the first level<br />
of each of the specified LSMEANS effects.<br />
To specify which levels of the effects are the controls, list the quoted formatted values in<br />
parentheses after the keyword CONTROL. For example, if the effects A, B, and C are classification<br />
variables, each having two levels, 1 and 2, the following LSMEANS statement<br />
specifies the (1,2) level of A*B and the (2,1) level of B*C as controls:<br />
lsmeans A*B B*C / diff=control(’1’ ’2’ ’2’ ’1’);<br />
For multiple effects, the results depend upon the order of the list, and so you should check the<br />
output to make sure that the controls are correct.<br />
Two-tailed tests and confidence limits are associated with the CONTROL difftype. For onetailed<br />
results, use either the CONTROLL or CONTROLU difftype. <strong>The</strong> CONTROLL difftype<br />
tests whether the noncontrol levels are significantly smaller than the control; the upper confidence<br />
limits for the control minus the noncontrol levels are considered to be infinity and
E<br />
LSMEANS Statement ✦ 3921<br />
are displayed as missing. Conversely, the CONTROLU difftype tests whether the noncontrol<br />
levels are significantly larger than the control; the upper confidence limits for the noncontrol<br />
levels minus the control are considered to be infinity and are displayed as missing.<br />
If you want to perform multiple comparison adjustments on the differences of LS-means, you<br />
must specify the ADJUST= option.<br />
<strong>The</strong> differences of the LS-means are displayed in a table titled “Differences of Least Squares<br />
Means.” For ODS purposes, the table name is “Diffs.”<br />
requests that the L matrix coefficients for all LSMEANS effects be displayed. For ODS<br />
purposes, the name of this “L Matrix Coefficients” table is “Coef.”<br />
OM< =OM-data-set ><br />
OBSMARGINS< =OM-data-set ><br />
specifies a potentially different weighting scheme for the computation of LS-means coefficients.<br />
<strong>The</strong> standard LS-means have equal coefficients across classification effects; however,<br />
the OM option changes these coefficients to be proportional to those found in OM-data-set.<br />
This adjustment is reasonable when you want your inferences to apply to a population that is<br />
not necessarily balanced but has the margins observed in OM-data-set.<br />
PDIFF<br />
By default, OM-data-set is the same as the analysis data set. You can optionally specify another<br />
data set that describes the population for which you want to make inferences. This data<br />
set must contain all model variables except for the dependent variable (which is ignored if it<br />
is present). In addition, the levels of all CLASS variables must be the same as those occurring<br />
in the analysis data set. Specifying an OM-data-set enables you to construct arbitrarily<br />
weighted LS-means.<br />
In computing the observed margins, PROC <strong>MIXED</strong> uses all observations for which there<br />
are no missing or invalid independent variables, including those for which there are missing<br />
dependent variables. Also, if OM-data-set has a WEIGHT variable, PROC <strong>MIXED</strong> uses<br />
weighted margins to construct the LS-means coefficients. If OM-data-set is balanced, the<br />
LS-means are unchanged by the OM option.<br />
<strong>The</strong> BYLEVEL option modifies the observed-margins LS-means. Instead of computing the<br />
margins across all of the OM-data-set, PROC <strong>MIXED</strong> computes separate margins for each<br />
level of the LSMEANS effect in question. In this case the resulting LS-means are actually<br />
equal to raw means for fixed-effects models and certain balanced random-effects models, but<br />
their estimated standard errors account for the covariance structure that you have specified. If<br />
the AT option is specified, the BYLEVEL option disables it.<br />
You can use the E option in conjunction with either the OM or BYLEVEL option to check that<br />
the modified LS-means coefficients are the ones you want. It is possible that the modified LSmeans<br />
are not estimable when the standard ones are, or vice versa. Nonestimable LS-means<br />
are noted as “Non-est” in the output.<br />
is the same as the DIFF option.
3922 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
SINGULAR=number<br />
tunes the estimability checking as documented for the SINGULAR= option in the<br />
CONTRAST statement.<br />
SLICE= fixed-effect<br />
SLICE= (fixed-effects)<br />
specifies effects by which to partition interaction LSMEANS effects. This can produce what<br />
are known as tests of simple effects (Winer 1971). For example, suppose that A*B is significant,<br />
and you want to test the effect of A for each level of B. <strong>The</strong> appropriate LSMEANS<br />
statement is as follows:<br />
lsmeans A*B / slice=B;<br />
This code tests for the simple main effects of A for B, which are calculated by extracting the<br />
appropriate rows from the coefficient matrix for the A*B LS-means and by using them to form<br />
an F test. See the section “Inference and Test Statistics” for more information about this F<br />
test.<br />
<strong>The</strong> SLICE option produces a table titled “Tests of Effect Slices.” For ODS purposes, the<br />
table name is “Slices.”<br />
MODEL Statement<br />
MODEL dependent = < fixed-effects >< / options > ;<br />
<strong>The</strong> MODEL statement names a single dependent variable and the fixed effects, which determine the<br />
X matrix of the mixed model (see the section “Parameterization of Mixed Models” on page 3975<br />
for details). <strong>The</strong> specification of effects is the same as in the GLM procedure; however, unlike<br />
PROC GLM, you do not specify random effects in the MODEL statement. <strong>The</strong> MODEL statement<br />
is required.<br />
An intercept is included in the fixed-effects model by default. If no fixed effects are specified, only<br />
this intercept term is fit. <strong>The</strong> intercept can be removed by using the NOINT option.<br />
Table 56.6 summarizes options in the MODEL statement. <strong>The</strong>se are subsequently discussed in<br />
detail in alphabetical order.<br />
Table 56.6 Summary of Important MODEL Statement Options<br />
Option Description<br />
Model Building<br />
NOINT excludes fixed-effect intercept from model<br />
Statistical Computations<br />
ALPHA=˛ determines the confidence level (1 ˛) for fixed effects<br />
ALPHAP=˛ determines the confidence level (1 ˛) for predicted values<br />
CHISQ requests chi-square tests<br />
DDF= specifies denominator degrees of freedom (list)
Table 56.6 continued<br />
Option Description<br />
MODEL Statement ✦ 3923<br />
DDFM= specifies the method for computing denominator degrees of freedom<br />
HTYPE= selects the type of hypothesis test<br />
INFLUENCE requests influence and case-deletion diagnostics<br />
NOTEST suppresses hypothesis tests for the fixed effects<br />
OUTP= specifies output data set for predicted values and related quantities<br />
OUTPM= specifies output data set for predicted values and related quantities<br />
RESIDUAL adds Pearson-type and studentized residuals to output data sets<br />
VCIRY adds scaled marginal residual to output data sets<br />
Statistical Output<br />
CL displays confidence limits for fixed-effects parameter estimates<br />
CORRB displays correlation matrix of fixed-effects parameter estimates<br />
COVB displays covariance matrix of fixed-effects parameter estimates<br />
COVBI displays inverse covariance matrix of fixed-effects parameter estimates<br />
E, E1, E2, E3 displays L matrix coefficients<br />
INTERCEPT adds a row for the intercept to test tables<br />
SOLUTION displays fixed-effects parameter estimates (and scale parameter in<br />
GLM models)<br />
Singularity Tolerances<br />
SINGCHOL= tunes sensitivity in computing Cholesky roots<br />
SINGRES= tunes singularity criterion for residual variance<br />
SINGULAR= tunes the sensitivity in sweeping<br />
ZETA= tunes the sensitivity in forming Type 3 functions<br />
You can specify the following options in the MODEL statement after a slash (/).<br />
ALPHA=number<br />
requests that a t-type confidence interval be constructed for each of the fixed-effects parameters<br />
with confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the<br />
default is 0.05.<br />
ALPHAP=number<br />
requests that a t-type confidence interval be constructed for the predicted values with confidence<br />
level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />
CHISQ<br />
CL<br />
requests that chi-square tests be performed for all specified effects in addition to the F tests.<br />
Type 3 tests are the default; you can produce the Type 1 and Type 2 tests by using the HTYPE=<br />
option.<br />
requests that t-type confidence limits be constructed for each of the fixed-effects parameter
3924 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
estimates. <strong>The</strong> confidence level is 0.95 by default; this can be changed with the ALPHA=<br />
option.<br />
CONTAIN<br />
has the same effect as the DDFM=CONTAIN option.<br />
CORRB<br />
produces the approximate correlation matrix of the fixed-effects parameter estimates. For<br />
ODS purposes, the name of this table is “CorrB.”<br />
COVB<br />
COVBI<br />
produces the approximate variance-covariance matrix of the fixed-effects parameter estimates<br />
bˇ. By default, this matrix equals .X 0bV 1 X/ and results from sweeping .X y/ 0bV 1 .X y/ on<br />
all but its last pivot and removing the y border. <strong>The</strong> EMPIRICAL option in the PROC <strong>MIXED</strong><br />
statement changes this matrix into “empirical sandwich” form. For ODS purposes, the name<br />
of this table is “CovB.” If the degrees-of-freedom method of Kenward and Roger (1997) is in<br />
effect (DDFM=KENWARDROGER), the COVB matrix changes because the method entails<br />
an adjustment of the variance-covariance matrix of the fixed effects by the method proposed<br />
by Prasad and Rao (1990) and Harville and Jeske (1992); see also Kackar and Harville (1984).<br />
produces the inverse of the approximate variance-covariance matrix of the fixed-effects parameter<br />
estimates. For ODS purposes, the name of this table is “InvCovB.”<br />
DDF=value-list<br />
enables you to specify your own denominator degrees of freedom for the fixed effects. <strong>The</strong><br />
value-list specification is a list of numbers or missing values (.) separated by commas. <strong>The</strong><br />
degrees of freedom should be listed in the order in which the effects appear in the “Tests of<br />
Fixed Effects” table. If you want to retain the default degrees of freedom for a particular<br />
effect, use a missing value for its location in the list. For example, the following statement<br />
assigns 3 denominator degrees of freedom to A and 4.7 to A*B, while those for B remain the<br />
same:<br />
model Y = A B A*B / ddf=3,.,4.7;<br />
If you specify DDFM=SATTERTHWAITE or DDFM=KENWARDROGER, the DDF= option<br />
has no effect.<br />
DDFM=CONTAIN<br />
DDFM=BETWITHIN<br />
DDFM=RESIDUAL<br />
DDFM=SATTERTHWAITE<br />
DDFM=KENWARDROGER< (FIRSTORDER) ><br />
specifies the method for computing the denominator degrees of freedom for the tests of fixed<br />
effects resulting from the MODEL, CONTRAST, ESTIMATE, and LSMEANS statements.<br />
Table 56.7 lists syntax aliases for the degrees-of-freedom methods.
Table 56.7 Aliases for DDFM= Option<br />
DDFM= Option Alias<br />
BETWITHIN BW<br />
CONTAIN CON<br />
KENWARDROGER KENROG, KR<br />
RESIDUAL RES<br />
SATTERTHWAITE SATTERTH, SAT<br />
MODEL Statement ✦ 3925<br />
<strong>The</strong> DDFM=CONTAIN option invokes the containment method to compute denominator degrees<br />
of freedom, and it is the default when you specify a RANDOM statement. <strong>The</strong> containment<br />
method is carried out as follows: Denote the fixed effect in question A, and search<br />
the RANDOM effect list for the effects that syntactically contain A. For example, the random<br />
effect B(A) contains A, but the random effect C does not, even if it has the same levels as B(A).<br />
Among the random effects that contain A, compute their rank contribution to the (X Z) matrix.<br />
<strong>The</strong> DDF assigned to A is the smallest of these rank contributions. If no effects are found,<br />
the DDF for A is set equal to the residual degrees of freedom, N rank.X Z/. This choice of<br />
DDF matches the tests performed for balanced split-plot designs and should be adequate for<br />
moderately unbalanced designs.<br />
CAUTION: If you have a Z matrix with a large number of columns, the overall memory<br />
requirements and the computing time after convergence can be substantial for the containment<br />
method. If it is too large, you might want to use the DDFM=BETWITHIN option.<br />
<strong>The</strong> DDFM=BETWITHIN option is the default for REPEATED statement specifications<br />
(with no RANDOM statements). It is computed by dividing the residual degrees of freedom<br />
into between-subject and within-subject portions. PROC <strong>MIXED</strong> then checks whether<br />
a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom<br />
to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect (see<br />
Schluchter and Elashoff 1990). If there are multiple within-subject effects containing classification<br />
variables, the within-subject degrees of freedom are partitioned into components<br />
corresponding to the subject-by-effect interactions.<br />
One exception to the preceding method is the case where you have specified no RANDOM<br />
statements and a REPEATED statement with the TYPE=UN option. In this case, all effects<br />
are assigned the between-subject degrees of freedom to provide for better small-sample approximations<br />
to the relevant sampling distributions. DDFM=KENWARDROGER might be a<br />
better option to try for this case.<br />
<strong>The</strong> DDFM=RESIDUAL option performs all tests by using the residual degrees of freedom,<br />
n rank.X/, where n is the number of observations.<br />
<strong>The</strong> DDFM=SATTERTHWAITE option performs a general Satterthwaite approximation for<br />
the denominator degrees of freedom, computed as follows. Suppose is the vector of unknown<br />
parameters in V, and suppose C D .X 0 V 1 X/ , where denotes a generalized inverse.<br />
Let bC and b be the corresponding estimates.
3926 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Consider the one-dimensional case, and consider ` to be a vector defining an estimable linear<br />
combination of ˇ. <strong>The</strong> Satterthwaite degrees of freedom for the t statistic<br />
t D `bˇ<br />
p<br />
` OC` 0<br />
is computed as<br />
D 2.` OC` 0 / 2<br />
g 0 Ag<br />
where g is the gradient of `C` 0 with respect to , evaluated at b, and A is the asymptotic<br />
variance-covariance matrix of b obtained from the second derivative matrix of the likelihood<br />
equations.<br />
For the multidimensional case, let L be an estimable contrast matrix and denote the rank of<br />
LbCL 0 as q > 1. <strong>The</strong> Satterthwaite denominator degrees of freedom for the F statistic<br />
F D bˇ 0 L 0 .LbCL 0 / 1 Lbˇ<br />
q<br />
are computed by first performing the spectral decomposition LbCL 0 D P 0 DP, where P is an orthogonal<br />
matrix of eigenvectors and D is a diagonal matrix of eigenvalues, both of dimension<br />
q q. Define `m to be the mth row of PL, and let<br />
m D<br />
2.Dm/ 2<br />
g 0 m Agm<br />
where Dm is the mth diagonal element of D and gm is the gradient of `mC` 0 m<br />
, evaluated at b. <strong>The</strong>n let<br />
E D<br />
qX<br />
mD1<br />
m<br />
m<br />
2 I. m > 2/<br />
where the indicator function eliminates terms for which m<br />
F are then computed as<br />
D 2E<br />
E q<br />
provided E > q; otherwise is set to zero.<br />
with respect to<br />
2. <strong>The</strong> degrees of freedom for<br />
This method is a generalization of the techniques described in Giesbrecht and Burns (1985),<br />
McLean and Sanders (1988), and Fai and Cornelius (1996). <strong>The</strong> method can also include estimated<br />
random effects. In this case, append b to bˇ and change bC to be the inverse of the coefficient<br />
matrix in the mixed model equations. <strong>The</strong> calculations require extra memory to hold c<br />
matrices that are the size of the mixed model equations, where c is the number of covariance<br />
parameters. In the notation of Table 56.25, this is approximately 8q.p C g/.p C g/=2 bytes.<br />
Extra computing time is also required to process these matrices. <strong>The</strong> Satterthwaite method<br />
implemented here is intended to produce an accurate F approximation; however, the results<br />
can differ from those produced by PROC GLM. Also, the small sample properties of this
E<br />
E1<br />
E2<br />
E3<br />
FULLX<br />
MODEL Statement ✦ 3927<br />
approximation have not been extensively investigated for the various models available with<br />
PROC <strong>MIXED</strong>.<br />
<strong>The</strong> DDFM=KENWARDROGER option performs the degrees of freedom calculations detailed<br />
by Kenward and Roger (1997). This approximation involves inflating the estimated<br />
variance-covariance matrix of the fixed and random effects by the method proposed<br />
by Prasad and Rao (1990) and Harville and Jeske (1992); see also Kackar and Harville<br />
(1984). Satterthwaite-type degrees of freedom are then computed based on this adjustment.<br />
By default, the observed information matrix of the covariance parameter estimates<br />
is used in the calculations. For covariance structures that have nonzero second derivatives<br />
with respect to the covariance parameters, the Kenward-Roger covariance matrix adjustment<br />
includes a second-order term. This term can result in standard error shrinkage.<br />
Also, the resulting adjusted covariance matrix can then be indefinite and is not invariant under<br />
reparameterization. <strong>The</strong> FIRSTORDER suboption of the DDFM=KENWARDROGER<br />
option eliminates the second derivatives from the calculation of the covariance matrix<br />
adjustment. For the case of scalar estimable functions, the resulting estimator is referred<br />
to as the Prasad-Rao estimator em @ in Harville and Jeske (1992). <strong>The</strong> following<br />
are examples of covariance structures that generally lead to nonzero second derivatives:<br />
TYPE=ANTE(1), TYPE=AR(1), TYPE=ARH(1), TYPE=ARMA(1,1), TYPE=CSH,<br />
TYPE=FA, TYPE=FA0(q), TYPE=TOEPH, TYPE=UNR, and all TYPE=SP() structures.<br />
When the asymptotic variance matrix of the covariance parameters is found to be singular,<br />
a generalized inverse is used. Covariance parameters with zero variance then do<br />
not contribute to the degrees-of-freedom adjustment for DDFM=SATTERTHWAITE and<br />
DDFM=KENWARDROGER, and a message is written to the log.<br />
This method changes output in the following tables (listed in Table 56.22): Contrast, CorrB,<br />
CovB, Diffs, Estimates, InvCovB, LSMeans, Slices, SolutionF, SolutionR, Tests1–Tests3.<br />
<strong>The</strong> OUTP= and OUTPM= data sets are also affected.<br />
requests that Type 1, Type 2, and Type 3 L matrix coefficients be displayed for all specified<br />
effects. For ODS purposes, the name of the table is “Coef.”<br />
requests that Type 1 L matrix coefficients be displayed for all specified effects. For ODS<br />
purposes, the name of the table is “Coef.”<br />
requests that Type 2 L matrix coefficients be displayed for all specified effects. For ODS<br />
purposes, the name of the table is “Coef.”<br />
requests that Type 3 L matrix coefficients be displayed for all specified effects. For ODS<br />
purposes, the name of the table is “Coef.”<br />
requests that columns of the X matrix that consist entirely of zeros not be eliminated from X;<br />
otherwise, they are eliminated by default. For a column corresponding to a missing cell to<br />
be added to X, its particular levels must be present in at least one observation in the analysis
3928 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
data set along with a missing dependent variable. <strong>The</strong> use of the FULLX option can affect<br />
coefficient specifications in the CONTRAST and ESTIMATE statements, as well as covariate<br />
coefficients from LSMEANS statements specified with the AT MEANS option.<br />
HTYPE=value-list<br />
indicates the type of hypothesis test to perform on the fixed effects. Valid entries for value<br />
are 1, 2, and 3; the default value is 3. You can specify several types by separating the values<br />
with a comma or a space. <strong>The</strong> ODS table names are “Tests1” for the Type 1 tests, “Tests2”<br />
for the Type 2 tests, and “Tests3” for the Type 3 tests.<br />
INFLUENCE< (influence-options) ><br />
specifies that influence and case deletion diagnostics are to be computed.<br />
<strong>The</strong> INFLUENCE option computes influence diagnostics by noniterative or iterative methods.<br />
<strong>The</strong> noniterative diagnostics rely on recomputation formulas under the assumption that<br />
covariance parameters or their ratios remain fixed. With the possible exception of a profiled<br />
residual variance, no covariance parameters are updated. This is the default behavior because<br />
of its computational efficiency. However, the impact of an observation on the overall analysis<br />
can be underestimated if its effect on covariance parameters is not assessed. Toward this end,<br />
iterative methods can be applied to gauge the overall impact of observations and to obtain<br />
influence diagnostics for the covariance parameter estimates.<br />
If you specify the INFLUENCE option without further suboptions, PROC <strong>MIXED</strong> computes<br />
single-case deletion diagnostics and influence statistics for each observation in the data set by<br />
updating estimates for the fixed-effects parameter estimates, and also the residual variance, if<br />
it is profiled. <strong>The</strong> EFFECT=, SELECT=, ITER=, SIZE=, and KEEP= suboptions provide additional<br />
flexibility in the computation and reporting of influence statistics. Table 56.8 briefly<br />
describes important suboptions and their effect on the influence analysis.<br />
Table 56.8 Summary of INFLUENCE Default and Suboptions<br />
Description Suboption<br />
Compute influence diagnostics for individual observations default<br />
Measure influence of sets of observations chosen according to a EFFECT=<br />
classification variable or effect<br />
Remove pairs of observations and report the results sorted by de- SIZE=2<br />
gree of influence<br />
Remove triples, quadruples of observations, etc. SIZE=<br />
Allow selection of individual observations, observations sharing SELECT=<br />
specific levels of effects, and construction of tuples from specified<br />
subsets of observations<br />
Update fixed effects and covariance parameters by refitting the ITER=n > 0<br />
mixed model, adding up to n iterations<br />
Compute influence diagnostics for the covariance parameters ITER=n > 0<br />
Update only fixed effects and the residual variance, if it is profiled ITER=0<br />
Add the reduced-data estimates to the data set created with ODS<br />
OUTPUT<br />
ESTIMATES
MODEL Statement ✦ 3929<br />
<strong>The</strong> modifiers and their default values are discussed in the following paragraphs. <strong>The</strong> set<br />
of computed influence diagnostics varies with the suboptions. <strong>The</strong> most extensive set of<br />
influence diagnostics is obtained when ITER=n with n > 0.<br />
You can produce statistical graphics of influence diagnostics when the ODS GRAPHICS<br />
statement is specified. For general information about ODS Graphics, see Chapter 21,<br />
“Statistical Graphics Using ODS.” For specific information about the graphics available in<br />
the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on page 3998.<br />
You can specify the following influence-options in parentheses:<br />
EFFECT=effect<br />
specifies an effect according to which observations are grouped. Observations sharing<br />
the same level of the effect are removed from the analysis as a group. <strong>The</strong> effect must<br />
contain only classification variables, but they need not be contained in the model.<br />
ESTIMATES<br />
EST<br />
ITER=n<br />
Removing observations can change the rank of the .X 0 V 1 X/ matrix. This is particularly<br />
likely to happen when multiple observations are eliminated from the analysis.<br />
If the rank of the estimated variance-covariance matrix of bˇ changes or its singularity<br />
pattern is altered, no influence diagnostics are computed.<br />
specifies that the updated parameter estimates should be written to the ODS output<br />
data set. <strong>The</strong> values are not displayed in the “Influence” table, but if you use ODS<br />
OUTPUT to create a data set from the listing, the estimates are added to the data set.<br />
If ITER=0, only the fixed-effects estimates are saved. In iterative influence analyses,<br />
fixed-effects and covariance parameters are stored. <strong>The</strong> p fixed-effects parameter estimates<br />
are named Parm1–Parmp, and the q covariance parameter estimates are named<br />
CovP1–CovPq. <strong>The</strong> order corresponds to that in the “Solution for Fixed Effects” and<br />
“Covariance Parameter Estimates” tables. If parameter updates fail—for example, because<br />
of a loss of rank or a nonpositive definite Hessian—missing values are reported.<br />
controls the maximum number of additional iterations PROC <strong>MIXED</strong> performs to update<br />
the fixed-effects and covariance parameter estimates following data point removal.<br />
If you specify n > 0, then statistics such as DFFITS, MDFFITS, and the likelihood<br />
distances measure the impact of observation(s) on all aspects of the analysis. Typically,<br />
the influence will grow compared to values at ITER=0. In models without RANDOM<br />
or REPEATED effects, the ITER= option has no effect.<br />
This documentation refers to analyses when n > 0 simply as iterative influence analysis,<br />
even if final covariance parameter estimates can be updated in a single step (for<br />
example, when METHOD=MIVQUE0 or METHOD=TYPE3). This nomenclature reflects<br />
the fact that only if n > 0 are all model parameters updated, which can require<br />
additional iterations. If n > 0 and METHOD=REML (default) or METHOD=ML,<br />
the procedure updates fixed effects and variance-covariance parameters after removing<br />
the selected observations with additional Newton-Raphson iterations, starting from the<br />
converged estimates for the entire data. <strong>The</strong> process stops for each observation or set of<br />
observations if the convergence criterion is satisfied or the number of further iterations
3930 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
exceeds n. If n > 0 and METHOD=TYPE1, TYPE2, or TYPE3, ANOVA estimates of<br />
the covariance parameters are recomputed in a single step.<br />
Compared to noniterative updates, the computations are more involved. In particular<br />
for large data sets and/or a large number of random effects, iterative updates require<br />
considerably more resources. A one-step (ITER=1) or two-step update might be a<br />
good compromise. <strong>The</strong> output includes the number of iterations performed, which is<br />
less than n if the iteration converges. If the process does not converge in n iterations,<br />
you should be careful in interpreting the results, especially if n is fairly large.<br />
Bounds and other restrictions on the covariance parameters carry over from the fulldata<br />
model. Covariance parameters that are not iterated in the model fit to the full<br />
data (the NOITER or HOLD option in the PARMS statement) are likewise not updated<br />
in the refit. In certain models, such as random-effects models, the ratios between the<br />
covariance parameters and the residual variance are maintained rather than the actual<br />
value of the covariance parameter estimate (see the section “Influence Diagnostics” on<br />
page 3982).<br />
KEEP=n<br />
determines how many observations are retained for display and in the output data set or<br />
how many tuples if you specify SIZE=. <strong>The</strong> output is sorted by an influence statistic as<br />
discussed for the SIZE= suboption.<br />
SELECT=value-list<br />
specifies which observations or effect levels are chosen for influence calculations. If<br />
the SELECT= suboption is not specified, diagnostics are computed as follows:<br />
for all observations, if EFFECT= or SIZE= are not given<br />
for all levels of the specified effect, if EFFECT= is specified<br />
for all tuples of size k formed from the observations in value-list, if SIZE=k is<br />
specified<br />
When you specify an effect with the EFFECT= option, the values in value-list represent<br />
indices of the levels in the order in which PROC <strong>MIXED</strong> builds classification effects.<br />
Which observations in the data set correspond to this index depends on the order of the<br />
variables in the CLASS statement, not the order in which the variables appear in the<br />
interaction effect. See the section “Parameterization of Mixed Models” on page 3975<br />
to understand precisely how the procedure indexes nested and crossed effects and how<br />
levels of classification variables are ordered. <strong>The</strong> actual values of the classification<br />
variables involved in the effect are shown in the output so you can determine which<br />
observations were removed.<br />
If the EFFECT= suboption is not specified, the SELECT= value list refers to the sequence<br />
in which observations are read from the input data set or from the current BY<br />
group if there is a BY statement. This indexing is not necessarily the same as the observation<br />
numbers in the input data set, for example, if a WHERE clause is specified or<br />
during BY processing.
SIZE=n<br />
MODEL Statement ✦ 3931<br />
instructs PROC <strong>MIXED</strong> to remove groups of observations formed as tuples of size<br />
n. For example, SIZE=2 specifies all n .n 1/=2 unique pairs of observations.<br />
<strong>The</strong> number of tuples for SIZE=k is nŠ=.kŠ.n k/Š/ and grows quickly with n and<br />
k. Using the SIZE= option can result in considerable computing time. <strong>The</strong> <strong>MIXED</strong><br />
procedure displays by default only the 50 tuples with the greatest influence. Use the<br />
KEEP= option to override this default and to retain a different number of tuples in the<br />
listing or ODS output data set. Regardless of the KEEP= specification, all tuples are<br />
evaluated and the results are ordered according to an influence statistic. This statistic<br />
is the (restricted) likelihood distance as a measure of overall influence if ITER= n > 0<br />
or when a residual variance is profiled. When likelihood distances are unavailable, the<br />
results are ordered by the PRESS statistic.<br />
To reduce computational burden, the SIZE= option can be combined with the<br />
SELECT=value-list modifier. For example, the following statements evaluate all<br />
15 D 6 5=2 pairs formed from observations 13, 14, 18, 30, 31, and 33 and display<br />
the five pairs with the greatest influence:<br />
proc mixed;<br />
class a m f;<br />
model penetration = a m /<br />
influence(size=2 keep=5<br />
select=13,14,18,30,31,33);<br />
random f(m);<br />
run;<br />
If any observation in a tuple contains missing values or has otherwise not contributed to<br />
the analysis, the tuple is not evaluated. This guarantees that the displayed results refer<br />
to the same number of observations, so that meaningful statistics are available by which<br />
to order the results. If computations fail for a particular tuple—for example, because<br />
the .X 0 V 1 X/ matrix changes rank or the G matrix is not positive definite—no results<br />
are produced. Results are retained when the maximum number of iterative updates is<br />
exceeded in iterative influence analyses.<br />
<strong>The</strong> SIZE= suboption cannot be combined with the EFFECT= suboption. As in the<br />
case of the EFFECT= suboption, the statistics being computed are those appropriate<br />
for removal of multiple data points, even if SIZE=1.<br />
For ODS purposes the name of the “Influence Diagnostics” table is “Influence.” <strong>The</strong> variables<br />
in this table depend on whether you specify the EFFECT=, SIZE=, or KEEP= suboption and<br />
whether covariance parameters are iteratively updated. When ITER=0 (the default), certain<br />
influence diagnostics are meaningful only if the residual variance is profiled. Table 56.9 and<br />
Table 56.10 summarize the statistics obtained depending on the model and modifiers. <strong>The</strong><br />
last column in these tables gives the variable name in the ODS OUTPUT INFLUENCE=<br />
data set. Restricted likelihood distances are reported instead of the likelihood distance unless<br />
METHOD=ML. See the section “Influence Diagnostics” on page 3982 for details about the<br />
individual statistics.
3932 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.9 Statistics Computed with INFLUENCE Option, Noniterative Analysis (ITER=0)<br />
Suboption 2 Statistic Variable<br />
Profiled Name<br />
Default Yes Observed value Observed<br />
Predicted value Predicted<br />
Residual Residual<br />
Leverage Leverage<br />
PRESS residual PRESSRes<br />
Internally studentized residual Student<br />
Externally studentized residual RStudent<br />
RMSE without deleted obs RMSE<br />
Cook’s D CookD<br />
DFFITS DFFITS<br />
CovRatio COVRATIO<br />
(Restricted) likelihood distance RLD, LD<br />
Default No Observed value Observed<br />
Predicted value Predicted<br />
Residual Residual<br />
Leverage Leverage<br />
PRESS residual PRESSRes<br />
Internally studentized residual Student<br />
Cook’s D CookD<br />
EFFECT=, Yes Observations in level (tuple) Nobs<br />
SIZE=, PRESS statistic PRESS<br />
or KEEP= Cook’s D CookD<br />
MDFFITS MDFFITS<br />
CovRatio COVRATIO<br />
COVTRACE COVTRACE<br />
RMSE without deleted level (tuple) RMSE<br />
(Restricted) likelihood distance RLD, LD<br />
EFFECT=, No Observations in level (tuple) Nobs<br />
SIZE=, PRESS statistic PRESS<br />
or KEEP= Cook’s D CookD<br />
Table 56.10 Statistics Computed with INFLUENCE Option, Iterative Analysis (ITER=n > 0)<br />
Suboption Statistic Variable<br />
Name<br />
Default Number of iterations Iter<br />
Observed value Observed<br />
Predicted value Predicted<br />
Residual Residual<br />
Leverage Leverage
Table 56.10 continued<br />
Suboption Statistic Variable<br />
Name<br />
MODEL Statement ✦ 3933<br />
PRESS residual PRESSres<br />
Internally studentized residual Student<br />
Externally studentized residual RStudent<br />
RMSE without deleted obs (if possible) RMSE<br />
Cook’s D CookD<br />
DFFITS DFFITS<br />
CovRatio COVRATIO<br />
Cook’s D CovParms CookDCP<br />
CovRatio CovParms COVRATIOCP<br />
MDFFITS CovParms MDFFITSCP<br />
(Restricted) likelihood distance RLD, LD<br />
EFFECT=, Observations in level (tuple) Nobs<br />
SIZE=, Number of iterations Iter<br />
or KEEP= PRESS statistic PRESS<br />
RMSE without deleted level (tuple) RMSE<br />
Cook’s D CookD<br />
MDFFITS MDFFITS<br />
CovRatio COVRATIO<br />
COVTRACE COVTRACE<br />
Cook’s D CovParms CookDCP<br />
CovRatio CovParms COVRATIOCP<br />
MDFFITS CovParms MDFFITSCP<br />
(Restricted) likelihood distance RLD, LD<br />
INTERCEPT<br />
adds a row to the tables for Type 1, 2, and 3 tests corresponding to the overall intercept.<br />
LCOMPONENTS<br />
requests an estimate for each row of the L matrix used to form tests of fixed effects. Components<br />
corresponding to Type 3 tests are the default; you can produce the Type 1 and Type 2<br />
component estimates with the HTYPE= option.<br />
Tests of fixed effects involve testing of linear hypotheses of the form Lˇ D 0. <strong>The</strong> matrix<br />
L is constructed from Type 1, 2, or 3 estimable functions. By default the <strong>MIXED</strong> procedure<br />
constructs Type 3 tests. In many situations, the individual rows of the matrix L represent<br />
contrasts of interest. For example, in a one-way classification model, the Type 3 estimable<br />
functions define differences of factor-level means. In a balanced two-way layout, the rows of<br />
L correspond to differences of cell means.<br />
For example, suppose factors A and B have a and b levels, respectively. <strong>The</strong> following statements<br />
produce .a 1/ one degree of freedom tests for the rows of L associated with the Type<br />
1 and Type 3 estimable functions for factor A, .b 1/ tests for the rows of L associated with<br />
factor B, and a single test for the Type 1 and Type 3 coefficients associated with regressor X:
3934 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
class A B;<br />
model y = A B x / htype=1,3 lcomponents;<br />
<strong>The</strong> denominator degrees of freedom associated with a row of L are the same as those in the<br />
corresponding “Tests of Fixed Effects” table, except for DDFM=KENWARDROGER and<br />
DDFM=SATTERTHWAITE. For these degree of freedom methods, the denominator degrees<br />
of freedom are computed separately for each row of L.<br />
For ODS purposes, the name of the table containing all requested component tests is “LComponents.”<br />
See Example 56.9 for applications of the LCOMPONENTS option.<br />
NOCONTAIN<br />
has the same effect as the DDFM=RESIDUAL option.<br />
NOINT<br />
requests that no intercept be included in the model. An intercept is included by default.<br />
NOTEST<br />
specifies that no hypothesis tests be performed for the fixed effects.<br />
OUTP=<strong>SAS</strong>-data-set<br />
OUTPRED=<strong>SAS</strong>-data-set<br />
specifies an output data set containing predicted values and related quantities. This option<br />
replaces the P option from <strong>SAS</strong> 6.<br />
Predicted values are formed by using the rows from (X Z) as L matrices. Thus, predicted<br />
values from the original data are Xbˇ C Zb. <strong>The</strong>ir approximate standard errors of prediction<br />
are formed from the quadratic form of L with bC defined in the section “Statistical Properties”<br />
on page 3971. <strong>The</strong> L95 and U95 variables provide a t-type confidence interval for the<br />
predicted values, and they correspond to the L95M and U95M variables from the GLM and<br />
REG procedures for fixed-effects models. <strong>The</strong> residuals are the observed minus the predicted<br />
values. Predicted values for data points other than those observed can be obtained by using<br />
missing dependent variables in your input data set.<br />
Specifications that have a REPEATED statement with the SUBJECT= option and missing dependent<br />
variables compute predicted values by using empirical best linear unbiased prediction<br />
(EBLUP). Using hats . O / to denote estimates, the EBLUP formula is<br />
Om D Xm Ǒ C OCm OV 1 .y X Ǒ/<br />
where m represents a hypothetical realization of a missing data vector with associated design<br />
matrix Xm. <strong>The</strong> matrix Cm is the model-based covariance matrix between m and the observed<br />
data y, and other notation is as presented in the section “Mixed Models <strong>The</strong>ory” on page 3962.<br />
<strong>The</strong> estimated prediction variance is as follows:<br />
cVar. Om m/ D OVm OCm OV 1 OC T m C<br />
ŒXm OCm OV 1 X.X T OV 1 X/ ŒXm OCm OV 1 X T
MODEL Statement ✦ 3935<br />
where Vm is the model-based variance matrix of m. For further details, see Henderson (1984)<br />
and Harville (1990). This feature can be useful for forecasting time series or for computing<br />
spatial predictions.<br />
By default, all variables from the input data set are included in the OUTP= data set. You can<br />
select a subset of these variables by using the ID statement.<br />
OUTPM=<strong>SAS</strong>-data-set<br />
OUTPREDM=<strong>SAS</strong>-data-set<br />
specifies an output data set containing predicted means and related quantities. This option<br />
replaces the PM option from <strong>SAS</strong> 6.<br />
<strong>The</strong> output data set is of the same form as that resulting from the OUTP= option, except<br />
that the predicted values do not incorporate the EBLUP values Zb. <strong>The</strong>y also do not use the<br />
EBLUPs for specifications that have a REPEATED statement with the SUBJECT= option and<br />
missing dependent variables. <strong>The</strong> predicted values are formed as Xbˇ in the OUTPM= data<br />
set, and standard errors are quadratic forms in the approximate variance-covariance matrix of<br />
bˇ as displayed by the COVB option.<br />
By default, all variables from the input data set are included in the OUTPM= data set. You<br />
can select a subset of these variables by using the ID statement.<br />
RESIDUAL<br />
requests that Pearson-type and (internally) studentized residuals be added to the OUTP= and<br />
OUTPM= data sets. Studentized residuals are raw residuals standardized by their estimated<br />
standard error. When residuals are internally studentized, the data point in question has<br />
contributed to the estimation of the covariance parameter estimates on which the standard<br />
error of the residual is based. Externally studentized residuals can be computed with the<br />
INFLUENCE option. Pearson-type residuals scale the residual by the standard deviation of<br />
the response.<br />
<strong>The</strong> option has no effect unless the OUTP= or OUTPM= option is specified or unless you request<br />
statistical graphics with the ODS GRAPHICS statement. For general information about<br />
ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific information<br />
about the graphics available in the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on<br />
page 3998. For computational details about studentized and Pearson residuals in <strong>MIXED</strong>,<br />
see the section “Residual Diagnostics” on page 3980.<br />
SINGCHOL=number<br />
tunes the sensitivity in computing Cholesky roots. If a diagonal pivot element is less than<br />
D*number as PROC <strong>MIXED</strong> performs the Cholesky decomposition on a matrix, the associated<br />
column is declared to be linearly dependent upon previous columns and is set to 0. <strong>The</strong><br />
value D is the original diagonal element of the matrix. <strong>The</strong> default for number is 1E4 times<br />
the machine epsilon; this product is approximately 1E 12 on most computers.<br />
SINGRES=number<br />
sets the tolerance for which the residual variance is considered to be zero. <strong>The</strong> default is 1E4<br />
times the machine epsilon; this product is approximately 1E 12 on most computers.
3936 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
SINGULAR=number<br />
tunes the sensitivity in sweeping. If a diagonal pivot element is less than D*number as<br />
PROC <strong>MIXED</strong> sweeps a matrix, the associated column is declared to be linearly dependent<br />
upon previous columns, and the associated parameter is set to 0. <strong>The</strong> value D is the original<br />
diagonal element of the matrix. <strong>The</strong> default is 1E4 times the machine epsilon; this product is<br />
approximately 1E 12 on most computers.<br />
SOLUTION<br />
S<br />
VCIRY<br />
XPVIX<br />
XPVIXI<br />
requests that a solution for the fixed-effects parameters be produced. Using notation from<br />
the section “Mixed Models <strong>The</strong>ory” on page 3962, the fixed-effects parameter estimates<br />
are bˇ and their approximate standard errors are the square roots of the diagonal elements<br />
of .X 0bV 1 X/ . You can output this approximate variance matrix with the COVB option<br />
or modify it with the EMPIRICAL option in the PROC <strong>MIXED</strong> statement or the<br />
DDFM=KENWARDROGER option in the MODEL statement.<br />
Along with the estimates and their approximate standard errors, a t statistic is computed as the<br />
estimate divided by its standard error. <strong>The</strong> degrees of freedom for this t statistic matches the<br />
one appearing in the “Tests of Fixed Effects” table under the effect containing the parameter.<br />
<strong>The</strong> “Pr > |t|” column contains the two-tailed p-value corresponding to the t statistic and<br />
associated degrees of freedom. You can use the CL option to request confidence intervals<br />
for all of the parameters; they are constructed around the estimate by using a radius of the<br />
standard error times a percentage point from the t distribution.<br />
requests that responses and marginal residuals be scaled by the inverse Cholesky root of the<br />
marginal variance-covariance matrix. <strong>The</strong> variables ScaledDep and ScaledResid are added to<br />
the OUTPM= data set. <strong>The</strong>se quantities can be important in bootstrapping of data or residuals.<br />
Examination of the scaled residuals is also helpful in diagnosing departures from normality.<br />
Notice that the results of this scaling operation can depend on the order in which the <strong>MIXED</strong><br />
procedure processes the data.<br />
<strong>The</strong> VCIRY option has no effect unless you also use the OUTPM= option or unless you request<br />
statistical graphics with the ODS GRAPHICS statement. For general information about<br />
ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific information<br />
about the graphics available in the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on<br />
page 3998.<br />
is an alias for the COVBI option.<br />
is an alias for the COVB option.<br />
ZETA=number<br />
tunes the sensitivity in forming Type 3 functions. Any element in the estimable function basis<br />
with an absolute value less than number is set to 0. <strong>The</strong> default is 1E 8.
PARMS Statement<br />
PARMS (value-list) . . . < / options > ;<br />
PARMS Statement ✦ 3937<br />
<strong>The</strong> PARMS statement specifies initial values for the covariance parameters, or it requests a grid<br />
search over several values of these parameters. You must specify the values in the order in which<br />
they appear in the “Covariance Parameter Estimates” table.<br />
<strong>The</strong> value-list specification can take any of several forms:<br />
m a single value<br />
m1; m2; : : : ; mn several values<br />
m to n a sequence where m equals the starting value, n equals the ending value, and the<br />
increment equals 1<br />
m to n by i a sequence where m equals the starting value, n equals the ending value, and the<br />
increment equals i<br />
m1; m2 to m3<br />
mixed values and sequences<br />
You can use the PARMS statement to input known parameters. Referring to the split-plot example<br />
(Example 56.1), suppose the three variance components are known to be 60, 20, and 6. <strong>The</strong> <strong>SAS</strong><br />
statements to fix the variance components at these values are as follows:<br />
proc mixed data=sp noprofile;<br />
class Block A B;<br />
model Y = A B A*B;<br />
random Block A*Block;<br />
parms (60) (20) (6) / noiter;<br />
run;<br />
<strong>The</strong> NOPROFILE option requests PROC <strong>MIXED</strong> to refrain from profiling the residual variance parameter<br />
during its calculations, thereby enabling its value to be held at 6 as specified in the PARMS<br />
statement. <strong>The</strong> NOITER option prevents any Newton-Raphson iterations so that the subsequent<br />
results are based on the given variance components. You can also specify known parameters of G<br />
by using the GDATA= option in the RANDOM statement.<br />
If you specify more than one set of initial values, PROC <strong>MIXED</strong> performs a grid search of the<br />
likelihood surface and uses the best point on the grid for subsequent analysis. Specifying a large<br />
number of grid points can result in long computing times. <strong>The</strong> grid search feature is also useful for<br />
exploring the likelihood surface. (See Example 56.3.)<br />
<strong>The</strong> results from the PARMS statement are the values of the parameters on the specified grid (denoted<br />
by CovP1–CovPn), the residual variance (possibly estimated) for models with a residual<br />
variance parameter, and various functions of the likelihood.<br />
For ODS purposes, the name of the “Parameter Search” table is “ParmSearch.”<br />
You can specify the following options in the PARMS statement after a slash (/).
3938 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
HOLD=value-list<br />
EQCONS=value-list<br />
specifies which parameter values PROC <strong>MIXED</strong> should hold to equal the specified values.<br />
For example, the following statement constrains the first and third covariance parameters to<br />
equal 5 and 2, respectively:<br />
parms (5) (3) (2) (3) / hold=1,3;<br />
LOGDETH<br />
evaluates the log determinant of the Hessian matrix for each point specified in the PARMS<br />
statement. A Log Det H column is added to the “Parameter Search” table.<br />
LOWERB=value-list<br />
enables you to specify lower boundary constraints on the covariance parameters. <strong>The</strong> valuelist<br />
specification is a list of numbers or missing values (.) separated by commas. You must<br />
list the numbers in the order that PROC <strong>MIXED</strong> uses for the covariance parameters, and<br />
each number corresponds to the lower boundary constraint. A missing value instructs PROC<br />
<strong>MIXED</strong> to use its default constraint, and if you do not specify numbers for all of the covariance<br />
parameters, PROC <strong>MIXED</strong> assumes the remaining ones are missing.<br />
An example for which this option is useful is when you want to constrain the G matrix to<br />
be positive definite in order to avoid the more computationally intensive algorithms required<br />
when G becomes singular. <strong>The</strong> corresponding statements for a random coefficients model are<br />
as follows:<br />
proc mixed;<br />
class person;<br />
model y = time;<br />
random int time / type=fa0(2) sub=person;<br />
parms / lowerb=1e-4,.,1e-4;<br />
run;<br />
Here the TYPE=FA0(2) structure is used in order to specify a Cholesky root parameterization<br />
for the 2 2 unstructured blocks in G. This parameterization ensures that the G matrix is<br />
nonnegative definite, and the PARMS statement then ensures that it is positive definite by<br />
constraining the two diagonal terms to be greater than or equal to 1E 4.<br />
NOBOUND<br />
requests the removal of boundary constraints on covariance parameters. For example, variance<br />
components have a default lower boundary constraint of 0, and the NOBOUND option<br />
allows their estimates to be negative.<br />
NOITER<br />
requests that no Newton-Raphson iterations be performed and that PROC <strong>MIXED</strong> use the<br />
best value from the grid search to perform inferences. By default, iterations begin at the best<br />
value from the PARMS grid search.<br />
NOPROFILE<br />
specifies a different computational method for the residual variance during the grid search.<br />
By default, PROC <strong>MIXED</strong> estimates this parameter by using the profile likelihood when
OLS<br />
PRIOR Statement ✦ 3939<br />
appropriate. This estimate is displayed in the Variance column of the “Parameter Search”<br />
table. <strong>The</strong> NOPROFILE option suppresses the profiling and uses the actual value of the<br />
specified variance in the likelihood calculations.<br />
requests starting values corresponding to the usual general linear model. Specifically, all<br />
variances and covariances are set to zero except for the residual variance, which is set equal<br />
to its ordinary least squares (OLS) estimate. This option is useful when the default MIVQUE0<br />
procedure produces poor starting values for the optimization process.<br />
PARMSDATA=<strong>SAS</strong>-data-set<br />
PDATA=<strong>SAS</strong>-data-set<br />
reads in covariance parameter values from a <strong>SAS</strong> data set. <strong>The</strong> data set should contain the Est<br />
or Covp1–Covpn variables.<br />
RATIOS<br />
indicates that ratios with the residual variance are specified instead of the covariance parameters<br />
themselves. <strong>The</strong> default is to use the individual covariance parameters.<br />
UPPERB=value-list<br />
enables you to specify upper boundary constraints on the covariance parameters. <strong>The</strong> valuelist<br />
specification is a list of numbers or missing values (.) separated by commas. You must<br />
list the numbers in the order that PROC <strong>MIXED</strong> uses for the covariance parameters, and<br />
each number corresponds to the upper boundary constraint. A missing value instructs PROC<br />
<strong>MIXED</strong> to use its default constraint, and if you do not specify numbers for all of the covariance<br />
parameters, PROC <strong>MIXED</strong> assumes that the remaining ones are missing.<br />
PRIOR Statement<br />
PRIOR < distribution >< / options > ;<br />
<strong>The</strong> PRIOR statement enables you to carry out a sampling-based Bayesian analysis in PROC<br />
<strong>MIXED</strong>. It currently operates only with variance component models. <strong>The</strong> analysis produces a<br />
<strong>SAS</strong> data set containing a pseudo-random sample from the joint posterior density of the variance<br />
components and other parameters in the mixed model.<br />
<strong>The</strong> posterior analysis is performed after all other PROC <strong>MIXED</strong> computations. It begins with the<br />
“Posterior Sampling Information” table, which provides basic information about the posterior sampling<br />
analysis, including the prior densities, sampling algorithm, sample size, and random number<br />
seed. For ODS purposes, the name of this table is “Posterior.”<br />
By default, PROC <strong>MIXED</strong> uses an independence chain algorithm in order to generate the posterior<br />
sample (Tierney 1994). This algorithm works by generating a pseudo-random proposal from a<br />
convenient base distribution, chosen to be as close as possible to the posterior. <strong>The</strong> proposal is then<br />
retained in the sample with probability proportional to the ratio of weights constructed by taking<br />
the ratio of the true posterior to the base density. If a proposal is not accepted, then a duplicate of<br />
the previous observation is added to the chain.
3940 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
In selecting the base distribution, PROC <strong>MIXED</strong> makes use of the fact that the fixed-effects parameters<br />
can be analytically integrated out of the joint posterior, leaving the marginal posterior density<br />
of the variance components. In order to better approximate the marginal posterior density of the<br />
variance components, PROC <strong>MIXED</strong> transforms them by using the MIVQUE(0) equations. You<br />
can display the selected transformation with the PTRANS option or specify your own with the<br />
TDATA= option. <strong>The</strong> density of the transformed parameters is then approximated by a product of<br />
inverted gamma densities (see Gelfand et al. 1990).<br />
To determine the parameters for the inverted gamma densities, PROC <strong>MIXED</strong> evaluates the logarithm<br />
of the posterior density over a grid of points in each of the transformed parameters, and you<br />
can display the results of this search with the PSEARCH option. PROC <strong>MIXED</strong> then performs a<br />
linear regression of these values on the logarithm of the inverted gamma density. <strong>The</strong> resulting base<br />
densities are displayed in the “Base Densities” table; for ODS purposes, the name of this table is<br />
“BaseDen.” You can input different base densities with the BDATA= option.<br />
At the end of the sampling, the “Acceptance Rates” table displays the acceptance rate computed<br />
as the number of accepted samples divided by the total number of samples generated. For ODS<br />
purposes, the name of the “Acceptance Rates” table is “AcceptanceRates.”<br />
<strong>The</strong> OUT= option specifies the output data set containing the posterior sample. PROC <strong>MIXED</strong> automatically<br />
includes all variance component parameters in this data set (labeled COVP1–COVPn),<br />
the Type 3 F statistics constructed as in Ghosh (1992) discussing Schervish (1992) (labeled T3Fn),<br />
the log values of the posterior (labeled LOGF), the log of the base sampling density (labeled<br />
LOGG), and the log of their ratio (labeled LOGRATIO). If you specify the SOLUTION option<br />
in the MODEL statement, the data set also contains a random sample from the posterior density<br />
of the fixed-effects parameters (labeled BETAn); and if you specify the SOLUTION option in<br />
the RANDOM statement, the table contains a random sample from the posterior density of the<br />
random-effects parameters (labeled GAMn). PROC <strong>MIXED</strong> also generates additional variables<br />
corresponding to any CONTRAST, ESTIMATE, or LSMEANS statement that you specify.<br />
Subsequently, you can use <strong>SAS</strong>/INSIGHT or the UNIVARIATE, CAPABILITY, or KDE procedure<br />
to analyze the posterior sample.<br />
<strong>The</strong> prior density of the variance components is, by default, a noninformative version of Jeffreys’<br />
prior (Box and Tiao 1973). You can also specify informative priors with the DATA= option or a<br />
flat (equal to 1) prior for the variance components. <strong>The</strong> prior density of the fixed-effects parameters<br />
is assumed to be flat (equal to 1), and the resulting posterior is conditionally multivariate normal<br />
(conditioning on the variance component parameters) with mean .X 0 V 1 X/ X 0 V 1 y and variance<br />
.X 0 V 1 X/ .<br />
<strong>The</strong> distribution argument in the PRIOR statement determines the prior density for the variance<br />
component parameters of your mixed model. Valid values are as follows.<br />
DATA=<br />
enables you to input the prior densities of the variance components used by the sampling<br />
algorithm. This data set must contain the Type and Parm1–Parmn variables, where n is the<br />
largest number of parameters among each of the base densities. <strong>The</strong> format of the DATA=<br />
data set matches that created by PROC <strong>MIXED</strong> in the “Base Densities” table, so you can<br />
output the densities from one run and use them as input for a subsequent run.
PRIOR Statement ✦ 3941<br />
JEFFREYS<br />
specifies a noninformative reference version of Jeffreys’ prior constructed by using the square<br />
root of the determinant of the expected information matrix as in (1.3.92) of Box and Tiao<br />
(1973). This is the default prior.<br />
FLAT<br />
specifies a prior density equal to 1 everywhere, making the likelihood function the posterior.<br />
You can specify the following options in the PRIOR statement after a slash (/).<br />
ALG=IC | INDCHAIN<br />
ALG=IS | IMPSAMP<br />
ALG=RS | REJSAMP<br />
ALG=RWC | RWCHAIN<br />
specifies the algorithm used for generating the posterior sample. <strong>The</strong> ALG=IC option requests<br />
an independence chain algorithm, and it is the default. <strong>The</strong> option ALG=IS requests<br />
importance sampling, ALG=RS requests rejection sampling, and ALG=RWC requests a random<br />
walk chain. For more information about these techniques, see Ripley (1987), Smith and<br />
Gelfand (1992), and Tierney (1994).<br />
BDATA=<br />
enables you to input the base densities used by the sampling algorithm. This data set must<br />
contain the Type and Parm1–Parmn variables, where n is the largest number of parameters<br />
among each of the base densities. <strong>The</strong> format of the BDATA= data set matches that created<br />
by PROC <strong>MIXED</strong> in the “Base Densities” table, so you can output the densities from one run<br />
and use them as input for a subsequent run.<br />
GRID=(value-list)<br />
specifies a grid of values over which to evaluate the posterior density. <strong>The</strong> value-list syntax is<br />
the same as in the PARMS statement, and you must specify an output data set name with the<br />
OUTG= option.<br />
GRIDT=(value-list)<br />
specifies a transformed grid of values over which to evaluate the posterior density. <strong>The</strong> valuelist<br />
syntax is the same as in the PARMS statement, and you must specify an output data set<br />
name with the OUTGT= option.<br />
IFACTOR=number<br />
is an alias for the SFACTOR= option.<br />
LOGNOTE=number<br />
instructs PROC <strong>MIXED</strong> to write a note to the <strong>SAS</strong> log after it generates the sample corresponding<br />
to each multiple of number. This is useful for monitoring the progress of CPUintensive<br />
runs.<br />
LOGRBOUND=number<br />
specifies the bounding constant for rejection sampling. <strong>The</strong> value of number equals the maximum<br />
of logff =gg over the variance component parameter space, where f is the posterior<br />
density and g is the product inverted gamma densities used to perform rejection sampling.<br />
When performing the rejection sampling, you might encounter the following message:
3942 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
WARNING: <strong>The</strong> log ratio bound of LL was violated at sample XX.<br />
When this occurs, PROC <strong>MIXED</strong> reruns an optimization algorithm to determine a new log<br />
upper bound and then restarts the rejection sampling. <strong>The</strong> resulting OUT= data set contains<br />
all observations that have been generated; therefore, assuming that you have requested N<br />
samples, you should retain only the final N observations in this data set for analysis purposes.<br />
NSAMPLE=number<br />
specifies the number of posterior samples to generate. <strong>The</strong> default is 1000, but more accurate<br />
results are obtained with larger samples such as 10000.<br />
NSEARCH=number<br />
specifies the number of posterior evaluations PROC <strong>MIXED</strong> makes for each transformed<br />
parameter in determining the parameters for the inverted gamma densities. <strong>The</strong> default is 20.<br />
OUT=<strong>SAS</strong>-data-set<br />
creates an output data set containing the sample from the posterior density.<br />
OUTG=<strong>SAS</strong>-data-set<br />
creates an output data set from the grid evaluations specified in the GRID= option.<br />
OUTGT=<strong>SAS</strong>-data-set<br />
creates an output data set from the transformed grid evaluations specified in the GRIDT=<br />
option.<br />
PSEARCH<br />
displays the search used to determine the parameters for the inverted gamma densities. For<br />
ODS purposes, the name of the table is “Search.”<br />
PTRANS<br />
displays the transformation of the variance components. For ODS purposes, the name of the<br />
table is “Trans.”<br />
SEED=number<br />
specifies an integer used to start the pseudo-random number generator for the simulation. If<br />
you do not specify a seed, or if you specify a value less than or equal to zero, the seed is by<br />
default generated from reading the time of day from the computer clock. You should use a<br />
positive seed (less than 2 31 1) whenever you want to duplicate the sample in another run of<br />
PROC <strong>MIXED</strong>.<br />
SFACTOR=number<br />
enables you to adjust the range over which PROC <strong>MIXED</strong> searches the transformed parameters<br />
in order to determine the parameters for the inverted gamma densities. PROC <strong>MIXED</strong><br />
determines the range by first transforming the estimates from the standard PROC <strong>MIXED</strong><br />
analysis (REML, ML, or MIVQUE0, depending upon which estimation method you select).<br />
It then multiplies and divides the transformed estimates by 2 number to obtain upper and<br />
lower bounds, respectively. Transformed values that produce negative variance components<br />
in the original scale are not included in the search. <strong>The</strong> default value is 1; number must be<br />
greater than 0.5.
RANDOM Statement ✦ 3943<br />
TDATA=<br />
enables you to input the transformation of the covariance parameters used by the sampling<br />
algorithm. This data set should contain the CovP1–CovPn variables. <strong>The</strong> format of the<br />
TDATA= data set matches that created by PROC <strong>MIXED</strong> in the “Trans” table, so you can<br />
output the transformation from one run and use it as input for a subsequent run.<br />
TRANS=EXPECTED<br />
TRANS=MIVQUE0<br />
TRANS=OBSERVED<br />
specifies the particular algorithm used to determine the transformation of the covariance parameters.<br />
<strong>The</strong> default is MIVQUE0, indicating a transformation based on the MIVQUE(0)<br />
equations. <strong>The</strong> other two options indicate the type of Hessian matrix used in constructing the<br />
transformation via a Cholesky root.<br />
UPDATE=number<br />
is an alias for the LOGNOTE= option.<br />
RANDOM Statement<br />
RANDOM random-effects < / options > ;<br />
<strong>The</strong> RANDOM statement defines the random effects constituting the vector in the mixed model.<br />
It can be used to specify traditional variance component models (as in the VARCOMP procedure)<br />
and to specify random coefficients. <strong>The</strong> random effects can be classification or continuous, and<br />
multiple RANDOM statements are possible.<br />
Using notation from the section “Mixed Models <strong>The</strong>ory” on page 3962, the purpose of the RAN-<br />
DOM statement is to define the Z matrix of the mixed model, the random effects in the vector,<br />
and the structure of G. <strong>The</strong> Z matrix is constructed exactly as the X matrix for the fixed effects,<br />
and the G matrix is constructed to correspond with the effects constituting Z. <strong>The</strong> structure of G is<br />
defined by using the TYPE= option.<br />
You can specify INTERCEPT (or INT) as a random effect to indicate the intercept. PROC <strong>MIXED</strong><br />
does not include the intercept in the RANDOM statement by default as it does in the MODEL<br />
statement.<br />
Table 56.11 summarizes important options in the RANDOM statement. All options are subsequently<br />
discussed in alphabetical order.<br />
Table 56.11 Summary of Important RANDOM Statement Options<br />
Option Description<br />
Construction of Covariance Structure<br />
GDATA= requests that the G matrix be read from a <strong>SAS</strong> data set<br />
GROUP= varies covariance parameters by groups<br />
LDATA= specifies data set with coefficient matrices for TYPE= LIN<br />
NOFULLZ eliminates columns in Z corresponding to missing values
3944 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.11 continued<br />
Option Description<br />
RATIOS indicates that ratios are specified in the GDATA= data set<br />
SUBJECT= identifies the subjects in the model<br />
TYPE= specifies the covariance structure<br />
Statistical Output<br />
ALPHA=˛ determines the confidence level (1 ˛)<br />
CL requests confidence limits for predictors of random effects<br />
G displays the estimated G matrix<br />
GC displays the Cholesky root (lower) of estimated G matrix<br />
GCI displays the inverse Cholesky root (lower) of estimated G matrix<br />
GCORR displays the correlation matrix corresponding to estimated G matrix<br />
GI displays the inverse of the estimated G matrix<br />
SOLUTION displays solutions b of the G-side random effects<br />
V displays blocks of the estimated V matrix<br />
VC displays the lower-triangular Cholesky root of blocks of the estimated<br />
V matrix<br />
VCI displays the inverse Cholesky root of blocks of the estimated V<br />
matrix<br />
VCORR displays the correlation matrix corresponding to blocks of the estimated<br />
V matrix<br />
VI displays the inverse of the blocks of the estimated V matrix<br />
You can specify the following options in the RANDOM statement after a slash (/).<br />
ALPHA=number<br />
requests that a t-type confidence interval be constructed for each of the random-effect estimates<br />
with confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the<br />
default is 0.05.<br />
CL<br />
G<br />
GC<br />
GCI<br />
requests that t-type confidence limits be constructed for each of the random-effect estimates.<br />
<strong>The</strong> confidence level is 0.95 by default; this can be changed with the ALPHA= option.<br />
requests that the estimated G matrix be displayed. PROC <strong>MIXED</strong> displays blanks for values<br />
that are 0. If you specify the SUBJECT= option, then the block of the G matrix corresponding<br />
to the first subject is displayed. For ODS purposes, the name of the table is “G.”<br />
displays the lower-triangular Cholesky root of the estimated G matrix according to the rules<br />
listed under the G option. For ODS purposes, the name of the table is “CholG.”<br />
displays the inverse Cholesky root of the estimated G matrix according to the rules listed<br />
under the G option. For ODS purposes, the name of the table is “InvCholG.”
RANDOM Statement ✦ 3945<br />
GCORR<br />
displays the correlation matrix corresponding to the estimated G matrix according to the rules<br />
listed under the G option. For ODS purposes, the name of the table is “GCorr.”<br />
GDATA=<strong>SAS</strong>-data-set<br />
requests that the G matrix be read in from a <strong>SAS</strong> data set. This G matrix is assumed to<br />
be known; therefore, only R-side parameters from effects in the REPEATED statement are<br />
included in the Newton-Raphson iterations. If no REPEATED statement is specified, then<br />
only a residual variance is estimated.<br />
GI<br />
GROUP=effect<br />
<strong>The</strong> information in the GDATA= data set can appear in one of two ways. <strong>The</strong> first is a<br />
sparse representation for which you include Row, Col, and Value variables to indicate the row,<br />
column, and value of G, respectively. All unspecified locations are assumed to be 0. <strong>The</strong><br />
second representation is for dense matrices. In it you include Row and Col1–Coln variables to<br />
indicate, respectively, the row and columns of G, which is a symmetric matrix of order n. For<br />
both representations, you must specify effects in the RANDOM statement that generate a Z<br />
matrix that contains n columns. (See Example 56.4.)<br />
If you have more than one RANDOM statement, only one GDATA= option is required in any<br />
one of them, and the data set you specify must contain the entire G matrix defined by all of<br />
the RANDOM statements.<br />
If the GDATA= data set contains variance ratios instead of the variances themselves, then use<br />
the RATIOS option.<br />
Known parameters of G can also be input by using the PARMS statement with the HOLD=<br />
option.<br />
displays the inverse of the estimated G matrix according to the rules listed under the G option.<br />
For ODS purposes, the name of the table is “InvG.”<br />
GRP=effect<br />
defines an effect specifying heterogeneity in the covariance structure of G. All observations<br />
having the same level of the group effect have the same covariance parameters. Each new<br />
level of the group effect produces a new set of covariance parameters with the same structure<br />
as the original group. You should exercise caution in defining the group effect, because<br />
strange covariance patterns can result from its misuse. Also, the group effect can greatly<br />
increase the number of estimated covariance parameters, which can adversely affect the optimization<br />
process.<br />
Continuous variables are permitted as arguments to the GROUP= option. PROC <strong>MIXED</strong><br />
does not sort by the values of the continuous variable; rather, it considers the data to be<br />
from a new subject or group whenever the value of the continuous variable changes from the<br />
previous observation. Using a continuous variable decreases execution time for models with<br />
a large number of subjects or groups and also prevents the production of a large “Class Level<br />
Information” table.<br />
LDATA=<strong>SAS</strong>-data-set<br />
reads the coefficient matrices associated with the TYPE=LIN(number) option. <strong>The</strong> data set
3946 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
must contain the variables Parm, Row, Col1–Coln or Parm, Row, Col, Value. <strong>The</strong> Parm variable<br />
denotes which of the number coefficient matrices is currently being constructed, and the Row,<br />
Col1–Coln, or Row, Col, Value variables specify the matrix values, as they do with the GDATA=<br />
option. Unspecified values of these matrices are set equal to 0.<br />
NOFULLZ<br />
eliminates the columns in Z corresponding to missing levels of random effects involving<br />
CLASS variables. By default, these columns are included in Z.<br />
RATIOS<br />
indicates that ratios with the residual variance are specified in the GDATA= data set instead of<br />
the covariance parameters themselves. <strong>The</strong> default GDATA= data set contains the individual<br />
covariance parameters.<br />
SOLUTION<br />
S<br />
requests that the solution for the random-effects parameters be produced. Using notation<br />
from the section “Mixed Models <strong>The</strong>ory” on page 3962, these estimates are the empirical<br />
best linear unbiased predictors (EBLUPs) b D bGZ 0bV 1 .y Xbˇ/. <strong>The</strong>y can be useful for<br />
comparing the random effects from different experimental units and can also be treated as<br />
residuals in performing diagnostics for your mixed model.<br />
<strong>The</strong> numbers displayed in the SE Pred column of the “Solution for Random Effects” table<br />
are not the standard errors of the b displayed in the Estimate column; rather, they are the<br />
standard errors of predictions bi i, where bi is the ith EBLUP and i is the ith randomeffect<br />
parameter.<br />
SUBJECT=effect<br />
SUB=effect<br />
identifies the subjects in your mixed model. Complete independence is assumed across subjects;<br />
thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal<br />
structure in G with identical blocks. <strong>The</strong> Z matrix is modified to accommodate this block<br />
diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the<br />
RANDOM statement within the subject effect.<br />
Continuous variables are permitted as arguments to the SUBJECT= option. PROC <strong>MIXED</strong><br />
does not sort by the values of the continuous variable; rather, it considers the data to be<br />
from a new subject or group whenever the value of the continuous variable changes from the<br />
previous observation. Using a continuous variable decreases execution time for models with<br />
a large number of subjects or groups and also prevents the production of a large “Class Level<br />
Information” table.<br />
When you specify the SUBJECT= option and a classification random effect, computations<br />
are usually much quicker if the levels of the random effect are duplicated within each level of<br />
the SUBJECT= effect.<br />
TYPE=covariance-structure<br />
specifies the covariance structure of G. Valid values for covariance-structure and their descriptions<br />
are listed in Table 56.13 and Table 56.14. Although a variety of structures are
RANDOM Statement ✦ 3947<br />
available, most applications call for either TYPE=VC or TYPE=UN. <strong>The</strong> TYPE=VC (variance<br />
components) option is the default structure, and it models a different variance component<br />
for each random effect.<br />
<strong>The</strong> TYPE=UN (unstructured) option is useful for correlated random coefficient models. For<br />
example, the following statement specifies a random intercept-slope model that has different<br />
variances for the intercept and slope and a covariance between them:<br />
random intercept age / type=un subject=person;<br />
You can also use TYPE=FA0(2) here to request a G estimate that is constrained to be nonnegative<br />
definite.<br />
If you are constructing your own columns of Z with continuous variables, you can use the<br />
TYPE=TOEP(1) structure to group them together to have a common variance component. If<br />
you want to have different covariance structures in different parts of G, you must use multiple<br />
RANDOM statements with different TYPE= options.<br />
V< =value-list ><br />
requests that blocks of the estimated V matrix be displayed. <strong>The</strong> first block determined by<br />
the SUBJECT= effect is the default displayed block. PROC <strong>MIXED</strong> displays entries that are<br />
0 as blanks in the table.<br />
You can optionally use the value-list specification, which indicates the subjects for which<br />
blocks of V are to be displayed. For example, the following statement displays block matrices<br />
for the first, third, and seventh persons:<br />
random int time / type=un subject=person v=1,3,7;<br />
<strong>The</strong> table name for ODS purposes is “V.”<br />
VC< =value-list ><br />
displays the Cholesky root of the blocks of the estimated V matrix. <strong>The</strong> value-list specification<br />
is the same as in the V= option. <strong>The</strong> table name for ODS purposes is “CholV.”<br />
VCI< =value-list ><br />
displays the inverse of the Cholesky root of the blocks of the estimated V matrix. <strong>The</strong> valuelist<br />
specification is the same as in the V= option. <strong>The</strong> table name for ODS purposes is “Inv-<br />
CholV.”<br />
VCORR< =value-list ><br />
displays the correlation matrix corresponding to the blocks of the estimated V matrix. <strong>The</strong><br />
value-list specification is the same as in the V= option. <strong>The</strong> table name for ODS purposes is<br />
“VCorr.”<br />
VI< =value-list ><br />
displays the inverse of the blocks of the estimated V matrix. <strong>The</strong> value-list specification is<br />
the same as in the V= option. <strong>The</strong> table name for ODS purposes is “InvV.”
3948 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
REPEATED Statement<br />
REPEATED < repeated-effect >< / options > ;<br />
<strong>The</strong> REPEATED statement is used to specify the R matrix in the mixed model. Its syntax is different<br />
from that of the REPEATED statement in PROC GLM. If no REPEATED statement is specified, R<br />
is assumed to be equal to 2 I.<br />
For many repeated measures models, no repeated effect is required in the REPEATED statement.<br />
Simply use the SUBJECT= option to define the blocks of R and the TYPE= option to define their<br />
covariance structure. In this case, the repeated measures data must be similarly ordered for each<br />
subject, and you must indicate all missing response variables with periods in the input data set unless<br />
they all fall at the end of a subject’s repeated response profile. <strong>The</strong>se requirements are necessary in<br />
order to inform PROC <strong>MIXED</strong> of the proper location of the observed repeated responses.<br />
Specifying a repeated effect is useful when you do not want to indicate missing values with periods<br />
in the input data set. <strong>The</strong> repeated effect must contain only classification variables. Make sure that<br />
the levels of the repeated effect are different for each observation within a subject; otherwise, PROC<br />
<strong>MIXED</strong> constructs identical rows in R corresponding to the observations with the same level. This<br />
results in a singular R and an infinite likelihood.<br />
Whether you specify a REPEATED effect or not, the rows of R for each subject are constructed in<br />
the order in which they appear in the input data set.<br />
Table 56.12 summarizes important options in the REPEATED statement. All options are subsequently<br />
discussed in alphabetical order.<br />
Table 56.12 Summary of Important REPEATED Statement Options<br />
Option Description<br />
Construction of Covariance Structure<br />
GROUP= defines an effect specifying heterogeneity in the R-side covariance<br />
structure<br />
LDATA= specifies data set with coefficient matrices for TYPE= LIN<br />
LOCAL requests that a diagonal matrix be added to R<br />
LOCALW specifies that only the local effects are weighted<br />
NOLOCALW specifies that only the nonlocal effects are weighted<br />
SUBJECT= identifies the subjects in the R-side model<br />
TYPE= specifies the R-side covariance structure<br />
Statistical Output<br />
HLM produces a table of Hotelling-Lawley-McKeon statistics (McKeon<br />
1974)<br />
HLPS produces a table of Hotelling-Lawley-Pillai-Samson statistics (Pillai<br />
and Samson 1959)<br />
R displays blocks of the estimated R matrix<br />
RC display the Cholesky root (lower) of blocks of the estimated R<br />
matrix
Table 56.12 continued<br />
Option Description<br />
REPEATED Statement ✦ 3949<br />
RCI displays the inverse Cholesky root (lower) of blocks of the estimated<br />
R matrix<br />
RCORR displays the correlation matrix corresponding to blocks of the estimated<br />
R matrix<br />
RI displays the inverse of blocks of the estimated R matrix<br />
You can specify the following options in the REPEATED statement after a slash (/).<br />
GROUP=effect<br />
GRP=effect<br />
defines an effect specifying heterogeneity in the covariance structure of R. All observations<br />
having the same level of the GROUP effect have the same covariance parameters. Each<br />
new level of the GROUP effect produces a new set of covariance parameters with the same<br />
structure as the original group. You should exercise caution in properly defining the GROUP<br />
effect, because strange covariance patterns can result with its misuse. Also, the GROUP effect<br />
can greatly increase the number of estimated covariance parameters, which can adversely<br />
affect the optimization process.<br />
HLM<br />
HLPS<br />
Continuous variables are permitted as arguments to the GROUP= option. PROC <strong>MIXED</strong><br />
does not sort by the values of the continuous variable; rather, it considers the data to be<br />
from a new subject or group whenever the value of the continuous variable changes from the<br />
previous observation. Using a continuous variable decreases execution time for models with<br />
a large number of subjects or groups and also prevents the production of a large “Class Level<br />
Information” table.<br />
produces a table of Hotelling-Lawley-McKeon statistics (McKeon 1974) for all fixed effects<br />
whose levels change across data having the same level of the SUBJECT= effect (the withinsubject<br />
fixed effects). This option applies only when you specify a REPEATED statement<br />
with the TYPE=UN option and no RANDOM statements. For balanced data, this model is<br />
equivalent to the multivariate model for repeated measures in PROC GLM.<br />
<strong>The</strong> Hotelling-Lawley-McKeon statistic has a slightly better F approximation than the<br />
Hotelling-Lawley-Pillai-Samson statistic (see the description of the HLPS option, which follows).<br />
Both of the Hotelling-Lawley statistics can perform much better in small samples than<br />
the default F statistic (Wright 1994).<br />
Separate tables are produced for Type 1, 2, and 3 tests, according to the ones you select. For<br />
ODS purposes, the table names are “HLM1,” “HLM2,” and “HLM3,” respectively.<br />
produces a table of Hotelling-Lawley-Pillai-Samson statistics (Pillai and Samson 1959) for all<br />
fixed effects whose levels change across data having the same level of the SUBJECT= effect<br />
(the within-subject fixed effects). This option applies only when you specify a REPEATED<br />
statement with the TYPE=UN option and no RANDOM statements. For balanced data, this
3950 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
model is equivalent to the multivariate model for repeated measures in PROC GLM, and this<br />
statistic is the same as the Hotelling-Lawley Trace statistic produced by PROC GLM.<br />
Separate tables are produced for Type 1, 2, and 3 tests, according to the ones you select. For<br />
ODS purposes, the table names are “HLPS1,” “HLPS2,” and “HLPS3,” respectively.<br />
LDATA=<strong>SAS</strong>-data-set<br />
reads the coefficient matrices associated with the TYPE=LIN(number) option. <strong>The</strong> data set<br />
must contain the variables Parm, Row, Col1–Coln or Parm, Row, Col, Value. <strong>The</strong> Parm variable<br />
denotes which of the number coefficient matrices is currently being constructed, and the<br />
Row, Col1–Coln, or Row, Col, Value variables specify the matrix values, as they do with the<br />
RANDOM statement option GDATA=. Unspecified values of these matrices are set equal to<br />
0.<br />
LOCAL<br />
LOCAL=POM(POM-data-set)<br />
requests that a diagonal matrix be added to R. With just the LOCAL option, this diagonal<br />
matrix equals 2 I, and 2 becomes an additional variance parameter that PROC <strong>MIXED</strong><br />
profiles out of the likelihood provided that you do not specify the NOPROFILE option in the<br />
PROC <strong>MIXED</strong> statement. <strong>The</strong> LOCAL option is useful if you want to add an observational<br />
error to a time series structure (Jones and Boadi-Boateng 1991) or a nugget effect to a spatial<br />
structure (Cressie 1993).<br />
<strong>The</strong> LOCAL=EXP() option produces exponential local effects, also known as dispersion<br />
effects, in a log-linear variance model. <strong>The</strong>se local effects have the form<br />
2 diagŒexp.Uı/<br />
where U is the full-rank design matrix corresponding to the effects that you specify and ı are<br />
the parameters that PROC <strong>MIXED</strong> estimates. An intercept is not included in U because it is<br />
accounted for by 2 . PROC <strong>MIXED</strong> constructs the full-rank U in terms of 1s and 1s for<br />
classification effects. Be sure to scale continuous effects in U sensibly.<br />
<strong>The</strong> LOCAL=POM(POM-data-set) option specifies the power-of-the-mean structure. This<br />
structure possesses a variance of the form 2 jx 0 i ˇ j for the ith observation, where xi is the<br />
ith row of X (the design matrix of the fixed effects) and ˇ is an estimate of the fixed-effects<br />
parameters that you specify in POM-data-set.<br />
<strong>The</strong> <strong>SAS</strong> data set specified by POM-data-set contains the numeric variable Estimate (in previous<br />
releases, the variable name was required to be EST), and it has at least as many observations<br />
as there are fixed-effects parameters. <strong>The</strong> first p observations of the Estimate variable<br />
in POM-data-set are taken to be the elements of ˇ , where p is the number of columns of<br />
X. You must order these observations according to the non-full-rank parameterization of the<br />
<strong>MIXED</strong> procedure. One easy way to set up POM-data-set for a ˇ corresponding to ordinary<br />
least squares is illustrated by the following statements:<br />
ods output SolutionF=sf;<br />
proc mixed;<br />
class a;<br />
model y = a x / s;<br />
run;
proc mixed;<br />
class a;<br />
model y = a x;<br />
repeated / local=pom(sf);<br />
run;<br />
REPEATED Statement ✦ 3951<br />
Note that the generalized least squares estimate of the fixed-effects parameters from the second<br />
PROC <strong>MIXED</strong> step usually is not the same as your specified ˇ . However, you can<br />
iterate the POM fitting until the two estimates agree. Continuing from the previous example,<br />
the statements for performing one step of this iteration are as follows:<br />
ods output SolutionF=sf1;<br />
proc mixed;<br />
class a;<br />
model y = a x / s;<br />
repeated / local=pom(sf);<br />
run;<br />
proc compare brief data=sf compare=sf1;<br />
var estimate;<br />
run;<br />
data sf;<br />
set sf1;<br />
run;<br />
Unfortunately, this iterative process does not always converge. For further details, refer to the<br />
description of pseudo-likelihood in Chapter 3 of Carroll and Ruppert (1988).<br />
LOCALW<br />
specifies that only the local effects and no others be weighted. By default, all effects are<br />
weighted. <strong>The</strong> LOCALW option is used in connection with the WEIGHT statement and the<br />
LOCAL option in the REPEATED statement.<br />
NONLOCALW<br />
specifies that only the nonlocal effects and no others be weighted. By default, all effects are<br />
weighted. <strong>The</strong> NONLOCALW option is used in connection with the WEIGHT statement and<br />
the LOCAL option in the REPEATED statement.<br />
R< =value-list ><br />
requests that blocks of the estimated R matrix be displayed. <strong>The</strong> first block determined by<br />
the SUBJECT= effect is the default displayed block. PROC <strong>MIXED</strong> displays blanks for<br />
value-lists that are 0.<br />
<strong>The</strong> value-list indicates the subjects for which blocks of R are to be displayed. For example,<br />
the following statement displays block matrices for the first, third, and fifth persons:<br />
repeated / type=cs subject=person r=1,3,5;<br />
See the PARMS statement for the possible forms of value-list. <strong>The</strong> table name for ODS<br />
purposes is “R.”
3952 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
RC< =value-list ><br />
produces the Cholesky root of blocks of the estimated R matrix. <strong>The</strong> value-list specification<br />
is the same as with the R option. <strong>The</strong> table name for ODS purposes is “CholR.”<br />
RCI< =value-list ><br />
produces the inverse Cholesky root of blocks of the estimated R matrix. <strong>The</strong> value-list specification<br />
is the same as with the R option. <strong>The</strong> table name for ODS purposes is “InvCholR.”<br />
RCORR< =value-list ><br />
produces the correlation matrix corresponding to blocks of the estimated R matrix. <strong>The</strong><br />
value-list specification is the same as with the R option. <strong>The</strong> table name for ODS purposes is<br />
“RCorr.”<br />
RI< =value-list ><br />
produces the inverse of blocks of the estimated R matrix. <strong>The</strong> value-list specification is the<br />
same as with the R option. <strong>The</strong> table name for ODS purposes is “InvR.”<br />
SSCP<br />
requests that an unstructured R matrix be estimated from the sum-of-squares-andcrossproducts<br />
matrix of the residuals. It applies only when you specify TYPE=UN and<br />
have no RANDOM statements. Also, you must have a sufficient number of subjects for the<br />
estimate to be positive definite.<br />
This option is useful when the size of the blocks of R is large (for example, greater than 10)<br />
and you want to use or inspect an unstructured estimate that is much quicker to compute than<br />
the default REML estimate. <strong>The</strong> two estimates will agree for certain balanced data sets when<br />
you have a classification fixed effect defined across all time points within a subject.<br />
SUBJECT=effect<br />
SUB=effect<br />
identifies the subjects in your mixed model. Complete independence is assumed across subjects;<br />
therefore, the SUBJECT= option produces a block-diagonal structure in R with identical<br />
blocks. When the SUBJECT= effect consists entirely of classification variables, the<br />
blocks of R correspond to observations sharing the same level of that effect. <strong>The</strong>se blocks are<br />
sorted according to this effect as well.<br />
Continuous variables are permitted as arguments to the SUBJECT= option. PROC <strong>MIXED</strong><br />
does not sort by the values of the continuous variable; rather, it considers the data to be<br />
from a new subject or group whenever the value of the continuous variable changes from the<br />
previous observation. Using a continuous variable decreases execution time for models with<br />
a large number of subjects or groups and also prevents the production of a large “Class Level<br />
Information” table.<br />
If you want to model nonzero covariance among all of the observations in your <strong>SAS</strong> data set,<br />
specify SUBJECT=INTERCEPT to treat the data as if they are all from one subject. However,<br />
be aware that in this case PROC <strong>MIXED</strong> manipulates an R matrix with dimensions equal to<br />
the number of observations. If no SUBJECT= effect is specified, then every observation is<br />
assumed to be from a different subject and R is assumed to be diagonal. For this reason, you<br />
usually want to use the SUBJECT= option in the REPEATED statement.
REPEATED Statement ✦ 3953<br />
TYPE=covariance-structure<br />
specifies the covariance structure of the R matrix. <strong>The</strong> SUBJECT= option defines the<br />
blocks of R, and the TYPE= option specifies the structure of these blocks. Valid values<br />
for covariance-structure and their descriptions are provided in Table 56.13 and Table 56.14.<br />
<strong>The</strong> default structure is VC.<br />
Table 56.13 Covariance Structures<br />
Structure Description Parms .i; j /th element<br />
ANTE(1) Ante-dependence 2t 1 i<br />
Qj 1<br />
j kDi k<br />
AR(1) Autoregressive(1) 2 2 ji j j<br />
ARH(1) Heterogeneous AR(1) t C 1 i j<br />
ARMA(1,1) ARMA(1,1) 3 2 Œ ji j j 1 1.i ¤ j / C 1.i D j /<br />
ji j j<br />
CS Compound Symmetry 2 1 C 2 1.i D j /<br />
CSH Heterogeneous CS t C 1 i j Œ 1.i ¤ j / C 1.i D j /<br />
FA(q) Factor Analytic<br />
FA0(q) No Diagonal FA<br />
FA1(q) Equal Diagonal FA<br />
q<br />
2 .2t q C 1/ C t †min.i;j;q/<br />
kD1 ik jk C 2 i 1.i D j /<br />
q<br />
2 .2t q C 1/ †min.i;j;q/<br />
kD1 ik jk<br />
q<br />
2 .2t q C 1/ C 1 †min.i;j;q/<br />
kD1 ik jk C 21.i D j /<br />
HF Huynh-Feldt t C 1 . 2 i C 2 j<br />
/=2 C 1.i ¤ j /<br />
LIN(q) General Linear q † q<br />
kD1 kAij<br />
TOEP Toeplitz t ji j jC1<br />
TOEP(q) Banded Toeplitz q ji j jC11.ji j j < q/<br />
TOEPH Heterogeneous TOEP 2t 1 i j ji j j<br />
TOEPH(q) Banded Hetero TOEP t C q 1 i j ji j j1.ji j j < q/<br />
UN Unstructured t.t C 1/=2 ij<br />
UN(q) Banded<br />
q<br />
2 .2t q C 1/ ij 1.ji j j < q/<br />
UNR Unstructured Corrs t.t C 1/=2 i j max.i;j / min.i;j /<br />
UNR(q) Banded Correlations<br />
q<br />
2 .2t q C 1/ i j max.i;j / min.i;j /<br />
ji2 j2j<br />
UN@AR(1) Direct Product AR(1) t1.t1 C 1/=2 C 1 i1j1<br />
UN@CS Direct Product CS t1.t1 C 1/=2 C 1<br />
UN@UN Direct Product UN t1.t1 C 1/=2 C 1;i1j1 2;i2j2<br />
t2.t2 C 1/=2 1<br />
8<br />
<<br />
:<br />
i1j1<br />
i2 D j2<br />
2 i1j1 i2 6D j2<br />
0 2 1<br />
2<br />
VC Variance Components q 1.i D j /<br />
k<br />
and i corresponds to kth effect<br />
In Table 56.13, “Parms” is the number of covariance parameters in the structure, t is the<br />
overall dimension of the covariance matrix, and 1.A/ equals 1 when A is true and 0 otherwise.<br />
For example, 1.i D j / equals 1 when i D j and 0 otherwise, and 1.ji j j < q/ equals<br />
1 when ji j j < q and 0 otherwise. For the TOEPH structures, 0 D 1, and for the UNR
3954 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
structures, ii D 1 for all i. For the direct product structures, the subscripts “1” and “2” refer<br />
to the first and second structure in the direct product, respectively, and i1 D int..i C t2<br />
1/=t2/, j1 D int..j C t2 1/=t2/, i2 D mod.i 1; t2/ C 1, and j2 D mod.j 1; t2/ C 1.<br />
Table 56.14 Spatial Covariance Structures<br />
Structure Description Parms .i; j /th element<br />
SP(EXP)(c-list ) Exponential 2 2 expf dij = g<br />
SP(EXPA)(c-list ) Anisotropic Exponential 2c C 1 2 Q c<br />
kD1 expf kd.i; j; k/ pkg<br />
SP(EXPGA)(c1 c2) 2D Exponential, 4 2 expf dij . ; /= g<br />
Geometrically Anisotropic<br />
SP(GAU)(c-list ) Gaussian 2 2 expf d 2 ij = 2 g<br />
SP(GAUGA)(c1 c2) 2D Gaussian, 4 2 expf dij . ; / 2 = 2 g<br />
Geometrically Anisotropic<br />
SP(LIN)(c-list ) Linear 2 2 .1 dij / 1. dij 1/<br />
SP(LINL)(c-list ) Linear Log 2 2 .1 log.dij //<br />
1. log.dij / 1/<br />
SP(MATERN)(c-list) Matérn 3 2 1<br />
. /<br />
SP(MATHSW)(c-list) Matérn 3 2 1<br />
. /<br />
(Handcock-Stein-Wallis)<br />
SP(POW)(c-list) Power 2 2 dij<br />
SP(POWA)(c-list) Anisotropic Power c C 1 2 d.i;j;1/<br />
1<br />
dij<br />
2<br />
p<br />
dij<br />
d.i;j;2/<br />
2<br />
2K .dij = /<br />
2K 2dij<br />
p<br />
: : : d.i;j;c/<br />
c<br />
SP(SPH)(c-list ) Spherical 2 2 3dij<br />
Œ1 . 2 / C .<br />
2 3 / 1.dij<br />
SP(SPHGA)(c1 c2) 2D Spherical, 4 2 Œ1 . 3dij . ; /<br />
2<br />
Geometrically Anisotropic 1.dij . ; / /<br />
d 3<br />
ij<br />
/ C . dij . ; / 3<br />
2 3 /<br />
In Table 56.14, c-list contains the names of the numeric variables used as coordinates of the<br />
location of the observation in space, and dij is the Euclidean distance between the ith and<br />
jth vectors of these coordinates, which correspond to the ith and jth observations in the input<br />
data set. For SP(POWA) and SP(EXPA), c is the number of coordinates, and d.i; j; k/ is the<br />
absolute distance between the kth coordinate, k D 1; : : : ; c, of the ith and jth observations in<br />
the input data set. For the geometrically anisotropic structures SP(EXPGA), SP(GAUGA),<br />
and SP(SPHGA), exactly two spatial coordinate variables must be specified as c1 and c2.<br />
Geometric anisotropy is corrected by applying a rotation and scaling to the coordinate<br />
system, and dij . ; / represents the Euclidean distance between two points in the transformed<br />
space. SP(MATERN) and SP(MATHSW) represent covariance structures in a class defined<br />
by Matérn (see Matérn 1986, Handcock and Stein 1993, Handcock and Wallis 1994). <strong>The</strong><br />
function K is the modified Bessel function of the second kind of (real) order > 0; the<br />
parameter governs the smoothness of the process (see below for more details).<br />
Table 56.15 lists some examples of the structures in Table 56.13 and Table 56.14.<br />
/
Table 56.15 Covariance Structure Examples<br />
Description Structure Example<br />
2<br />
Variance<br />
Components<br />
Compound<br />
Symmetry<br />
VC (default)<br />
CS<br />
Unstructured UN<br />
Banded Main<br />
Diagonal<br />
First-Order<br />
Autoregressive<br />
UN(1)<br />
AR(1)<br />
Toeplitz TOEP<br />
Toeplitz with<br />
Two Bands<br />
Spatial<br />
Power<br />
Heterogeneous<br />
AR(1)<br />
First-Order<br />
Autoregressive<br />
Moving-Average<br />
TOEP(2)<br />
SP(POW)(c)<br />
ARH(1)<br />
ARMA(1,1)<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
2<br />
B 0 0 0<br />
0 2 B 0 0<br />
0 0 2 AB 0<br />
0 0 0 2 AB<br />
3<br />
7<br />
5<br />
2 C 1 1 1 1<br />
2<br />
1 C 1 1 1<br />
2<br />
1 1 C 1 1<br />
2<br />
1 1 1 C 1<br />
3<br />
2<br />
1 21 31 41<br />
2<br />
21 2 32 42<br />
2<br />
31 32 3 43<br />
2<br />
41 42 43 4<br />
3<br />
2<br />
1 0 0 0<br />
2 0 2 0 0 7<br />
2 0 0 3 0 5<br />
2 0 0 0 4<br />
2<br />
3<br />
1 2 3<br />
2<br />
6<br />
4<br />
6<br />
4<br />
1<br />
2<br />
2 1<br />
3 2 1<br />
2<br />
1 2 3<br />
2<br />
1 1 2<br />
2<br />
2 1 1<br />
2<br />
3 2 1<br />
3<br />
7<br />
5<br />
2<br />
3<br />
2<br />
1 0 0<br />
6 2<br />
6 1 1 0 7<br />
4 0 2 5<br />
1 1<br />
0 0 2<br />
1<br />
2<br />
2<br />
2<br />
6<br />
4<br />
6<br />
4<br />
7<br />
5<br />
7<br />
5<br />
1 d12 d13 d14<br />
d21 1 d23 d24<br />
d31 d32 1 d34<br />
d41 d42 d43 1<br />
REPEATED Statement ✦ 3955<br />
3<br />
7<br />
5<br />
2<br />
1 1 2 1 3 2<br />
1 4 3<br />
2<br />
2 1 2 2 3 2 4 2<br />
3 1 2<br />
2<br />
3 2 3 3 4<br />
4 1 3<br />
2<br />
4 2 4 3 4<br />
2<br />
3 2<br />
1<br />
6<br />
2 6<br />
4<br />
1<br />
1<br />
7<br />
5<br />
2 1<br />
3<br />
7<br />
5<br />
3<br />
7<br />
5
3956 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.15 continued<br />
Description Structure Example<br />
2<br />
Heterogeneous<br />
CS<br />
First-Order<br />
Factor<br />
Analytic<br />
CSH<br />
FA(1)<br />
Huynh-Feldt HF<br />
First-Order<br />
Ante-dependence<br />
Heterogeneous<br />
Toeplitz<br />
Unstructured<br />
Correlations<br />
Direct Product<br />
AR(1)<br />
ANTE(1)<br />
TOEPH<br />
UNR<br />
UN@AR(1)<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
4<br />
2<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
6<br />
4<br />
2<br />
1<br />
2 1<br />
3 1<br />
4 1<br />
1 2<br />
2<br />
2<br />
3 2<br />
4 2<br />
1 3<br />
2 3<br />
2<br />
3<br />
4 3<br />
1 4<br />
2 4<br />
3 4<br />
2<br />
4<br />
7<br />
5<br />
2<br />
1 C d1<br />
2 1<br />
1 2<br />
2<br />
2<br />
1 3 1 4<br />
C d2<br />
3 1 3 2<br />
2 3<br />
2<br />
3<br />
2 4<br />
C d3<br />
4 1 4 2 4 3<br />
3 4<br />
2<br />
4<br />
2<br />
1<br />
2<br />
2 C 2 1<br />
2<br />
2<br />
3 C 2 1<br />
2<br />
1 C 2 2<br />
2<br />
2<br />
2<br />
2<br />
3 C 2 2<br />
2<br />
3<br />
2<br />
1 C 2 3<br />
2<br />
2<br />
2 C 2 3<br />
2<br />
2<br />
3<br />
2<br />
2<br />
1<br />
2 1 1<br />
3 1 2 1<br />
1<br />
3<br />
2<br />
2<br />
2<br />
2<br />
1<br />
2<br />
1 3 1 2<br />
2 3 2<br />
2<br />
3<br />
3<br />
5<br />
C d4<br />
3<br />
7<br />
5<br />
3<br />
3<br />
7<br />
5<br />
2<br />
3<br />
4<br />
2<br />
1<br />
1<br />
1<br />
1<br />
1<br />
2<br />
3<br />
1<br />
3<br />
4<br />
2<br />
2<br />
2<br />
2<br />
2<br />
1<br />
1<br />
2<br />
1<br />
2<br />
4<br />
3<br />
3<br />
2<br />
3<br />
3<br />
2<br />
1<br />
1<br />
1<br />
2<br />
3<br />
4<br />
4<br />
4<br />
2<br />
4<br />
3<br />
7 27<br />
5 1<br />
2<br />
1<br />
2 1 21<br />
3 1 31<br />
4 1 41<br />
2<br />
1 21<br />
2<br />
21 2<br />
1 2 21<br />
2<br />
2<br />
3 2 32<br />
4 2 42<br />
2<br />
1<br />
˝ 4<br />
1<br />
2<br />
4<br />
3 31<br />
3 32<br />
2<br />
3<br />
3 43<br />
3 2<br />
5 D<br />
1<br />
2<br />
3<br />
4 41<br />
4 42<br />
4 43<br />
2<br />
4<br />
1<br />
2 1<br />
3<br />
7<br />
5<br />
2 2 2<br />
1 1 1 2<br />
21 21 21 2<br />
2 2 2<br />
1 1 1 21 21 21<br />
2<br />
1 2 2 2<br />
1 1 21 2<br />
21 21<br />
21 21 21 2 2 2 2<br />
2 2 2 2<br />
2 2 2<br />
21 21 21 2 2 2<br />
21 2<br />
2<br />
21 21 2 2 2 2<br />
2 2<br />
<strong>The</strong> following provides some further information about these covariance structures:<br />
TYPE=ANTE(1) specifies the first-order antedependence structure (see Kenward 1987, Patel<br />
1991, and Macchiavelli and Arnold 1994). In Table 56.13, 2 i is the ith<br />
variance parameter, and k is the kth autocorrelation parameter satisfying<br />
j kj < 1.<br />
3<br />
7<br />
5
REPEATED Statement ✦ 3957<br />
TYPE=AR(1) specifies a first-order autoregressive structure. PROC <strong>MIXED</strong> imposes the<br />
constraint j j < 1 for stationarity.<br />
TYPE=ARH(1) specifies a heterogeneous first-order autoregressive structure. As with<br />
TYPE=AR(1), PROC <strong>MIXED</strong> imposes the constraint j j < 1 for stationarity.<br />
TYPE=ARMA(1,1) specifies the first-order autoregressive moving-average structure. In<br />
Table 56.13, is the autoregressive parameter, models a moving-average<br />
component, and 2 is the residual variance. In the notation of Fuller (1976,<br />
p. 68), D 1 and<br />
D .1 C b1 1/. 1 C b1/<br />
1 C b 2 1 C 2b1 1<br />
<strong>The</strong> example in Table 56.15 and jb1j < 1 imply that<br />
b1 D ˇ p ˇ 2 4˛ 2<br />
2˛<br />
where ˛ D and ˇ D 1 C 2 2 . PROC <strong>MIXED</strong> imposes the<br />
constraints j j < 1 and j j < 1 for stationarity, although for some values<br />
of and in this region the resulting covariance matrix is not positive<br />
definite. When the estimated value of becomes negative, the computed<br />
covariance is multiplied by cos. dij / to account for the negativity.<br />
TYPE=CS specifies the compound-symmetry structure, which has constant variance<br />
and constant covariance.<br />
TYPE=CSH specifies the heterogeneous compound-symmetry structure. This structure<br />
has a different variance parameter for each diagonal element, and it<br />
uses the square roots of these parameters in the off-diagonal entries. In<br />
Table 56.13, 2 i is the ith variance parameter, and is the correlation parameter<br />
satisfying j j < 1.<br />
TYPE=FA(q) specifies the factor-analytic structure with q factors (Jennrich and<br />
Schluchter 1986). This structure is of the form ƒƒ 0 C D, where ƒ<br />
is a t q rectangular matrix and D is a t t diagonal matrix with t<br />
different parameters. When q > 1, the elements of ƒ in its upper-right<br />
corner (that is, the elements in the ith row and j th column for j > i) are<br />
set to zero to fix the rotation of the structure.<br />
TYPE=FA0(q) is similar to the FA(q) structure except that no diagonal matrix D is included.<br />
When q < t—that is, when the number of factors is less than<br />
the dimension of the matrix—this structure is nonnegative definite but not<br />
of full rank. In this situation, you can use it for approximating an unstructured<br />
G matrix in the RANDOM statement or for combining with the<br />
LOCAL option in the REPEATED statement. When q D t, you can use<br />
this structure to constrain G to be nonnegative definite in the RANDOM<br />
statement.<br />
TYPE=FA1(q) is similar to the FA(q) structure except that all of the elements in D are<br />
constrained to be equal. This offers a useful and more parsimonious alternative<br />
to the full factor-analytic structure.
3958 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
TYPE=HF specifies the Huynh-Feldt covariance structure (Huynh and Feldt 1970).<br />
This structure is similar to the CSH structure in that it has the same number<br />
of parameters and heterogeneity along the main diagonal. However, it<br />
constructs the off-diagonal elements by taking arithmetic rather than geometric<br />
means.<br />
You can perform a likelihood ratio test of the Huynh-Feldt conditions<br />
by running PROC <strong>MIXED</strong> twice, once with TYPE=HF and once with<br />
TYPE=UN, and then subtracting their respective values of 2 times the<br />
maximized likelihood.<br />
If PROC <strong>MIXED</strong> does not converge under your Huynh-Feldt model, you<br />
can specify your own starting values with the PARMS statement. <strong>The</strong><br />
default MIVQUE(0) starting values can sometimes be poor for this structure.<br />
A good choice for starting values is often the parameter estimates<br />
corresponding to an initial fit that uses TYPE=CS.<br />
TYPE=LIN(q) specifies the general linear covariance structure with q parameters. This<br />
structure consists of a linear combination of known matrices that are input<br />
with the LDATA= option. This structure is very general, and you need to<br />
make sure that the variance matrix is positive definite. By default, PROC<br />
<strong>MIXED</strong> sets the initial values of the parameters to 1. You can use the<br />
PARMS statement to specify other initial values.<br />
TYPE=SIMPLE is an alias for TYPE=VC.<br />
TYPE=SP(EXPA)(c-list) specifies the spatial anisotropic exponential structure, where c-list<br />
is a list of variables indicating the coordinates. This structure has .i; j /th<br />
element equal to<br />
2<br />
cY<br />
expf kd.i; j; k/ pkg kD1<br />
where c is the number of coordinates and d.i; j; k/ is the absolute distance<br />
between the kth coordinate (k D 1; : : : ; c) of the ith and j th observations<br />
in the input data set. <strong>The</strong>re are 2c C 1 parameters to be estimated: k, p k<br />
(k D 1; : : : ; c), and 2 .<br />
You might want to constrain some of the EXPA parameters to known values.<br />
For example, suppose you have three coordinate variables C1, C2,<br />
and C3 and you want to constrain the powers p k to equal 2, as in Sacks et<br />
al. (1989). Suppose further that you want to model covariance across the<br />
entire input data set and you suspect the k and 2 estimates are close to<br />
3, 4, 5, and 1, respectively. <strong>The</strong>n specify the following statements:<br />
repeated / type=sp(expa)(c1 c2 c3)<br />
subject=intercept;<br />
parms (3) (4) (5) (2) (2) (2) (1) /<br />
hold=4,5,6;
TYPE=SP(EXPGA)(c1 c2)<br />
TYPE=SP(GAUGA)(c1 c2)<br />
REPEATED Statement ✦ 3959<br />
TYPE=SP(SPHGA)(c1 c2) specify modifications of the isotropic SP(EXP), SP(SPH), and<br />
SP(GAU) covariance structures that allow for geometric anisotropy in two<br />
dimensions. <strong>The</strong> coordinates are specified by the variables c1 and c2.<br />
If the spatial process is geometrically anisotropic in c D Œci1; ci2, then it<br />
is isotropic in the coordinate system<br />
Ac D<br />
TYPE=SP(MATERN)(c-list )<br />
1 0<br />
0<br />
cos sin<br />
sin cos<br />
c D c<br />
for a properly chosen angle and scaling factor . Elliptical isocorrelation<br />
contours are thereby transformed to spherical contours, adding two parameters<br />
to the respective isotropic covariance structures. Euclidean distances<br />
(see Table 56.14) are expressed in terms of c .<br />
<strong>The</strong> angle of the clockwise rotation is reported in radians, 0 2 .<br />
<strong>The</strong> scaling parameter represents the ratio of the range parameters in the<br />
direction of the major and minor axis of the correlation contours. In other<br />
words, following a rotation of the coordinate system by angle , isotropy<br />
is achieved by compressing or magnifying distances in one coordinate by<br />
the factor .<br />
Fixing D 1:0 reduces the models to isotropic ones for any angle of<br />
rotation. If the scaling parameter is held constant at 1.0, you should also<br />
hold constant the angle of rotation, as in the following statements:<br />
repeated / type=sp(expga)(gxc gyc)<br />
subject=intercept;<br />
parms (6) (1.0) (0.0) (1) / hold=2,3;<br />
If is fixed at any other value than 1.0, the angle of rotation can be estimated.<br />
Specifying a starting grid of angles and scaling factors can considerably<br />
improve the convergence properties of the optimization algorithm<br />
for these models. Only a single random effect with geometrically<br />
anisotropic structure is permitted.<br />
TYPE=SP(MATHSW)(c-list ) specifies covariance structures in the Matérn class of covariance<br />
functions (Matérn 1986). Two observations for the same subject<br />
(block of R) that are Euclidean distance dij apart have covariance<br />
2 1<br />
. /<br />
dij<br />
2<br />
2K .dij = / > 0; > 0<br />
where K is the modified Bessel function of the second kind of (real) order<br />
> 0. <strong>The</strong> smoothness (continuity) of a stochastic process with covariance<br />
function in this class increases with . <strong>The</strong> Matérn class thus enables<br />
data-driven estimation of the smoothness properties. <strong>The</strong> covariance<br />
is identical to the exponential model for D 0:5 (TYPE=SP(EXP)(clist)),<br />
while for D 1 the model advocated by Whittle (1954) results.
3960 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
TYPE=SP(POW)(c-list)<br />
As ! 1 the model approaches the gaussian covariance structure<br />
(TYPE=SP(GAU)(c-list)).<br />
<strong>The</strong> MATHSW structure represents the Matérn class in the parameterization<br />
of Handcock and Stein (1993) and Handcock and Wallis (1994),<br />
2 1<br />
. /<br />
p<br />
dij<br />
2K 2dij<br />
p<br />
Since computation of the function K and its derivatives is numerically<br />
very intensive, fitting models with Matérn covariance structures can be<br />
more time-consuming than with other spatial covariance structures. Good<br />
starting values are essential.<br />
TYPE=SP(POWA)(c-list) specifies the spatial power structures. When the estimated<br />
value of becomes negative, the computed covariance is multiplied by<br />
cos. dij / to account for the negativity.<br />
TYPE=TOEP specifies a banded Toeplitz structure. This can be viewed as a movingaverage<br />
structure with order equal to q 1. <strong>The</strong> TYPE=TOEP option is<br />
a full Toeplitz matrix, which can be viewed as an autoregressive structure<br />
with order equal to the dimension of the matrix. <strong>The</strong> specification<br />
TYPE=TOEP(1) is the same as 2 I , where I is an identity matrix, and<br />
it can be useful for specifying the same variance component for several<br />
effects.<br />
TYPE=TOEPH specifies a heterogeneous banded Toeplitz structure. In<br />
Table 56.13, 2 i is the ith variance parameter and j is the j th correlation<br />
parameter satisfying j j j < 1. If you specify the order parameter q,<br />
then PROC <strong>MIXED</strong> estimates only the first q bands of the matrix, setting<br />
all higher bands equal to 0. <strong>The</strong> option TOEPH(1) is equivalent to both<br />
the UN(1) and UNR(1) options.<br />
TYPE=UN specifies a completely general (unstructured) covariance matrix parameterized<br />
directly in terms of variances and covariances. <strong>The</strong> variances are<br />
constrained to be nonnegative, and the covariances are unconstrained. This<br />
structure is not constrained to be nonnegative definite in order to avoid<br />
nonlinear constraints; however, you can use the FA0 structure if you want<br />
this constraint to be imposed by a Cholesky factorization. If you specify<br />
the order parameter q, then PROC <strong>MIXED</strong> estimates only the first q bands<br />
of the matrix, setting all higher bands equal to 0.<br />
TYPE=UNR specifies a completely general (unstructured) covariance matrix parameterized<br />
in terms of variances and correlations. This structure fits the same<br />
model as the TYPE=UN(q) option but with a different parameterization.<br />
<strong>The</strong> ith variance parameter is 2 i . <strong>The</strong> parameter jk is the correlation between<br />
the j th and kth measurements; it satisfies j jkj < 1. If you specify<br />
the order parameter r, then PROC <strong>MIXED</strong> estimates only the first q bands<br />
of the matrix, setting all higher bands equal to zero.
TYPE=UN@AR(1)<br />
TYPE=UN@CS<br />
REPEATED Statement ✦ 3961<br />
TYPE=UN@UN specify direct (Kronecker) product structures designed for multivariate repeated<br />
measures (see Galecki 1994). <strong>The</strong>se structures are constructed by<br />
taking the Kronecker product of an unstructured matrix (modeling covariance<br />
across the multivariate observations) with an additional covariance<br />
matrix (modeling covariance across time or another factor). <strong>The</strong> upper-left<br />
value in the second matrix is constrained to equal 1 to identify the model.<br />
See the <strong>SAS</strong>/IML User’s <strong>Guide</strong> for more details about direct products.<br />
To use these structures in the REPEATED statement, you must specify<br />
two distinct REPEATED effects, both of which must be included in the<br />
CLASS statement. <strong>The</strong> first effect indicates the multivariate observations,<br />
and the second identifies the levels of time or some additional factor. Note<br />
that the input data set must still be constructed in “univariate” format; that<br />
is, all dependent observations are still listed observation-wise in one single<br />
variable. Although this construction provides for general modeling possibilities,<br />
it forces you to construct variables indicating both dimensions of<br />
the Kronecker product.<br />
For example, suppose your observed data consist of heights and weights of<br />
several children measured over several successive years. Your input data<br />
set should then contain variables similar to the following:<br />
Y, all of the heights and weights, with a separate observation for each<br />
Var, indicating whether the measurement is a height or a weight<br />
Year, indicating the year of measurement<br />
Child, indicating the child on which the measurement was taken<br />
Your PROC <strong>MIXED</strong> statements for a Kronecker AR(1) structure across<br />
years would then be as follows:<br />
proc mixed;<br />
class Var Year Child;<br />
model Y = Var Year Var*Year;<br />
repeated Var Year / type=un@ar(1)<br />
subject=Child;<br />
run;<br />
You should nearly always want to model different means for the multivariate<br />
observations; hence the inclusion of Var in the MODEL statement. <strong>The</strong><br />
preceding mean model consists of cell means for all combinations of VAR<br />
and YEAR.<br />
TYPE=VC specifies standard variance components and is the default structure for<br />
both the RANDOM and REPEATED statements. In the RANDOM<br />
statement, a distinct variance component is assigned to each effect. In<br />
the REPEATED statement, this structure is usually used only with the<br />
GROUP= option to specify a heterogeneous variance model.<br />
Jennrich and Schluchter (1986) provide general information about the use of covariance structures,<br />
and Wolfinger (1996) presents details about many of the heterogeneous structures.
3962 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Modeling with spatial covariance structures is discussed in many sources, for example, Marx<br />
and Thompson (1987), Zimmerman and Harville (1991), Cressie (1993), Brownie, Bowman,<br />
and Burton (1993), Stroup, Baenziger, and Mulitze (1994), Brownie and Gumpertz (1997),<br />
Gotway and Stroup (1997), Chilès and Delfiner (1999), Schabenberger and Gotway (2005),<br />
and Littell et al. (2006).<br />
WEIGHT Statement<br />
WEIGHT variable ;<br />
If you do not specify a REPEATED statement, the WEIGHT statement operates exactly like the<br />
one in PROC GLM. In this case PROC <strong>MIXED</strong> replaces X 0 X and Z 0 Z with X 0 WX and Z 0 WZ,<br />
where W is the diagonal weight matrix. If you specify a REPEATED statement, then the WEIGHT<br />
statement replaces R with LRL, where L is a diagonal matrix with elements W 1=2 . Observations<br />
with nonpositive or missing weights are not included in the PROC <strong>MIXED</strong> analysis.<br />
Details: <strong>MIXED</strong> <strong>Procedure</strong><br />
Mixed Models <strong>The</strong>ory<br />
This section provides an overview of a likelihood-based approach to general linear mixed models.<br />
This approach simplifies and unifies many common statistical analyses, including those involving<br />
repeated measures, random effects, and random coefficients. <strong>The</strong> basic assumption is that the data<br />
are linearly related to unobserved multivariate normal random variables. For extensions to nonlinear<br />
and nonnormal situations see the documentation of the GLIMMIX and NL<strong>MIXED</strong> procedures.<br />
Additional theory and examples are provided in Littell et al. (2006), Verbeke and Molenberghs<br />
(1997, 2000), and Brown and Prescott (1999).<br />
Matrix Notation<br />
Suppose that you observe n data points y1; : : : ; yn and that you want to explain them by using<br />
n values for each of p explanatory variables x11; : : : ; x1p, x21; : : : ; x2p, : : : ; xn1; : : : ; xnp. <strong>The</strong><br />
xij values can be either regression-type continuous variables or dummy variables indicating class<br />
membership. <strong>The</strong> standard linear model for this setup is<br />
yi D<br />
pX<br />
j D1<br />
xij ˇj C i<br />
i D 1; : : : ; n
Mixed Models <strong>The</strong>ory ✦ 3963<br />
where ˇ1; : : : ; ˇp are unknown fixed-effects parameters to be estimated and 1; : : : ; n are unknown<br />
independent and identically distributed normal (Gaussian) random variables with mean 0 and variance<br />
2 .<br />
<strong>The</strong> preceding equations can be written simultaneously by using vectors and a matrix, as follows:<br />
2<br />
y1<br />
6 y2<br />
6<br />
4 :<br />
3<br />
7<br />
5 D<br />
2<br />
x11<br />
6 x21<br />
6<br />
4 :<br />
x12<br />
x22<br />
:<br />
: : :<br />
: : :<br />
x1p<br />
x2p<br />
:<br />
3 2<br />
7 6<br />
7 6<br />
7 6<br />
5 4<br />
ˇ1<br />
ˇ2<br />
:<br />
3<br />
7<br />
5 C<br />
2<br />
6<br />
4<br />
1<br />
2<br />
:<br />
3<br />
7<br />
5<br />
yn<br />
xn1 xn2 : : : xnp<br />
For convenience, simplicity, and extendability, this entire system is written as<br />
y D Xˇ C<br />
ˇp<br />
where y denotes the vector of observed yi’s, X is the known matrix of xij ’s, ˇ is the unknown fixedeffects<br />
parameter vector, and is the unobserved vector of independent and identically distributed<br />
Gaussian random errors.<br />
In addition to denoting data, random variables, and explanatory variables in the preceding fashion,<br />
the subsequent development makes use of basic matrix operators such as transpose ( 0 ), inverse ( 1 ),<br />
generalized inverse ( ), determinant (j j), and matrix multiplication. See Searle (1982) for details<br />
about these and other matrix techniques.<br />
Formulation of the Mixed Model<br />
<strong>The</strong> previous general linear model is certainly a useful one (Searle 1971), and it is the one fitted by<br />
the GLM procedure. However, many times the distributional assumption about is too restrictive.<br />
<strong>The</strong> mixed model extends the general linear model by allowing a more flexible specification of the<br />
covariance matrix of . In other words, it allows for both correlation and heterogeneous variances,<br />
although you still assume normality.<br />
<strong>The</strong> mixed model is written as<br />
y D Xˇ C Z C<br />
where everything is the same as in the general linear model except for the addition of the known<br />
design matrix, Z, and the vector of unknown random-effects parameters, . <strong>The</strong> matrix Z can<br />
contain either continuous or dummy variables, just like X. <strong>The</strong> name mixed model comes from the<br />
fact that the model contains both fixed-effects parameters, ˇ, and random-effects parameters, .<br />
See Henderson (1990) and Searle, Casella, and McCulloch (1992) for historical developments of<br />
the mixed model.<br />
A key assumption in the foregoing analysis is that and are normally distributed with<br />
E D 0<br />
0<br />
Var D<br />
G 0<br />
0 R<br />
n
3964 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> variance of y is, therefore, V D ZGZ 0 C R. You can model V by setting up the random-effects<br />
design matrix Z and by specifying covariance structures for G and R.<br />
Note that this is a general specification of the mixed model, in contrast to many texts and articles<br />
that discuss only simple random effects. Simple random effects are a special case of the general<br />
specification with Z containing dummy variables, G containing variance components in a diagonal<br />
structure, and R D 2 In, where In denotes the n n identity matrix. <strong>The</strong> general linear model is a<br />
further special case with Z D 0 and R D 2 In.<br />
<strong>The</strong> following two examples illustrate the most common formulations of the general linear mixed<br />
model.<br />
Example: Growth Curve with Compound Symmetry<br />
Suppose that you have three growth curve measurements for s individuals and that you want to fit<br />
an overall linear trend in time. Your X matrix is as follows:<br />
2 3<br />
1 1<br />
6 1 2 7<br />
6 1 3 7<br />
6 7<br />
X D 6 : : 7<br />
6 1 1 7<br />
4 1 2 5<br />
1 3<br />
<strong>The</strong> first column (coded entirely with 1s) fits an intercept, and the second column (coded with times<br />
of 1; 2; 3) fits a slope. Here, n D 3s and p D 2.<br />
Suppose further that you want to introduce a common correlation among the observations from a<br />
single individual, with correlation being the same for all individuals. One way of setting this up in<br />
the general mixed model is to eliminate the Z and G matrices and let the R matrix be block diagonal<br />
with blocks corresponding to the individuals and with each block having the compound-symmetry<br />
structure. This structure has two unknown parameters, one modeling a common covariance and the<br />
other modeling a residual variance. <strong>The</strong> form for R would then be as follows:<br />
2<br />
6<br />
R D 6<br />
4<br />
2<br />
1 C 2 2 2<br />
1<br />
1<br />
2 2<br />
1 1 C 2 2 1<br />
2<br />
2 2<br />
1<br />
1 1<br />
C 2<br />
: ::<br />
2<br />
1 C 2 2 2<br />
1<br />
1<br />
2 2<br />
1 1 C 2 2 1<br />
2<br />
2 2<br />
1<br />
1 1<br />
C 2<br />
where blanks denote zeros. <strong>The</strong>re are 3s rows and columns altogether, and the common correlation<br />
is 2 1 =. 2 1 C 2 /.<br />
<strong>The</strong> PROC <strong>MIXED</strong> statements to fit this model are as follows:<br />
3<br />
7<br />
5
proc mixed;<br />
class indiv;<br />
model y = time;<br />
repeated / type=cs subject=indiv;<br />
run;<br />
Mixed Models <strong>The</strong>ory ✦ 3965<br />
Here, indiv is a classification variable indexing individuals. <strong>The</strong> MODEL statement fits a straight line<br />
for time ; the intercept is fit by default just as in PROC GLM. <strong>The</strong> REPEATED statement models the<br />
R matrix: TYPE=CS specifies the compound symmetry structure, and SUBJECT=INDIV specifies<br />
the blocks of R.<br />
An alternative way of specifying the common intra-individual correlation is to let<br />
2<br />
6<br />
Z D 6<br />
4<br />
2<br />
6<br />
G D 6<br />
4<br />
1<br />
1<br />
1<br />
2<br />
1<br />
1<br />
1<br />
1<br />
2<br />
1<br />
: ::<br />
: ::<br />
1<br />
1<br />
1<br />
3<br />
7<br />
5<br />
2<br />
1<br />
3<br />
7<br />
5<br />
and R D 2 In. <strong>The</strong> Z matrix has 3s rows and s columns, and G is s s.<br />
You can set up this model in PROC <strong>MIXED</strong> in two different but equivalent ways:<br />
proc mixed;<br />
class indiv;<br />
model y = time;<br />
random indiv;<br />
run;<br />
proc mixed;<br />
class indiv;<br />
model y = time;<br />
random intercept / subject=indiv;<br />
run;<br />
Both of these specifications fit the same model as the previous one that used the REPEATED statement;<br />
however, the RANDOM specifications constrain the correlation to be positive, whereas the<br />
REPEATED specification leaves the correlation unconstrained.
3966 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Example: Split-Plot Design<br />
<strong>The</strong> split-plot design involves two experimental treatment factors, A and B, and two different sizes of<br />
experimental units to which they are applied (see Winer 1971, Snedecor and Cochran 1980, Milliken<br />
and Johnson 1992, and Steel, Torrie, and Dickey 1997). <strong>The</strong> levels of A are randomly assigned to<br />
the larger-sized experimental unit, called whole plots, whereas the levels of B are assigned to the<br />
smaller-sized experimental unit, the subplots. <strong>The</strong> subplots are assumed to be nested within the<br />
whole plots, so that a whole plot consists of a cluster of subplots and a level of A is applied to the<br />
entire cluster.<br />
Such an arrangement is often necessary by nature of the experiment, the classical example being<br />
the application of fertilizer to large plots of land and different crop varieties planted in subdivisions<br />
of the large plots. For this example, fertilizer is the whole-plot factor A and variety is the subplot<br />
factor B.<br />
<strong>The</strong> first example is a split-plot design for which the whole plots are arranged in a randomized block<br />
design. <strong>The</strong> appropriate PROC <strong>MIXED</strong> statements are as follows:<br />
Here<br />
proc mixed;<br />
class a b block;<br />
model y = a|b;<br />
random block a*block;<br />
run;<br />
R D 2 I24<br />
and X, Z, and G have the following form:<br />
2<br />
1 1 1 1<br />
3<br />
6 1<br />
6<br />
1<br />
6<br />
1<br />
6 1<br />
6 1<br />
6<br />
X D 6 :<br />
6 1<br />
6 1<br />
6 1<br />
6 1<br />
4 1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
:<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
:<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
1<br />
:<br />
1<br />
1<br />
1<br />
7<br />
1 7<br />
5<br />
1 1 1 1
2<br />
6<br />
Z D 6<br />
4<br />
2<br />
6<br />
G D 6<br />
4<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
2<br />
B<br />
2<br />
B<br />
2<br />
B<br />
2<br />
B<br />
2<br />
AB<br />
2<br />
AB<br />
: ::<br />
2<br />
AB<br />
3<br />
7<br />
5<br />
Mixed Models <strong>The</strong>ory ✦ 3967<br />
where 2 B is the variance component for Block and 2 AB is the variance component for A*Block.<br />
Changing the RANDOM statement as follows fits the same model, but with Z and G sorted differently:<br />
random int a / subject=block;<br />
3<br />
7<br />
5
3968 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
2<br />
6<br />
Z D 6<br />
4<br />
2<br />
6<br />
G D 6<br />
4<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
2<br />
B<br />
2<br />
AB<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
2<br />
AB<br />
2<br />
AB<br />
: ::<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
2<br />
B<br />
2<br />
AB<br />
Estimating Covariance Parameters in the Mixed Model<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
1 1<br />
Estimation is more difficult in the mixed model than in the general linear model. Not only do you<br />
have ˇ as in the general linear model, but you have unknown parameters in , G, and R as well.<br />
Least squares is no longer the best method. Generalized least squares (GLS) is more appropriate,<br />
minimizing<br />
.y Xˇ/ 0 V 1 .y Xˇ/<br />
However, it requires knowledge of V and, therefore, knowledge of G and R. Lacking such information,<br />
one approach is to use estimated GLS, in which you insert some reasonable estimate for V<br />
into the minimization problem. <strong>The</strong> goal thus becomes finding a reasonable estimate of G and R.<br />
2<br />
AB<br />
2<br />
AB<br />
3<br />
7<br />
5<br />
3<br />
7<br />
5
Mixed Models <strong>The</strong>ory ✦ 3969<br />
In many situations, the best approach is to use likelihood-based methods, exploiting the assumption<br />
that and are normally distributed (Hartley and Rao 1967; Patterson and Thompson 1971;<br />
Harville 1977; Laird and Ware 1982; Jennrich and Schluchter 1986). PROC <strong>MIXED</strong> implements<br />
two likelihood-based methods: maximum likelihood (ML) and restricted/residual maximum likelihood<br />
(REML). A favorable theoretical property of ML and REML is that they accommodate data<br />
that are missing at random (Rubin 1976; Little 1995).<br />
PROC <strong>MIXED</strong> constructs an objective function associated with ML or REML and maximizes it<br />
over all unknown parameters. Using calculus, it is possible to reduce this maximization problem<br />
to one over only the parameters in G and R. <strong>The</strong> corresponding log-likelihood functions are as<br />
follows:<br />
ML W l.G; R/ D 1<br />
log jVj<br />
2<br />
REML W lR.G; R/ D 1<br />
log jVj<br />
2<br />
1<br />
2 r0 V 1 r<br />
1<br />
2 log jX0 V 1 Xj<br />
n<br />
log.2 /<br />
2<br />
1<br />
2 r0 V 1 r<br />
n p<br />
2<br />
log.2 /g<br />
where r D y X.X 0 V 1 X/ X 0 V 1 y and p is the rank of X. PROC <strong>MIXED</strong> actually minimizes<br />
2 times these functions by using a ridge-stabilized Newton-Raphson algorithm. Lindstrom and<br />
Bates (1988) provide reasons for preferring Newton-Raphson to the Expectation-Maximum (EM)<br />
algorithm described in Dempster, Laird, and Rubin (1977) and Laird, Lange, and Stram (1987), as<br />
well as analytical details for implementing a QR-decomposition approach to the problem. Wolfinger,<br />
Tobias, and Sall (1994) present the sweep-based algorithms that are implemented in PROC<br />
<strong>MIXED</strong>.<br />
One advantage of using the Newton-Raphson algorithm is that the second derivative matrix of the<br />
objective function evaluated at the optima is available upon completion. Denoting this matrix H,<br />
the asymptotic theory of maximum likelihood (see Serfling 1980) shows that 2H 1 is an asymptotic<br />
variance-covariance matrix of the estimated parameters of G and R. Thus, tests and confidence<br />
intervals based on asymptotic normality can be obtained. However, these can be unreliable in small<br />
samples, especially for parameters such as variance components that have sampling distributions<br />
that tend to be skewed to the right.<br />
If a residual variance 2 is a part of your mixed model, it can usually be profiled out of the likelihood.<br />
This means solving analytically for the optimal 2 and plugging this expression back into the<br />
likelihood formula (see Wolfinger, Tobias, and Sall 1994). This reduces the number of optimization<br />
parameters by one and can improve convergence properties. PROC <strong>MIXED</strong> profiles the residual<br />
variance out of the log likelihood whenever it appears reasonable to do so. This includes the case<br />
when R equals 2 I and when it has blocks with a compound symmetry, time series, or spatial structure.<br />
PROC <strong>MIXED</strong> does not profile the log likelihood when R has unstructured blocks, when you<br />
use the HOLD= or NOITER option in the PARMS statement, or when you use the NOPROFILE<br />
option in the PROC <strong>MIXED</strong> statement.<br />
Instead of ML or REML, you can use the noniterative MIVQUE0 method to estimate G and R (Rao<br />
1972; LaMotte 1973; Wolfinger, Tobias, and Sall 1994). In fact, by default PROC <strong>MIXED</strong> uses<br />
MIVQUE0 estimates as starting values for the ML and REML procedures. For variance component<br />
models, another estimation method involves equating Type 1, 2, or 3 expected mean squares to<br />
their observed values and solving the resulting system. However, Swallow and Monahan (1984)<br />
present simulation evidence favoring REML and ML over MIVQUE0 and other method-of-moment<br />
estimators.
3970 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Estimating Fixed and Random Effects in the Mixed Model<br />
ML, REML, MIVQUE0, or Type1–Type3 provide estimates of G and R, which are denoted bG and<br />
bR, respectively. To obtain estimates of ˇ and , the standard method is to solve the mixed model<br />
equations (Henderson 1984):<br />
"<br />
X 0bR 1 X X 0bR 1 Z<br />
Z 0bR 1 X Z 0bR 1 Z C bG 1<br />
<strong>The</strong> solutions can also be written as<br />
bˇ D .X 0bV 1 X/ X 0bV 1 y<br />
b D bGZ 0bV 1 .y Xbˇ/<br />
#<br />
bˇ<br />
b D<br />
"<br />
X 0bR 1 y<br />
Z 0bR 1 y<br />
and have connections with empirical Bayes estimators (Laird and Ware 1982, Carlin and Louis<br />
1996).<br />
Note that the mixed model equations are extended normal equations and that the preceding expression<br />
assumes that bG is nonsingular. For the extreme case where the eigenvalues of bG are very large,<br />
bG 1 contributes very little to the equations and b is close to what it would be if actually contained<br />
fixed-effects parameters. On the other hand, when the eigenvalues of bG are very small, bG 1 dominates<br />
the equations and b is close to 0. For intermediate cases, bG 1 can be viewed as shrinking the<br />
fixed-effects estimates of toward 0 (Robinson 1991).<br />
If bG is singular, then the mixed model equations are modified (Henderson 1984) as follows:<br />
"<br />
X 0bR 1X X 0bR 1 bL<br />
ZbL<br />
0Z 0bR 1X bL 0Z 0bR 1 #<br />
ZbL C I<br />
bˇ<br />
b D<br />
"<br />
X 0bR 1y bL 0Z 0bR 1 #<br />
y<br />
where bL is the lower-triangular Cholesky root of bG, satisfying bG D bLbL 0 . Both b and a generalized<br />
inverse of the left-hand-side coefficient matrix are then transformed by using bL to determine b.<br />
An example of when the singular form of the equations is necessary is when a variance component<br />
estimate falls on the boundary constraint of 0.<br />
Model Selection<br />
<strong>The</strong> previous section on estimation assumes the specification of a mixed model in terms of X, Z,<br />
G, and R. Even though X and Z have known elements, their specific form and construction are<br />
flexible, and several possibilities can present themselves for a particular data set. Likewise, several<br />
different covariance structures for G and R might be reasonable.<br />
Space does not permit a thorough discussion of model selection, but a few brief comments and<br />
references are in order. First, subject matter considerations and objectives are of great importance<br />
when selecting a model; see Diggle (1988) and Lindsey (1993).<br />
Second, when the data themselves are looked to for guidance, many of the graphical methods and<br />
diagnostics appropriate for the general linear model extend to the mixed model setting as well<br />
(Christensen, Pearson, and Johnson 1992).<br />
#
Mixed Models <strong>The</strong>ory ✦ 3971<br />
Finally, a likelihood-based approach to the mixed model provides several statistical measures for<br />
model adequacy as well. <strong>The</strong> most common of these are the likelihood ratio test and Akaike’s and<br />
Schwarz’s criteria (Bozdogan 1987; Wolfinger 1993; Keselman et al. 1998, 1999).<br />
Statistical Properties<br />
If G and R are known, bˇ is the best linear unbiased estimator (BLUE) of ˇ, and b is the best linear<br />
unbiased predictor (BLUP) of (Searle 1971; Harville 1988, 1990; Robinson 1991; McLean,<br />
Sanders, and Stroup 1991). Here, “best” means minimum mean squared error. <strong>The</strong> covariance<br />
matrix of .bˇ ˇ; b / is<br />
C D X0 R 1 X X 0 R 1 Z<br />
Z 0 R 1 X Z 0 R 1 Z C G 1<br />
where denotes a generalized inverse (see Searle 1971).<br />
However, G and R are usually unknown and are estimated by using one of the aforementioned<br />
methods. <strong>The</strong>se estimates, bG and bR, are therefore simply substituted into the preceding expression<br />
to obtain<br />
bC D<br />
"<br />
X 0bR 1 X X 0bR 1 Z<br />
Z 0bR 1 X Z 0bR 1 Z C bG 1<br />
#<br />
as the approximate variance-covariance matrix of .bˇ ˇ; b ). In this case, the BLUE and BLUP<br />
acronyms no longer apply, but the word empirical is often added to indicate such an approximation.<br />
<strong>The</strong> appropriate acronyms thus become EBLUE and EBLUP.<br />
McLean and Sanders (1988) show that bC can also be written as<br />
"<br />
bC11 bC D<br />
bC 0 21<br />
#<br />
where<br />
bC21 bC22<br />
bC11 D .X 0bV 1 X/<br />
bC21 D bGZ 0bV 1 XbC11<br />
bC22 D .Z 0bR 1 Z C bG 1 / 1 bC21X 0bV 1 ZbG<br />
Note that bC11 is the familiar estimated generalized least squares formula for the variance-covariance<br />
matrix of bˇ.<br />
As a cautionary note, bC tends to underestimate the true sampling variability of<br />
(bˇ b) because no account is made for the uncertainty in estimating G and R. Although inflation<br />
factors have been proposed (Kackar and Harville 1984; Kass and Steffey 1989; Prasad and<br />
Rao 1990), they tend to be small for data sets that are fairly well balanced. PROC <strong>MIXED</strong> does not<br />
compute any inflation factors by default, but rather accounts for the downward bias by using the<br />
approximate t and F statistics described subsequently. <strong>The</strong> DDFM=KENWARDROGER option<br />
in the MODEL statement prompts PROC <strong>MIXED</strong> to compute a specific inflation factor along with<br />
Satterthwaite-based degrees of freedom.
3972 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Inference and Test Statistics<br />
For inferences concerning the covariance parameters in your model, you can use likelihood-based<br />
statistics. One common likelihood-based statistic is the Wald Z, which is computed as the parameter<br />
estimate divided by its asymptotic standard error. <strong>The</strong> asymptotic standard errors are computed from<br />
the inverse of the second derivative matrix of the likelihood with respect to each of the covariance<br />
parameters. <strong>The</strong> Wald Z is valid for large samples, but it can be unreliable for small data sets<br />
and for parameters such as variance components, which are known to have a skewed or bounded<br />
sampling distribution.<br />
A better alternative is the likelihood ratio 2 statistic. This statistic compares two covariance models,<br />
one a special case of the other. To compute it, you must run PROC <strong>MIXED</strong> twice, once for<br />
each of the two models, and then subtract the corresponding values of 2 times the log likelihoods.<br />
You can use either ML or REML to construct this statistic, which tests whether the full model is<br />
necessary beyond the reduced model.<br />
As long as the reduced model does not occur on the boundary of the covariance parameter space,<br />
the 2 statistic computed in this fashion has a large-sample 2 distribution that is 2 with degrees<br />
of freedom equal to the difference in the number of covariance parameters between the two models.<br />
If the reduced model does occur on the boundary of the covariance parameter space, the asymptotic<br />
distribution becomes a mixture of 2 distributions (Self and Liang 1987). A common example of<br />
this is when you are testing that a variance component equals its lower boundary constraint of 0.<br />
A final possibility for obtaining inferences concerning the covariance parameters is to simulate or<br />
resample data from your model and construct empirical sampling distributions of the parameters.<br />
<strong>The</strong> <strong>SAS</strong> macro language and the ODS system are useful tools in this regard.<br />
F and t Tests for Fixed- and Random-Effects Parameters<br />
For inferences concerning the fixed- and random-effects parameters in the mixed model, consider<br />
estimable linear combinations of the following form:<br />
L ˇ<br />
<strong>The</strong> estimability requirement (Searle 1971) applies only to the ˇ portion of L, because any linear<br />
combination of is estimable. Such a formulation in terms of a general L matrix encompasses a<br />
wide variety of common inferential procedures such as those employed with Type 1–Type 3 tests<br />
and LS-means. <strong>The</strong> CONTRAST and ESTIMATE statements in PROC <strong>MIXED</strong> enable you to<br />
specify your own L matrices. Typically, inference on fixed effects is the focus, and, in this case, the<br />
portion of L is assumed to contain all 0s.<br />
Statistical inferences are obtained by testing the hypothesis<br />
H W L ˇ D 0<br />
or by constructing point and interval estimates.
Mixed Models <strong>The</strong>ory ✦ 3973<br />
When L consists of a single row, a general t statistic can be constructed as follows (see McLean and<br />
Sanders 1988, Stroup 1989a):<br />
t D<br />
L bˇ<br />
b<br />
p<br />
LbCL 0<br />
Under the assumed normality of and , t has an exact t distribution only for data exhibiting<br />
certain types of balance and for some special unbalanced cases. In general, t is only approximately<br />
t-distributed, and its degrees of freedom must be estimated. See the DDFM= option for a description<br />
of the various degrees-of-freedom methods available in PROC <strong>MIXED</strong>.<br />
With b being the approximate degrees of freedom, the associated confidence interval is<br />
L bˇ<br />
b ˙ t q<br />
LbCL b;˛=2<br />
0<br />
where t b;˛=2 is the .1 ˛=2/100th percentile of the t b distribution.<br />
When the rank of L is greater than 1, PROC <strong>MIXED</strong> constructs the following general F statistic:<br />
F D<br />
bˇ<br />
b<br />
0<br />
L 0 .LbCL 0 / 1 L bˇ<br />
b<br />
r<br />
where r D rank.LbCL 0 /. Analogous to t, F in general has an approximate F distribution with r<br />
numerator degrees of freedom and b denominator degrees of freedom.<br />
<strong>The</strong> t and F statistics enable you to make inferences about your fixed effects, which account for<br />
the variance-covariance model you select. An alternative is the 2 statistic associated with the<br />
likelihood ratio test. This statistic compares two fixed-effects models, one a special case of the<br />
other. It is computed just as when comparing different covariance models, although you should use<br />
ML and not REML here because the penalty term associated with restricted likelihoods depends<br />
upon the fixed-effects specification.<br />
F Tests With the ANOVAF Option<br />
<strong>The</strong> ANOVAF option computes F tests by the following method in models with REPEATED statement<br />
and without RANDOM statement. Let L denote the matrix of estimable functions for the<br />
hypothesis H W Lˇ D 0, where ˇ are the fixed-effects parameters. Let M D L 0 .LL 0 / L, and<br />
suppose that bC denotes the estimated variance-covariance matrix of bˇ (see the section “Statistical<br />
Properties” for the construction of bC).<br />
<strong>The</strong> ANOVAF F statistics are computed as<br />
FA D bˇ 0 L 0 LL 0 .<br />
1<br />
Lbˇ t1 D bˇ 0 .<br />
Mbˇ<br />
t1<br />
Notice that this is a modification of the usual F statistic where .LbCL 0 / 1 is replaced with .LL 0 / 1<br />
and rank.L/ is replaced with t1 D trace.MbC/; see, for example, Brunner, Domhof, and Langer
3974 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
(2002, Sec. 5.4). <strong>The</strong> p-values for this statistic are computed from either an F 1; 2 or an F 1;1<br />
distribution. <strong>The</strong> respective degrees of freedom are determined by the <strong>MIXED</strong> procedure as follows:<br />
1 D<br />
t 2 1<br />
trace.MbCMbC/<br />
2 D 2t 2 1<br />
g 0 Ag<br />
2 D maxfminf 2 ; dfeg; 1g g 0 Ag > 1E3 MACEPS<br />
1 otherwise<br />
<strong>The</strong> term g 0 Ag in the term 2 for the denominator degrees of freedom is based on approximating<br />
VarŒtrace.MbC/ based on a first-order Taylor series about the true covariance parameters. This<br />
generalizes results in the appendix of Brunner, Dette, and Munk (1997) to a broader class of models.<br />
<strong>The</strong> vector g D Œg1; ; gq contains the partial derivatives<br />
trace<br />
L 0 LL 0 !<br />
1 @bC<br />
L<br />
@ i<br />
and A is the asymptotic variance-covariance matrix of the covariance parameter estimates<br />
(ASYCOV option).<br />
PROC <strong>MIXED</strong> reports 1 and 2 as “NumDF” and “DenDF” under the “ANOVA F” heading in the<br />
output. <strong>The</strong> corresponding p-values are denoted as “Pr > F(DDF)” for F 1; 2 and “Pr > F(infty)”<br />
for F 1;1, respectively.<br />
P-values computed with the ANOVAF option can be identical to the nonparametric tests in Akritas,<br />
Arnold, and Brunner (1997) and in Brunner, Domhof, and Langer (2002), provided that the<br />
response data consist of properly created (and sorted) ranks and that the covariance parameters are<br />
estimated by MIVQUE0 in models with REPEATED statement and properly chosen SUBJECT=<br />
and/or GROUP= effects.<br />
If you model an unstructured covariance matrix in a longitudinal model with one or more repeated<br />
factors, the ANOVAF results are identical to a multivariate MANOVA where degrees of freedom<br />
are corrected with the Greenhouse-Geiser adjustment (Greenhouse and Geiser 1959). For example,<br />
suppose that factor A has 2 levels and factor B has 4 levels. <strong>The</strong> following two sets of statements<br />
produce the same p-values:<br />
proc mixed data=Mydata anovaf method=mivque0;<br />
class id A B;<br />
model score = A | B / chisq;<br />
repeated / type=un subject=id;<br />
ods select Tests3;<br />
run;<br />
proc transpose data=MyData out=tdata;<br />
by id;<br />
var score;<br />
proc glm data=tdata;<br />
model col: = / nouni;
Parameterization of Mixed Models ✦ 3975<br />
repeated A 2, B 4;<br />
ods output ModelANOVA=maov epsilons=eps;<br />
run;<br />
proc transpose data=eps(where=(substr(statistic,1,3)=’Gre’)) out=teps;<br />
var cvalue1;<br />
run;<br />
data aov; set maov;<br />
if (_n_ = 1) then merge teps;<br />
if (Source=’A’) then do;<br />
pFddf = ProbF;<br />
pFinf = 1 - probchi(df*Fvalue,df);<br />
output;<br />
end; else if (Source=’B’) then do;<br />
pFddf = ProbFGG;<br />
pFinf = 1 - probchi(df*col1*Fvalue,df*col1);<br />
output;<br />
end; else if (Source=’A*B’) then do;<br />
pfddF = ProbFGG;<br />
pFinf = 1 - probchi(df*col2*Fvalue,df*col2);<br />
output;<br />
end;<br />
proc print data=aov label noobs;<br />
label Source = ’Effect’<br />
df = ’NumDF’<br />
Fvalue = ’Value’<br />
pFddf = ’Pr > F(DDF)’<br />
pFinf = ’Pr > F(infty)’;<br />
var Source df Fvalue pFddf pFinf;<br />
format pF: pvalue6.;<br />
run;<br />
<strong>The</strong> PROC GLM code produces p-values that correspond to the ANOVAF p-values shown as Pr ><br />
F(DDF) in the <strong>MIXED</strong> output. <strong>The</strong> subsequent DATA step computes the p-values that correspond<br />
to Pr > F(infty) in the PROC <strong>MIXED</strong> output.<br />
Parameterization of Mixed Models<br />
Recall that a mixed model is of the form<br />
y D Xˇ C Z C<br />
where y represents univariate data, ˇ is an unknown vector of fixed effects with known model matrix<br />
X, is an unknown vector of random effects with known model matrix Z, and is an unknown<br />
random error vector.<br />
PROC <strong>MIXED</strong> constructs a mixed model according to the specifications in the MODEL,<br />
RANDOM, and REPEATED statements. Each effect in the MODEL statement generates one or<br />
more columns in the model matrix X, and each effect in the RANDOM statement generates one or<br />
more columns in the model matrix Z. Effects in the REPEATED statement do not generate model
3976 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
matrices; they serve only to index observations within subjects. This section shows precisely how<br />
PROC <strong>MIXED</strong> builds X and Z.<br />
Intercept<br />
By default, all models automatically include a column of 1s in X to estimate a fixed-effect intercept<br />
parameter . You can use the NOINT option in the MODEL statement to suppress this intercept.<br />
<strong>The</strong> NOINT option is useful when you are specifying a classification effect in the MODEL statement<br />
and you want the parameter estimate to be in terms of the mean response for each level of that effect,<br />
rather than in terms of a deviation from an overall mean.<br />
By contrast, the intercept is not included by default in Z. To obtain a column of 1s in Z, you must<br />
specify in the RANDOM statement either the INTERCEPT effect or some effect that has only one<br />
level.<br />
Regression Effects<br />
Numeric variables, or polynomial terms involving them, can be included in the model as regression<br />
effects (covariates). <strong>The</strong> actual values of such terms are included as columns of the model matrices<br />
X and Z. You can use the bar operator with a regression effect to generate polynomial effects. For<br />
instance, X|X|X expands to X X*X X*X*X, a cubic model.<br />
Main Effects<br />
If a classification variable has m levels, PROC <strong>MIXED</strong> generates m columns in the model matrix<br />
for its main effect. Each column is an indicator variable for a given level. <strong>The</strong> order of the columns<br />
is the sort order of the values of their levels and can be controlled with the ORDER= option in the<br />
PROC <strong>MIXED</strong> statement. Table 56.16 is an example.<br />
Table 56.16 Example of Main Effects<br />
Data I A B<br />
A B A1 A2 B1 B2 B3<br />
1 1 1 1 0 1 0 0<br />
1 2 1 1 0 0 1 0<br />
1 3 1 1 0 0 0 1<br />
2 1 1 0 1 1 0 0<br />
2 2 1 0 1 0 1 0<br />
2 3 1 0 1 0 0 1<br />
Typically, there are more columns for these effects than there are degrees of freedom for them. In<br />
other words, PROC <strong>MIXED</strong> uses an overparameterized model.
Interaction Effects<br />
Parameterization of Mixed Models ✦ 3977<br />
Often a model includes interaction (crossed) effects. With an interaction, PROC <strong>MIXED</strong> first reorders<br />
the terms to correspond to the order of the variables in the CLASS statement. Thus, B*A<br />
becomes A*B if A precedes B in the CLASS statement. <strong>The</strong>n, PROC <strong>MIXED</strong> generates columns for<br />
all combinations of levels that occur in the data. <strong>The</strong> order of the columns is such that the rightmost<br />
variables in the cross index faster than the leftmost variables (Table 56.17). Empty columns (that<br />
would contain all 0s) are not generated for X, but they are for Z.<br />
Table 56.17 Example of Interaction Effects<br />
Data I A B A*B<br />
A B A1 A2 B1 B2 B3 A1B1 A1B2 A1B3 A2B1 A2B2 A2B3<br />
1 1 1 1 0 1 0 0 1 0 0 0 0 0<br />
1 2 1 1 0 0 1 0 0 1 0 0 0 0<br />
1 3 1 1 0 0 0 1 0 0 1 0 0 0<br />
2 1 1 0 1 1 0 0 0 0 0 1 0 0<br />
2 2 1 0 1 0 1 0 0 0 0 0 1 0<br />
2 3 1 0 1 0 0 1 0 0 0 0 0 1<br />
In the preceding matrix, main-effects columns are not linearly independent of crossed-effects<br />
columns; in fact, the column space for the crossed effects contains the space of the main effect.<br />
When your model contains many interaction effects, you might be able to code them more parsimoniously<br />
by using the bar operator ( | ). <strong>The</strong> bar operator generates all possible interaction effects.<br />
For example, A|B|C expands to A B A*B C A*C B*C A*B*C. To eliminate higher-order interaction<br />
effects, use the at sign (@) in conjunction with the bar operator. For instance, A|B|C|D @2 expands<br />
to A B A*B C A*C B*C D A*D B*D C*D.<br />
Nested Effects<br />
Nested effects are generated in the same manner as crossed effects. Hence, the design columns generated<br />
by the following two statements are the same (but the ordering of the columns is different):<br />
model Y=A B(A);<br />
model Y=A A*B;<br />
<strong>The</strong> nesting operator in PROC <strong>MIXED</strong> is more a notational convenience than an operation distinct<br />
from crossing. Nested effects are typically characterized by the property that the nested variables<br />
never appear as main effects. <strong>The</strong> order of the variables within nesting parentheses is made to<br />
correspond to the order of these variables in the CLASS statement. <strong>The</strong> order of the columns is<br />
such that variables outside the parentheses index faster than those inside the parentheses, and the<br />
rightmost nested variables index faster than the leftmost variables (Table 56.18).
3978 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.18 Example of Nested Effects<br />
Data I A B(A)<br />
A B A1 A2 B1A1 B2A1 B3A1 B1A2 B2A2 B3A2<br />
1 1 1 1 0 1 0 0 0 0 0<br />
1 2 1 1 0 0 1 0 0 0 0<br />
1 3 1 1 0 0 0 1 0 0 0<br />
2 1 1 0 1 0 0 0 1 0 0<br />
2 2 1 0 1 0 0 0 0 1 0<br />
2 3 1 0 1 0 0 0 0 0 1<br />
Note that nested effects are often distinguished from interaction effects by the implied randomization<br />
structure of the design. That is, they usually indicate random effects within a fixed-effects<br />
framework. <strong>The</strong> fact that random effects can be modeled directly in the RANDOM statement might<br />
make the specification of nested effects in the MODEL statement unnecessary.<br />
Continuous-Nesting-Class Effects<br />
When a continuous variable nests with a classification variable, the design columns are constructed<br />
by multiplying the continuous values into the design columns for the class effect (Table 56.19).<br />
Table 56.19 Example of Continuous-Nesting-Class Effects<br />
Data I A X(A)<br />
X A A1 A2 X(A1) X(A2)<br />
21 1 1 1 0 21 0<br />
24 1 1 1 0 24 0<br />
22 1 1 1 0 22 0<br />
28 2 1 0 1 0 28<br />
19 2 1 0 1 0 19<br />
23 2 1 0 1 0 23<br />
This model estimates a separate slope for X within each level of A.<br />
Continuous-by-Class Effects<br />
Continuous-by-class effects generate the same design columns as continuous-nesting-class effects.<br />
<strong>The</strong> two models are made different by the presence of the continuous variable as a regressor by<br />
itself, as well as a contributor to a compound effect. Table 56.20 shows an example.
Table 56.20 Example of Continuous-by-Class Effects<br />
Data I X A X*A<br />
Parameterization of Mixed Models ✦ 3979<br />
X A X A1 A2 X*A1 X*A2<br />
21 1 1 21 1 0 21 0<br />
24 1 1 24 1 0 24 0<br />
22 1 1 22 1 0 22 0<br />
28 2 1 28 0 1 0 28<br />
19 2 1 19 0 1 0 19<br />
23 2 1 23 0 1 0 23<br />
You can use continuous-by-class effects to test for homogeneity of slopes.<br />
General Effects<br />
An example that combines all the effects is X1*X2*A*B*C (D E). <strong>The</strong> continuous list comes first,<br />
followed by the crossed list, followed by the nested list in parentheses. You should be aware of<br />
the sequencing of parameters when you use the CONTRAST or ESTIMATE statement to compute<br />
some function of the parameter estimates.<br />
Effects might be renamed by PROC <strong>MIXED</strong> to correspond to ordering rules. For example, B*A(E<br />
D) might be renamed A*B(D E) to satisfy the following:<br />
Classification variables that occur outside parentheses (crossed effects) are sorted in the order<br />
in which they appear in the CLASS statement.<br />
Variables within parentheses (nested effects) are sorted in the order in which they appear in<br />
the CLASS statement.<br />
<strong>The</strong> sequencing of the parameters generated by an effect can be described by which variables have<br />
their levels indexed faster:<br />
Variables in the crossed list index faster than variables in the nested list.<br />
Within a crossed or nested list, variables to the right index faster than variables to the left.<br />
For example, suppose a model includes four effects—A, B, C, and D—each having two levels, 1 and<br />
2. Suppose the CLASS statement is as follows:<br />
class A B C D;<br />
<strong>The</strong>n the order of the parameters for the effect B*A(C D), which is renamed A*B (C D), is<br />
A1B1C1D1 ! A1B2C1D1 ! A2B1C1D1 ! A2B2C1D1 !<br />
A1B1C1D2 ! A1B2C1D2 ! A2B1C1D2 ! A2B2C1D2 !<br />
A1B1C2D1 ! A1B2C2D1 ! A2B1C2D1 ! A2B2C2D1 !<br />
A1B1C2D2 ! A1B2C2D2 ! A2B1C2D2 ! A2B2C2D2
3980 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Note that first the crossed effects B and A are sorted in the order in which they appear in the CLASS<br />
statement so that A precedes B in the parameter list. <strong>The</strong>n, for each combination of the nested effects<br />
in turn, combinations of A and B appear. <strong>The</strong> B effect moves fastest because it is rightmost in the<br />
cross list. <strong>The</strong>n A moves next fastest, and D moves next fastest. <strong>The</strong> C effect is the slowest since it<br />
is leftmost in the nested list.<br />
When numeric levels are used, levels are sorted by their character format, which might not correspond<br />
to their numeric sort sequence (for example, noninteger levels). <strong>The</strong>refore, it is advisable to<br />
include a desired format for numeric levels or to use the ORDER=INTERNAL option in the PROC<br />
<strong>MIXED</strong> statement to ensure that levels are sorted by their internal values.<br />
Implications of the Non-Full-Rank Parameterization<br />
For models with fixed effects involving classification variables, there are more design columns in<br />
X constructed than there are degrees of freedom for the effect. Thus, there are linear dependencies<br />
among the columns of X. In this event, all of the parameters are not estimable; there is an infinite<br />
number of solutions to the mixed model equations. PROC <strong>MIXED</strong> uses a generalized inverse (a<br />
g2-inverse, Pringle and Rayner, 1971) to obtain values for the estimates (Searle 1971). <strong>The</strong> solution<br />
values are not displayed unless you specify the SOLUTION option in the MODEL statement. <strong>The</strong><br />
solution has the characteristic that estimates are 0 whenever the design column for that parameter<br />
is a linear combination of previous columns. With this parameterization, hypothesis tests are<br />
constructed to test linear functions of the parameters that are estimable.<br />
Some procedures (such as the CATMOD procedure) reparameterize models to full rank by using<br />
restrictions on the parameters. PROC GLM and PROC <strong>MIXED</strong> do not reparameterize, making the<br />
hypotheses that are commonly tested more understandable. See Goodnight (1978) for additional<br />
reasons for not reparameterizing.<br />
Missing Level Combinations<br />
PROC <strong>MIXED</strong> handles missing level combinations of classification variables similarly to the way<br />
PROC GLM does. Both procedures delete fixed-effects parameters corresponding to missing levels<br />
in order to preserve estimability. However, PROC <strong>MIXED</strong> does not delete missing level combinations<br />
for random-effects parameters because linear combinations of the random-effects parameters<br />
are always estimable. <strong>The</strong>se conventions can affect the way you specify your CONTRAST and<br />
ESTIMATE coefficients.<br />
Residuals and Influence Diagnostics<br />
Residual Diagnostics<br />
Consider a residual vector of the form e D PY, where P is a projection matrix, possibly an oblique<br />
projector. A typical element ei with variance vi and estimated variancebvi is said to be standardized
as<br />
ei ei<br />
p D pvi<br />
VarŒei<br />
and studentized as<br />
ei<br />
p bvi<br />
Residuals and Influence Diagnostics ✦ 3981<br />
External studentization uses an estimate of VarŒei that does not involve the ith observation. Externally<br />
studentized residuals are often preferred over studentized residuals because they have wellknown<br />
distributional properties in standard linear models for independent data.<br />
q<br />
Residuals that are scaled by the estimated variance of the response, i.e., ei= cVarŒYi, are referred<br />
to as Pearson-type residuals.<br />
Marginal and Conditional Residuals<br />
<strong>The</strong> marginal and conditional means in the linear mixed model are EŒY D Xˇ and EŒYj D<br />
Xˇ C Z , respectively. Accordingly, the vector rm of marginal residuals is defined as<br />
rm D Y Xbˇ<br />
and the vector rc of conditional residuals is<br />
rc D Y Xbˇ Zb D rm Zb<br />
Following Gregoire, Schabenberger, and Barrett (1995), let Q D X.X 0bV 1 X/ X 0 and K D I<br />
ZbGZ 0bV 1 . <strong>The</strong>n<br />
cVarŒrm D bV Q<br />
cVarŒrc D K.bV Q/K 0<br />
For an individual observation the raw, studentized, and Pearson-type residuals computed by the<br />
<strong>MIXED</strong> procedure are given in Table 56.21.<br />
Table 56.21 Residual Types Computed by the <strong>MIXED</strong> <strong>Procedure</strong><br />
Type of Residual Marginal Conditional<br />
Raw rmi D Yi x 0 i bˇ rci D rmi z 0 i b<br />
Studentized rstudent mi D rmi p<br />
bVarŒrmi <br />
Pearson r pearson<br />
mi D rmi p<br />
bVarŒYi <br />
rstudent ci D rci p<br />
bVarŒrci <br />
r pearson<br />
ci<br />
D<br />
rci p<br />
bVarŒYi j
3982 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
When the OUTPM= option is specified in addition to the RESIDUAL option in the MODEL statement,<br />
rstudent mi and r pearson<br />
mi are added to the data set as variables Resid, StudentResid, and PearsonResid,<br />
respectively. When the OUTP= option is specified, rstudent ci and r pearson<br />
ci are added to<br />
the data set. Raw residuals are part of the OUTPM= and OUTP= data sets without the RESIDUAL<br />
option.<br />
Scaled Residuals<br />
For correlated data, a set of scaled quantities can be defined through the Cholesky decomposition<br />
of the variance-covariance matrix. Since fitted residuals in linear models are rank-deficient, it is<br />
customary to draw on the variance-covariance matrix of the data. If VarŒY D V and C 0 C D V,<br />
then C 0 1 Y has uniform dispersion and its elements are uncorrelated.<br />
Scaled residuals in a mixed model are meaningful for quantities based on the marginal distribution<br />
of the data. Let bC denote the Cholesky root of bV, so that bC 0bC D bV, and define<br />
Yc D bC 0 1 Y<br />
r m.c/ D bC 0 1 rm<br />
By analogy with other scalings, the inverse Cholesky decomposition can also be applied to the<br />
residual vector, bC 0 1 rm, although V is not the variance-covariance matrix of rm.<br />
To diagnose whether the covariance structure of the model has been specified correctly can be<br />
difficult based on Yc, since the inverse Cholesky transformation affects the expected value of Yc.<br />
You can draw on r m.c/ as a vector of (approximately) uncorrelated data with constant mean.<br />
When the OUTPM= option in the MODEL statement is specified in addition to the VCIRY option,<br />
Yc is added as variable ScaledDep and r m.c/ is added as ScaledResid to the data set.<br />
Influence Diagnostics<br />
Basic Idea and Statistics<br />
<strong>The</strong> general idea of quantifying the influence of one or more observations relies on computing parameter<br />
estimates based on all data points, removing the cases in question from the data, refitting<br />
the model, and computing statistics based on the change between full-data and reduced-data estimation.<br />
Influence statistics can be coarsely grouped by the aspect of estimation that is their primary<br />
target:<br />
overall measures compare changes in objective functions: (restricted) likelihood distance<br />
(Cook and Weisberg 1982, Ch. 5.2)<br />
influence on parameter estimates: Cook’s D (Cook 1977, 1979), MDFFITS (Belsley, Kuh,<br />
and Welsch 1980, p. 32)<br />
influence on precision of estimates: CovRatio and CovTrace<br />
influence on fitted and predicted values: PRESS residual, PRESS statistic (Allen 1974), DF-<br />
FITS (Belsley, Kuh, and Welsch 1980, p. 15)
Residuals and Influence Diagnostics ✦ 3983<br />
outlier properties: internally and externally studentized residuals, leverage<br />
For linear models for uncorrelated data, it is not necessary to refit the model after removing a<br />
data point in order to measure the impact of an observation on the model. <strong>The</strong> change in fixed<br />
effect estimates, residuals, residual sums of squares, and the variance-covariance matrix of the fixed<br />
effects can be computed based on the fit to the full data alone. By contrast, in mixed models<br />
several important complications arise. Data points can affect not only the fixed effects but also the<br />
covariance parameter estimates on which the fixed-effects estimates depend. Furthermore, closedform<br />
expressions for computing the change in important model quantities might not be available.<br />
This section provides background material for the various influence diagnostics available with the<br />
<strong>MIXED</strong> procedure. See the section “Mixed Models <strong>The</strong>ory” on page 3962 for relevant expressions<br />
and definitions. <strong>The</strong> parameter vector denotes all unknown parameters in the R and G matrix.<br />
<strong>The</strong> observations whose influence is being ascertained are represented by the set U and referred to<br />
simply as “the observations in U .” <strong>The</strong> estimate of a parameter vector, such as ˇ, obtained from<br />
all observations except those in the set U is denoted bˇ .U /. In case of a matrix A, the notation<br />
A .U / represents the matrix with the rows in U removed; these rows are collected in AU . If A is<br />
symmetric, then notation A .U / implies removal of rows and columns. <strong>The</strong> vector YU comprises<br />
the responses of the data points being removed, and V .U / is the variance-covariance matrix of<br />
the remaining observations. When k D 1, lowercase notation emphasizes that single points are<br />
removed, such as A .u/.<br />
Managing the Covariance Parameters<br />
An important component of influence diagnostics in the mixed model is the estimated variancecovariance<br />
matrix V D ZGZ 0 CR. To make the dependence on the vector of covariance parameters<br />
explicit, write it as V. /. If one parameter, 2 , is profiled or factored out of V, the remaining<br />
parameters are denoted as . Notice that in a model where G is diagonal and R D 2 I, the<br />
parameter vector contains the ratios of each variance component and 2 (see Wolfinger, Tobias,<br />
and Sall 1994). When ITER=0, two scenarios are distinguished:<br />
1. If the residual variance is not profiled, either because the model does not contain a residual<br />
variance or because it is part of the Newton-Raphson iterations, then b b. .U /<br />
2. If the residual variance is profiled, then b b and b .U /<br />
2<br />
.U / 6D b2 . Influence statistics<br />
such as Cook’s D and internally studentized residuals are based on V.b/, whereas externally<br />
studentized residuals and the DFFITS statistic are based on V.bU / D 2<br />
.U / V.b /. In a<br />
random components model with uncorrelated errors, for example, the computation of V.bU /<br />
involves scaling of bG and bR by the full-data estimate b2 and multiplying the result with the<br />
reduced-data estimate b2 .U / .<br />
Certain statistics, such as MDFFITS, CovRatio, and CovTrace, require an estimate of the variance<br />
of the fixed effects that is based on the reduced number of observations. For example, V.b U / is<br />
evaluated at the reduced-data parameter estimates but computed for the entire data set. <strong>The</strong> matrix<br />
V .U /.b .U //, on the other hand, has rows and columns corresponding to the points in U removed.<br />
<strong>The</strong> resulting matrix is evaluated at the delete-case estimates.
3984 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
When influence analysis is iterative, the entire vector is updated, whether the residual variance<br />
is profiled or not. <strong>The</strong> matrices to be distinguished here are V.b/, V.b .U //, and V .U /.b .U //, with<br />
unambiguous notation.<br />
Predicted Values, PRESS Residual, and PRESS Statistic<br />
An unconditional predicted value is byi D x 0 i bˇ, where the vector xi is the ith row of X. <strong>The</strong> (raw)<br />
residual is given asbi D yi byi, and the PRESS residual is<br />
b i.U / D yi x 0 i bˇ .U /<br />
<strong>The</strong> PRESS statistic is the sum of the squared PRESS residuals,<br />
PRESS D X<br />
i2U<br />
b 2<br />
i.U /<br />
where the sum is over the observations in U .<br />
If EFFECT=, SIZE=, or KEEP= is not specified, PROC <strong>MIXED</strong> computes the PRESS residual<br />
for each observation selected through SELECT= (or all observations if SELECT= is not given). If<br />
EFFECT=, SIZE=, or KEEP= is specified, the procedure computes PRESS.<br />
Leverage<br />
For the general mixed model, leverage can be defined through the projection matrix that results from<br />
a transformation of the model with the inverse of the Cholesky decomposition of V, or through an<br />
oblique projector. <strong>The</strong> <strong>MIXED</strong> procedure follows the latter path in the computation of influence<br />
diagnostics. <strong>The</strong> leverage value reported for the ith observation is the ith diagonal entry of the<br />
matrix<br />
H D X.X 0 V.b/ 1 X/ X 0 V.b/ 1<br />
which is the weight of the observation in contributing to its own predicted value, H D dbY=dY.<br />
While H is idempotent, it is generally not symmetric and thus not a projection matrix in the narrow<br />
sense.<br />
<strong>The</strong> properties of these leverages are generalizations of the properties in models with diagonal<br />
variance-covariance matrices. For example, bY D HY, and in a model with intercept and V D 2 I,<br />
the leverage values<br />
hii D x 0 i .X0 X/ xi<br />
are h l ii D 1=n hii 1 D h u ii and P n<br />
iD1 hii D rank.X/. <strong>The</strong> lower bound for hii is achieved<br />
in an intercept-only model, and the upper bound is achieved in a saturated model. <strong>The</strong> trace of H<br />
equals the rank of X.<br />
If ij denotes the element in row i, column j of V 1 , then for a model containing only an intercept<br />
the diagonal elements of H are<br />
hii D<br />
Pn j D1 ij<br />
Pn Pn iD1 j D1 ij
Residuals and Influence Diagnostics ✦ 3985<br />
Because P n<br />
j D1 ij is a sum of elements in the ith row of the inverse variance-covariance matrix, hii<br />
can be negative, even if the correlations among data points are nonnegative. In case of a saturated<br />
model with X D I, hii D 1:0.<br />
Internally and Externally Studentized Residuals<br />
See the section “Residual Diagnostics” on page 3980 for the distinction between standardization,<br />
studentization, and scaling of residuals. Internally studentized marginal and conditional residuals<br />
are computed with the RESIDUAL option of the MODEL statement. <strong>The</strong> INFLUENCE option<br />
computes internally and externally studentized marginal residuals.<br />
<strong>The</strong> computation of internally studentized residuals relies on the diagonal entries of V.b/ Q.b/,<br />
where Q.b/ D X.X 0 V.b/ 1 X/ X 0 . Externally studentized residuals require iterative influence<br />
analysis or a profiled residual variance. In the former case the studentization is based on V.b U /; in<br />
the latter case it is based on 2<br />
.U / V.b /.<br />
Cook’s D<br />
Cook’s D statistic is an invariant norm that measures the influence of observations in U on a vector<br />
of parameter estimates (Cook 1977). In case of the fixed-effects coefficients, let<br />
ı .U / D bˇ bˇ .U /<br />
<strong>The</strong>n the <strong>MIXED</strong> procedure computes<br />
D.ˇ/ D ı 0<br />
.U / cVarŒbˇ ı .U /=rank.X/<br />
where cVarŒbˇ is the matrix that results from sweeping .X 0 V.b/ 1 X/ .<br />
If V is known, Cook’s D can be calibrated according to a chi-square distribution with degrees of<br />
freedom equal to the rank of X (Christensen, Pearson, and Johnson 1992). For estimated V the<br />
calibration can be carried out according to an F .rank.X/; n rank.X// distribution. To interpret D<br />
on a familiar scale, Cook (1979) and Cook and Weisberg (1982, p. 116) refer to the 50th percentile<br />
of the reference distribution. If D is equal to that percentile, then removing the points in U moves<br />
the fixed-effects coefficient vector from the center of the confidence region to the 50% confidence<br />
ellipsoid (Myers 1990, p. 262).<br />
In the case of iterative influence analysis, the <strong>MIXED</strong> procedure also computes a D-type statistic<br />
for the covariance parameters. If is the asymptotic variance-covariance matrix of b, then <strong>MIXED</strong><br />
computes<br />
D D .b b .U /// 0b 1 .b b .U //<br />
DFFITS and MDFFITS<br />
A DFFIT measures the change in predicted values due to removal of data points. If this change is<br />
standardized by the externally estimated standard error of the predicted value in the full data, the<br />
DFFITS statistic of Belsley, Kuh, and Welsch (1980, p. 15) results:<br />
DFFITSi D .byi by i.u//=ese.byi/
3986 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> <strong>MIXED</strong> procedure computes DFFITS when the EFFECT= or SIZE= modifier of the<br />
INFLUENCE option is not in effect. In general, an external estimate of the estimated standard<br />
error is used. When ITER > 0, the estimate is<br />
ese.byi/ D<br />
q<br />
x 0 i .X0 V.b .u// X/ 1 xi<br />
When ITER=0 and 2 is profiled, then<br />
q<br />
ese.byi/ D b .u/<br />
x 0 i .X0 V.b / 1 X/ xi<br />
When the EFFECT=, SIZE=, or KEEP= modifier is specified, the <strong>MIXED</strong> procedure computes a<br />
multivariate version suitable for the deletion of multiple data points. <strong>The</strong> statistic, termed MDFFITS<br />
after the MDFFIT statistic of Belsley, Kuh, and Welsch (1980, p. 32), is closely related to Cook’s<br />
D. Consider the case V D 2 V. / so that<br />
VarŒbˇ D 2 .X 0 V. / 1 X/<br />
and let fVarŒbˇ .U / be an estimate of VarŒbˇ .U / that does not use the observations in U . <strong>The</strong> MDF-<br />
FITS statistic is then computed as<br />
MDFFITS.ˇ/ D ı 0<br />
.U / fVarŒbˇ .U / ı .U /=rank.X/<br />
If ITER=0 and 2 is profiled, then fVarŒbˇ .U / is obtained by sweeping<br />
b 2<br />
.U / .X0<br />
.U / V .U /.b / X .U //<br />
<strong>The</strong> underlying idea is that if were known, then<br />
.X 0<br />
.U / V .U /. / 1 X .U //<br />
would be VarŒbˇ= 2 in a generalized least squares regression with all but the data in U .<br />
In the case of iterative influence analysis, fVarŒbˇ .U / is evaluated at b .U /. Furthermore, a MDFFITStype<br />
statistic is then computed for the covariance parameters:<br />
MDFFITS. / D .b b .U // 0 cVarŒb .U / 1 .b b .U //<br />
Covariance Ratio and Trace<br />
<strong>The</strong>se statistics depend on the availability of an external estimate of V, or at least of 2 . Whereas<br />
Cook’s D and MDFFITS measure the impact of data points on a vector of parameter estimates, the<br />
covariance-based statistics measure impact on their precision. Following Christensen, Pearson, and<br />
Johnson (1992), the <strong>MIXED</strong> procedure computes<br />
CovTrace.ˇ/ D jtrace. cVarŒbˇ fVarŒbˇ .U // rank.X/j<br />
CovRatio.ˇ/ D detns. fVarŒbˇ .U //<br />
detns. cVarŒbˇ/<br />
where detns.M/ denotes the determinant of the nonsingular part of matrix M.
Residuals and Influence Diagnostics ✦ 3987<br />
In the case of iterative influence analysis these statistics are also computed for the covariance parameter<br />
estimates. If q denotes the rank of VarŒb, then<br />
CovTrace. / D jtrace. cVarŒb cVarŒb .U // qj<br />
CovRatio. / D detns. cVarŒb .U //<br />
detns. cVarŒb/<br />
Likelihood Distances<br />
<strong>The</strong> log-likelihood function l and restricted log-likelihood function lR of the linear mixed model<br />
are given in the section “Estimating Covariance Parameters in the Mixed Model” on page 3968.<br />
Denote as the collection of all parameters, i.e., the fixed effects ˇ and the covariance parameters<br />
. Twice the difference between the (restricted) log-likelihood evaluated at the full-data estimates<br />
b and at the reduced-data estimates b .U / is known as the (restricted) likelihood distance:<br />
RLD .U / D 2flR.b / lR.b .U //g<br />
LD .U / D 2fl.b / l.b .U //g<br />
Cook and Weisberg (1982, Ch. 5.2) refer to these differences as likelihood distances, Beckman,<br />
Nachtsheim, and Cook (1987) call the measures likelihood displacements. If the number of elements<br />
in that are subject to updating following point removal is q, then likelihood displacements can be<br />
compared against cutoffs from a chi-square distribution with q degrees of freedom. Notice that this<br />
reference distribution does not depend on the number of observations removed from the analysis,<br />
but rather on the number of model parameters that are updated. <strong>The</strong> likelihood displacement gives<br />
twice the amount by which the log likelihood of the full data changes if one were to use an estimate<br />
based on fewer data points. It is thus a global, summary measure of the influence of the observations<br />
in U jointly on all parameters.<br />
Unless METHOD=ML, the <strong>MIXED</strong> procedure computes the likelihood displacement based on the<br />
residual (=restricted) log likelihood, even if METHOD=MIVQUE0 or METHOD=TYPE1, TYPE2,<br />
or TYPE3.<br />
Noniterative Update Formulas<br />
Update formulas that do not require refitting of the model are available for the cases where V D<br />
2 I, V is known, or V is known. When ITER=0 and these update formulas can be invoked, the<br />
<strong>MIXED</strong> procedure uses the computational devices that are outlined in the following paragraphs. It<br />
is then assumed that the variance-covariance matrix of the fixed effects has the form .X 0 V 1 X/ .<br />
When DDFM=KENWARDROGER, this is not the case; the estimated variance-covariance matrix<br />
is then inflated to better represent the uncertainty in the estimated covariance parameters. Influence<br />
statistics when DDFM=KENWARDROGER should iteratively update the covariance parameters<br />
(ITER > 0). <strong>The</strong> dependence of V on is suppressed in the sequel for brevity.<br />
Updating the Fixed Effects Denote by U the .n k/ matrix that is assembled from k columns<br />
of the identity matrix. Each column of U corresponds to the removal of one data point. <strong>The</strong> point<br />
being targeted by the ith column of U corresponds to the row in which a 1 appears. Furthermore,
3988 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
define<br />
D .X 0 V 1 X/<br />
Q D X X 0<br />
P D V 1 .V Q/V 1<br />
<strong>The</strong> change in the fixed-effects estimates following removal of the observations in U is<br />
bˇ bˇ .U / D X 0 V 1 U.U 0 PU/ 1 U 0 V 1 .y Xbˇ/<br />
Using results in Cook and Weisberg (1982, A2) you can further compute<br />
e D .X 0 1<br />
.U / V.U / X .U // D C X 0 V 1 U.U 0 PU/ 1 U 0 V 1 X<br />
If X is .n p/ of rank m < p, then is deficient in rank and the <strong>MIXED</strong> procedure computes<br />
needed quantities in e by sweeping (Goodnight 1979). If the rank of the .k k/ matrix U0PU is<br />
less than k, the removal of the observations introduces a new singularity, whether X is of full rank or<br />
not. <strong>The</strong> solution vectors bˇ and bˇ .U / then do not have the same expected values and should not be<br />
compared. When the <strong>MIXED</strong> procedure encounters this situation, influence diagnostics that depend<br />
on the choice of generalized inverse are not computed. <strong>The</strong> procedure also monitors the singularity<br />
criteria when sweeping the rows of .X0V 1X/ and of .X0 1 V .U / .U / X .U // . If a new singularity is<br />
encountered or a former singularity disappears, no influence statistics are computed.<br />
Residual Variance When 2 is profiled out of the marginal variance-covariance matrix, a closedform<br />
estimate of 2 that is based on only the remaining observations can be computed provided<br />
V D V.b / is known. Hurtado (1993, Thm. 5.2) shows that<br />
.n q r/b 2<br />
.U / D .n q/b2 b 0 U .b2 U 0 PU/ 1 bU<br />
and bU D U 0 V 1 .y Xbˇ/. In the case of maximum likelihood estimation q D 0 and for REML<br />
estimation q D rank.X/. <strong>The</strong> constant r equals the rank of .U 0 PU/ for REML estimation and the<br />
number of effective observations that are removed if METHOD=ML.<br />
Likelihood Distances For noniterative methods the following computational devices are used to<br />
compute (restricted) likelihood distances provided that the residual variance 2 is profiled.<br />
<strong>The</strong> log likelihood function l.b/ evaluated at the full-data and reduced-data estimates can be written<br />
as<br />
l.b / D n<br />
2 log.b2 /<br />
1<br />
log jV j<br />
2<br />
1<br />
2 .y Xbˇ/ 0 V 1 .y Xbˇ/=b 2 n<br />
log.2 /<br />
2<br />
l.b<br />
.U // D n<br />
2 log.b2 .U / /<br />
1<br />
log jV j<br />
2<br />
1<br />
2 .y Xbˇ .U // 0 V 1 .y Xbˇ .U //=b 2<br />
.U /<br />
n<br />
log.2 /<br />
2<br />
Notice that l.b .U // evaluates the log likelihood for n data points at the reduced-data estimates. It is<br />
not the log likelihood obtained by fitting the model to the reduced data. <strong>The</strong> likelihood distance is<br />
then<br />
LD .U / D n log<br />
(<br />
2 b.U /<br />
b2 )<br />
n C y Xbˇ .U /<br />
0<br />
V 1 y Xbˇ .U / =b 2<br />
.U /<br />
Expressions for RLD .U / in noniterative influence analysis are derived along the same lines.
Default Output<br />
Default Output ✦ 3989<br />
<strong>The</strong> following sections describe the output PROC <strong>MIXED</strong> produces by default. This output is<br />
organized into various tables, and they are discussed in order of appearance.<br />
Model Information<br />
<strong>The</strong> “Model Information” table describes the model, some of the variables it involves, and the<br />
method used in fitting it. It also lists the method (profile, factor, parameter, or none) for handling<br />
the residual variance in the model. <strong>The</strong> profile method concentrates the residual variance out of the<br />
optimization problem, whereas the parameter method retains it as a parameter in the optimization.<br />
<strong>The</strong> factor method keeps the residual fixed, and none is displayed when a residual variance is not<br />
part of the model.<br />
<strong>The</strong> “Model Information” table also has a row labeled Fixed Effects SE Method. This row describes<br />
the method used to compute the approximate standard errors for the fixed-effects parameter<br />
estimates and related functions of them. <strong>The</strong> two possibilities for this row are Model-Based, which<br />
is the default method, and Empirical, which results from using the EMPIRICAL option in the PROC<br />
<strong>MIXED</strong> statement.<br />
For ODS purposes, the name of the “Model Information” table is “ModelInfo.”<br />
Class Level Information<br />
<strong>The</strong> “Class Level Information” table lists the levels of every variable specified in the CLASS statement.<br />
You should check this information to make sure the data are correct. You can adjust the order<br />
of the CLASS variable levels with the ORDER= option in the PROC <strong>MIXED</strong> statement. For ODS<br />
purposes, the name of the “Class Level Information” table is “ClassLevels.”<br />
Dimensions<br />
<strong>The</strong> “Dimensions” table lists the sizes of relevant matrices. This table can be useful in determining<br />
CPU time and memory requirements. For ODS purposes, the name of the “Dimensions” table is<br />
“Dimensions.”<br />
Number of Observations<br />
<strong>The</strong> “Number of Observations” table shows the number of observations read from the data set and<br />
the number of observations used in fitting the model.
3990 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Iteration History<br />
<strong>The</strong> “Iteration History” table describes the optimization of the residual log likelihood or log likelihood.<br />
<strong>The</strong> function to be minimized (the objective function) is 2l for ML and 2lR for REML;<br />
the column name of the objective function in the “Iteration History” table is “-2 Log Like” for<br />
ML and “-2 Res Log Like” for REML. <strong>The</strong> minimization is performed by using a ridge-stabilized<br />
Newton-Raphson algorithm, and the rows of this table describe the iterations that this algorithm<br />
takes in order to minimize the objective function.<br />
<strong>The</strong> Evaluations column of the “Iteration History” table tells how many times the objective function<br />
is evaluated during each iteration.<br />
<strong>The</strong> Criterion column of the “Iteration History” table is, by default, a relative Hessian convergence<br />
quantity given by<br />
g0 1 H k k gk jfkj where f k is the value of the objective function at iteration k, g k is the gradient (first derivative) of<br />
f k, and H k is the Hessian (second derivative) of f k. If H k is singular, then PROC <strong>MIXED</strong> uses the<br />
following relative quantity:<br />
g 0<br />
k g k<br />
jf kj<br />
To prevent the division by jf kj, use the ABSOLUTE option in the PROC <strong>MIXED</strong> statement. To<br />
use a relative function or gradient criterion, use the CONVF or CONVG option, respectively.<br />
<strong>The</strong> Hessian criterion is considered superior to function and gradient criteria because it measures<br />
orthogonality rather than lack of progress (Bates and Watts 1988). Provided the initial estimate is<br />
feasible and the maximum number of iterations is not exceeded, the Newton-Raphson algorithm<br />
is considered to have converged when the criterion is less than the tolerance specified with the<br />
CONVF, CONVG, or CONVH option in the PROC <strong>MIXED</strong> statement. <strong>The</strong> default tolerance is<br />
1E 8. If convergence is not achieved, PROC <strong>MIXED</strong> displays the estimates of the parameters at<br />
the last iteration.<br />
A convergence criterion that is missing indicates that a boundary constraint has been dropped; it is<br />
usually not a cause for concern.<br />
If you specify the ITDETAILS option in the PROC <strong>MIXED</strong> statement, then the covariance parameter<br />
estimates at each iteration are included as additional columns in the “Iteration History” table.<br />
For ODS purposes, the name of the “Iteration History” table is “IterHistory.”<br />
Convergence Status<br />
<strong>The</strong> “Convergence Status” table informs about the status of the iterative estimation process at the<br />
end of the Newton-Raphson optimization. It appears as a message in the listing, and this message<br />
is repeated in the log. <strong>The</strong> ODS object “ConvergenceStatus” also contains several nonprinting<br />
columns that can be helpful in checking the success of the iterative process, in particular during
Default Output ✦ 3991<br />
batch processing or when analyzing BY groups. <strong>The</strong> Status variable takes on the value 0 for a<br />
successful convergence (even if the Hessian matrix might not be positive definite). <strong>The</strong> values 1<br />
and 2 of the Status variable indicate lack of convergence and infeasible initial parameter values,<br />
respectively. <strong>The</strong> variables pdG and pdH can be used to check whether the G and R matrices are<br />
positive definite.<br />
For models that are not fit iteratively, such as models without random effects or when the NOITER<br />
option is in effect, the “Convergence Status” is not produced.<br />
Covariance Parameter Estimates<br />
<strong>The</strong> “Covariance Parameter Estimates” table contains the estimates of the parameters in G and<br />
R (see the section “Estimating Covariance Parameters in the Mixed Model” on page 3968). <strong>The</strong>ir<br />
values are labeled in the table along with Subject and Group information if applicable. <strong>The</strong> estimates<br />
are displayed in the Estimate column and are the results of one of the following estimation methods:<br />
REML, ML, MIVQUE0, SSCP, Type1, Type2, or Type3.<br />
If you specify the RATIO option in the PROC <strong>MIXED</strong> statement, the Ratio column is added to the<br />
table listing the ratio of each parameter estimate to that of the residual variance.<br />
Specifying the COVTEST option in the PROC <strong>MIXED</strong> statement produces the “Std Error,” “Z<br />
Value,” and “Pr Z” columns. <strong>The</strong> “Std Error” column contains the approximate standard errors of<br />
the covariance parameter estimates. <strong>The</strong>se are the square roots of the diagonal elements of the observed<br />
inverse Fisher information matrix, which equals 2H 1 , where H is the Hessian matrix. <strong>The</strong><br />
H matrix consists of the second derivatives of the objective function with respect to the covariance<br />
parameters; see Wolfinger, Tobias, and Sall (1994) for formulas. When you use the SCORING=<br />
option and PROC <strong>MIXED</strong> converges without stopping the scoring algorithm, PROC <strong>MIXED</strong> uses<br />
the expected Hessian matrix to compute the covariance matrix instead of the observed Hessian. <strong>The</strong><br />
observed or expected inverse Fisher information matrix can be viewed as an asymptotic covariance<br />
matrix of the estimates.<br />
<strong>The</strong> “Z Value” column is the estimate divided by its approximate standard error, and the “Pr Z”<br />
column is the one- or two-tailed area of the standard Gaussian density outside of the Z-value. <strong>The</strong><br />
<strong>MIXED</strong> procedure computes one-sided p-values for the residual variance and for covariance parameters<br />
with a lower bound of 0. <strong>The</strong> procedure computes two-sided p-values otherwise. <strong>The</strong>se<br />
statistics constitute Wald tests of the covariance parameters, and they are valid only asymptotically.<br />
CAUTION: Wald tests can be unreliable in small samples.<br />
For ODS purposes, the name of the “Covariance Parameter Estimates” table is “CovParms.”<br />
Fit Statistics<br />
<strong>The</strong> “Fit Statistics” table provides some statistics about the estimated mixed model. Expressions<br />
for the 2 times the log likelihood are provided in the section “Estimating Covariance Parameters<br />
in the Mixed Model” on page 3968. If the log likelihood is an extremely large number, then PROC<br />
<strong>MIXED</strong> has deemed the estimated V matrix to be singular. In this case, all subsequent results should<br />
be viewed with caution.
3992 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
In addition, the “Fit Statistics” table lists three information criteria: AIC, AICC, and BIC, all in<br />
smaller-is-better form. Expressions for these criteria are described under the IC option.<br />
For ODS purposes, the name of the “Model Fitting Information” table is “FitStatistics.”<br />
Null Model Likelihood Ratio Test<br />
If one covariance model is a submodel of another, you can carry out a likelihood ratio test for the<br />
significance of the more general model by computing 2 times the difference between their log<br />
likelihoods. <strong>The</strong>n compare this statistic to the 2 distribution with degrees of freedom equal to the<br />
difference in the number of parameters for the two models.<br />
This test is reported in the “Null Model Likelihood Ratio Test” table to determine whether it is<br />
necessary to model the covariance structure of the data at all. <strong>The</strong> “Chi-Square” value is 2 times<br />
the log likelihood from the null model minus 2 times the log likelihood from the fitted model,<br />
where the null model is the one with only the fixed effects listed in the MODEL statement and<br />
R D 2 I. This statistic has an asymptotic 2 distribution with q 1 degrees of freedom, where q is<br />
the effective number of covariance parameters (those not estimated to be on a boundary constraint).<br />
<strong>The</strong> “Pr > ChiSq” column contains the upper-tail area from this distribution. This p-value can be<br />
used to assess the significance of the model fit.<br />
This test is not produced for cases where the null hypothesis lies on the boundary of the parameter<br />
space, which is typically for variance component models. This is because the standard asymptotic<br />
theory does not apply in this case (Self and Liang 1987, Case 5).<br />
If you specify a PARMS statement, PROC <strong>MIXED</strong> constructs a likelihood ratio test between the<br />
best model from the grid search and the final fitted model and reports the results in the “Parameter<br />
Search” table.<br />
For ODS purposes, the name of the “Null Model Likelihood Ratio Test” table is “LRT.”<br />
Type 3 Tests of Fixed Effects<br />
<strong>The</strong> “Type 3 Tests of Fixed Effects” table contains hypothesis tests for the significance of each of<br />
the fixed effects—that is, those effects you specify in the MODEL statement. By default, PROC<br />
<strong>MIXED</strong> computes these tests by first constructing a Type 3 L matrix (see Chapter 15, “<strong>The</strong> Four<br />
Types of Estimable Functions”) for each effect. This L matrix is then used to compute the following<br />
F statistic:<br />
F D bˇ 0 L 0 ŒL.X 0bV 1 X/ L 0 Lbˇ<br />
r<br />
where r D rank.L.X 0bV 1 X/ L 0 /. A p-value for the test is computed as the tail area beyond this<br />
statistic from an F distribution with NDF and DDF degrees of freedom. <strong>The</strong> numerator degrees<br />
of freedom (NDF) are the row rank of L, and the denominator degrees of freedom are computed<br />
by using one of the methods described under the DDFM= option. Small values of the p-value<br />
(typically less than 0.05 or 0.01) indicate a significant effect.<br />
You can use the HTYPE= option in the MODEL statement to obtain tables of Type 1 (sequential)<br />
tests and Type 2 (adjusted) tests in addition to or instead of the table of Type 3 (partial) tests.
ODS Table Names ✦ 3993<br />
You can use the CHISQ option in the MODEL statement to obtain Wald 2 tests of the fixed<br />
effects. <strong>The</strong>se are carried out by using the numerator of the F statistic and comparing it with the 2<br />
distribution with NDF degrees of freedom. It is more liberal than the F test because it effectively<br />
assumes infinite denominator degrees of freedom.<br />
For ODS purposes, the names of the “Type 1 Tests of Fixed Effects” through the “Type 3 Tests of<br />
Fixed Effects” tables are “Tests1” through “Tests3,” respectively.<br />
ODS Table Names<br />
Each table created by PROC <strong>MIXED</strong> has a name associated with it, and you must use this name to<br />
reference the table when using ODS statements. <strong>The</strong>se names are listed in Table 56.22.<br />
Table 56.22 ODS Tables Produced by PROC <strong>MIXED</strong><br />
Table Name Description Required Statement / Option<br />
AccRates acceptance rates for posterior sampling<br />
PRIOR<br />
AsyCorr asymptotic correlation matrix of<br />
covariance parameters<br />
PROC <strong>MIXED</strong> ASYCORR<br />
AsyCov asymptotic covariance matrix of<br />
covariance parameters<br />
PROC <strong>MIXED</strong> ASYCOV<br />
Base base densities used for posterior<br />
sampling<br />
PRIOR<br />
Bound computed bound for posterior rejection<br />
sampling<br />
PRIOR<br />
CholG Cholesky root of the estimated G<br />
matrix<br />
RANDOM / GC<br />
CholR Cholesky root of blocks of the estimated<br />
R matrix<br />
REPEATED / RC<br />
CholV Cholesky root of blocks of the estimated<br />
V matrix<br />
RANDOM / VC<br />
ClassLevels level information from the CLASS<br />
statement<br />
default output<br />
Coef L matrix coefficients E option in MODEL,<br />
CONTRAST, ESTIMATE,<br />
or LSMEANS<br />
Contrasts results from the CONTRAST<br />
statements<br />
CONTRAST<br />
ConvergenceStatus convergence status default<br />
CorrB approximate correlation matrix of<br />
fixed-effects parameter estimates<br />
MODEL / CORRB<br />
CovB approximate covariance matrix of<br />
fixed-effects parameter estimates<br />
MODEL / COVB<br />
CovParms estimated covariance parameters default output<br />
Diffs differences of LS-means LSMEANS / DIFF (or PDIFF)
3994 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.22 continued<br />
Table Name Description Required Statement / Option<br />
Dimensions dimensions of the model default output<br />
Estimates results from ESTIMATE statements ESTIMATE<br />
FitStatistics fit statistics default<br />
G estimated G matrix RANDOM / G<br />
GCorr correlation matrix from the<br />
estimated G matrix<br />
RANDOM / GCORR<br />
HLM1 Type 1 Hotelling-Lawley-McKeon MODEL / HTYPE=1 and<br />
tests of fixed effects<br />
REPEATED / HLM TYPE=UN<br />
HLM2 Type 2 Hotelling-Lawley-McKeon MODEL / HTYPE=2 and<br />
tests of fixed effects<br />
REPEATED / HLM TYPE=UN<br />
HLM3 Type 3 Hotelling-Lawley-McKeon<br />
tests of fixed effects<br />
REPEATED / HLM TYPE=UN<br />
HLPS1 Type 1 Hotelling-Lawley-Pillai- MODEL / HTYPE=1 and<br />
Samson tests of fixed effects REPEATED / HLPS TYPE=UN<br />
HLPS2 Type 2 Hotelling-Lawley-Pillai- MODEL / HTYPE=1 and<br />
Samson tests of fixed effects REPEATED / HLPS TYPE=UN<br />
HLPS3 Type 3 Hotelling-Lawley-Pillai-<br />
Samson tests of fixed effects<br />
REPEATED / HLPS TYPE=UN<br />
Influence influence diagnostics MODEL / INFLUENCE<br />
InfoCrit information criteria PROC <strong>MIXED</strong> IC<br />
InvCholG inverse Cholesky root of the<br />
estimated G matrix<br />
RANDOM / GCI<br />
InvCholR inverse Cholesky root of blocks of<br />
the estimated R matrix<br />
REPEATED / RCI<br />
InvCholV inverse Cholesky root of blocks of<br />
the estimated V matrix<br />
RANDOM / VCI<br />
InvCovB inverse of approximate covariance<br />
matrix of fixed-effects parameter estimates<br />
MODEL / COVBI<br />
InvG inverse of the estimated G<br />
matrix<br />
RANDOM / GI<br />
InvR inverse of blocks of the estimated R<br />
matrix<br />
REPEATED / RI<br />
InvV inverse of blocks of the estimated V<br />
matrix<br />
RANDOM / VI<br />
IterHistory iteration history default output<br />
LComponents single-degree-of-freedom estimates<br />
corresponding to rows of the L matrix<br />
for fixed effects<br />
MODEL / LCOMPONENTS<br />
LRT likelihood ratio test default output<br />
LSMeans LS-means LSMEANS<br />
MMEq mixed model equations PROC <strong>MIXED</strong> MMEQ<br />
MMEqSol mixed model equations solution PROC <strong>MIXED</strong> MMEQSOL<br />
ModelInfo model information default output
Table 56.22 continued<br />
ODS Table Names ✦ 3995<br />
Table Name Description Required Statement / Option<br />
NObs number of observations read and<br />
used<br />
default output<br />
ParmSearch parameter search values PARMS<br />
Posterior posterior sampling information PRIOR<br />
R blocks of the estimated R matrix REPEATED / R<br />
RCorr correlation matrix from blocks of the<br />
estimated R matrix<br />
REPEATED / RCORR<br />
Search posterior density search table PRIOR / PSEARCH<br />
Slices tests of LS-means slices LSMEANS / SLICE=<br />
SolutionF fixed-effects solution vector MODEL / S<br />
SolutionR random-effects solution vector RANDOM / S<br />
Tests1 Type 1 tests of fixed effects MODEL / HTYPE=1<br />
Tests2 Type 2 tests of fixed effects MODEL / HTYPE=2<br />
Tests3 Type 3 tests of fixed effects default output<br />
Type1 Type 1 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE1<br />
Type2 Type 2 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE2<br />
Type3 Type 3 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE3<br />
Trans transformation of covariance parameters<br />
PRIOR / PTRANS<br />
V blocks of the estimated V matrix RANDOM / V<br />
VCorr correlation matrix from blocks of the<br />
estimated V matrix<br />
RANDOM / VCORR<br />
In Table 56.22, “Coef” refers to multiple tables produced by the E, E1, E2, or E3 option in the<br />
MODEL statement and the E option in the CONTRAST, ESTIMATE, and LSMEANS statements.<br />
You can create one large data set of these tables with a statement similar to the following:<br />
ods output Coef=c;<br />
To create separate data sets, use the following statement:<br />
ods output Coef(match_all)=c;<br />
Here the resulting data sets are named C, C1, C2, etc. <strong>The</strong> same principles apply to data sets created<br />
from the “R,” “CholR,” “InvCholR,” “RCorr,” “InvR,” “V,” “CholV,” “InvCholV,” “VCorr,” and<br />
“InvV” tables.<br />
In Table 56.22, the following changes have occurred from <strong>SAS</strong> 6. <strong>The</strong> “Predicted,” “PredMeans,”<br />
and “Sample” tables from <strong>SAS</strong> 6 no longer exist and have been replaced by output data sets; see<br />
descriptions of the MODEL statement options OUTPRED= and OUTPREDM= and the PRIOR<br />
statement option OUT= for more details. <strong>The</strong> “ML” and “REML” tables from <strong>SAS</strong> 6 have been<br />
replaced by the “IterHistory” table. <strong>The</strong> “Tests,” “HLM,” and “HLPS” tables from <strong>SAS</strong> 6 have<br />
been renamed “Tests3,” “HLM3,” and “HLPS3,” respectively.<br />
Table 56.23 lists the variable names associated with the data sets created when you use the ODS<br />
OUTPUT option in conjunction with the preceding tables. In Table 56.23, n is used to denote a
3996 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
generic number that depends on the particular data set and model you select, and it can assume a<br />
different value each time it is used (even within the same table). <strong>The</strong> phrase model specific appears<br />
in rows of the affected tables to indicate that columns in these tables depend on the variables you<br />
specify in the model.<br />
CAUTION: <strong>The</strong>re is a danger of name collisions with the variables in the model specific tables in<br />
Table 56.23 and variables in your input data set. You should avoid using input variables with the<br />
same names as the variables in these tables.<br />
Table 56.23 Variable Names for the ODS Tables Produced in PROC <strong>MIXED</strong><br />
Table Name Variables<br />
AsyCorr Row, CovParm, CovP1–CovPn<br />
AsyCov Row, CovParm, CovP1–CovPn<br />
BaseDen Type, Parm1–Parmn<br />
Bound Technique, Converge, Iterations, Evaluations, LogBound, CovP1–<br />
CovPn, TCovP1–TCovPn<br />
CholG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
CholR Index, Row, Col1–Coln<br />
CholV Index, Row, Col1–Coln<br />
ClassLevels Class, Levels, Values<br />
Coef model specific, LMatrix, Effect, Subject, Sub1–Subn, Group,<br />
Group1–Groupn, Row1–Rown<br />
Contrasts Label, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />
CorrB model specific, Effect, Row, Col1–Coln<br />
CovB model specific, Effect, Row, Col1–Coln<br />
CovParms CovParm, Subject, Group, Estimate, StandardError, ZValue,<br />
ProbZ, Alpha, Lower, Upper<br />
Diffs model specific, Effect, Margins, ByLevel, AT variables, Diff, StandardError,<br />
DF, tValue, Tails, Probt, Adjustment, Adjp, Alpha,<br />
Lower, Upper, AdjLow, AdjUpp<br />
Dimensions Descr, Value<br />
Estimates Label, Estimate, StandardError, DF, tValue, Tails, Probt, Alpha,<br />
Lower, Upper<br />
FitStatistics Descr, Value<br />
G model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
GCorr model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
HLM1 Effect, NumDF, DenDF, FValue, ProbF<br />
HLM2 Effect, NumDF, DenDF, FValue, ProbF<br />
HLM3 Effect, NumDF, DenDF, FValue, ProbF<br />
HLPS1 Effect, NumDF, DenDF, FValue, ProbF<br />
HLPS2 Effect, NumDF, DenDF, FValue, ProbF<br />
HLPS3 Effect, NumDF, DenDF, FValue, ProbF
Table 56.23 continued<br />
Table Name Variables<br />
ODS Table Names ✦ 3997<br />
Influence dependent on option modifiers, Effect, Tuple, Obs1–Obsk, Level,<br />
Iter, Index, Predicted, Residual, Leverage, PressRes, PRESS, Student,<br />
RMSE, RStudent, CookD, DFFITS, MDFFITS, CovRatio,<br />
CovTrace, CookDCP, MDFFITSCP, CovRatioCP, CovTraceCP,<br />
LD, RLD, Parm1–Parmp, CovP1–CovPq, Notes<br />
InfoCrit Neg2LogLike, Parms, AIC, AICC, HQIC, BIC, CAIC<br />
InvCholG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
InvCholR Index, Row, Col1–Coln<br />
InvCholV Index, Row, Col1–Coln<br />
InvCovB model specific, Effect, Row, Col1–Coln<br />
InvG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
InvR Index, Row, Col1–Coln<br />
InvV Index, Row, Col1–Coln<br />
IterHistory CovP1–CovPn, Iteration, Evaluations, M2ResLogLike,<br />
M2LogLike, Criterion<br />
LComponents Effect, TestType, LIndex, Estimate, StdErr, DF, tValue, Probt<br />
LRT DF, ChiSquare, ProbChiSq<br />
LSMeans model specific, Effect, Margins, ByLevel, AT variables, Estimate,<br />
StandardError, DF, tValue, Probt, Alpha, Lower, Upper, Cov1–<br />
Covn, Corr1–Corrn<br />
MMEq model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
MMEqSol model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Row, Col1–Coln<br />
ModelInfo Descr, Value<br />
Nobs Label, N, NObsRead, NObsUsed, SumFreqsRead, SumFreqsUsed<br />
ParmSearch CovP1–CovPn, Var, ResLogLike, M2ResLogLike2, LogLike,<br />
M2LogLike, LogDetH<br />
Posterior Descr, Value<br />
R Index, Row, Col1–Coln<br />
RCorr Index, Row, Col1–Coln<br />
Search Parm, TCovP1–TCovPn, Posterior<br />
Slices model specific, Effect, Margins, ByLevel, AT variables, NumDF,<br />
DenDF, FValue, ProbF<br />
SolutionF model specific, Effect, Estimate, StandardError, DF, tValue, Probt,<br />
Alpha, Lower, Upper<br />
SolutionR model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />
Groupn, Estimate, StdErrPred, DF, tValue, Probt, Alpha, Lower,<br />
Upper<br />
Tests1 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />
Tests2 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />
Tests3 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />
Type1 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF
3998 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Table 56.23 continued<br />
Table Name Variables<br />
Type2 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF<br />
Type3 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF<br />
Trans Prior, TCovP, CovP1–CovPn<br />
V Index, Row, Col1–Coln<br />
VCorr Index, Row, Col1–Coln<br />
Some of the variables listed in Table 56.23 are created only when you specify certain options in the<br />
relevant PROC <strong>MIXED</strong> statements.<br />
ODS Graphics<br />
This section describes the use of ODS for creating diagnostic plots with the <strong>MIXED</strong> procedure.<br />
To request these graphs you must specify the ODS GRAPHICS statement and the relevant options<br />
of the PROC <strong>MIXED</strong> or MODEL statement (Table 56.24). For more information about the ODS<br />
GRAPHICS statement, see Chapter 21, “Statistical Graphics Using ODS.” ODS names of the various<br />
graphics are given in the section “ODS Graph Names” on page 4002.<br />
Residual Plots<br />
<strong>The</strong> <strong>MIXED</strong> procedure can generate panels of residual diagnostics. Each panel consists of a plot<br />
of residuals versus predicted values, a histogram with normal density overlaid, a Q-Q plot, and<br />
summary residual and fit statistics (Figure 56.15). <strong>The</strong> plots are produced even if the OUTP= and<br />
OUTPM= options in the MODEL statement are not specified. Residual panels can be generated for<br />
marginal and conditional raw, studentized, and Pearson residuals as well as for scaled residuals (see<br />
the section “Residual Diagnostics” on page 3980).<br />
Recall the example in the section “Getting Started: <strong>MIXED</strong> <strong>Procedure</strong>” on page 3890. <strong>The</strong> following<br />
statements generate several 2 2 panels of residual graphs:<br />
ods graphics on;<br />
proc mixed data=heights;<br />
class Family Gender;<br />
model Height = Gender / residual;<br />
random Family Family*Gender;<br />
run;<br />
ods graphics off;
ODS Graphics ✦ 3999<br />
<strong>The</strong> graphical displays are requested by specifying the ODS GRAPHICS statement. <strong>The</strong> panel<br />
of the studentized marginal residuals is shown in Figure 56.15, and the panel of the studentized<br />
conditional residuals is shown in Figure 56.16.<br />
Figure 56.15 Panel of the Studentized (Marginal) Residuals<br />
Since the fixed-effects part of the model comprises only an intercept and the gender effect, the<br />
marginal mean takes on only two values, one for each gender. <strong>The</strong> “Residual Statistics” inset in the<br />
lower-right corner provides descriptive statistics for the set of residuals that is displayed. Note that<br />
residuals in a mixed model do not necessarily sum to zero, even if the model contains an intercept.
4000 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Figure 56.16 Panel of the Conditional Studentized Residuals<br />
Influence Plots<br />
<strong>The</strong> graphical features of the <strong>MIXED</strong> procedure enable you to generate plots of influence diagnostics<br />
and of deletion estimates. <strong>The</strong> type and number of plots produced depend on your modifiers of<br />
the INFLUENCE option in the MODEL statement and on the PLOTS= option in the PROC <strong>MIXED</strong><br />
statement. Plots related to covariance parameters are produced only when diagnostics are computed<br />
by iterative methods (ITER=). <strong>The</strong> estimates of the fixed effects—and covariance parameters when<br />
updates are iterative—are plotted when you specify the ESTIMATES modifier or when you request<br />
PLOTS=INFLUENCEESTPLOT.<br />
Two basic types of influence panels are shown in Figure 56.17 and Figure 56.18. <strong>The</strong> diagnostics<br />
panel shows Cook’s D and CovRatio statistics for the fixed effects and the covariance parameters.<br />
For the <strong>SAS</strong> statements that produce these influence panels, see Example 56.8. In this example, the<br />
impact of subjects (Person) on the analysis is assessed. <strong>The</strong> Cook’s D statistic measures a subject’s<br />
impact on the estimates, and the CovRatio statistic measures a subject’s impact on the precision of<br />
the estimates. Separate statistics are computed for the fixed effects and the covariance parameters.<br />
<strong>The</strong> CovRatio statistic has a threshold of 1.0. Values larger than 1.0 indicate that precision of the<br />
estimates is lost by exclusion of the observations in question. Values smaller than 1.0 indicate that
ODS Graphics ✦ 4001<br />
precision is gained by exclusion of the observations from the analysis. For example, it is evident<br />
from Output 56.17 that person 20 has considerable impact on the covariance parameter estimates<br />
and moderate influence on the fixed-effects estimates. Furthermore, exclusion of this subject from<br />
the analysis increases the precision of the covariance parameters, whereas the effect on the precision<br />
of the fixed effects is minor.<br />
Output 56.18 shows another type of influence plot, a panel of the deletion estimates. Each plot<br />
within the panel corresponds to one of the model parameters. A reference line is drawn at the<br />
estimate based on the full data.<br />
Figure 56.17 Influence Diagnostics
4002 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Figure 56.18 Deletion Estimates<br />
ODS Graph Names<br />
To request graphics with PROC <strong>MIXED</strong>, you must first enable ODS Graphics by specifying the ODS<br />
GRAPHICS ON statement. See Chapter 21, “Statistical Graphics Using ODS,” for more information.<br />
Some graphs are produced by default; other graphs are produced by using statements and options.<br />
You can reference every graph produced through ODS Graphics with a name. <strong>The</strong> names of the<br />
graphs that PROC <strong>MIXED</strong> generates are listed in Table 56.24, along with the required statements<br />
and options.<br />
Table 56.24 ODS Graphics Produced by PROC <strong>MIXED</strong><br />
ODS Graph Name Plot Description Statement or Option<br />
Boxplot Box plots PLOTS=BOXPLOT<br />
CovRatioPlot CovRatio statistics for fixed<br />
effects or covariance parame-<br />
ters<br />
CooksDPlot Cook’s D for fixed effects or<br />
covariance parameters<br />
PLOTS=INFLUENCE<strong>STAT</strong>PANEL(UNPACK)<br />
and MODEL / INFLUENCE<br />
PLOTS=INFLUENCE<strong>STAT</strong>PANEL(UNPACK)<br />
and MODEL / INFLUENCE
Table 56.24 continued<br />
ODS Graph Name Plot Description Statement or Option<br />
ODS Graphics ✦ 4003<br />
DistancePlot Likelihood or restricted likelihood<br />
distance<br />
MODEL / INFLUENCE<br />
InfluenceEstPlot Panel of deletion estimates MODEL / INFLUENCE(EST)<br />
or PLOTS=INFLUENCEESTPLOT and<br />
MODEL / INFLUENCE<br />
InfluenceEstPlot Parameter estimates after removing<br />
observation or sets of<br />
observations<br />
PLOTS=INFLUENCEESTPLOT(UNPACK)<br />
and MODEL / INFLUENCE<br />
InfluenceStatPanel Panel of influence statistics MODEL / INFLUENCE<br />
PearsonBoxPlot Box plot of Pearson residuals PLOTS=PEARSONPANEL(UNPACK BOX)<br />
PearsonByPredicted Pearson residuals vs.<br />
dictedpre-<br />
PLOTS=PEARSONPANEL(UNPACK)<br />
PearsonHistogram Histogram of Pearson residuals<br />
PLOTS=PEARSONPANEL(UNPACK)<br />
PearsonPanel Panel of Pearson residuals MODEL / RESIDUAL<br />
PearsonQQplot Q-Q plot of Pearson residuals PLOTS=PEARSONPANEL(UNPACK)<br />
PressPlot Plot of PRESS residuals or<br />
PRESS statistic<br />
PLOTS=PRESS and MODEL / INFLUENCE<br />
ResidualBoxplot Box plot of (raw) residuals PLOTS=RESIDUALPANEL(UNPACK BOX)<br />
ResidualByPredicted Residuals vs. predicted PLOTS=RESIDUALPANEL(UNPACK)<br />
ResidualHistogram Histogram of raw residuals PLOTS=RESIDUALPANEL(UNPACK)<br />
ResidualPanel Panel of (raw) residuals MODEL / RESIDUAL<br />
ResidualQQplot Q-Q plot of raw residuals PLOTS=RESIDUALPANEL(UNPACK)<br />
ScaledBoxplot Box plot of scaled residuals PLOTS=VCIRYPANEL(UNPACK BOX)<br />
ScaledByPredicted Scaled residuals vs. predicted PLOTS=VCIRYPANEL(UNPACK)<br />
ScaledHistogram Histogram of scaled residuals PLOTS=VCIRYPANEL(UNPACK)<br />
ScaledQQplot Q-Q plot of scaled residuals PLOTS=VCIRYPANEL(UNPACK)<br />
StudentBoxplot Box plot of studentized residuals<br />
PLOTS=STUDENTPANEL(UNPACK BOX)<br />
StudentByPredicted Studentized residuals vs. predicted<br />
PLOTS=STUDENTPANEL(UNPACK)<br />
StudentHistogram Histogram<br />
residuals<br />
of studentized PLOTS=STUDENTPANEL(UNPACK)<br />
StudentPanel Panel of studentized residuals MODEL / RESIDUAL<br />
StudentQQplot Q-Q plot of studentized residuals<br />
PLOTS=STUDENTPANEL(UNPACK)<br />
VCIRYPanel Panel of scaled residuals MODEL / VCIRY
4004 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Computational Issues<br />
Computational Method<br />
In addition to numerous matrix-multiplication routines, PROC <strong>MIXED</strong> frequently uses the sweep<br />
operator (Goodnight 1979) and the Cholesky root (Golub and Van Loan 1989). <strong>The</strong> routines perform<br />
a modified W transformation (Goodnight and Hemmerle 1979) for G-side likelihood calculations<br />
and a direct method for R-side likelihood calculations. For the Type 3 F tests, PROC <strong>MIXED</strong><br />
uses the algorithm described in Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>.”<br />
PROC <strong>MIXED</strong> uses a ridge-stabilized Newton-Raphson algorithm to optimize either a full (ML)<br />
or residual (REML) likelihood function. <strong>The</strong> Newton-Raphson algorithm is preferred to the EM<br />
algorithm (Lindstrom and Bates 1988). PROC <strong>MIXED</strong> profiles the likelihood with respect to the<br />
fixed effects and also with respect to the residual variance whenever it appears reasonable to do<br />
so. <strong>The</strong> residual profiling can be avoided by using the NOPROFILE option of the PROC <strong>MIXED</strong><br />
statement. PROC <strong>MIXED</strong> uses the MIVQUE0 method (Rao 1972; Giesbrecht 1989) to compute<br />
initial values.<br />
<strong>The</strong> likelihoods that PROC <strong>MIXED</strong> optimizes are usually well-defined continuous functions with a<br />
single optimum. <strong>The</strong> Newton-Raphson algorithm typically performs well and finds the optimum in<br />
a few iterations. It is a quadratically converging algorithm, meaning that the error of the approximation<br />
near the optimum is squared at each iteration. <strong>The</strong> quadratic convergence property is evident<br />
when the convergence criterion drops to zero by factors of 10 or more.<br />
Table 56.25 Notation for Order Calculations<br />
Symbol Number<br />
p columns of X<br />
g columns of Z<br />
N observations<br />
q covariance parameters<br />
t maximum observations per subject<br />
S subjects<br />
Using the notation from Table 56.25, the following are estimates of the computational speed of the<br />
algorithms used in PROC <strong>MIXED</strong>. For likelihood calculations, the crossproducts matrix construction<br />
is of order N.p C g/ 2 and the sweep operations are of order .p C g/ 3 . <strong>The</strong> first derivative<br />
calculations for parameters in G are of order qg 3 for ML and q.g 3 Cpg 2 Cp 2 g/ for REML. If you<br />
specify a subject effect in the RANDOM statement and if you are not using the REPEATED statement,<br />
then replace g with g=S and q with qS in these calculations. <strong>The</strong> first derivative calculations<br />
for parameters in R are of order qS.t 3 C gt 2 C g 2 t/ for ML and qS.t 3 C .p C g/t 2 C .p 2 C g 2 /t/<br />
for REML. For the second derivatives, replace q with q.q C 1/=2 in the first derivative expressions.<br />
When you specify both G- and R-side parameters (that is, when you use both the RANDOM and<br />
REPEATED statements), then additional calculations are required of an order equal to the sum of<br />
the orders for G and R. Considerable execution times can result in this case.
Computational Issues ✦ 4005<br />
For further details about the computational techniques used in PROC <strong>MIXED</strong>, see Wolfinger, Tobias,<br />
and Sall (1994).<br />
Parameter Constraints<br />
By default, some covariance parameters are assumed to satisfy certain boundary constraints during<br />
the Newton-Raphson algorithm. For example, variance components are constrained to be nonnegative,<br />
and autoregressive parameters are constrained to be between 1 and 1. You can remove these<br />
constraints with the NOBOUND option in the PARMS statement (or with the NOBOUND option<br />
in the PROC <strong>MIXED</strong> statement), but this can lead to estimates that produce an infinite likelihood.<br />
You can also introduce or change boundary constraints with the LOWERB= and UPPERB= options<br />
in the PARMS statement.<br />
During the Newton-Raphson algorithm, a parameter might be set equal to one of its boundary<br />
constraints for a few iterations and then it might move away from the boundary. You see a missing<br />
value in the Criterion column of the “Iteration History” table whenever a boundary constraint is<br />
dropped.<br />
For some data sets the final estimate of a parameter might equal one of its boundary constraints.<br />
This is usually not a cause for concern, but it might lead you to consider a different model. For<br />
instance, a variance component estimate can equal zero; in this case, you might want to drop the<br />
corresponding random effect from the model. However, be aware that changing the model in this<br />
fashion can affect degrees-of-freedom calculations.<br />
Convergence Problems<br />
For some data sets, the Newton-Raphson algorithm can fail to converge. Nonconvergence can result<br />
from a number of causes, including flat or ridged likelihood surfaces and ill-conditioned data.<br />
It is also possible for PROC <strong>MIXED</strong> to converge to a point that is not the global optimum of the<br />
likelihood, although this usually occurs only with the spatial covariance structures.<br />
If you experience convergence problems, the following points might be helpful:<br />
One useful tool is the PARMS statement, which lets you input initial values for the covariance<br />
parameters and performs a grid search over the likelihood surface.<br />
Sometimes the Newton-Raphson algorithm does not perform well when two of the covariance<br />
parameters are on a different scale—that is, when they are several orders of magnitude apart.<br />
This is because the Hessian matrix is processed jointly for the two parameters, and elements<br />
of it corresponding to one of the parameters can become close to internal tolerances in PROC<br />
<strong>MIXED</strong>. In this case, you can improve stability by rescaling the effects in the model so that<br />
the covariance parameters are on the same scale.<br />
Data that are extremely large or extremely small can adversely affect results because of the<br />
internal tolerances in PROC <strong>MIXED</strong>. Rescaling it can improve stability.<br />
For stubborn problems, you might want to specify ODS OUTPUT COVPARMS=data-setname<br />
to output the “Covariance Parameter Estimates” table as a precautionary measure. That
4006 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
way, if the problem does not converge, you can read the final parameter values back into a<br />
new run with the PARMSDATA= option in the PARMS statement.<br />
Fisher scoring can be more robust than Newton-Raphson with poor MIVQUE(0) starting<br />
values. Specifying a SCORING= value of 5 or so might help to recover from poor starting<br />
values.<br />
Tuning the singularity options SINGULAR=, SINGCHOL=, and SINGRES= in the MODEL<br />
statement can improve the stability of the optimization process.<br />
Tuning the MAXITER= and MAXFUNC= options in the PROC <strong>MIXED</strong> statement can save<br />
resources. Also, the ITDETAILS option displays the values of all the parameters at each<br />
iteration.<br />
Using the NOPROFILE and NOBOUND options in the PROC <strong>MIXED</strong> statement might help<br />
convergence, although they can produce unusual results.<br />
Although the CONVH convergence criterion usually gives the best results, you might want<br />
to try CONVF or CONVG, possibly along with the ABSOLUTE option.<br />
If the convergence criterion reaches a relatively small value such as 1E 7 but never gets<br />
lower than 1E 8, you might want to specify CONVH=1E 6 in the PROC <strong>MIXED</strong> statement<br />
to get results; however, interpret the results with caution.<br />
An infinite likelihood during the iteration process means that the Newton-Raphson algorithm<br />
has stepped into a region where either the R or V matrix is nonpositive definite. This is<br />
usually no cause for concern as long as iterations continue. If PROC <strong>MIXED</strong> stops because<br />
of an infinite likelihood, recheck your model to make sure that no observations from the same<br />
subject are producing identical rows in R or V and that you have enough data to estimate the<br />
particular covariance structure you have selected. Any time that the final estimated likelihood<br />
is infinite, subsequent results should be interpreted with caution.<br />
A nonpositive definite Hessian matrix can indicate a surface saddlepoint or linear dependencies<br />
among the parameters.<br />
A warning message about the singularities of X changing indicates that there is some linear<br />
dependency in the estimate of X 0bV 1 X that is not found in X 0 X. This can adversely affect<br />
the likelihood calculations and optimization process. If you encounter this problem, make<br />
sure that your model specification is reasonable and that you have enough data to estimate<br />
the particular covariance structure you have selected. Rearranging effects in the MODEL<br />
statement so that the most significant ones are first can help, because PROC <strong>MIXED</strong> sweeps<br />
the estimate of X 0 V 1 X in the order of the MODEL effects and the sweep is more stable<br />
if larger pivots are dealt with first. If this does not help, specifying starting values with the<br />
PARMS statement can place the optimization on a different and possibly more stable path.<br />
Lack of convergence can indicate model misspecification or a violation of the normality assumption.
Memory<br />
Computational Issues ✦ 4007<br />
Let p be the number of columns in X, and let g be the number of columns in Z. For large models,<br />
most of the memory resources are required for holding symmetric matrices of order p, g, and pCg.<br />
<strong>The</strong> approximate memory requirement in bytes is<br />
40.p 2 C g 2 / C 32.p C g/ 2<br />
If you have a large model that exceeds the memory capacity of your computer, see the suggestions<br />
listed under “Computing Time.”<br />
Computing Time<br />
PROC <strong>MIXED</strong> is computationally intensive, and execution times can be long. In addition to the<br />
CPU time used in collecting sums and crossproducts and in solving the mixed model equations (as<br />
in PROC GLM), considerable CPU time is often required to compute the likelihood function and<br />
its derivatives. <strong>The</strong>se latter computations are performed for every Newton-Raphson iteration.<br />
If you have a model that takes too long to run, the following suggestions can be helpful:<br />
Examine the “Model Information” table to find out the number of columns in the X and Z<br />
matrices. A large number of columns in either matrix can greatly increase computing time.<br />
You might want to eliminate some higher-order effects if they are too large.<br />
If you have a Z matrix with a lot of columns, use the DDFM=BW option in the MODEL<br />
statement to eliminate the time required for the containment method.<br />
If possible, “factor out” a common effect from the effects in the RANDOM statement and<br />
make it the SUBJECT= effect. This creates a block-diagonal G matrix and can often speed<br />
calculations.<br />
If possible, use the same or nested SUBJECT= effects in all RANDOM and REPEATED<br />
statements.<br />
If your data set is very large, you might want to analyze it in pieces. <strong>The</strong> BY statement can<br />
help implement this strategy.<br />
In general, specify random effects with a lot of levels in the REPEATED statement and those<br />
with a few levels in the RANDOM statement.<br />
<strong>The</strong> METHOD=MIVQUE0 option runs faster than either the METHOD=REML or<br />
METHOD=ML option because it is noniterative.<br />
You can specify known values for the covariance parameters by using the HOLD= or<br />
NOITER option in the PARMS statement or the GDATA= option in the RANDOM statement.<br />
This eliminates the need for iteration.<br />
<strong>The</strong> LOGNOTE option in the PROC <strong>MIXED</strong> statement writes periodic messages to the <strong>SAS</strong><br />
log concerning the status of the calculations. It can help you diagnose where the slowdown is<br />
occurring.
4008 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Examples: Mixed <strong>Procedure</strong><br />
<strong>The</strong> following are basic examples of the use of PROC <strong>MIXED</strong>. More examples and details can be<br />
found in Littell et al. (2006), Wolfinger (1997), Verbeke and Molenberghs (1997, 2000), Murray<br />
(1998), Singer (1998), Sullivan, Dukes, and Losina (1999), and Brown and Prescott (1999).<br />
Example 56.1: Split-Plot Design<br />
PROC <strong>MIXED</strong> can fit a variety of mixed models. One of the most common mixed models is the<br />
split-plot design. <strong>The</strong> split-plot design involves two experimental factors, A and B. Levels of A are<br />
randomly assigned to whole plots (main plots), and levels of B are randomly assigned to split plots<br />
(subplots) within each whole plot. <strong>The</strong> design provides more precise information about B than about<br />
A, and it often arises when A can be applied only to large experimental units. An example is where<br />
A represents irrigation levels for large plots of land and B represents different crop varieties planted<br />
in each large plot.<br />
Consider the following data from Stroup (1989a), which arise from a balanced split-plot design with<br />
the whole plots arranged in a randomized complete-block design. <strong>The</strong> variable A is the whole-plot<br />
factor, and the variable B is the subplot factor. A traditional analysis of these data involves the<br />
construction of the whole-plot error (A*Block) to test A and the pooled residual error (B*Block and<br />
A*B*Block) to test B and A*B. To carry out this analysis with PROC GLM, you must use a TEST<br />
statement to obtain the correct F test for A.<br />
Performing a mixed model analysis with PROC <strong>MIXED</strong> eliminates the need for the error term<br />
construction. PROC <strong>MIXED</strong> estimates variance components for Block, A*Block, and the residual,<br />
and it automatically incorporates the correct error terms into test statistics.<br />
<strong>The</strong> following statements create a DATA set for a split-plot design with four blocks, three whole-plot<br />
levels, and two subplot levels:<br />
data sp;<br />
input Block A B Y @@;<br />
datalines;<br />
1 1 1 56 1 1 2 41<br />
1 2 1 50 1 2 2 36<br />
1 3 1 39 1 3 2 35<br />
2 1 1 30 2 1 2 25<br />
2 2 1 36 2 2 2 28<br />
2 3 1 33 2 3 2 30<br />
3 1 1 32 3 1 2 24<br />
3 2 1 31 3 2 2 27<br />
3 3 1 15 3 3 2 19<br />
4 1 1 30 4 1 2 25<br />
4 2 1 35 4 2 2 30<br />
4 3 1 17 4 3 2 18<br />
;
<strong>The</strong> following statements fit the split-plot model assuming random block effects:<br />
proc mixed;<br />
class A B Block;<br />
model Y = A B A*B;<br />
random Block A*Block;<br />
run;<br />
Example 56.1: Split-Plot Design ✦ 4009<br />
<strong>The</strong> variables A, B, and Block are listed as classification variables in the CLASS statement. <strong>The</strong><br />
columns of model matrix X consist of indicator variables corresponding to the levels of the fixed<br />
effects A, B, and A*B listed on the right side of the MODEL statement. <strong>The</strong> dependent variable Y is<br />
listed on the left side of the MODEL statement.<br />
<strong>The</strong> columns of the model matrix Z consist of indicator variables corresponding to the levels of the<br />
random effects Block and A*Block. <strong>The</strong> G matrix is diagonal and contains the variance components<br />
of Block and A*Block. <strong>The</strong> R matrix is also diagonal and contains the residual variance.<br />
<strong>The</strong> <strong>SAS</strong> statements produce Output 56.1.1–Output 56.1.8.<br />
<strong>The</strong> “Model Information” table in Output 56.1.1 lists basic information about the split-plot model.<br />
REML is used to estimate the variance components, and the residual variance is profiled from the<br />
optimization.<br />
Output 56.1.1 Results for Split-Plot Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.SP<br />
Dependent Variable Y<br />
Covariance Structure Variance Components<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
<strong>The</strong> “Class Level Information” table in Output 56.1.2 lists the levels of all variables specified in the<br />
CLASS statement. You can check this table to make sure that the data are correct.<br />
Output 56.1.2 Split-Plot Example (continued)<br />
Class Level Information<br />
Class Levels Values<br />
A 3 1 2 3<br />
B 2 1 2<br />
Block 4 1 2 3 4<br />
<strong>The</strong> “Dimensions” table in Output 56.1.3 lists the magnitudes of various vectors and matrices. <strong>The</strong><br />
X matrix is seen to be 24 12, and the Z matrix is 24 16.
4010 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.1.3 Split-Plot Example (continued)<br />
Dimensions<br />
Covariance Parameters 3<br />
Columns in X 12<br />
Columns in Z 16<br />
Subjects 1<br />
Max Obs Per Subject 24<br />
<strong>The</strong> “Number of Observations” table in Output 56.1.4 shows that all observations read from the data<br />
set are used in the analysis.<br />
Output 56.1.4 Split-Plot Example (continued)<br />
Number of Observations<br />
Number of Observations Read 24<br />
Number of Observations Used 24<br />
Number of Observations Not Used 0<br />
PROC <strong>MIXED</strong> estimates the variance components for Block, A*Block, and the residual by REML.<br />
<strong>The</strong> REML estimates are the values that maximize the likelihood of a set of linearly independent<br />
error contrasts, and they provide a correction for the downward bias found in the usual maximum<br />
likelihood estimates. <strong>The</strong> objective function is 2 times the logarithm of the restricted likelihood,<br />
and PROC <strong>MIXED</strong> minimizes this objective function to obtain the estimates.<br />
<strong>The</strong> minimization method is the Newton-Raphson algorithm, which uses the first and second derivatives<br />
of the objective function to iteratively find its minimum. <strong>The</strong> “Iteration History” table in<br />
Output 56.1.5 records the steps of that optimization process. For this example, only one iteration<br />
is required to obtain the estimates. <strong>The</strong> Evaluations column reveals that the restricted likelihood<br />
is evaluated once for each of the iterations. A criterion of 0 indicates that the Newton-Raphson<br />
algorithm has converged.<br />
Output 56.1.5 Split-Plot Analysis (continued)<br />
Iteration History<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
0 1 139.81461222<br />
1 1 119.76184570 0.00000000<br />
Convergence criteria met.<br />
<strong>The</strong> REML estimates for the variance components of Block, A*Block, and the residual are 62.40,<br />
15.38, and 9.36, respectively, as listed in the Estimate column of the “Covariance Parameter Estimates”<br />
table in Output 56.1.6.
Output 56.1.6 Split-Plot Analysis (continued)<br />
Covariance Parameter<br />
Estimates<br />
Cov Parm Estimate<br />
Block 62.3958<br />
A*Block 15.3819<br />
Residual 9.3611<br />
Example 56.1: Split-Plot Design ✦ 4011<br />
<strong>The</strong> “Fit Statistics” table in Output 56.1.7 lists several pieces of information about the fitted mixed<br />
model, including the residual log likelihood. <strong>The</strong> Akaike (AIC) and Bayesian (BIC) information<br />
criteria can be used to compare different models; the ones with smaller values are preferred. <strong>The</strong><br />
AICC information criteria is a small-sample bias-adjusted form of the Akaike criterion (Hurvich<br />
and Tsai 1989).<br />
Output 56.1.7 Split-Plot Analysis (continued)<br />
Fit Statistics<br />
-2 Res Log Likelihood 119.8<br />
AIC (smaller is better) 125.8<br />
AICC (smaller is better) 127.5<br />
BIC (smaller is better) 123.9<br />
Finally, the fixed effects are tested by using Type 3 estimable functions (Output 56.1.8).<br />
Output 56.1.8 Split-Plot Analysis (continued)<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
A 2 6 4.07 0.0764<br />
B 1 9 19.39 0.0017<br />
A*B 2 9 4.02 0.0566<br />
<strong>The</strong> tests match the one obtained from the following PROC GLM statements:<br />
proc glm data=sp;<br />
class A B Block;<br />
model Y = A B A*B Block A*Block;<br />
test h=A e=A*Block;<br />
run;<br />
You can continue this analysis by producing solutions for the fixed and random effects and then<br />
testing various linear combinations of them by using the CONTRAST and ESTIMATE statements.<br />
If you use the same CONTRAST and ESTIMATE statements with PROC GLM, the test statistics
4012 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
correspond to the fixed-effects-only model. <strong>The</strong> test statistics from PROC <strong>MIXED</strong> incorporate the<br />
random effects.<br />
<strong>The</strong> various “inference space” contrasts given by Stroup (1989a) can be implemented via the<br />
ESTIMATE statement. Consider the following examples:<br />
proc mixed data=sp;<br />
class A B Block;<br />
model Y = A B A*B;<br />
random Block A*Block;<br />
estimate ’a1 mean narrow’<br />
intercept 1 A 1 B .5 .5 A*B .5 .5 |<br />
Block .25 .25 .25 .25<br />
A*Block .25 .25 .25 .25 0 0 0 0 0 0 0 0;<br />
estimate ’a1 mean intermed’<br />
intercept 1 A 1 B .5 .5 A*B .5 .5 |<br />
Block .25 .25 .25 .25;<br />
estimate ’a1 mean broad’<br />
intercept 1 a 1 b .5 .5 A*B .5 .5;<br />
run;<br />
<strong>The</strong>se statements result in Output 56.1.9.<br />
Output 56.1.9 Inference Space Results<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Estimates<br />
Standard<br />
Label Estimate Error DF t Value Pr > |t|<br />
a1 mean narrow 32.8750 1.0817 9 30.39
proc mixed;<br />
class A B Block;<br />
model Y = A B A*B;<br />
random Block A*Block;<br />
run;<br />
An equivalent way of specifying this model is as follows:<br />
proc mixed data=sp;<br />
class A B Block;<br />
model Y = A B A*B;<br />
random intercept A / subject=Block;<br />
run;<br />
Example 56.2: Repeated Measures ✦ 4013<br />
In general, if all of the effects in the RANDOM statement can be nested within one effect, you<br />
can specify that one effect by using the SUBJECT= option. <strong>The</strong> subject effect is, in a sense, “factored<br />
out” of the random effects. <strong>The</strong> specification that uses the SUBJECT= effect can result in<br />
faster execution times for large problems because PROC <strong>MIXED</strong> is able to perform the likelihood<br />
calculations separately for each subject.<br />
Example 56.2: Repeated Measures<br />
<strong>The</strong> following data are from Pothoff and Roy (1964) and consist of growth measurements for 11<br />
girls and 16 boys at ages 8, 10, 12, and 14. Some of the observations are suspect (for example, the<br />
third observation for person 20); however, all of the data are used here for comparison purposes.<br />
<strong>The</strong> analysis strategy employs a linear growth curve model for the boys and girls as well as a<br />
variance-covariance model that incorporates correlations for all of the observations arising from<br />
the same person. <strong>The</strong> data are assumed to be Gaussian, and their likelihood is maximized to estimate<br />
the model parameters. See Jennrich and Schluchter (1986), Louis (1988), Crowder and Hand<br />
(1990), Diggle, Liang, and Zeger (1994), and Everitt (1995) for overviews of this approach to repeated<br />
measures. Jennrich and Schluchter present results for the Pothoff and Roy data from various<br />
covariance structures. <strong>The</strong> PROC <strong>MIXED</strong> statements to fit an unstructured variance matrix (their<br />
Model 2) are as follows:<br />
data pr;<br />
input Person Gender $ y1 y2 y3 y4;<br />
y=y1; Age=8; output;<br />
y=y2; Age=10; output;<br />
y=y3; Age=12; output;<br />
y=y4; Age=14; output;<br />
drop y1-y4;<br />
datalines;<br />
1 F 21.0 20.0 21.5 23.0<br />
2 F 21.0 21.5 24.0 25.5<br />
3 F 20.5 24.0 24.5 26.0<br />
4 F 23.5 24.5 25.0 26.5<br />
5 F 21.5 23.0 22.5 23.5
4014 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
6 F 20.0 21.0 21.0 22.5<br />
7 F 21.5 22.5 23.0 25.0<br />
8 F 23.0 23.0 23.5 24.0<br />
9 F 20.0 21.0 22.0 21.5<br />
10 F 16.5 19.0 19.0 19.5<br />
11 F 24.5 25.0 28.0 28.0<br />
12 M 26.0 25.0 29.0 31.0<br />
13 M 21.5 22.5 23.0 26.5<br />
14 M 23.0 22.5 24.0 27.5<br />
15 M 25.5 27.5 26.5 27.0<br />
16 M 20.0 23.5 22.5 26.0<br />
17 M 24.5 25.5 27.0 28.5<br />
18 M 22.0 22.0 24.5 26.5<br />
19 M 24.0 21.5 24.5 25.5<br />
20 M 23.0 20.5 31.0 26.0<br />
21 M 27.5 28.0 31.0 31.5<br />
22 M 23.0 23.0 23.5 25.0<br />
23 M 21.5 23.5 24.0 28.0<br />
24 M 17.0 24.5 26.0 29.5<br />
25 M 22.5 25.5 25.5 26.0<br />
26 M 23.0 24.5 26.0 30.0<br />
27 M 22.0 21.5 23.5 25.0<br />
;<br />
proc mixed data=pr method=ml covtest;<br />
class Person Gender;<br />
model y = Gender Age Gender*Age / s;<br />
repeated / type=un subject=Person r;<br />
run;<br />
To follow Jennrich and Schluchter, this example uses maximum likelihood (METHOD=ML) instead<br />
of the default REML to estimate the unknown covariance parameters. <strong>The</strong> COVTEST option<br />
requests asymptotic tests of all the covariance parameters.<br />
<strong>The</strong> MODEL statement first lists the dependent variable Y. <strong>The</strong> fixed effects are then listed after the<br />
equal sign. <strong>The</strong> variable Gender requests a different intercept for the girls and boys, Age models<br />
an overall linear growth trend, and Gender*Age makes the slopes different over time. It is actually<br />
not necessary to specify Age separately, but doing so enables PROC <strong>MIXED</strong> to carry out a test for<br />
heterogeneous slopes. <strong>The</strong> S option requests the display of the fixed-effects solution vector.<br />
<strong>The</strong> REPEATED statement contains no effects, taking advantage of the default assumption that the<br />
observations are ordered similarly for each subject. <strong>The</strong> TYPE=UN option requests an unstructured<br />
block for each SUBJECT=Person. <strong>The</strong> R matrix is, therefore, block diagonal with 27 blocks, each<br />
block consisting of identical 4 4 unstructured matrices. <strong>The</strong> 10 parameters of these unstructured<br />
blocks make up the covariance parameters estimated by maximum likelihood. <strong>The</strong> R option requests<br />
that the first block of R be displayed.<br />
<strong>The</strong> results from this analysis are shown in Output 56.2.1–Output 56.2.9.
Example 56.2: Repeated Measures ✦ 4015<br />
Output 56.2.1 Repeated Measures Analysis with Unstructured Covariance Matrix<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.PR<br />
Dependent Variable y<br />
Covariance Structure Unstructured<br />
Subject Effect Person<br />
Estimation Method ML<br />
Residual Variance Method None<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Between-Within<br />
In Output 56.2.1, the covariance structure is listed as “Unstructured,” and no residual variance is<br />
used with this structure. <strong>The</strong> default degrees-of-freedom method here is “Between-Within.”<br />
Output 56.2.2 Repeated Measures Analysis (continued)<br />
Class Level Information<br />
Class Levels Values<br />
Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />
14 15 16 17 18 19 20 21 22 23<br />
24 25 26 27<br />
Gender 2 F M<br />
In Output 56.2.2, note that Person has 27 levels and Gender has 2.<br />
Output 56.2.3 Repeated Measures Analysis (continued)<br />
Dimensions<br />
Covariance Parameters 10<br />
Columns in X 6<br />
Columns in Z 0<br />
Subjects 27<br />
Max Obs Per Subject 4<br />
In Output 56.2.3, the 10 covariance parameters result from the 4 4 unstructured blocks of R. <strong>The</strong>re<br />
is no Z matrix for this model, and each of the 27 subjects has a maximum of 4 observations.
4016 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.2.4 Repeated Measures Analysis (continued)<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 108<br />
Number of Observations Not Used 0<br />
Iteration History<br />
Iteration Evaluations -2 Log Like Criterion<br />
0 1 478.24175986<br />
1 2 419.47721707 0.00000152<br />
2 1 419.47704812 0.00000000<br />
Convergence criteria met.<br />
Three Newton-Raphson iterations are required to find the maximum likelihood estimates<br />
(Output 56.2.4). <strong>The</strong> default relative Hessian criterion has a final value less than 1E 8, indicating<br />
the convergence of the Newton-Raphson algorithm and the attainment of an optimum.<br />
Output 56.2.5 Repeated Measures Analysis (continued)<br />
Estimated R Matrix for Person 1<br />
Row Col1 Col2 Col3 Col4<br />
1 5.1192 2.4409 3.6105 2.5222<br />
2 2.4409 3.9279 2.7175 3.0624<br />
3 3.6105 2.7175 5.9798 3.8235<br />
4 2.5222 3.0624 3.8235 4.6180<br />
<strong>The</strong> 4 4 matrix in Output 56.2.5 is the estimated unstructured covariance matrix. It is the estimate<br />
of the first block of R, and the other 26 blocks all have the same estimate.
Output 56.2.6 Repeated Measures Analysis (continued)<br />
Covariance Parameter Estimates<br />
Example 56.2: Repeated Measures ✦ 4017<br />
Standard Z<br />
Cov Parm Subject Estimate Error Value Pr Z<br />
UN(1,1) Person 5.1192 1.4169 3.61 0.0002<br />
UN(2,1) Person 2.4409 0.9835 2.48 0.0131<br />
UN(2,2) Person 3.9279 1.0824 3.63 0.0001<br />
UN(3,1) Person 3.6105 1.2767 2.83 0.0047<br />
UN(3,2) Person 2.7175 1.0740 2.53 0.0114<br />
UN(3,3) Person 5.9798 1.6279 3.67 0.0001<br />
UN(4,1) Person 2.5222 1.0649 2.37 0.0179<br />
UN(4,2) Person 3.0624 1.0135 3.02 0.0025<br />
UN(4,3) Person 3.8235 1.2508 3.06 0.0022<br />
UN(4,4) Person 4.6180 1.2573 3.67 0.0001<br />
<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.2.6 lists the 10 estimated covariance parameters<br />
in order; note their correspondence to the first block of R displayed in Output 56.2.5. <strong>The</strong><br />
parameter estimates are labeled according to their location in the block in the Cov Parm column,<br />
and all of these estimates are associated with Person as the subject effect. <strong>The</strong> Std Error column lists<br />
approximate standard errors of the covariance parameters obtained from the inverse Hessian matrix.<br />
<strong>The</strong>se standard errors lead to approximate Wald Z statistics, which are compared with the standard<br />
normal distribution <strong>The</strong> results of these tests indicate that all the parameters are significantly different<br />
from 0; however, the Wald test can be unreliable in small samples.<br />
To carry out Wald tests of various linear combinations of these parameters, use the following procedure.<br />
First, run the statements again, adding the ASYCOV option and an ODS statement:<br />
ods output CovParms=cp AsyCov=asy;<br />
proc mixed data=pr method=ml covtest asycov;<br />
class Person Gender;<br />
model y = Gender Age Gender*Age / s;<br />
repeated / type=un subject=Person r;<br />
run;<br />
This creates two data sets, cp and asy, which contain the covariance parameter estimates and their<br />
asymptotic variance covariance matrix, respectively. <strong>The</strong>n read these data sets into the <strong>SAS</strong>/IML<br />
matrix programming language as follows:<br />
proc iml;<br />
use cp;<br />
read all var {Estimate} into est;<br />
use asy;<br />
read all var (’CovP1’:’CovP10’) into asy;<br />
You can then construct your desired linear combinations and corresponding quadratic forms with<br />
the asy matrix.
4018 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.2.7 Repeated Measures Analysis (continued)<br />
Fit Statistics<br />
-2 Log Likelihood 419.5<br />
AIC (smaller is better) 447.5<br />
AICC (smaller is better) 452.0<br />
BIC (smaller is better) 465.6<br />
Null Model Likelihood Ratio Test<br />
DF Chi-Square Pr > ChiSq<br />
9 58.76 |t|<br />
Intercept 15.8423 0.9356 25 16.93
Output 56.2.9 Repeated Measures Analysis (continued)<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Gender 1 25 1.17 0.2904<br />
Age 1 25 110.54
4020 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
This specifies an unstructured covariance matrix for the random intercept and slope. In mixed model<br />
notation, G is block diagonal with identical 2 2 unstructured blocks for each person. By default, R<br />
becomes 2 I. See Example 56.5 for further information about this model.<br />
Finally, you can fit a compound symmetry structure by using TYPE=CS, as follows:<br />
proc mixed data=pr method=ml covtest;<br />
class Person Gender;<br />
model y = Gender Age Gender*Age / s;<br />
repeated / type=cs subject=Person r;<br />
run;<br />
<strong>The</strong> results from this analysis are shown in Output 56.2.10–Output 56.2.17.<br />
<strong>The</strong> “Model Information” table in Output 56.2.10 is the same as before except for the change in<br />
“Covariance Structure.”<br />
Output 56.2.10 Repeated Measures Analysis with Compound Symmetry Structure<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.PR<br />
Dependent Variable y<br />
Covariance Structure Compound Symmetry<br />
Subject Effect Person<br />
Estimation Method ML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Between-Within<br />
<strong>The</strong> “Dimensions” table in Output 56.2.11 shows that there are only two covariance parameters<br />
in the compound symmetry model; this covariance structure has common variance and common<br />
covariance.<br />
Output 56.2.11 Analysis with Compound Symmetry (continued)<br />
Class Level Information<br />
Class Levels Values<br />
Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />
14 15 16 17 18 19 20 21 22 23<br />
24 25 26 27<br />
Gender 2 F M
Output 56.2.11 continued<br />
Dimensions<br />
Covariance Parameters 2<br />
Columns in X 6<br />
Columns in Z 0<br />
Subjects 27<br />
Max Obs Per Subject 4<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 108<br />
Number of Observations Not Used 0<br />
Example 56.2: Repeated Measures ✦ 4021<br />
Since the data are balanced, only one step is required to find the estimates (Output 56.2.12).<br />
Output 56.2.12 Analysis with Compound Symmetry (continued)<br />
Iteration History<br />
Iteration Evaluations -2 Log Like Criterion<br />
0 1 478.24175986<br />
1 1 428.63905802 0.00000000<br />
Convergence criteria met.<br />
Output 56.2.13 displays the estimated R matrix for the first subject. Note the compound symmetry<br />
structure here, which consists of a common covariance with a diagonal enhancement.<br />
Output 56.2.13 Analysis with Compound Symmetry (continued)<br />
Estimated R Matrix for Person 1<br />
Row Col1 Col2 Col3 Col4<br />
1 4.9052 3.0306 3.0306 3.0306<br />
2 3.0306 4.9052 3.0306 3.0306<br />
3 3.0306 3.0306 4.9052 3.0306<br />
4 3.0306 3.0306 3.0306 4.9052<br />
<strong>The</strong> common covariance is estimated to be 3:0306, as listed in the CS row of the “Covariance Parameter<br />
Estimates” table in Output 56.2.14, and the residual variance is estimated to be 1:8746, as<br />
listed in the Residual row. You can use these two numbers to estimate the intraclass correlation coefficient<br />
(ICC) for this model. Here, the ICC estimate equals 3:0306=.3:0306 C 1:8746/ D 0:6178.<br />
You can also obtain this number by adding the RCORR option to the REPEATED statement.
4022 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.2.14 Analysis with Compound Symmetry (continued)<br />
Covariance Parameter Estimates<br />
Standard Z<br />
Cov Parm Subject Estimate Error Value Pr Z<br />
CS Person 3.0306 0.9552 3.17 0.0015<br />
Residual 1.8746 0.2946 6.36 ChiSq<br />
1 49.60 |t|<br />
Intercept 16.3406 0.9631 25 16.97
Output 56.2.17 Analysis with Compound Symmetry (continued)<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Gender 1 25 0.47 0.5003<br />
Age 1 79 111.10
4024 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.2.19 Analysis with Heterogeneous Structures (continued)<br />
Class Level Information<br />
Class Levels Values<br />
Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />
14 15 16 17 18 19 20 21 22 23<br />
24 25 26 27<br />
Gender 2 F M<br />
Dimensions<br />
Covariance Parameters 4<br />
Columns in X 6<br />
Columns in Z 0<br />
Subjects 27<br />
Max Obs Per Subject 4<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 108<br />
Number of Observations Not Used 0<br />
As Output 56.2.20 shows, even with the heterogeneity, only one iteration is required for convergence.<br />
Output 56.2.20 Analysis with Heterogeneous Structures (continued)<br />
Iteration History<br />
Iteration Evaluations -2 Log Like Criterion<br />
0 1 478.24175986<br />
1 1 408.81297228 0.00000000<br />
Convergence criteria met.<br />
<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.2.21 lists the heterogeneous estimates.<br />
Note that both the common covariance and the diagonal enhancement differ between girls and boys.<br />
Output 56.2.21 Analysis with Heterogeneous Structures (continued)<br />
Covariance Parameter Estimates<br />
Cov Parm Subject Group Estimate<br />
Variance Person Gender F 0.5900<br />
CS Person Gender F 3.8804<br />
Variance Person Gender M 2.7577<br />
CS Person Gender M 2.4463
Example 56.2: Repeated Measures ✦ 4025<br />
As Output 56.2.22 shows, both Akaike’s information criterion (424.8) and Schwarz’s Bayesian<br />
information criterion (435.2) are smaller for this model than for the homogeneous compound symmetry<br />
model (440.6 and 448.4, respectively). This indicates that the heterogeneous model is more<br />
appropriate. To construct the likelihood ratio test between the two models, subtract the 2 log<br />
likelihood values: 428:6 408:8 D 19:8. Comparing this value with the 2 distribution with two<br />
degrees of freedom yields a p-value less than 0.0001, again favoring the heterogeneous model.<br />
Output 56.2.22 Analysis with Heterogeneous Structures (continued)<br />
Fit Statistics<br />
-2 Log Likelihood 408.8<br />
AIC (smaller is better) 424.8<br />
AICC (smaller is better) 426.3<br />
BIC (smaller is better) 435.2<br />
Null Model Likelihood Ratio Test<br />
DF Chi-Square Pr > ChiSq<br />
3 69.43 |t|<br />
Intercept 16.3406 1.1130 25 14.68
4026 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.2.24 Analysis with Heterogeneous Structures (continued)<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Gender 1 25 0.55 0.4644<br />
Age 1 79 141.37
<strong>The</strong> results from this analysis are shown in Output 56.3.1–Output 56.3.13.<br />
Example 56.3: Plotting the Likelihood ✦ 4027<br />
<strong>The</strong> “Model Information” table in Output 56.3.1 lists details about this variance components model.<br />
Output 56.3.1 Model Information<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.HH<br />
Dependent Variable y<br />
Covariance Structure Variance Components<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
<strong>The</strong> “Class Level Information” table in Output 56.3.2 lists the levels for A and B.<br />
Output 56.3.2 Class Level Information<br />
Class Level Information<br />
Class Levels Values<br />
a 3 1 2 3<br />
b 2 1 2<br />
<strong>The</strong> “Dimensions” table in Output 56.3.3 reveals that X is 16 4 and Z is 16 8. Since there are<br />
no SUBJECT= effects, PROC <strong>MIXED</strong> considers the data to be effectively from one subject with 16<br />
observations.<br />
Output 56.3.3 Model Dimensions and Number of Observations<br />
Dimensions<br />
Covariance Parameters 3<br />
Columns in X 4<br />
Columns in Z 8<br />
Subjects 1<br />
Max Obs Per Subject 16<br />
Number of Observations<br />
Number of Observations Read 16<br />
Number of Observations Used 16<br />
Number of Observations Not Used 0<br />
Only a portion of the “Parameter Search” table is shown in Output 56.3.4 because the full listing<br />
has 651 rows.
4028 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.3.4 Selected Results of Parameter Search<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
-2 Res Log<br />
CovP1 CovP2 CovP3 Variance Res Log Like Like<br />
17.0000 0.3000 1.0000 80.1400 -52.4699 104.9399<br />
17.0000 0.3050 1.0000 80.0466 -52.4697 104.9393<br />
17.0000 0.3100 1.0000 79.9545 -52.4694 104.9388<br />
17.0000 0.3150 1.0000 79.8637 -52.4692 104.9384<br />
17.0000 0.3200 1.0000 79.7742 -52.4691 104.9381<br />
17.0000 0.3250 1.0000 79.6859 -52.4690 104.9379<br />
17.0000 0.3300 1.0000 79.5988 -52.4689 104.9378<br />
17.0000 0.3350 1.0000 79.5129 -52.4689 104.9377<br />
17.0000 0.3400 1.0000 79.4282 -52.4689 104.9377<br />
17.0000 0.3450 1.0000 79.3447 -52.4689 104.9378<br />
. . . . . .<br />
. . . . . .<br />
. . . . . .<br />
20.0000 0.3550 1.0000 78.2003 -52.4683 104.9366<br />
20.0000 0.3600 1.0000 78.1201 -52.4684 104.9368<br />
20.0000 0.3650 1.0000 78.0409 -52.4685 104.9370<br />
20.0000 0.3700 1.0000 77.9628 -52.4687 104.9373<br />
20.0000 0.3750 1.0000 77.8857 -52.4689 104.9377<br />
20.0000 0.3800 1.0000 77.8096 -52.4691 104.9382<br />
20.0000 0.3850 1.0000 77.7345 -52.4693 104.9387<br />
20.0000 0.3900 1.0000 77.6603 -52.4696 104.9392<br />
20.0000 0.3950 1.0000 77.5871 -52.4699 104.9399<br />
20.0000 0.4000 1.0000 77.5148 -52.4703 104.9406<br />
As Output 56.3.5 shows, convergence occurs quickly because PROC <strong>MIXED</strong> starts from the best<br />
value from the grid search.<br />
Output 56.3.5 Iteration History and Convergence Status<br />
Iteration History<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
1 2 104.93416367 0.00000000<br />
Convergence criteria met.<br />
<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.3.6 lists the variance components estimates.<br />
Note that B is much more variable than A*B.
Output 56.3.6 Estimated Covariance Parameters<br />
Covariance Parameter Estimates<br />
Example 56.3: Plotting the Likelihood ✦ 4029<br />
Standard Z<br />
Cov Parm Estimate Error Value Pr > Z<br />
b 1464.36 2098.01 0.70 0.2426<br />
a*b 26.9581 59.6570 0.45 0.3257<br />
Residual 78.8426 35.3512 2.23 0.0129<br />
<strong>The</strong> asymptotic covariance matrix in Output 56.3.7 also reflects the large variability of B relative to<br />
A*B.<br />
Output 56.3.7 Asymptotic Covariance Matrix of Covariance Parameters<br />
Asymptotic Covariance Matrix of Estimates<br />
Row Cov Parm CovP1 CovP2 CovP3<br />
1 b 4401640 1.2831 -273.32<br />
2 a*b 1.2831 3558.96 -502.84<br />
3 Residual -273.32 -502.84 1249.71<br />
As Output 56.3.8 shows, the PARMS likelihood ratio test (LRT) compares the best model from the<br />
grid search with the final fitted model. Since these models are nearly the same, the LRT is not<br />
significant.<br />
Output 56.3.8 Fit Statistics and Likelihood Ratio Test<br />
Fit Statistics<br />
-2 Res Log Likelihood 104.9<br />
AIC (smaller is better) 110.9<br />
AICC (smaller is better) 113.6<br />
BIC (smaller is better) 107.0<br />
PARMS Model Likelihood Ratio Test<br />
DF Chi-Square Pr > ChiSq<br />
2 0.00 1.0000<br />
<strong>The</strong> mixed model equations are analogous to the normal equations in the standard linear model.<br />
As Output 56.3.9 shows, for this example, rows 1–4 correspond to the fixed effects, rows 5–12<br />
correspond to the random effects, and Col13 corresponds to the dependent variable.
4030 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.3.9 Mixed Model Equations<br />
Mixed Model Equations<br />
Row Effect a b Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />
1 Intercept 0.2029 0.06342 0.07610 0.06342 0.1015 0.1015 0.03805<br />
2 a 1 0.06342 0.06342 0.03805 0.02537 0.03805<br />
3 a 2 0.07610 0.07610 0.03805 0.03805<br />
4 a 3 0.06342 0.06342 0.02537 0.03805<br />
5 b 1 0.1015 0.03805 0.03805 0.02537 0.1022 0.03805<br />
6 b 2 0.1015 0.02537 0.03805 0.03805 0.1022<br />
7 a*b 1 1 0.03805 0.03805 0.03805 0.07515<br />
8 a*b 1 2 0.02537 0.02537 0.02537<br />
9 a*b 2 1 0.03805 0.03805 0.03805<br />
10 a*b 2 2 0.03805 0.03805 0.03805<br />
11 a*b 3 1 0.02537 0.02537 0.02537<br />
12 a*b 3 2 0.03805 0.03805 0.03805<br />
Mixed Model Equations<br />
Row Col8 Col9 Col10 Col11 Col12 Col13<br />
1 0.02537 0.03805 0.03805 0.02537 0.03805 36.4143<br />
2 0.02537 13.8757<br />
3 0.03805 0.03805 12.7469<br />
4 0.02537 0.03805 9.7917<br />
5 0.03805 0.02537 21.2956<br />
6 0.02537 0.03805 0.03805 15.1187<br />
7 9.3477<br />
8 0.06246 4.5280<br />
9 0.07515 7.2676<br />
10 0.07515 5.4793<br />
11 0.06246 4.6802<br />
12 0.07515 5.1115<br />
<strong>The</strong> solution matrix in Output 56.3.10 results from sweeping all but the last row of the mixed model<br />
equations matrix. <strong>The</strong> final column contains a solution vector for the fixed and random effects. <strong>The</strong><br />
first four rows correspond to fixed effects and the last eight correspond to random effects.
Output 56.3.10 Solutions of the Mixed Model Equations<br />
Mixed Model Equations Solution<br />
Example 56.3: Plotting the Likelihood ✦ 4031<br />
Row Effect a b Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />
1 Intercept 761.84 -29.7718 -29.6578 -731.14 -733.22 -0.4680<br />
2 a 1 -29.7718 59.5436 29.7718 -2.0764 2.0764 -14.0239<br />
3 a 2 -29.6578 29.7718 56.2773 -1.0382 1.0382 0.4680<br />
4 a 3<br />
5 b 1 -731.14 -2.0764 -1.0382 741.63 722.73 -4.2598<br />
6 b 2 -733.22 2.0764 1.0382 722.73 741.63 4.2598<br />
7 a*b 1 1 -0.4680 -14.0239 0.4680 -4.2598 4.2598 22.8027<br />
8 a*b 1 2 0.4680 -12.9342 -0.4680 4.2598 -4.2598 4.1555<br />
9 a*b 2 1 -0.5257 1.0514 -12.9534 -4.7855 4.7855 2.1570<br />
10 a*b 2 2 0.5257 -1.0514 -14.0048 4.7855 -4.7855 -2.1570<br />
11 a*b 3 1 -12.4663 12.9342 12.4663 -4.2598 4.2598 1.9200<br />
12 a*b 3 2 -14.4918 14.0239 14.4918 4.2598 -4.2598 -1.9200<br />
Mixed Model Equations Solution<br />
Row Col8 Col9 Col10 Col11 Col12 Col13<br />
1 0.4680 -0.5257 0.5257 -12.4663 -14.4918 159.61<br />
2 -12.9342 1.0514 -1.0514 12.9342 14.0239 53.2049<br />
3 -0.4680 -12.9534 -14.0048 12.4663 14.4918 7.8856<br />
4<br />
5 4.2598 -4.7855 4.7855 -4.2598 4.2598 26.8837<br />
6 -4.2598 4.7855 -4.7855 4.2598 -4.2598 -26.8837<br />
7 4.1555 2.1570 -2.1570 1.9200 -1.9200 3.0198<br />
8 22.8027 -2.1570 2.1570 -1.9200 1.9200 -3.0198<br />
9 -2.1570 22.5560 4.4021 2.1570 -2.1570 -1.7134<br />
10 2.1570 4.4021 22.5560 -2.1570 2.1570 1.7134<br />
11 -1.9200 2.1570 -2.1570 22.8027 4.1555 -0.8115<br />
12 1.9200 -2.1570 2.1570 4.1555 22.8027 0.8115<br />
<strong>The</strong> A factor is significant at the 5% level (Output 56.3.11).<br />
Output 56.3.11 Tests of Fixed Effects<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
a 2 2 28.00 0.0345<br />
Output 56.3.12 shows that the significance of A appears to be from the difference between its first<br />
level and its other two levels.
4032 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.3.12 Least Squares Means for A Effect<br />
Least Squares Means<br />
Standard<br />
Effect a Estimate Error DF t Value Pr > |t|<br />
a 1 212.82 27.6014 2 7.71 0.0164<br />
a 2 167.50 27.5463 2 6.08 0.0260<br />
a 3 159.61 27.6014 2 5.78 0.0286<br />
Output 56.3.13 lists the predicted values from the model. <strong>The</strong>se values are the sum of the fixedeffects<br />
estimates and the empirical best linear unbiased predictors (EBLUPs) of the random effects.<br />
Output 56.3.13 Predicted Values<br />
StdErr<br />
Obs a b y Pred Pred DF Alpha Lower Upper Resid<br />
1 1 1 237 242.723 4.72563 10 0.05 232.193 253.252 -5.7228<br />
2 1 1 254 242.723 4.72563 10 0.05 232.193 253.252 11.2772<br />
3 1 1 246 242.723 4.72563 10 0.05 232.193 253.252 3.2772<br />
4 1 2 178 182.916 5.52589 10 0.05 170.603 195.228 -4.9159<br />
5 1 2 179 182.916 5.52589 10 0.05 170.603 195.228 -3.9159<br />
6 2 1 208 192.670 4.70076 10 0.05 182.196 203.144 15.3297<br />
7 2 1 178 192.670 4.70076 10 0.05 182.196 203.144 -14.6703<br />
8 2 1 187 192.670 4.70076 10 0.05 182.196 203.144 -5.6703<br />
9 2 2 146 142.330 4.70076 10 0.05 131.856 152.804 3.6703<br />
10 2 2 145 142.330 4.70076 10 0.05 131.856 152.804 2.6703<br />
11 2 2 141 142.330 4.70076 10 0.05 131.856 152.804 -1.3297<br />
12 3 1 186 185.687 5.52589 10 0.05 173.374 197.999 0.3134<br />
13 3 1 183 185.687 5.52589 10 0.05 173.374 197.999 -2.6866<br />
14 3 2 142 133.542 4.72563 10 0.05 123.013 144.072 8.4578<br />
15 3 2 125 133.542 4.72563 10 0.05 123.013 144.072 -8.5422<br />
16 3 2 136 133.542 4.72563 10 0.05 123.013 144.072 2.4578<br />
To plot the likelihood surface by using ODS Graphics, use the following statements:<br />
proc template;<br />
define statgraph surface;<br />
begingraph;<br />
layout overlay3d;<br />
surfaceplotparm x=CovP1 y=CovP2 z=ResLogLike;<br />
endlayout;<br />
endgraph;<br />
end;<br />
run;<br />
proc sgrender data=parms template=surface;<br />
run;<br />
<strong>The</strong> results from this plot are shown in Output 56.3.14. <strong>The</strong> peak of the surface is the REML<br />
estimates for the B and A*B variance components.
Output 56.3.14 Plot of Likelihood Surface<br />
Example 56.4: Known G and R<br />
Example 56.4: Known G and R ✦ 4033<br />
This animal breeding example from Henderson (1984, p. 48) considers multiple traits. <strong>The</strong> data<br />
are artificial and consist of measurements of two traits on three animals, but the second trait of the<br />
third animal is missing. Assuming an additive genetic model, you can use PROC <strong>MIXED</strong> to predict<br />
the breeding value of both traits on all three animals and also to predict the second trait of the third<br />
animal. <strong>The</strong> data are as follows:<br />
data h;<br />
input Trait Animal Y;<br />
datalines;<br />
1 1 6<br />
1 2 8<br />
1 3 7<br />
2 1 9<br />
2 2 5<br />
2 3 .<br />
;
4034 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Both G and R are known.<br />
2<br />
2 1 1 2 1 1<br />
3<br />
6<br />
1<br />
6<br />
G D 6 1<br />
6 2<br />
4 1<br />
2<br />
:5<br />
1<br />
2<br />
:5<br />
2<br />
1<br />
:5<br />
1<br />
1<br />
3<br />
1:5<br />
2<br />
:5<br />
1:5<br />
3<br />
:5 7<br />
2 7<br />
1:5 7<br />
:75 5<br />
1 :5 2 1:5 :75 3<br />
2<br />
6<br />
R D 6<br />
4<br />
4 0 0 1 0 0<br />
0 4 0 0 1 0<br />
0 0 4 0 0 1<br />
1 0 0 5 0 0<br />
0 1 0 0 5 0<br />
0 0 1 0 0 5<br />
3<br />
7<br />
5<br />
In order to read G into PROC <strong>MIXED</strong> by using the GDATA= option in the RANDOM statement,<br />
perform the following DATA step:<br />
data g;<br />
input Row Col1-Col6;<br />
datalines;<br />
1 2 1 1 2 1 1<br />
2 1 2 .5 1 2 .5<br />
3 1 .5 2 1 .5 2<br />
4 2 1 1 3 1.5 1.5<br />
5 1 2 .5 1.5 3 .75<br />
6 1 .5 2 1.5 .75 3<br />
;<br />
<strong>The</strong> preceding data are in the dense representation for a GDATA= data set. You can also construct<br />
a data set with the sparse representation by using Row, Col, and Value variables, although this would<br />
require 21 observations instead of 6 for this example.<br />
<strong>The</strong> PROC <strong>MIXED</strong> statements are as follows:<br />
proc mixed data=h mmeq mmeqsol;<br />
class Trait Animal;<br />
model Y = Trait / noint s outp=predicted;<br />
random Trait*Animal / type=un gdata=g g gi s;<br />
repeated / type=un sub=Animal r ri;<br />
parms (4) (1) (5) / noiter;<br />
run;<br />
proc print data=predicted;<br />
run;<br />
<strong>The</strong> MMEQ and MMEQSOL options request the mixed model equations and their solution. <strong>The</strong><br />
variables Trait and Animal are classification variables, and Trait defines the entire X matrix for the<br />
fixed-effects portion of the model, since the intercept is omitted with the NOINT option. <strong>The</strong> fixedeffects<br />
solution vector and predicted values are also requested by using the S and OUTP= options,<br />
respectively.
Example 56.4: Known G and R ✦ 4035<br />
<strong>The</strong> random effect Trait*Animal leads to a Z matrix with six columns, the first five corresponding to<br />
the identity matrix and the last consisting of 0s. An unstructured G matrix is specified by using the<br />
TYPE=UN option, and it is read into PROC <strong>MIXED</strong> from a <strong>SAS</strong> data set by using the GDATA=G<br />
specification. <strong>The</strong> G and GI options request the display of G and G 1 , respectively. <strong>The</strong> S option<br />
requests that the random-effects solution vector be displayed.<br />
Note that the preceding R matrix is block diagonal if the data are sorted by animals. <strong>The</strong><br />
REPEATED statement exploits this fact by requesting R to have unstructured 2 2 blocks corresponding<br />
to animals, which are the subjects. <strong>The</strong> R and RI options request that the estimated 2 2<br />
blocks for the first animal and its inverse be displayed. <strong>The</strong> PARMS statement lists the parameters<br />
of this 2 2 matrix. Note that the parameters from G are not specified in the PARMS statement<br />
because they have already been assigned by using the GDATA= option in the RANDOM statement.<br />
<strong>The</strong> NOITER option prevents PROC <strong>MIXED</strong> from computing residual (restricted) maximum likelihood<br />
estimates; instead, the known values are used for inferences.<br />
<strong>The</strong> results from this analysis are shown in Output 56.4.1–Output 56.4.12.<br />
<strong>The</strong> “Unstructured” covariance structure (Output 56.4.1) applies to both G and R here. <strong>The</strong> levels<br />
of Trait and Animal have been specified correctly.<br />
Output 56.4.1 Model and Class Level Information<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.H<br />
Dependent Variable Y<br />
Covariance Structure Unstructured<br />
Subject Effect Animal<br />
Estimation Method REML<br />
Residual Variance Method None<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
Class Level Information<br />
Class Levels Values<br />
Trait 2 1 2<br />
Animal 3 1 2 3<br />
<strong>The</strong> three covariance parameters indicated in Output 56.4.2 correspond to those from the R matrix.<br />
Those from G are considered fixed and known because of the GDATA= option.
4036 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.4.2 Model Dimensions and Number of Observations<br />
Dimensions<br />
Covariance Parameters 3<br />
Columns in X 2<br />
Columns in Z 6<br />
Subjects 1<br />
Max Obs Per Subject 6<br />
Number of Observations<br />
Number of Observations Read 6<br />
Number of Observations Used 5<br />
Number of Observations Not Used 1<br />
Because starting values for the covariance parameters are specified in the PARMS statement, the<br />
<strong>MIXED</strong> procedure prints the residual (restricted) log likelihood at the starting values. Because of<br />
the NOITER option in the PARMS statement, this is also the final log likelihood in this analysis<br />
(Output 56.4.3).<br />
Output 56.4.3 REML Log Likelihood<br />
Parameter Search<br />
CovP1 CovP2 CovP3 Res Log Like -2 Res Log Like<br />
4.0000 1.0000 5.0000 -7.3731 14.7463<br />
<strong>The</strong> block of R corresponding to the first animal and the inverse of this block are shown in<br />
Output 56.4.4.<br />
Output 56.4.4 Inverse R Matrix<br />
Estimated R Matrix<br />
for Animal 1<br />
Row Col1 Col2<br />
1 4.0000 1.0000<br />
2 1.0000 5.0000<br />
Estimated Inv(R) Matrix<br />
for Animal 1<br />
Row Col1 Col2<br />
1 0.2632 -0.05263<br />
2 -0.05263 0.2105
Example 56.4: Known G and R ✦ 4037<br />
<strong>The</strong> G matrix as specified in the GDATA= data set and its inverse are shown in Output 56.4.5 and<br />
Output 56.4.6.<br />
Output 56.4.5 G Matrix<br />
Estimated G Matrix<br />
Row Effect Trait Animal Col1 Col2 Col3 Col4<br />
1 Trait*Animal 1 1 2.0000 1.0000 1.0000 2.0000<br />
2 Trait*Animal 1 2 1.0000 2.0000 0.5000 1.0000<br />
3 Trait*Animal 1 3 1.0000 0.5000 2.0000 1.0000<br />
4 Trait*Animal 2 1 2.0000 1.0000 1.0000 3.0000<br />
5 Trait*Animal 2 2 1.0000 2.0000 0.5000 1.5000<br />
6 Trait*Animal 2 3 1.0000 0.5000 2.0000 1.5000<br />
Output 56.4.6 Inverse G Matrix<br />
Estimated G Matrix<br />
Row Col5 Col6<br />
1 1.0000 1.0000<br />
2 2.0000 0.5000<br />
3 0.5000 2.0000<br />
4 1.5000 1.5000<br />
5 3.0000 0.7500<br />
6 0.7500 3.0000<br />
Estimated Inv(G) Matrix<br />
Row Effect Trait Animal Col1 Col2 Col3 Col4<br />
1 Trait*Animal 1 1 2.5000 -1.0000 -1.0000 -1.6667<br />
2 Trait*Animal 1 2 -1.0000 2.0000 0.6667<br />
3 Trait*Animal 1 3 -1.0000 2.0000 0.6667<br />
4 Trait*Animal 2 1 -1.6667 0.6667 0.6667 1.6667<br />
5 Trait*Animal 2 2 0.6667 -1.3333 -0.6667<br />
6 Trait*Animal 2 3 0.6667 -1.3333 -0.6667<br />
Estimated Inv(G) Matrix<br />
Row Col5 Col6<br />
1 0.6667 0.6667<br />
2 -1.3333<br />
3 -1.3333<br />
4 -0.6667 -0.6667<br />
5 1.3333<br />
6 1.3333<br />
<strong>The</strong> table of covariance parameter estimates in Output 56.4.7 displays only the parameters in R.<br />
Because of the GDATA= option in the RANDOM statement, the G-side parameters do not partici-
4038 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
pate in the parameter estimation process. Because of the NOITER option in the PARMS statement,<br />
however, the R-side parameters in this output are identical to their starting values.<br />
Output 56.4.7 R-Side Covariance Parameters<br />
Covariance Parameter Estimates<br />
Cov Parm Subject Estimate<br />
UN(1,1) Animal 4.0000<br />
UN(2,1) Animal 1.0000<br />
UN(2,2) Animal 5.0000<br />
<strong>The</strong> coefficients of the mixed model equations in Output 56.4.8 agree with Henderson (1984, p. 55).<br />
Recall from Output 56.4.1 that there are 2 columns in X and 6 columns in Z. <strong>The</strong> first 8 columns<br />
of the mixed model equations correspond to the X and Z components. Column 9 represents the Y<br />
border.<br />
Output 56.4.8 Mixed Model Equations with Y Border<br />
Mixed Model Equations<br />
Row Effect Trait Animal Col1 Col2 Col3 Col4<br />
1 Trait 1 0.7763 -0.1053 0.2632 0.2632<br />
2 Trait 2 -0.1053 0.4211 -0.05263 -0.05263<br />
3 Trait*Animal 1 1 0.2632 -0.05263 2.7632 -1.0000<br />
4 Trait*Animal 1 2 0.2632 -0.05263 -1.0000 2.2632<br />
5 Trait*Animal 1 3 0.2500 -1.0000<br />
6 Trait*Animal 2 1 -0.05263 0.2105 -1.7193 0.6667<br />
7 Trait*Animal 2 2 -0.05263 0.2105 0.6667 -1.3860<br />
8 Trait*Animal 2 3 0.6667<br />
Mixed Model Equations<br />
Row Col5 Col6 Col7 Col8 Col9<br />
1 0.2500 -0.05263 -0.05263 4.6974<br />
2 0.2105 0.2105 2.2105<br />
3 -1.0000 -1.7193 0.6667 0.6667 1.1053<br />
4 0.6667 -1.3860 1.8421<br />
5 2.2500 0.6667 -1.3333 1.7500<br />
6 0.6667 1.8772 -0.6667 -0.6667 1.5789<br />
7 -0.6667 1.5439 0.6316<br />
8 -1.3333 -0.6667 1.3333<br />
<strong>The</strong> solution to the mixed model equations also matches that given by Henderson (1984, p. 55).<br />
After solving the augmented mixed model equations, you can find the solutions for fixed and random<br />
effects in the last column (Output 56.4.9).
Output 56.4.9 Solutions of the Mixed Model Equations with Y Border<br />
Mixed Model Equations Solution<br />
Example 56.4: Known G and R ✦ 4039<br />
Row Effect Trait Animal Col1 Col2 Col3 Col4<br />
1 Trait 1 2.5508 1.5685 -1.3047 -1.1775<br />
2 Trait 2 1.5685 4.5539 -1.4112 -1.3534<br />
3 Trait*Animal 1 1 -1.3047 -1.4112 1.8282 1.0652<br />
4 Trait*Animal 1 2 -1.1775 -1.3534 1.0652 1.7589<br />
5 Trait*Animal 1 3 -1.1701 -0.9410 1.0206 0.7085<br />
6 Trait*Animal 2 1 -1.3002 -2.1592 1.8010 1.0900<br />
7 Trait*Animal 2 2 -1.1821 -2.1055 1.0925 1.7341<br />
8 Trait*Animal 2 3 -1.1678 -1.3149 1.0070 0.7209<br />
Mixed Model Equations Solution<br />
Row Col5 Col6 Col7 Col8 Col9<br />
1 -1.1701 -1.3002 -1.1821 -1.1678 6.9909<br />
2 -0.9410 -2.1592 -2.1055 -1.3149 6.9959<br />
3 1.0206 1.8010 1.0925 1.0070 0.05450<br />
4 0.7085 1.0900 1.7341 0.7209 -0.04955<br />
5 1.7812 1.0095 0.7197 1.7756 0.02230<br />
6 1.0095 2.7518 1.6392 1.4849 0.2651<br />
7 0.7197 1.6392 2.6874 0.9930 -0.2601<br />
8 1.7756 1.4849 0.9930 2.7645 0.1276<br />
<strong>The</strong> solutions for the fixed and random effects in Output 56.4.10 correspond to the last column in<br />
Output 56.4.9. Note that the standard errors for the fixed effects and the prediction standard errors<br />
for the random effects are the square root values of the diagonal entries in the solution of the mixed<br />
model equations (Output 56.4.9).<br />
Output 56.4.10 Solutions for Fixed and Random Effects<br />
Solution for Fixed Effects<br />
Standard<br />
Effect Trait Estimate Error DF t Value Pr > |t|<br />
Trait 1 6.9909 1.5971 3 4.38 0.0221<br />
Trait 2 6.9959 2.1340 3 3.28 0.0465<br />
Solution for Random Effects<br />
Std Err<br />
Effect Trait Animal Estimate Pred DF t Value Pr > |t|<br />
Trait*Animal 1 1 0.05450 1.3521 0 0.04 .<br />
Trait*Animal 1 2 -0.04955 1.3262 0 -0.04 .<br />
Trait*Animal 1 3 0.02230 1.3346 0 0.02 .<br />
Trait*Animal 2 1 0.2651 1.6589 0 0.16 .<br />
Trait*Animal 2 2 -0.2601 1.6393 0 -0.16 .<br />
Trait*Animal 2 3 0.1276 1.6627 0 0.08 .
4040 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> estimates for the two traits are nearly identical, but the standard error of the second trait is<br />
larger because of the missing observation.<br />
<strong>The</strong> Estimate column in the “Solution for Random Effects” table lists the best linear unbiased predictions<br />
(BLUPs) of the breeding values of both traits for all three animals. <strong>The</strong> p-values are missing<br />
because the default containment method for computing degrees of freedom results in zero degrees<br />
of freedom for the random effects parameter tests.<br />
Output 56.4.11 Significance Test Comparing Traits<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Trait 2 3 10.59 0.0437<br />
<strong>The</strong> two estimated traits are significantly different from zero at the 5% level (Output 56.4.11).<br />
Output 56.4.12 displays the predicted values of the observations based on the trait and breeding<br />
value estimates—that is, the fixed and random effects.<br />
Output 56.4.12 Predicted Observations<br />
StdErr<br />
Obs Trait Animal Y Pred Pred DF Alpha Lower Upper Resid<br />
1 1 1 6 7.04542 1.33027 0 0.05 . . -1.04542<br />
2 1 2 8 6.94137 1.39806 0 0.05 . . 1.05863<br />
3 1 3 7 7.01321 1.41129 0 0.05 . . -0.01321<br />
4 2 1 9 7.26094 1.72839 0 0.05 . . 1.73906<br />
5 2 2 5 6.73576 1.74077 0 0.05 . . -1.73576<br />
6 2 3 . 7.12015 2.99088 0 0.05 . . .<br />
<strong>The</strong> predicted values are not the predictions of future records in the sense that they do not contain<br />
a component corresponding to a new observational error. See Henderson (1984) for information<br />
about predicting future records. <strong>The</strong> Lower and Upper columns usually contain confidence limits<br />
for the predicted values; they are missing here because the random-effects parameter degrees of<br />
freedom equals 0.
Example 56.5: Random Coefficients<br />
Example 56.5: Random Coefficients ✦ 4041<br />
This example comes from a pharmaceutical stability data simulation performed by Obenchain<br />
(1990). <strong>The</strong> observed responses are replicate assay results, expressed in percent of label claim,<br />
at various shelf ages, expressed in months. <strong>The</strong> desired mixed model involves three batches of<br />
product that differ randomly in intercept (initial potency) and slope (degradation rate). This type<br />
of model is also known as a hierarchical or multilevel model (Singer 1998; Sullivan, Dukes, and<br />
Losina 1999).<br />
<strong>The</strong> <strong>SAS</strong> statements are as follows:<br />
data rc;<br />
input Batch Month @@;<br />
Monthc = Month;<br />
do i = 1 to 6;<br />
input Y @@;<br />
output;<br />
end;<br />
datalines;<br />
1 0 101.2 103.3 103.3 102.1 104.4 102.4<br />
1 1 98.8 99.4 99.7 99.5 . .<br />
1 3 98.4 99.0 97.3 99.8 . .<br />
1 6 101.5 100.2 101.7 102.7 . .<br />
1 9 96.3 97.2 97.2 96.3 . .<br />
1 12 97.3 97.9 96.8 97.7 97.7 96.7<br />
2 0 102.6 102.7 102.4 102.1 102.9 102.6<br />
2 1 99.1 99.0 99.9 100.6 . .<br />
2 3 105.7 103.3 103.4 104.0 . .<br />
2 6 101.3 101.5 100.9 101.4 . .<br />
2 9 94.1 96.5 97.2 95.6 . .<br />
2 12 93.1 92.8 95.4 92.2 92.2 93.0<br />
3 0 105.1 103.9 106.1 104.1 103.7 104.6<br />
3 1 102.2 102.0 100.8 99.8 . .<br />
3 3 101.2 101.8 100.8 102.6 . .<br />
3 6 101.1 102.0 100.1 100.2 . .<br />
3 9 100.9 99.5 102.2 100.8 . .<br />
3 12 97.8 98.3 96.9 98.4 96.9 96.5<br />
;<br />
proc mixed data=rc;<br />
class Batch;<br />
model Y = Month / s;<br />
random Int Month / type=un sub=Batch s;<br />
run;<br />
In the DATA step, Monthc is created as a duplicate of Month in order to enable both a continuous and<br />
a classification version of the same variable. <strong>The</strong> variable Monthc is used in a subsequent analysis<br />
In the PROC <strong>MIXED</strong> statements, Batch is listed as the only classification variable. <strong>The</strong> fixed effect<br />
Month in the MODEL statement is not declared as a classification variable; thus it models a linear<br />
trend in time. An intercept is included as a fixed effect by default, and the S option requests that the
4042 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
fixed-effects parameter estimates be produced.<br />
<strong>The</strong> two random effects are Int and Month, modeling random intercepts and slopes, respectively.<br />
Note that Intercept and Month are used as both fixed and random effects. <strong>The</strong> TYPE=UN option in<br />
the RANDOM statement specifies an unstructured covariance matrix for the random intercept and<br />
slope effects. In mixed model notation, G is block diagonal with unstructured 2 2 blocks. Each<br />
block corresponds to a different level of Batch, which is the SUBJECT= effect. <strong>The</strong> unstructured<br />
type provides a mechanism for estimating the correlation between the random coefficients. <strong>The</strong> S<br />
option requests the production of the random-effects parameter estimates.<br />
<strong>The</strong> results from this analysis are shown in Output 56.5.1–Output 56.5.9. <strong>The</strong> “Unstructured” covariance<br />
structure in Output 56.5.1 applies to G here.<br />
Output 56.5.1 Model Information in Random Coefficients Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.RC<br />
Dependent Variable Y<br />
Covariance Structure Unstructured<br />
Subject Effect Batch<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
Batch is the only classification variable in this analysis, and it has three levels (Output 56.5.2).<br />
Output 56.5.2 Random Coefficients Analysis (continued)<br />
Class Level Information<br />
Class Levels Values<br />
Batch 3 1 2 3<br />
<strong>The</strong> “Dimensions” table in Output 56.5.3 indicates that there are three subjects (corresponding to<br />
batches). <strong>The</strong> 24 observations not used correspond to the missing values of Y in the input data set.<br />
Output 56.5.3 Random Coefficients Analysis (continued)<br />
Dimensions<br />
Covariance Parameters 4<br />
Columns in X 2<br />
Columns in Z Per Subject 2<br />
Subjects 3<br />
Max Obs Per Subject 36
Output 56.5.3 continued<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 84<br />
Number of Observations Not Used 24<br />
As Output 56.5.4 shows, only one iteration is required for convergence.<br />
Output 56.5.4 Random Coefficients Analysis (continued)<br />
Iteration History<br />
Example 56.5: Random Coefficients ✦ 4043<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
0 1 367.02768461<br />
1 1 350.32813577 0.00000000<br />
Convergence criteria met.<br />
<strong>The</strong> Estimate column in Output 56.5.5 lists the estimated elements of the unstructured 2 2 matrix<br />
comprising the blocks of G. Note that the random coefficients are negatively correlated.<br />
Output 56.5.5 Random Coefficients Analysis (continued)<br />
Covariance Parameter Estimates<br />
Cov Parm Subject Estimate<br />
UN(1,1) Batch 0.9768<br />
UN(2,1) Batch -0.1045<br />
UN(2,2) Batch 0.03717<br />
Residual 3.2932<br />
<strong>The</strong> null model likelihood ratio test indicates a significant improvement over the null model consisting<br />
of no random effects and a homogeneous residual error (Output 56.5.6).<br />
Output 56.5.6 Random Coefficients Analysis (continued)<br />
Fit Statistics<br />
-2 Res Log Likelihood 350.3<br />
AIC (smaller is better) 358.3<br />
AICC (smaller is better) 358.8<br />
BIC (smaller is better) 354.7
4044 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.5.6 continued<br />
Null Model Likelihood Ratio Test<br />
DF Chi-Square Pr > ChiSq<br />
3 16.70 0.0008<br />
<strong>The</strong> fixed-effects estimates represent the estimated means for the random intercept and slope, respectively<br />
(Output 56.5.7).<br />
Output 56.5.7 Random Coefficients Analysis (continued)<br />
Solution for Fixed Effects<br />
Standard<br />
Effect Estimate Error DF t Value Pr > |t|<br />
Intercept 102.70 0.6456 2 159.08 |t|<br />
Intercept 1 -1.0010 0.6842 78 -1.46 0.1474<br />
Month 1 0.1287 0.1245 78 1.03 0.3047<br />
Intercept 2 0.3934 0.6842 78 0.58 0.5669<br />
Month 2 -0.2060 0.1245 78 -1.65 0.1021<br />
Intercept 3 0.6076 0.6842 78 0.89 0.3772<br />
Month 3 0.07731 0.1245 78 0.62 0.5365<br />
<strong>The</strong> F statistic in the “Type 3 Tests of Fixed Effects” table in Output 56.5.9 is the square of the<br />
t statistic used in the test of Month in the preceding “Solution for Fixed Effects” table (compare<br />
Output 56.5.7 and Output 56.5.9). Both statistics test the null hypothesis that the slope assigned to<br />
Month equals 0, and this hypothesis can barely be rejected at the 5% level.
Output 56.5.9 Random Coefficients Analysis (continued)<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Month 1 2 19.41 0.0478<br />
Example 56.5: Random Coefficients ✦ 4045<br />
It is also possible to fit a random coefficients model with error terms that follow a nested structure<br />
(Fuller and Battese 1973). <strong>The</strong> following <strong>SAS</strong> statements represent one way of doing this:<br />
proc mixed data=rc;<br />
class Batch Monthc;<br />
model Y = Month / s;<br />
random Int Month Monthc / sub=Batch s;<br />
run;<br />
<strong>The</strong> variable Monthc is added to the CLASS and RANDOM statements, and it models the nested<br />
errors. Note that Month and Monthc are continuous and classification versions of the same variable.<br />
Also, the TYPE=UN option is dropped from the RANDOM statement, resulting in the default<br />
variance components model instead of correlated random coefficients. <strong>The</strong> results from this analysis<br />
are shown in Output 56.5.10.<br />
Output 56.5.10 Random Coefficients with Nested Errors Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.RC<br />
Dependent Variable Y<br />
Covariance Structure Variance Components<br />
Subject Effect Batch<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
Class Level Information<br />
Class Levels Values<br />
Batch 3 1 2 3<br />
Monthc 6 0 1 3 6 9 12<br />
Dimensions<br />
Covariance Parameters 4<br />
Columns in X 2<br />
Columns in Z Per Subject 8<br />
Subjects 3<br />
Max Obs Per Subject 36
4046 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.5.10 continued<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 84<br />
Number of Observations Not Used 24<br />
Iteration History<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
0 1 367.02768461<br />
1 4 277.51945360 .<br />
2 1 276.97551718 0.00104208<br />
3 1 276.90304909 0.00003174<br />
4 1 276.90100316 0.00000004<br />
5 1 276.90100092 0.00000000<br />
Convergence criteria met.<br />
Covariance Parameter Estimates<br />
Cov Parm Subject Estimate<br />
Intercept Batch 0<br />
Month Batch 0.01243<br />
Monthc Batch 3.7411<br />
Residual 0.7969<br />
For this analysis, the Newton-Raphson algorithm requires five iterations and nine likelihood evaluations<br />
to achieve convergence. <strong>The</strong> missing value in the Criterion column in iteration 1 indicates<br />
that a boundary constraint has been dropped.<br />
<strong>The</strong> estimate for the Intercept variance component equals 0. This occurs frequently in practice and<br />
indicates that the restricted likelihood is maximized by setting this variance component equal to 0.<br />
Whenever a zero variance component estimate occurs, the following note appears in the <strong>SAS</strong> log:<br />
NOTE: Estimated G matrix is not positive definite.<br />
<strong>The</strong> remaining variance component estimates are positive, and the estimate corresponding to the<br />
nested errors (MONTHC) is much larger than the other two.<br />
A comparison of AIC and BIC for this model with those of the previous model favors the nested<br />
error model (compare Output 56.5.11 and Output 56.5.6). Strictly speaking, a likelihood ratio test<br />
cannot be carried out between the two models because one is not contained in the other; however, a<br />
cautious comparison of likelihoods can be informative.
Example 56.5: Random Coefficients ✦ 4047<br />
Output 56.5.11 Random Coefficients with Nested Errors Analysis (continued)<br />
Fit Statistics<br />
-2 Res Log Likelihood 276.9<br />
AIC (smaller is better) 282.9<br />
AICC (smaller is better) 283.2<br />
BIC (smaller is better) 280.2<br />
<strong>The</strong> better-fitting covariance model affects the standard errors of the fixed-effects parameter estimates<br />
more than the estimates themselves (Output 56.5.12).<br />
Output 56.5.12 Random Coefficients with Nested Errors Analysis (continued)<br />
Solution for Fixed Effects<br />
Standard<br />
Effect Estimate Error DF t Value Pr > |t|<br />
Intercept 102.56 0.7287 2 140.74
4048 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.5.13 Random Coefficients with Nested Errors Analysis (continued)<br />
Solution for Random Effects<br />
Std Err<br />
Effect Batch Monthc Estimate Pred DF t Value Pr > |t|<br />
Intercept 1 0 . . . .<br />
Month 1 -0.00028 0.09268 66 -0.00 0.9976<br />
Monthc 1 0 0.2191 0.7896 66 0.28 0.7823<br />
Monthc 1 1 -2.5690 0.7571 66 -3.39 0.0012<br />
Monthc 1 3 -2.3067 0.6865 66 -3.36 0.0013<br />
Monthc 1 6 1.8726 0.7328 66 2.56 0.0129<br />
Monthc 1 9 -1.2350 0.9300 66 -1.33 0.1888<br />
Monthc 1 12 0.7736 1.1992 66 0.65 0.5211<br />
Intercept 2 0 . . . .<br />
Month 2 -0.07571 0.09268 66 -0.82 0.4169<br />
Monthc 2 0 -0.00621 0.7896 66 -0.01 0.9938<br />
Monthc 2 1 -2.2126 0.7571 66 -2.92 0.0048<br />
Monthc 2 3 3.1063 0.6865 66 4.53 F<br />
Month 1 2 15.78 0.0579<br />
<strong>The</strong> test of Month is similar to that from the previous model, although it is no longer significant at<br />
the 5% level (Output 56.5.14).
Example 56.6: Line-Source Sprinkler Irrigation<br />
Example 56.6: Line-Source Sprinkler Irrigation ✦ 4049<br />
<strong>The</strong>se data appear in Hanks et al. (1980), Johnson, Chaudhuri, and Kanemasu (1983), and Stroup<br />
(1989b). Three cultivars (Cult) of winter wheat are randomly assigned to rectangular plots within<br />
each of three blocks (Block). <strong>The</strong> nine plots are located side by side, and a line-source sprinkler is<br />
placed through the middle. Each plot is subdivided into twelve subplots—six to the north of the<br />
line source, six to the south (Dir). <strong>The</strong> two plots closest to the line source represent the maximum<br />
irrigation level (Irrig=6), the two next-closest plots represent the next-highest level (Irrig=5), and so<br />
forth.<br />
This example is a case where both G and R can be modeled. One of Stroup’s models specifies a<br />
diagonal G containing the variance components for Block, Block*Dir, and Block*Irrig, and a Toeplitz<br />
R with four bands. <strong>The</strong> <strong>SAS</strong> statements to fit this model and carry out some further analyses follow.<br />
CAUTION: This analysis can require considerable CPU time.<br />
data line;<br />
length Cult$ 8;<br />
input Block Cult$ @;<br />
row = _n_;<br />
do Sbplt=1 to 12;<br />
if Sbplt le 6 then do;<br />
Irrig = Sbplt;<br />
Dir = ’North’;<br />
end; else do;<br />
Irrig = 13 - Sbplt;<br />
Dir = ’South’;<br />
end;<br />
input Y @; output;<br />
end;<br />
datalines;<br />
1 Luke 2.4 2.7 5.6 7.5 7.9 7.1 6.1 7.3 7.4 6.7 3.8 1.8<br />
1 Nugaines 2.2 2.2 4.3 6.3 7.9 7.1 6.2 5.3 5.3 5.2 5.4 2.9<br />
1 Bridger 2.9 3.2 5.1 6.9 6.1 7.5 5.6 6.5 6.6 5.3 4.1 3.1<br />
2 Nugaines 2.4 2.2 4.0 5.8 6.1 6.2 7.0 6.4 6.7 6.4 3.7 2.2<br />
2 Bridger 2.6 3.1 5.7 6.4 7.7 6.8 6.3 6.2 6.6 6.5 4.2 2.7<br />
2 Luke 2.2 2.7 4.3 6.9 6.8 8.0 6.5 7.3 5.9 6.6 3.0 2.0<br />
3 Nugaines 1.8 1.9 3.7 4.9 5.4 5.1 5.7 5.0 5.6 5.1 4.2 2.2<br />
3 Luke 2.1 2.3 3.7 5.8 6.3 6.3 6.5 5.7 5.8 4.5 2.7 2.3<br />
3 Bridger 2.7 2.8 4.0 5.0 5.2 5.2 5.9 6.1 6.0 4.3 3.1 3.1<br />
;<br />
proc mixed;<br />
class Block Cult Dir Irrig;<br />
model Y = Cult|Dir|Irrig@2;<br />
random Block Block*Dir Block*Irrig;<br />
repeated / type=toep(4) sub=Block*Cult r;<br />
lsmeans Cult|Irrig;<br />
estimate ’Bridger vs Luke’ Cult 1 -1 0;<br />
estimate ’Linear Irrig’ Irrig -5 -3 -1 1 3 5;<br />
estimate ’B vs L x Linear Irrig’ Cult*Irrig<br />
-5 -3 -1 1 3 5 5 3 1 -1 -3 -5;<br />
run;
4050 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> preceding statements use the bar operator ( | ) and the at sign (@) to specify all two-factor<br />
interactions between Cult, Dir, and Irrig as fixed effects.<br />
<strong>The</strong> RANDOM statement sets up the Z and G matrices corresponding to the random effects Block,<br />
Block*Dir, and Block*Irrig.<br />
In the REPEATED statement, the TYPE=TOEP(4) option sets up the blocks of the R matrix to be<br />
Toeplitz with four bands below and including the main diagonal. <strong>The</strong> subject effect is Block*Cult,<br />
and it produces nine 12 12 blocks. <strong>The</strong> R option requests that the first block of R be displayed.<br />
Least squares means (LSMEANS) are requested for Cult, Irrig, and Cult*Irrig, and a few ESTIMATE<br />
statements are specified to illustrate some linear combinations of the fixed effects.<br />
<strong>The</strong> results from this analysis are shown in Output 56.6.1.<br />
<strong>The</strong> “Covariance Structures” row in Output 56.6.1 reveals the two different structures assumed for<br />
G and R.<br />
Output 56.6.1 Model Information in Line-Source Sprinkler Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.LINE<br />
Dependent Variable Y<br />
Covariance Structures Variance Components,<br />
Toeplitz<br />
Subject Effect Block*Cult<br />
Estimation Method REML<br />
Residual Variance Method Profile<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Containment<br />
<strong>The</strong> levels of each classification variable are listed as a single string in the Values column, regardless<br />
of whether the levels are numeric or character (Output 56.6.2).<br />
Output 56.6.2 Class Level Information<br />
Class Level Information<br />
Class Levels Values<br />
Block 3 1 2 3<br />
Cult 3 Bridger Luke Nugaines<br />
Dir 2 North South<br />
Irrig 6 1 2 3 4 5 6<br />
Even though there is a SUBJECT= effect in the REPEATED statement, the analysis considers all<br />
of the data to be from one subject because there is no corresponding SUBJECT= effect in the<br />
RANDOM statement (Output 56.6.3).
Output 56.6.3 Model Dimensions and Number of Observations<br />
Example 56.6: Line-Source Sprinkler Irrigation ✦ 4051<br />
Dimensions<br />
Covariance Parameters 7<br />
Columns in X 48<br />
Columns in Z 27<br />
Subjects 1<br />
Max Obs Per Subject 108<br />
Number of Observations<br />
Number of Observations Read 108<br />
Number of Observations Used 108<br />
Number of Observations Not Used 0<br />
<strong>The</strong> Newton-Raphson algorithm converges successfully in seven iterations (Output 56.6.4).<br />
Output 56.6.4 Iteration History and Convergence Status<br />
Iteration History<br />
Iteration Evaluations -2 Res Log Like Criterion<br />
0 1 226.25427252<br />
1 4 187.99336173 .<br />
2 3 186.62579299 0.10431081<br />
3 1 184.38218213 0.04807260<br />
4 1 183.41836853 0.00886548<br />
5 1 183.25111475 0.00075353<br />
6 1 183.23809997 0.00000748<br />
7 1 183.23797748 0.00000000<br />
Convergence criteria met.<br />
<strong>The</strong> first block of the estimated R matrix has the TOEP(4) structure, and the observations that are<br />
three plots apart exhibit a negative correlation (Output 56.6.5).
4052 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.6.5 Estimated R Matrix for the First Subject<br />
Estimated R Matrix for Block*Cult 1 Bridger<br />
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />
1 0.2850 0.007986 0.001452 -0.09253<br />
2 0.007986 0.2850 0.007986 0.001452 -0.09253<br />
3 0.001452 0.007986 0.2850 0.007986 0.001452 -0.09253<br />
4 -0.09253 0.001452 0.007986 0.2850 0.007986 0.001452 -0.09253<br />
5 -0.09253 0.001452 0.007986 0.2850 0.007986 0.001452<br />
6 -0.09253 0.001452 0.007986 0.2850 0.007986<br />
7 -0.09253 0.001452 0.007986 0.2850<br />
8 -0.09253 0.001452 0.007986<br />
9 -0.09253 0.001452<br />
10 -0.09253<br />
11<br />
12<br />
Estimated R Matrix for Block*Cult 1 Bridger<br />
Row Col8 Col9 Col10 Col11 Col12<br />
1<br />
2<br />
3<br />
4<br />
5 -0.09253<br />
6 0.001452 -0.09253<br />
7 0.007986 0.001452 -0.09253<br />
8 0.2850 0.007986 0.001452 -0.09253<br />
9 0.007986 0.2850 0.007986 0.001452 -0.09253<br />
10 0.001452 0.007986 0.2850 0.007986 0.001452<br />
11 -0.09253 0.001452 0.007986 0.2850 0.007986<br />
12 -0.09253 0.001452 0.007986 0.2850<br />
Output 56.6.6 lists the estimated covariance parameters from both G and R. <strong>The</strong> first three are the<br />
variance components making up the diagonal G, and the final four make up the Toeplitz structure<br />
in the blocks of R. <strong>The</strong> Residual row corresponds to the variance of the Toeplitz structure, and it<br />
represents the parameter profiled out during the optimization process.<br />
Output 56.6.6 Estimated Covariance Parameters<br />
Covariance Parameter Estimates<br />
Cov Parm Subject Estimate<br />
Block 0.2194<br />
Block*Dir 0.01768<br />
Block*Irrig 0.03539<br />
TOEP(2) Block*Cult 0.007986<br />
TOEP(3) Block*Cult 0.001452<br />
TOEP(4) Block*Cult -0.09253<br />
Residual 0.2850
Example 56.6: Line-Source Sprinkler Irrigation ✦ 4053<br />
<strong>The</strong> “ 2 Res Log Likelihood” value in Output 56.6.7 is the same as the final value listed in the<br />
“Iteration History” table (Output 56.6.4).<br />
Output 56.6.7 Fit Statistics Based on the Residual Log Likelihood<br />
Fit Statistics<br />
-2 Res Log Likelihood 183.2<br />
AIC (smaller is better) 197.2<br />
AICC (smaller is better) 198.8<br />
BIC (smaller is better) 190.9<br />
Every fixed effect except for Dir and Cult*Irrig is significant at the 5% level (Output 56.6.8).<br />
Output 56.6.8 Tests for Fixed Effects<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
Cult 2 68 7.98 0.0008<br />
Dir 1 2 3.95 0.1852<br />
Cult*Dir 2 68 3.44 0.0379<br />
Irrig 5 10 102.60
4054 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.6.10 Least Squares Means for Cult, Irrig, and <strong>The</strong>ir Interaction<br />
Least Squares Means<br />
Standard<br />
Effect Cult Irrig Estimate Error DF t Value Pr > |t|<br />
Cult Bridger 5.0306 0.2874 68 17.51
Example 56.7: Influence in Heterogeneous Variance Model ✦ 4055<br />
Example 56.7: Influence in Heterogeneous Variance Model<br />
In this example from Snedecor and Cochran (1976, p. 256), a one-way classification model with<br />
heterogeneous variances is fit. <strong>The</strong> data, shown in the following DATA step, represent amounts of<br />
different types of fat absorbed by batches of doughnuts during cooking, measured in grams.<br />
data absorb;<br />
input FatType Absorbed @@;<br />
datalines;<br />
1 164 1 172 1 168 1 177 1 156 1 195<br />
2 178 2 191 2 197 2 182 2 185 2 177<br />
3 175 3 193 3 178 3 171 3 163 3 176<br />
4 155 4 166 4 149 4 164 4 170 4 168<br />
;<br />
<strong>The</strong> statistical model for these data can be written as<br />
Yij D C i C ij<br />
i D 1; ; t D 4<br />
j D 1; ; r D 6<br />
ij D N.0; 2 i /<br />
where Yij is the amount of fat absorbed by the j th batch of the ith fat type, and i denotes the<br />
fat-type effects. A quick glance at the data suggests that observations 6, 9, 14, and 21 might be<br />
influential on the analysis, because these are extreme observations for the respective fat types.<br />
<strong>The</strong> following <strong>SAS</strong> statements fit this model and request influence diagnostics for the fixed effects<br />
and covariance parameters. <strong>The</strong> ODS GRAPHICS statement requests plots of the influence diagnostics<br />
in addition to the tabular output. <strong>The</strong> ESTIMATES suboption requests plots of “leave-one-out”<br />
estimates for the fixed effects and group variances.<br />
ods graphics on;<br />
proc mixed data=absorb asycov;<br />
class FatType;<br />
model Absorbed = FatType / s<br />
influence(iter=10 estimates);<br />
repeated / group=FatType;<br />
ods output Influence=inf;<br />
run;<br />
ods graphics off;<br />
<strong>The</strong> “Influence” table is output to the <strong>SAS</strong> data set inf so that parameter estimates can be printed<br />
subsequently. Results from this analysis are shown in Output 56.7.1.
4056 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.7.1 Heterogeneous Variance Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Model Information<br />
Data Set WORK.ABSORB<br />
Dependent Variable Absorbed<br />
Covariance Structure Variance Components<br />
Group Effect FatType<br />
Estimation Method REML<br />
Residual Variance Method None<br />
Fixed Effects SE Method Model-Based<br />
Degrees of Freedom Method Between-Within<br />
Covariance Parameter Estimates<br />
Cov Parm Group Estimate<br />
Residual FatType 1 178.00<br />
Residual FatType 2 60.4000<br />
Residual FatType 3 97.6000<br />
Residual FatType 4 67.6000<br />
Solution for Fixed Effects<br />
Fat Standard<br />
Effect Type Estimate Error DF t Value Pr > |t|<br />
Intercept 162.00 3.3566 20 48.26
Example 56.7: Influence in Heterogeneous Variance Model ✦ 4057<br />
Output 56.7.2 Asymptotic Variances of Group Variance Estimates<br />
Asymptotic Covariance Matrix of Estimates<br />
Row Cov Parm CovP1 CovP2 CovP3 CovP4<br />
1 Residual 12674<br />
2 Residual 145<strong>9.2</strong>6<br />
3 Residual 3810.30<br />
4 Residual 1827.90<br />
In groups where the residual variance estimate is large, the precision of the estimate is also small<br />
(Output 56.7.2).<br />
<strong>The</strong> following statements print the “leave-one-out” estimates for fixed effects and covariance parameters<br />
that were written to the inf data set with the ESTIMATES suboption (Output 56.7.3):<br />
proc print data=inf label;<br />
var parm1-parm5 covp1-covp4;<br />
run;<br />
Output 56.7.3 Leave-One-Out Estimates<br />
Residual Residual Residual Residual<br />
Fat Fat Fat Fat FatType FatType FatType FatType<br />
Obs Intercept Type 1 Type 2 Type 3 Type 4 1 2 3 4<br />
1 162.00 11.600 23.000 14.000 0 203.30 60.400 97.60 67.600<br />
2 162.00 10.000 23.000 14.000 0 222.47 60.400 97.60 67.600<br />
3 162.00 10.800 23.000 14.000 0 217.68 60.400 97.60 67.600<br />
4 162.00 9.000 23.000 14.000 0 214.99 60.400 97.60 67.600<br />
5 162.00 13.200 23.000 14.000 0 145.70 60.400 97.60 67.600<br />
6 162.00 5.400 23.000 14.000 0 63.80 60.400 97.60 67.600<br />
7 162.00 10.000 24.400 14.000 0 178.00 60.795 97.60 67.600<br />
8 162.00 10.000 21.800 14.000 0 178.00 64.691 97.60 67.600<br />
9 162.00 10.000 20.600 14.000 0 178.00 32.296 97.60 67.600<br />
10 162.00 10.000 23.600 14.000 0 178.00 72.797 97.60 67.600<br />
11 162.00 10.000 23.000 14.000 0 178.00 75.490 97.60 67.600<br />
12 162.00 10.000 24.600 14.000 0 178.00 56.285 97.60 67.600<br />
13 162.00 10.000 23.000 14.200 0 178.00 60.400 121.68 67.600<br />
14 162.00 10.000 23.000 10.600 0 178.00 60.400 35.30 67.600<br />
15 162.00 10.000 23.000 13.600 0 178.00 60.400 120.79 67.600<br />
16 162.00 10.000 23.000 15.000 0 178.00 60.400 114.50 67.600<br />
17 162.00 10.000 23.000 16.600 0 178.00 60.400 71.30 67.600<br />
18 162.00 10.000 23.000 14.000 0 178.00 60.400 121.98 67.600<br />
19 163.40 8.600 21.600 12.600 0 178.00 60.400 97.60 69.799<br />
20 161.20 10.800 23.800 14.800 0 178.00 60.400 97.60 79.698<br />
21 164.60 7.400 20.400 11.400 0 178.00 60.400 97.60 33.800<br />
22 161.60 10.400 23.400 14.400 0 178.00 60.400 97.60 83.292<br />
23 160.40 11.600 24.600 15.600 0 178.00 60.400 97.60 65.299<br />
24 160.80 11.200 24.200 15.200 0 178.00 60.400 97.60 73.677
4058 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
<strong>The</strong> graphical displays in Output 56.7.4 and Output 56.7.5 are requested by specifying the ODS<br />
GRAPHICS statement. For general information about ODS Graphics, see Chapter 21, “Statistical<br />
Graphics Using ODS.” For specific information about the graphics available in the <strong>MIXED</strong> procedure,<br />
see the section “ODS Graphics” on page 3998.<br />
Output 56.7.4 Fixed-Effects Deletion Estimates
Output 56.7.5 Covariance Parameter Deletion Estimates<br />
Example 56.7: Influence in Heterogeneous Variance Model ✦ 4059<br />
<strong>The</strong> estimate of the intercept is affected only when observations from the last group are removed.<br />
<strong>The</strong> estimate of the “FatType 1” effect reacts to removal of observations in the first and last group<br />
(Output 56.7.4).<br />
While observations can affect one or more fixed-effects solutions in this model, they can affect only<br />
one covariance parameter, the variance in their group (Output 56.7.5). Observations 6, 9, 14, and<br />
21, which are extreme in their group, reduce the group variance considerably.<br />
Diagnostics related to residuals and predicted values are printed with the following statements:<br />
proc print data=inf label;<br />
var observed predicted residual pressres<br />
student Rstudent;<br />
run;
4060 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.7.6 Residual Diagnostics<br />
Internally Externally<br />
Observed Predicted PRESS Studentized Studentized<br />
Obs Value Mean Residual Residual Residual Residual<br />
1 164 172.0 -8.000 -9.600 -0.6569 -0.6146<br />
2 172 172.0 0.000 0.000 0.0000 0.0000<br />
3 168 172.0 -4.000 -4.800 -0.3284 -0.2970<br />
4 177 172.0 5.000 6.000 0.4105 0.3736<br />
5 156 172.0 -16.000 -1<strong>9.2</strong>00 -1.3137 -1.4521<br />
6 195 172.0 23.000 27.600 1.8885 3.1544<br />
7 178 185.0 -7.000 -8.400 -0.9867 -0.9835<br />
8 191 185.0 6.000 7.200 0.8457 0.8172<br />
9 197 185.0 12.000 14.400 1.6914 2.3131<br />
10 182 185.0 -3.000 -3.600 -0.4229 -0.3852<br />
11 185 185.0 0.000 -0.000 0.0000 0.0000<br />
12 177 185.0 -8.000 -9.600 -1.1276 -1.1681<br />
13 175 176.0 -1.000 -1.200 -0.1109 -0.0993<br />
14 193 176.0 17.000 20.400 1.8850 3.1344<br />
15 178 176.0 2.000 2.400 0.2218 0.1993<br />
16 171 176.0 -5.000 -6.000 -0.5544 -0.5119<br />
17 163 176.0 -13.000 -15.600 -1.4415 -1.6865<br />
18 176 176.0 0.000 0.000 0.0000 0.0000<br />
19 155 162.0 -7.000 -8.400 -0.9326 -0.9178<br />
20 166 162.0 4.000 4.800 0.5329 0.4908<br />
21 149 162.0 -13.000 -15.600 -1.7321 -2.4495<br />
22 164 162.0 2.000 2.400 0.2665 0.2401<br />
23 170 162.0 8.000 9.600 1.0659 1.0845<br />
24 168 162.0 6.000 7.200 0.7994 0.7657<br />
Observations 6, 9, 14, and 21 have large studentized residuals (Output 56.7.6). That the externally<br />
studentized residuals are much larger than the internally studentized residuals for these observations<br />
indicates that the variance estimate in the group shrinks when the observation is removed. Also<br />
important to note is that comparisons based on raw residuals in models with heterogeneous variance<br />
can be misleading. Observation 5, for example, has a larger residual but a smaller studentized<br />
residual than observation 21. <strong>The</strong> variance for the first fat type is much larger than the variance in<br />
the fourth group. A “large” residual is more “surprising” in the groups with small variance.<br />
A measure of the overall influence on the analysis is the (restricted) likelihood distance, shown in<br />
Output 56.7.7. Observations 6, 9, 14, and 21 clearly displace the REML solution more than any<br />
other observations.
Output 56.7.7 Restricted Likelihood Distance<br />
Example 56.7: Influence in Heterogeneous Variance Model ✦ 4061<br />
<strong>The</strong> following statements list the restricted likelihood distance and various diagnostics related to the<br />
fixed-effects estimates (Output 56.7.8):<br />
proc print data=inf label;<br />
var leverage observed CookD DFFITS CovRatio RLD;<br />
run;
4062 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.7.8 Restricted Likelihood Distance and Fixed-Effects Diagnostics<br />
Restr.<br />
Observed Cook’s Likelihood<br />
Obs Leverage Value D DFFITS COVRATIO Distance<br />
1 0.167 164 0.02157 -0.27487 1.3706 0.1178<br />
2 0.167 172 0.00000 -0.00000 1.4998 0.1156<br />
3 0.167 168 0.00539 -0.13282 1.4675 0.1124<br />
4 0.167 177 0.00843 0.16706 1.4494 0.1117<br />
5 0.167 156 0.08629 -0.64938 0.9822 0.5290<br />
6 0.167 195 0.17831 1.41069 0.4301 5.8101<br />
7 0.167 178 0.04868 -0.43982 1.2078 0.1935<br />
8 0.167 191 0.03576 0.36546 1.2853 0.1451<br />
9 0.167 197 0.14305 1.03446 0.6416 2.2909<br />
10 0.167 182 0.00894 -0.17225 1.4463 0.1116<br />
11 0.167 185 0.00000 -0.00000 1.4998 0.1156<br />
12 0.167 177 0.06358 -0.52239 1.1183 0.2856<br />
13 0.167 175 0.00061 -0.04441 1.4961 0.1151<br />
14 0.167 193 0.17766 1.40175 0.4340 5.7044<br />
15 0.167 178 0.00246 0.08915 1.4851 0.1139<br />
16 0.167 171 0.01537 -0.22892 1.4078 0.1129<br />
17 0.167 163 0.10389 -0.75423 0.8766 0.8433<br />
18 0.167 176 0.00000 0.00000 1.4998 0.1156<br />
19 0.167 155 0.04349 -0.41047 1.2390 0.1710<br />
20 0.167 166 0.01420 0.21950 1.4148 0.1124<br />
21 0.167 149 0.15000 -1.09545 0.6000 2.7343<br />
22 0.167 164 0.00355 0.10736 1.4786 0.1133<br />
23 0.167 170 0.05680 0.48500 1.1592 0.2383<br />
24 0.167 168 0.03195 0.34245 1.3079 0.1353<br />
In this example, observations with large likelihood distances also have large values for Cook’s D<br />
and values of CovRatio far less than one (Output 56.7.8). <strong>The</strong> latter indicates that the fixed effects<br />
are estimated more precisely when these observations are removed from the analysis.<br />
<strong>The</strong> following statements print the values of the D statistic and the CovRatio for the covariance<br />
parameters:<br />
proc print data=inf label;<br />
var iter CookDCP CovRatioCP;<br />
run;<br />
<strong>The</strong> same conclusions as for the fixed-effects estimates hold for the covariance parameter estimates.<br />
Observations 6, 9, 14, and 21 change the estimates and their precision considerably (Output 56.7.9,<br />
Output 56.7.10). All iterative updates converged within at most four iterations.
Output 56.7.9 Covariance Parameter Diagnostics<br />
Example 56.7: Influence in Heterogeneous Variance Model ✦ 4063<br />
Cook’s D COVRATIO<br />
Obs Iterations CovParms CovParms<br />
1 3 0.05050 1.6306<br />
2 3 0.15603 1.9520<br />
3 3 0.12426 1.8692<br />
4 3 0.10796 1.8233<br />
5 4 0.08232 0.8375<br />
6 4 1.02909 0.1606<br />
7 1 0.00011 1.2662<br />
8 2 0.01262 1.4335<br />
9 3 0.54126 0.3573<br />
10 3 0.10531 1.8156<br />
11 3 0.15603 1.9520<br />
12 2 0.01160 1.0849<br />
13 3 0.15223 1.9425<br />
14 4 1.01865 0.1635<br />
15 3 0.14111 1.9141<br />
16 3 0.07494 1.7203<br />
17 3 0.18154 0.6671<br />
18 3 0.15603 1.9520<br />
19 2 0.00265 1.3326<br />
20 3 0.08008 1.7374<br />
21 1 0.62500 0.3125<br />
22 3 0.13472 1.8974<br />
23 2 0.00290 1.1663<br />
24 2 0.02020 1.4839<br />
Output 56.7.10 displays the standard panel of influence diagnostics that is obtained when influence<br />
analysis is iterative. <strong>The</strong> Cook’s D and CovRatio statistics are displayed for each deletion set for<br />
both fixed-effects and covariance parameter estimates. This provides a convenient summary of<br />
the impact on the analysis for each deletion set, since Cook’s D statistic measures impact on the<br />
estimates and the CovRatio statistic measures impact on the precision of the estimates.
4064 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.7.10 Influence Diagnostics<br />
Observations 6, 9, 14, and 21 have considerable impact on estimates and precision of fixed effects<br />
and covariance parameters. This is not necessarily the case. Observations can be influential on only<br />
some aspects of the analysis, as shown in the next example.<br />
Example 56.8: Influence Analysis for Repeated Measures Data<br />
This example revisits the repeated measures data of Pothoff and Roy (1964) that were analyzed<br />
in Example 56.2. Recall that the data consist of growth measurements at ages 8, 10, 12, and 14<br />
for 11 girls and 16 boys. <strong>The</strong> model being fit contains fixed effects for Gender and Age and their<br />
interaction.<br />
<strong>The</strong> earlier analysis of these data indicated some unusual observations in this data set. Because<br />
of the clustered data structure, it is of interest to study the influence of clusters (children) on the<br />
analysis rather than the influence of individual observations. A cluster comprises the repeated measurements<br />
for each child.
Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4065<br />
<strong>The</strong> repeated measures are first modeled with an unstructured within-child variance-covariance matrix.<br />
A residual variance is not profiled in this model. A noniterative influence analysis will update<br />
the fixed effects only. <strong>The</strong> following statements request this noniterative maximum likelihood analysis<br />
and produce Output 56.8.1:<br />
proc mixed data=pr method=ml;<br />
class person gender;<br />
model y = gender age gender*age /<br />
influence(effect=person);<br />
repeated / type=un subject=person;<br />
ods select influence;<br />
run;<br />
Output 56.8.1 Default Influence Statistics in Noniterative Analysis<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Influence Diagnostics for Levels of Person<br />
Number of<br />
Observations PRESS Cook’s<br />
Person in Level Statistic D<br />
1 4 10.1716 0.01539<br />
2 4 3.8187 0.03988<br />
3 4 10.8448 0.02891<br />
4 4 24.0339 0.04515<br />
5 4 1.6900 0.01613<br />
6 4 11.8592 0.01634<br />
7 4 1.1887 0.00521<br />
8 4 4.6717 0.02742<br />
9 4 13.4244 0.03949<br />
10 4 85.1195 0.13848<br />
11 4 67.9397 0.09728<br />
12 4 40.6467 0.04438<br />
13 4 13.0304 0.00924<br />
14 4 6.1712 0.00411<br />
15 4 24.5702 0.12727<br />
16 4 20.5266 0.01026<br />
17 4 9.9917 0.01526<br />
18 4 7.9355 0.01070<br />
19 4 15.5955 0.01982<br />
20 4 42.6845 0.01973<br />
21 4 95.3282 0.10075<br />
22 4 13.9649 0.03778<br />
23 4 4.9656 0.01245<br />
24 4 37.2494 0.15094<br />
25 4 4.3756 0.03375<br />
26 4 8.1448 0.03470<br />
27 4 20.2913 0.02523<br />
Each observation in the “Influence Diagnostics for Levels of Person” table in Output 56.8.1 represents<br />
the removal of four observations. <strong>The</strong> subjects 10, 15, and 24 have the greatest impact on the<br />
fixed effects (Cook’s D), and subject 10 and 21 have large PRESS statistics. <strong>The</strong> 21st child has a
4066 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
large PRESS statistic, and its D statistic is not that extreme. This is an indication that the model fits<br />
rather poorly for this child, whether it is part of the data or not.<br />
<strong>The</strong> previous analysis does not take into account the effect on the covariance parameters when a<br />
subject is removed from the analysis. If you also update the covariance parameters, the impact of<br />
observations on these can amplify or allay their effect on the fixed effects. To assess the overall<br />
influence of subjects on the analysis and to compute separate statistics for the fixed effects and covariance<br />
parameters, an iterative analysis is obtained by adding the INFLUENCE suboption ITER=,<br />
as follows:<br />
ods graphics on;<br />
proc mixed data=pr method=ml;<br />
class person gender;<br />
model y = gender age gender*age /<br />
influence(effect=person iter=5);<br />
repeated / type=un subject=person;<br />
run;<br />
<strong>The</strong> number of additional iterations following removal of the observations for a particular subject<br />
is limited to five. Graphical displays of influence diagnostics are requested by specifying the ODS<br />
GRAPHICS statement. For general information about ODS Graphics, see Chapter 21, “Statistical<br />
Graphics Using ODS.” For specific information about the graphics available in the <strong>MIXED</strong> procedure,<br />
see the section “ODS Graphics” on page 3998.<br />
<strong>The</strong> <strong>MIXED</strong> procedure produces a plot of the restricted likelihood distance (Output 56.8.2) and a<br />
panel of diagnostics for fixed effects and covariance parameters (Output 56.8.3).
Output 56.8.2 Restricted Likelihood Distance<br />
Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4067
4068 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.8.3 Influence Diagnostics Panel<br />
As judged by the restricted likelihood distance, subjects 20 and 24 clearly have the most influence<br />
on the overall analysis (Output 56.8.2).<br />
Output 56.8.3 displays Cook’s D and CovRatio statistics for the fixed effects and covariance parameters.<br />
Clearly, subject 20 has a dramatic effect on the estimates of variances and covariances.<br />
This subject also affects the precision of the covariance parameter estimates more than any other<br />
subject in Output 56.8.3 (CovRatio near 0).<br />
<strong>The</strong> child who exerts the greatest influence on the fixed effects is subject 24. Maybe surprisingly,<br />
this subject affects the variance-covariance matrix of the fixed effects more than subject 20 (small<br />
CovRatio in Output 56.8.3).<br />
<strong>The</strong> final model investigated for these data is a random coefficient model as in Stram and Lee (1994)<br />
with random effects for the intercept and age effect. <strong>The</strong> following statements examine the estimates<br />
for fixed effects and the entries of the unstructured 2 2 variance matrix of the random coefficients<br />
graphically:
Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4069<br />
proc mixed data=pr method=ml<br />
plots(only)=InfluenceEstPlot;<br />
class person gender;<br />
model y = gender age gender*age /<br />
influence(iter=5 effect=person est);<br />
random intercept age / type=un subject=person;<br />
run;<br />
<strong>The</strong> PLOTS(ONLY)=INFLUENCEESTPLOT option restricts the graphical output from this PROC<br />
<strong>MIXED</strong> run to only the panels of deletion estimates (Output 56.8.4 and Output 56.8.5).<br />
Output 56.8.4 Fixed-Effects Deletion Estimates<br />
In Output 56.8.4 the graphs on the left side of the panel represent the intercept and slope estimate<br />
for boys; the graphs on the right side represent the difference in intercept and slope between boys<br />
and girls. Removing any one of the first eleven children, who are girls, does not alter the intercept<br />
or slope in the group of boys. <strong>The</strong> difference in these parameters between boys and girls is altered<br />
by the removal of any child. Subject 24 changes the fixed effects considerably, subject 20 much less<br />
so.
4070 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.8.5 Covariance Parameter Deletion Estimates<br />
<strong>The</strong> covariance parameter deletion estimates in Output 56.8.5 show several important features.<br />
<strong>The</strong> panels do not contain information about subject 24. Estimation of the G matrix following<br />
removal of that child did not yield a positive definite matrix. As a consequence, covariance<br />
parameter diagnostics are not produced for this subject.<br />
Subject 20 has great impact on the four covariance parameters. Removing this child from<br />
the analysis increases the variance of the random intercept and random slope and reduces<br />
the residual variance by almost 80%. <strong>The</strong> repeated measurements of this child exhibit an<br />
up-and-down behavior.<br />
<strong>The</strong> variance of the random intercept and slope are reduced when child 15 is removed from<br />
the analysis. This child’s growth measurements oscillate about 27.0 from age 10 on.<br />
Examining observed and residual values by levels of classification variables is also a useful tool to<br />
diagnose the adequacy of the model and unusual observations. Box plots for effects in the model that<br />
consist of only classification variables can be requested with the BOXPLOT option of the PLOT=<br />
option in the PROC <strong>MIXED</strong> statement. For example, the following statements produce box plots<br />
for the SUBJECT= effects in the model:
Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4071<br />
proc mixed data=pr method=ml<br />
plot=boxplot(observed marginal conditional subject);<br />
class person gender;<br />
model y = gender age gender*age;<br />
random intercept age / type=un subject=person;<br />
run;<br />
<strong>The</strong> specific boxplot options request a plot of the observed data (Output 56.8.6), the marginal residuals<br />
(Output 56.8.7), and the conditional residuals (Output 56.8.8). Box plots of the observed values<br />
show the variation within and between children clearly. <strong>The</strong> group of girls (subjects 1–11) is distinguishable<br />
from the group of boys by somewhat lesser average growth and lesser within-child<br />
variation (Output 56.8.6). After adjusting for overall (population-averaged) gender and age effects,<br />
the residual within-child variation is reduced but substantial differences in the means remain<br />
(Output 56.8.7). If child-specific inferences are desired, a model accounting for only Gender, Age,<br />
and Gender*Age effects is not adequate for these data.<br />
Output 56.8.6 Distribution of Observed Values
4072 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.8.7 Distribution of Marginal Residuals<br />
<strong>The</strong> conditional residuals incorporate the EBLUPs for each child and enable you to examine whether<br />
the subject-specific model is adequate (Output 56.8.8). By using each child “as its own control,”<br />
the residuals are now centered near zero. Subjects 20 and 24 stand out as unusual in all three sets of<br />
box plots.
Output 56.8.8 Distribution of Conditional Residuals<br />
Example 56.9: Examining Individual Test Components ✦ 4073<br />
Example 56.9: Examining Individual Test Components<br />
<strong>The</strong> LCOMPONENTS option in the MODEL statement enables you to perform single-degree-offreedom<br />
tests for individual rows of the L matrix. Such tests are useful to identify interaction<br />
patterns. In a balanced layout, Type 3 components of L associated with A*B interactions correspond<br />
to simple contrasts of cell mean differences.<br />
<strong>The</strong> first example revisits the data from the split-plot design by Stroup (1989a) that was analyzed<br />
in Example 56.1. Recall that variables A and B in the following statements represent the whole-plot<br />
and subplot factors, respectively:<br />
proc mixed data=sp;<br />
class a b block;<br />
model y = a b a*b / LComponents e3;<br />
random block a*block;<br />
run;<br />
<strong>The</strong> <strong>MIXED</strong> procedure constructs a separate L matrix for each of the three fixed-effects components.<br />
<strong>The</strong> matrices are displayed in Output 56.9.1. <strong>The</strong> tests for fixed effects are shown in
4074 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Output 56.<strong>9.2</strong>.<br />
Output 56.9.1 Coefficients of Type 3 Estimable Functions<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Type 3 Coefficients for A<br />
Effect A B Row1 Row2<br />
Intercept<br />
A 1 1<br />
A 2 1<br />
A 3 -1 -1<br />
B 1<br />
B 2<br />
A*B 1 1 0.5<br />
A*B 1 2 0.5<br />
A*B 2 1 0.5<br />
A*B 2 2 0.5<br />
A*B 3 1 -0.5 -0.5<br />
A*B 3 2 -0.5 -0.5<br />
Type 3 Coefficients for B<br />
Effect A B Row1<br />
Intercept<br />
A 1<br />
A 2<br />
A 3<br />
B 1 1<br />
B 2 -1<br />
A*B 1 1 0.3333<br />
A*B 1 2 -0.333<br />
A*B 2 1 0.3333<br />
A*B 2 2 -0.333<br />
A*B 3 1 0.3333<br />
A*B 3 2 -0.333<br />
Type 3 Coefficients for A*B<br />
Effect A B Row1 Row2<br />
Intercept<br />
A 1<br />
A 2<br />
A 3<br />
B 1<br />
B 2<br />
A*B 1 1 1<br />
A*B 1 2 -1<br />
A*B 2 1 1<br />
A*B 2 2 -1<br />
A*B 3 1 -1 -1<br />
A*B 3 2 1 1
Output 56.<strong>9.2</strong> Type 3 Tests in Split-Plot Example<br />
Example 56.9: Examining Individual Test Components ✦ 4075<br />
Type 3 Tests of Fixed Effects<br />
Num Den<br />
Effect DF DF F Value Pr > F<br />
A 2 6 4.07 0.0764<br />
B 1 9 19.39 0.0017<br />
A*B 2 9 4.02 0.0566<br />
If i: denotes a whole-plot main effect mean, :j denotes a subplot main effect mean, and ij denotes<br />
a cell mean, the five components shown in Output 56.9.3 correspond to tests of the following:<br />
H0 W 1: D 2:<br />
H0 W 2: D 3:<br />
H0 W :1 D :2<br />
H0 W 11 12 D 31 32<br />
H0 W 21 22 D 31 32<br />
Output 56.9.3 Type 3 L Components Table<br />
L Components of Type 3 Tests of Fixed Effects<br />
L Standard<br />
Effect Index Estimate Error DF t Value Pr > |t|<br />
A 1 7.1250 3.1672 6 2.25 0.0655<br />
A 2 8.3750 3.1672 6 2.64 0.0383<br />
B 1 5.5000 1.2491 9 4.40 0.0017<br />
A*B 1 7.7500 3.0596 9 2.53 0.0321<br />
A*B 2 7.2500 3.0596 9 2.37 0.0419<br />
<strong>The</strong> first three components are comparisons of marginal means. <strong>The</strong> fourth component compares<br />
the effect of factor B at the first whole-plot level against the effect of B at the third whole-plot level.<br />
Finally, the last component tests whether the factor B effect changes between the second and third<br />
whole-plot level.<br />
<strong>The</strong> Type 3 component tests can also be produced with these corresponding ESTIMATE statements:<br />
proc mixed data=sp;<br />
class a b block ;<br />
model y = a b a*b;<br />
random block a*block;<br />
estimate ’a 1’ a 1 0 -1;<br />
estimate ’a 2’ a 0 1 -1;
4076 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
estimate ’b 1’ b 1 -1;<br />
estimate ’a*b 1’ a*b 1 -1 0 0 -1 1;<br />
estimate ’a*b 2’ a*b 0 0 1 -1 -1 1;<br />
ods select Estimates;<br />
run;<br />
<strong>The</strong> results are shown in Output 56.9.4.<br />
Output 56.9.4 Results from ESTIMATE Statements<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Estimates<br />
Standard<br />
Label Estimate Error DF t Value Pr > |t|<br />
a 1 7.1250 3.1672 6 2.25 0.0655<br />
a 2 8.3750 3.1672 6 2.64 0.0383<br />
b 1 5.5000 1.2491 9 4.40 0.0017<br />
a*b 1 7.7500 3.0596 9 2.53 0.0321<br />
a*b 2 7.2500 3.0596 9 2.37 0.0419<br />
A second useful application of the LCOMPONENTS option is in polynomial models, where Type<br />
1 tests are often used to test the entry of model terms sequentially. <strong>The</strong> SOLUTION option in the<br />
MODEL statement displays the regression coefficients that correspond to a Type 3 analysis. That<br />
is, the coefficients represent the partial coefficients you would get by adding the regressor variable<br />
last in a model containing all other effects, and the tests are identical to those in the “Type 3 Tests<br />
of Fixed Effects” table.<br />
Consider the following DATA step and the fit of a third-order polynomial regression model.<br />
data polynomial;<br />
do x=1 to 20; input y@@; output; end;<br />
datalines;<br />
1.092 1.758 1.997 3.154 3.880<br />
3.810 4.921 4.573 6.029 6.032<br />
6.291 7.151 7.154 6.469 7.137<br />
6.374 5.860 4.866 4.155 2.711<br />
;<br />
proc mixed data=polynomial;<br />
model y = x x*x x*x*x / s lcomponents htype=1,3;<br />
run;<br />
<strong>The</strong> t tests displayed in the “Solution for Fixed Effects” table are Type 3 tests, sometimes referred<br />
to as partial tests. <strong>The</strong>y measure the contribution of a regressor in the presence of all other regressor<br />
variables in the model.
Output 56.9.5 Parameter Estimates in Polynomial Model<br />
Example 56.9: Examining Individual Test Components ✦ 4077<br />
<strong>The</strong> Mixed <strong>Procedure</strong><br />
Solution for Fixed Effects<br />
Standard<br />
Effect Estimate Error DF t Value Pr > |t|<br />
Intercept 0.7837 0.3545 16 2.21 0.0420<br />
x 0.3726 0.1426 16 2.61 0.0189<br />
x*x 0.04756 0.01558 16 3.05 0.0076<br />
x*x*x -0.00306 0.000489 16 -6.27 |t|<br />
x 1 0.1763 0.01259 16 14.01
4078 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
References<br />
Akritas, M. G., Arnold, S. F., and Brunner, E. (1997), “Nonparametric Hypotheses and Rank Statistics<br />
for Unbalanced Factorial Designs,” Journal of the American Statistical Association, 92: 258–<br />
265.<br />
Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction on<br />
Automatic Control, AC–19, 716–723.<br />
Allen, D. M. (1974), “<strong>The</strong> Relationship between Variable Selection and Data Augmentation and a<br />
Method of Prediction,” Technometrics, 16, 125–127.<br />
Bates, D. M. and Watts, D. G. (1988), Nonlinear Regression Analysis and Its Applications, New<br />
York: John Wiley & Sons.<br />
Beckman, R. J., Nachtsheim, C. J., and Cook, D. R. (1987), “Diagnostics for Mixed-Model Analysis<br />
of Variance,” Technometrics, 29, 413–426<br />
Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics; Identifying Influential<br />
Data and Sources of Collinearity, New York: John Wiley & Sons.<br />
Box, G. E. P. and Tiao, G. C. (1973), Bayesian Inference in Statistical Analysis, Wiley Classics<br />
Library Edition Published 1992, New York: John Wiley & Sons.<br />
Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion (AIC): <strong>The</strong> General<br />
<strong>The</strong>ory and Its Analytical Extensions,” Psychometrika, 52, 345–370.<br />
Brown, H. and Prescott, R. (1999), Applied Mixed Models in Medicine, New York: John Wiley &<br />
Sons.<br />
Brownie, C., Bowman, D. T., and Burton, J. W. (1993), “Estimating Spatial Variation in Analysis<br />
of Data from Yield Trials: A Comparison of Methods,” Agronomy Journal, 85, 1244–1253.<br />
Brownie, C., and Gumpertz, M. L. (1997), “Validity of Spatial Analysis of Large Field Trials,”<br />
Journal of Agricultural, Biological, and Environmental Statistics, 2, 1–23.<br />
Brunner, E., Dette, H., Munk, A. (1997), “Box-Type Approximations in Nonparametric Factorial<br />
Designs,” Journal of the American Statistical Association, 92, 1494–1502.<br />
Brunner, E., Domhof, S., and Langer, F. (2002), Nonparametric Analysis of Longitudinal Data in<br />
Factorial Experiments, New York: John Wiley & Sons.<br />
Burdick, R. K. and Graybill, F. A. (1992), Confidence Intervals on Variance Components, New<br />
York: Marcel Dekker.<br />
Burnham, K. P. and Anderson, D. R. (1998), Model Selection and Inference: A Practical<br />
Information-<strong>The</strong>oretic Approach, New York: Springer-Verlag.<br />
Carlin, B. P. and Louis, T. A. (1996), Bayes and Empirical Bayes Methods for Data Analysis,<br />
London: Chapman and Hall.
References ✦ 4079<br />
Carroll, R. J. and Ruppert, D. (1988), Transformation and Weighting in Regression, London: Chapman<br />
and Hall.<br />
Chilès, J. P. and Delfiner, P. (1999), Geostatistics. Modeling Spatial Uncertainty, New York: John<br />
Wiley & Sons.<br />
Christensen, R., Pearson, L. M., and Johnson, W. (1992), “Case-Deletion Diagnostics for Mixed<br />
Models,” Technometrics, 34, 38–45.<br />
Cook, R. D. (1977), “Detection of Influential Observations in Linear Regression,” Technometrics,<br />
19, 15–18.<br />
Cook, R. D. (1979), “Influential Observations in Linear Regression,” Journal of the American Statistical<br />
Association, 74, 169–174.<br />
Cook, R. D. and Weisberg, S. (1982), Residuals and Influence in Regression, New York: Chapman<br />
and Hall.<br />
Cressie, N. (1993), Statistics for Spatial Data, Revised Edition, New York: John Wiley & Sons.<br />
Crowder, M. J. and Hand, D. J. (1990), Analysis of Repeated Measures, New York: Chapman and<br />
Hall.<br />
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete<br />
Data via the EM Algorithm,” Journal of the Royal Statistical Society, Ser. B., 39, 1–38.<br />
Diggle, P. J. (1988), “An Approach to the Analysis of Repeated Measurements,” Biometrics, 44,<br />
959–971.<br />
Diggle, P. J., Liang, K. Y., and Zeger, S. L. (1994), Analysis of Longitudinal Data, Oxford: Clarendon<br />
Press.<br />
Dunnett, C. W. (1980), “Pairwise Multiple Comparisons in the Unequal Variance Case,” Journal of<br />
the American Statistical Association, 75, 796–800.<br />
Edwards, D. and Berry, J. J. (1987), “<strong>The</strong> Efficiency of Simulation-based Multiple Comparisons,”<br />
Biometrics, 43, 913–928.<br />
Everitt, B. S. (1995), “<strong>The</strong> Analysis of Repeated Measures: A Practical Review with Examples,”<br />
<strong>The</strong> Statistician, 44, 113–135.<br />
Fai, A. H. T. and Cornelius, P. L. (1996), “Approximate F-tests of Multiple Degree of Freedom<br />
Hypotheses in Generalized Least Squares Analyses of Unbalanced Split-plot Experiments,” Journal<br />
of Statistical Computation and Simulation, 54, 363–378.<br />
Federer, W. T. and Wolfinger, R. D. (1998), “<strong>SAS</strong> Code for Recovering Intereffect Information in<br />
Experiments with Incomplete Block and Lattice Rectangle Designs,” Agronomy Journal, 90, 545–<br />
551.<br />
Fuller, W. A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons.
4080 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Fuller, W. A. and Battese, G. E. (1973), “Transformations for Estimation of Linear Models with<br />
Nested Error Structure,” Journal of the American Statistical Association, 68, 626–632.<br />
Galecki, A. T. (1994), “General Class of Covariance Structures for Two or More Repeated Factors<br />
in Longitudinal Data Analysis,” Communications in Statistics–<strong>The</strong>ory and Methods, 23(11), 3105–<br />
3119.<br />
Games, P. A., and Howell, J. F. (1976), “Pairwise Multiple Comparison <strong>Procedure</strong>s With Unequal<br />
n’s and/or Variances: A Monte Carlo Study,” Journal of Educational Statistics, 1, 113–125.<br />
Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990), “Illustration of Bayesian<br />
Inference in Normal Data Models Using Gibbs Sampling,” Journal of the American Statistical<br />
Association, 85, 972–985.<br />
Ghosh, M. (1992), Discussion of Schervish, M., “Bayesian Analysis of Linear Models,” Bayesian<br />
Statistics 4, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford: University<br />
Press, 432–433.<br />
Giesbrecht, F. G. (1989), “A General Structure for the Class of Mixed Linear Models,” Applications<br />
of Mixed Models in Agriculture and Related Disciplines, Southern Cooperative Series Bulletin No.<br />
343, Louisiana Agricultural Experiment Station, Baton Rouge, 183–201.<br />
Giesbrecht, F. G. and Burns, J. C. (1985), “Two-Stage Analysis Based on a Mixed Model: Largesample<br />
Asymptotic <strong>The</strong>ory and Small-Sample Simulation Results,” Biometrics, 41, 477–486.<br />
Golub, G. H. and Van Loan, C. F. (1989), Matrix Computations, Second Edition, Baltimore: Johns<br />
Hopkins University Press.<br />
Goodnight, J. H. (1978), <strong>SAS</strong> Technical Report R-101, Tests of Hypotheses in Fixed-Effects Linear<br />
Models, Cary, NC: <strong>SAS</strong> Institute Inc.<br />
Goodnight, J. H. (1979), “A Tutorial on the Sweep Operator,” American Statistician, 33, 149–158.<br />
Goodnight, J. H. and Hemmerle, W. J. (1979), “A Simplified Algorithm for the W-Transformation<br />
in Variance Component Estimation,” Technometrics, 21, 265–268.<br />
Gotway, C. A. and Stroup, W. W. (1997), “A Generalized Linear Model Approach to Spatial Data<br />
and Prediction,” Journal of Agricultural, Biological, and Environmental Statistics, 2, 157–187.<br />
Greenhouse, S. W. and Geisser, S. (1959), “On Methods in the Analysis of Profile Data,” Psychometrika,<br />
32, 95–112.<br />
Gregoire, T. G., Schabenberger, O., and Barrett, J. P. (1995), “Linear Modelling of Irregularly<br />
Spaced, Unbalanced, Longitudinal Data from Permanent Plot Measurements,” Canadian Journal of<br />
Forest Research, 25, 137–156.<br />
Handcock, M. S. and Stein, M. L. (1993), “A Bayesian Analysis of Kriging,” Technometrics, 35(4),<br />
403–410<br />
Handcock, M. S. and Wallis, J. R. (1994), “An Approach to Statistical Spatial-Temporal Modeling<br />
of Meteorological Fields (with Discussion),” Journal of the American Statistical Association, 89,<br />
368–390.
References ✦ 4081<br />
Hanks, R.J., Sisson, D.V., Hurst, R.L, and Hubbard K.G. (1980), “Statistical Analysis of Results<br />
from Irrigation Experiments Using the Line-Source Sprinkler System,” Soil Science Society American<br />
Journal, 44, 886–888.<br />
Hannan, E.J. and Quinn, B.G. (1979), “<strong>The</strong> Determination of the Order of an Autoregression,”<br />
Journal of the Royal Statistical Society, Series B, 41, 190–195.<br />
Hartley, H. O. and Rao, J. N. K. (1967), “Maximum-Likelihood Estimation for the Mixed Analysis<br />
of Variance Model,” Biometrika, 54, 93–108.<br />
Harville, D. A. (1977), “Maximum Likelihood Approaches to Variance Component Estimation and<br />
to Related Problems,” Journal of the American Statistical Association, 72, 320–338.<br />
Harville, D. A. (1988), “Mixed-Model Methodology: <strong>The</strong>oretical Justifications and Future Directions,”<br />
Proceedings of the Statistical Computing Section, American Statistical Association, New<br />
Orleans, 41–49.<br />
Harville, D. A. (1990), “BLUP (Best Linear Unbiased Prediction), and Beyond,” in Advances in<br />
Statistical Methods for Genetic Improvement of Livestock, Springer-Verlag, 239–276.<br />
Harville, D. A. and Jeske, D. R. (1992), “Mean Squared Error of Estimation or Prediction under a<br />
General Linear Model,” Journal of the American Statistical Association, 87, 724–731.<br />
Hemmerle, W. J. and Hartley, H. O. (1973), “Computing Maximum Likelihood Estimates for the<br />
Mixed AOV Model Using the W-Transformation,” Technometrics, 15, 819–831.<br />
Henderson, C. R. (1984), Applications of Linear Models in Animal Breeding, University of Guelph.<br />
Henderson, C. R. (1990), “Statistical Method in Animal Improvement: Historical Overview,” in<br />
Advances in Statistical Methods for Genetic Improvement of Livestock, New York: Springer-Verlag,<br />
1–14.<br />
Hsu, J. C. (1992), “<strong>The</strong> Factor Analytic Approach to Simultaneous Inference in the General Linear<br />
Model,” Journal of Computational and Graphical Statistics, 1, 151–168.<br />
Huber, P. J. (1967), “<strong>The</strong> Behavior of Maximum Likelihood Estimates under Nonstandard Conditions,”<br />
Proc. Fifth Berkeley Symp. Math. Statist. Prob., 1, 221–233.<br />
Hurtado, G. I. H. (1993), Detection of Influential Observations in Linear Mixed Models, Ph.D.<br />
dissertation, Department of Statistics, North Carolina State University, Raleigh, NC.<br />
Hurvich, C. M. and Tsai, C.-L. (1989), “Regression and Time Series Model Selection in Small<br />
Samples,” Biometrika, 76, 297–307.<br />
Huynh, H. and Feldt, L. S. (1970), “Conditions Under Which Mean Square Ratios in Repeated Measurements<br />
Designs Have Exact F-Distributions,” Journal of the American Statistical Association,<br />
65, 1582–1589.<br />
Jennrich, R. I. and Schluchter, M. D. (1986), “Unbalanced Repeated-Measures Models with Structured<br />
Covariance Matrices,” Biometrics, 42, 805–820.
4082 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Johnson, D. E., Chaudhuri, U. N., and Kanemasu, E. T. (1983), “Statistical Analysis of Line-Source<br />
Sprinkler Irrigation Experiments and Other Nonrandomized Experiments Using Multivariate Methods,”<br />
Soil Science Society American Journal, 47, 309–312.<br />
Jones, R. H. and Boadi-Boateng, F. (1991), “Unequally Spaced Longitudinal Data with AR(1) Serial<br />
Correlation,” Biometrics, 47, 161–175.<br />
Kackar, R. N. and Harville, D. A. (1984), “Approximations for Standard Errors of Estimators of<br />
Fixed and Random Effects in Mixed Linear Models,” Journal of the American Statistical Association,<br />
79, 853–862.<br />
Kass, R. E. and Steffey, D. (1989), “Approximate Bayesian Inference in Conditionally Independent<br />
Hierarchical Models (Parametric Empirical Bayes Models),” Journal of the American Statistical<br />
Association, 84, 717–726.<br />
Kenward, M. G. (1987), “A Method for Comparing Profiles of Repeated Measurements,” Applied<br />
Statistics, 36, 296–308.<br />
Kenward, M. G. and Roger, J. H. (1997), “Small Sample Inference for Fixed Effects from Restricted<br />
Maximum Likelihood,” Biometrics, 53, 983–997.<br />
Keselman, H. J., Algina, J., Kowalchuk, R. K., and Wolfinger, R. D. (1998), “A Comparison of Two<br />
Approaches for Selecting Covariance Structures in the Analysis of Repeated Measures,” Communications<br />
in Statistics–Computation and Simulation, 27(3), 591–604.<br />
Keselman, H. J., Algina, J., Kowalchuk, R. K., and Wolfinger, R. D. (1999). “A Comparison of<br />
Recent Approaches to the Analysis of Repeated Measurements,” British Journal of Mathematical<br />
and Statistical Psychology, 52, 63–78.<br />
Kramer, C. Y. (1956), “Extension of Multiple Range Tests to Group Means with Unequal Numbers<br />
of Replications,” Biometrics, 12, 309–310.<br />
Laird, N. M. and Ware, J. H. (1982), “Random-Effects Models for Longitudinal Data,” Biometrics,<br />
38, 963–974.<br />
Laird, N. M., Lange, N., and Stram, D. (1987), “Maximum Likelihood Computations with Repeated<br />
Measures: Application of the EM Algorithm,” Journal of the American Statistical Association, 82,<br />
97–105.<br />
LaMotte, L. R. (1973), “Quadratic Estimation of Variance Components,” Biometrics, 29, 311–330.<br />
Liang, K.Y. and Zeger, S.L. (1986), “Longitudinal Data Analysis Using Generalized Linear Models,”<br />
Biometrika, 73, 13–22.<br />
Lindsey, J. K. (1993), Models for Repeated Measurements, Oxford: Clarendon Press.<br />
Lindstrom, M. J. and Bates, D. M. (1988), “Newton-Raphson and EM Algorithms for Linear Mixed-<br />
Effects Models for Repeated-Measures Data,” Journal of the American Statistical Association, 83,<br />
1014–1022.
References ✦ 4083<br />
Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., and Schabenberger, O. (2006), <strong>SAS</strong><br />
for Mixed Models, Second Edition, Cary, NC: <strong>SAS</strong> Institute Inc.<br />
Little, R. J. A. (1995), “Modeling the Drop-Out Mechanism in Repeated-Measures Studies,” Journal<br />
of the American Statistical Association, 90, 1112–1121.<br />
Louis, T. A. (1988), “General Methods for Analyzing Repeated Measures,” Statistics in Medicine,<br />
7, 29–45.<br />
Macchiavelli, R. E. and Arnold, S. F. (1994), “Variable Order Ante-dependence Models,” Communications<br />
in Statistics–<strong>The</strong>ory and Methods, 23(9), 2683–2699.<br />
Marx, D. and Thompson, K. (1987), “Practical Aspects of Agricultural Kriging,” Bulletin 903,<br />
Arkansas Agricultural Experiment Station, Fayetteville.<br />
Matérn, B. (1986), Spatial Variation, Second Edition, Lecture Notes in Statistics, New York:<br />
Springer-Verlag.<br />
McKeon, J. J. (1974), “F Approximations to the Distribution of Hotelling’s T 2 0<br />
381–383.<br />
,” Biometrika, 61,<br />
McLean, R. A. and Sanders, W. L. (1988), “Approximating Degrees of Freedom for Standard Errors<br />
in Mixed Linear Models,” Proceedings of the Statistical Computing Section, American Statistical<br />
Association, New Orleans, 50–59.<br />
McLean, R. A., Sanders, W. L., and Stroup, W. W. (1991), “A Unified Approach to Mixed Linear<br />
Models,” <strong>The</strong> American Statistician, 45, 54–64.<br />
Milliken, G. A. and Johnson, D. E. (1992), Analysis of Messy Data, Volume 1: Designed Experiments,<br />
New York: Chapman and Hall.<br />
Murray, D. M. (1998), Design and Analysis of Group-Randomized Trials, New York: Oxford University<br />
Press.<br />
Myers, R. H. (1990), Classical and Modern Regression with Applications, Second Edition, Belmont,<br />
CA: PWS-Kent.<br />
Obenchain, R. L. (1990), STABLSIM.EXE, Version 9010, Eli Lilly and Company, Indianapolis,<br />
Indiana, unpublished C code.<br />
Patel, H. I. (1991), “Analysis of Incomplete Data from a Clinical Trial with Repeated Measurements,”<br />
Biometrika, 78, 609–619.<br />
Patterson, H. D. and Thompson, R. (1971), “Recovery of Inter-block Information When Block Sizes<br />
Are Unequal,” Biometrika, 58, 545–554.<br />
Pillai, K. C. and Samson, P. (1959), “On Hotelling’s Generalization of T 2 ,” Biometrika, 46, 160–<br />
168.<br />
Pothoff, R. F. and Roy, S. N. (1964), “A Generalized Multivariate Analysis of Variance Model<br />
Useful Especially for Growth Curve Problems,” Biometrika, 51, 313–326.
4084 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
Prasad, N. G. N. and Rao, J. N. K. (1990), “<strong>The</strong> Estimation of Mean Squared Error of Small-Area<br />
Estimators,” Journal of the American Statistical Association, 85, 163–171.<br />
Pringle, R. M. and Rayner, A. A. (1971), Generalized Inverse Matrices with Applications to Statistics,<br />
New York: Hafner Publishing Co.<br />
Rao, C. R. (1972), “Estimation of Variance and Covariance Components in Linear Models,” Journal<br />
of the American Statistical Association, 67, 112–115.<br />
Ripley, B. D. (1987), Stochastic Simulation, New York: John Wiley & Sons.<br />
Robinson, G. K. (1991), “That BLUP Is a Good Thing: <strong>The</strong> Estimation of Random Effects,” Statistical<br />
Science, 6, 15–51.<br />
Rubin, D. B. (1976), “Inference and Missing Data,” Biometrika, 63, 581–592.<br />
Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989), “Design and Analysis of Computer<br />
Experiments,” Statistical Science 4, 409–435.<br />
Schabenberger, O. and Gotway, C. A. (2005), Statistical Methods for Spatial Data Analysis, Boca<br />
Raton, FL: CRC Press.<br />
Schluchter, M. D. and Elashoff, J. D. (1990), “Small-Sample Adjustments to Tests with Unbalanced<br />
Repeated Measures Assuming Several Covariance Structures,” Journal of Statistical Computation<br />
and Simulation, 37, 69–87.<br />
Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464.<br />
Schervish, M. J. (1992), “Bayesian Analysis of Linear Models,” Bayesian Statistics 4, eds. J.M.<br />
Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford: University Press, 419–434 (with<br />
discussion).<br />
Searle, S. R. (1971), Linear Models, New York: John Wiley & Sons.<br />
Searle, S. R. (1982), Matrix Algebra Useful for Statisticians, New York: John Wiley & Sons.<br />
Searle, S. R. (1988), “Mixed Models and Unbalanced Data: Wherefrom, Whereat, and Whereto?”<br />
Communications in Statistics–<strong>The</strong>ory and Methods, 17(4), 935–968.<br />
Searle, S. R., Casella, G., and McCulloch, C. E. (1992), Variance Components, New York: John<br />
Wiley & Sons.<br />
Self, S. G. and Liang, K. Y. (1987), “Asymptotic Properties of Maximum Likelihood Estimators<br />
and Likelihood Ratio Tests under Nonstandard Conditions,” Journal of the American Statistical<br />
Association, 82, 605–610.<br />
Serfling, R. J. (1980), Approximation <strong>The</strong>orems of Mathematical Statistics, New York: John Wiley<br />
& Sons.<br />
Singer, J. D. (1998), “Using <strong>SAS</strong> PROC <strong>MIXED</strong> to Fit Multilevel Models, Hierarchical Models,<br />
and Individual Growth Models,” Journal of Educational and Behavioral Statistics, 23(4), 323–355.
References ✦ 4085<br />
Smith, A. F. M. and Gelfand, A. E. (1992), “Bayesian Statistics without Tears: A Sampling-<br />
Resampling Perspective,” American Statistician, 46, 84–88.<br />
Snedecor, G. W. and Cochran, W. G. (1976), Statistical Methods, Sixth Edition, Ames: Iowa State<br />
University Press.<br />
Snedecor, G. W. and Cochran, W. G. (1980), Statistical Methods, Ames: Iowa State University<br />
Press.<br />
Steel, R. G. D., Torrie, J. H., and Dickey D. (1997), Principles and <strong>Procedure</strong>s of Statistics: A<br />
Biometrical Approach, Third Edition, New York: McGraw-Hill, Inc.<br />
Stram, D. O. and Lee, J. W. (1994), “Variance Components Testing in the Longitudinal Mixed<br />
Effects Model,” Biometrics, 50, 1171–1177.<br />
Stroup, W. W. (1989a), “Predictable Functions and Prediction Space in the Mixed Model <strong>Procedure</strong>,”<br />
in Applications of Mixed Models in Agriculture and Related Disciplines, Southern Cooperative<br />
Series Bulletin No. 343, Louisiana Agricultural Experiment Station, Baton Rouge, 39–48.<br />
Stroup, W. W. (1989b), “Use of Mixed Model <strong>Procedure</strong> to Analyze Spatially Correlated Data: An<br />
Example Applied to a Line-Source Sprinkler Irrigation Experiment,” Applications of Mixed Models<br />
in Agriculture and Related Disciplines, Southern Cooperative Series Bulletin No. 343, Louisiana<br />
Agricultural Experiment Station, Baton Rouge, 104–122.<br />
Stroup, W. W., Baenziger, P. S., and Mulitze, D. K. (1994), “Removing Spatial Variation from<br />
Wheat Yield Trials: A Comparison of Methods,” Crop Science, 86, 62–66.<br />
Sullivan, L. M., Dukes, K. A., and Losina, E. (1999), “An Introduction to Hierarchical Linear<br />
Modelling,” Statistics in Medicine, 18, 855–888.<br />
Swallow, W. H. and Monahan, J. F. (1984), “Monte Carlo Comparison of ANOVA, MIVQUE,<br />
REML, and ML Estimators of Variance Components,” Technometrics, 28, 47–57.<br />
Tamhane, A. C. (1979), “A Comparison of <strong>Procedure</strong>s for Multiple Comparisons of Means With<br />
Unequal Variances,” Journal of the American Statistical Association, 74, 471–480.<br />
Tierney, L. (1994), “Markov Chains for Exploring Posterior Distributions” (with discussion), Annals<br />
of Statistics, 22, 1701–1762.<br />
Verbeke, G. and Molenberghs, G., eds. (1997), Linear Mixed Models in Practice: A <strong>SAS</strong>-Oriented<br />
Approach, New York: Springer.<br />
Verbeke, G. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data, New York:<br />
Springer.<br />
Westfall, P. J. and Young, S. S. (1993), Resampling-based Multiple Testing, New York: John Wiley<br />
& Sons.<br />
Westfall, P. H., Tobias, R. D., Rom, D., Wolfinger, R. D., and Hochberg, Y. (1999), Multiple Comparisons<br />
and Multiple Tests Using the <strong>SAS</strong> System, Cary, NC: <strong>SAS</strong> Institute Inc.
4086 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />
White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test<br />
for Heteroskedasticity,” Econometrica, 48, 817–838.<br />
Whittle, P. (1954), “On Stationary Processes in the Plane,” Biometrika, 41, 434–449.<br />
Winer, B. J. (1971), Statistical Principles in Experimental Design, Second Edition, New York:<br />
McGraw-Hill, Inc.<br />
Wolfinger, R. D. (1993), “Covariance Structure Selection in General Mixed Models,” Communications<br />
in Statistics, Simulation and Computation, 22(4), 1079–1106.<br />
Wolfinger, R. D. (1996), “Heterogeneous Variance-Covariance Structures for Repeated Measures,”<br />
Journal of Agricultural, Biological, and Environmental Statistics, 1, 205-230.<br />
Wolfinger, R. D. (1997), “An Example of Using Mixed Models and PROC <strong>MIXED</strong> for Longitudinal<br />
Data,” Journal of Biopharmaceutical Statistics, 7(4), 481–500.<br />
Wolfinger, R. D. and Chang, M. (1995), “Comparing the <strong>SAS</strong> GLM and <strong>MIXED</strong> <strong>Procedure</strong>s for<br />
Repeated Measures,” Proceedings of the Twentieth Annual <strong>SAS</strong> Users Group Conference.<br />
Wolfinger, R. D., Tobias, R. D., and Sall, J. (1991), “Mixed Models: A Future Direction,” Proceedings<br />
of the Sixteenth Annual <strong>SAS</strong> Users Group Conference, 1380–1388.<br />
Wolfinger, R. D., Tobias, R. D., and Sall, J. (1994), “Computing Gaussian Likelihoods and <strong>The</strong>ir<br />
Derivatives for General Linear Mixed Models,” SIAM Journal on Scientific Computing, 15(6),<br />
1294–1310.<br />
Wright, P. S. (1994), “Adjusted F Tests for Repeated Measures with the <strong>MIXED</strong> <strong>Procedure</strong>,” 328<br />
SMC-Statistics Department, University of Tennessee.<br />
Zimmerman, D. L. and Harville, D. A. (1991), “A Random Field Approach to the Analysis of<br />
Field-Plot Experiments and Other Spatial Experiments,” Biometrics, 47, 223–239.
Subject Index<br />
2D geometric anisotropic structure<br />
<strong>MIXED</strong> procedure, 3953<br />
Akaike’s information criterion<br />
example (<strong>MIXED</strong>), 4011, 4025, 4054<br />
<strong>MIXED</strong> procedure, 3901, 3970, 3991<br />
Akaike’s information criterion (finite sample<br />
corrected version)<br />
<strong>MIXED</strong> procedure, 3901, 3991<br />
alpha level<br />
<strong>MIXED</strong> procedure, 3899, 3915, 3919, 3923,<br />
3944<br />
anisotropic power covariance structure<br />
<strong>MIXED</strong> procedure, 3954<br />
anisotropic spatial power structure<br />
<strong>MIXED</strong> procedure, 3954<br />
ANTE(1) structure<br />
<strong>MIXED</strong> procedure, 3953<br />
ante-dependence structure<br />
<strong>MIXED</strong> procedure, 3953<br />
AR(1) structure<br />
<strong>MIXED</strong> procedure, 3953<br />
asymptotic covariance<br />
<strong>MIXED</strong> procedure, 3899<br />
at sign (@) operator<br />
<strong>MIXED</strong> procedure, 3977, 4049<br />
autoregressive moving-average structure<br />
<strong>MIXED</strong> procedure, 3953<br />
autoregressive structure<br />
example (<strong>MIXED</strong>), 4019<br />
<strong>MIXED</strong> procedure, 3953<br />
banded Toeplitz structure<br />
<strong>MIXED</strong> procedure, 3953<br />
bar (|) operator<br />
<strong>MIXED</strong> procedure, 3976, 3977, 4049<br />
Bayesian analysis<br />
<strong>MIXED</strong> procedure, 3939<br />
BLUE<br />
<strong>MIXED</strong> procedure, 3971<br />
BLUP<br />
<strong>MIXED</strong> procedure, 3971<br />
Bonferroni adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
boundary constraints<br />
<strong>MIXED</strong> procedure, 3938, 3939, 4005<br />
CALIS procedure<br />
compared to <strong>MIXED</strong> procedure, 3889<br />
chi-square test<br />
<strong>MIXED</strong> procedure, 3913, 3923<br />
class level<br />
<strong>MIXED</strong> procedure, 3903, 3989<br />
classification variables<br />
<strong>MIXED</strong> procedure, 3910<br />
compound symmetry structure<br />
example (<strong>MIXED</strong>), 3964, 4020, 4025<br />
<strong>MIXED</strong> procedure, 3953<br />
computational details<br />
<strong>MIXED</strong> procedure, 4004<br />
computational problems<br />
convergence (<strong>MIXED</strong>), 4005<br />
conditional residuals<br />
<strong>MIXED</strong> procedure, 3981<br />
confidence limits<br />
<strong>MIXED</strong> procedure, 3900<br />
constraints<br />
boundary (<strong>MIXED</strong>), 3938, 3939<br />
containment method<br />
<strong>MIXED</strong> procedure, 3924, 3925<br />
continuous-by-class effects<br />
<strong>MIXED</strong> procedure, 3978<br />
continuous-nesting-class effects<br />
<strong>MIXED</strong> procedure, 3978<br />
contrasts<br />
<strong>MIXED</strong> procedure, 3911, 3914<br />
convergence criterion<br />
<strong>MIXED</strong> procedure, 3899, 3900, 3990, 4006<br />
convergence problems<br />
<strong>MIXED</strong> procedure, 4005<br />
convergence status<br />
<strong>MIXED</strong> procedure, 3990<br />
Cook’s D<br />
<strong>MIXED</strong> procedure, 3985<br />
Cook’s D for covariance parameters<br />
<strong>MIXED</strong> procedure, 3985<br />
correlation<br />
estimates (<strong>MIXED</strong>), 3945, 3947, 3952, 4021<br />
covariance<br />
parameter estimates (<strong>MIXED</strong>), 3900, 3901<br />
parameter estimates, ratio (<strong>MIXED</strong>), 3909<br />
parameters (<strong>MIXED</strong>), 3886<br />
covariance parameter estimates<br />
<strong>MIXED</strong> procedure, 3991<br />
covariance structure<br />
anisotropic power (<strong>MIXED</strong>), 3960<br />
ante-dependence (<strong>MIXED</strong>), 3956
autoregressive (<strong>MIXED</strong>), 3957<br />
autoregressive moving-average (<strong>MIXED</strong>),<br />
3957<br />
banded (<strong>MIXED</strong>), 3960<br />
compound symmetry (<strong>MIXED</strong>), 3957<br />
equi-correlation (<strong>MIXED</strong>), 3957<br />
examples (<strong>MIXED</strong>), 3954<br />
exponential anisotropic (<strong>MIXED</strong>), 3958<br />
factor-analytic (<strong>MIXED</strong>), 3957<br />
general linear (<strong>MIXED</strong>), 3958<br />
heterogeneous autoregressive (<strong>MIXED</strong>),<br />
3957<br />
heterogeneous compound symmetry<br />
(<strong>MIXED</strong>), 3957<br />
heterogeneous Toeplitz (<strong>MIXED</strong>), 3960<br />
Huynh-Feldt (<strong>MIXED</strong>), 3958<br />
Kronecker (<strong>MIXED</strong>), 3961<br />
Matérn (<strong>MIXED</strong>), 3959<br />
<strong>MIXED</strong> procedure, 3888, 3953<br />
power (<strong>MIXED</strong>), 3960<br />
simple (<strong>MIXED</strong>), 3958<br />
spatial geometric anisotropic (<strong>MIXED</strong>),<br />
3959<br />
Toeplitz (<strong>MIXED</strong>), 3960<br />
unstructured (<strong>MIXED</strong>), 3960<br />
unstructured, correlation (<strong>MIXED</strong>), 3960<br />
variance components (<strong>MIXED</strong>), 3961<br />
covariance structures<br />
examples (<strong>MIXED</strong>), 4013<br />
covariates<br />
<strong>MIXED</strong> procedure, 3976<br />
CovRatio<br />
<strong>MIXED</strong> procedure, 3986<br />
CovRatio for covariance parameters<br />
<strong>MIXED</strong> procedure, 3986<br />
CovTrace<br />
<strong>MIXED</strong> procedure, 3986<br />
CovTrace for covariance parameters<br />
<strong>MIXED</strong> procedure, 3986<br />
crossed effects<br />
<strong>MIXED</strong> procedure, 3977<br />
default output<br />
<strong>MIXED</strong> procedure, 3989<br />
degrees of freedom<br />
between-within method (<strong>MIXED</strong>), 3901,<br />
3925<br />
containment method (<strong>MIXED</strong>), 3924, 3925<br />
Kenward-Roger method (GLIMMIX), 3927<br />
method (<strong>MIXED</strong>), 3924<br />
<strong>MIXED</strong> procedure, 3913, 3915, 3920, 3924<br />
residual method (<strong>MIXED</strong>), 3925<br />
Satterthwaite method (<strong>MIXED</strong>), 3925<br />
DFFITS<br />
<strong>MIXED</strong> procedure, 3985<br />
dimension information<br />
<strong>MIXED</strong> procedure, 3989<br />
dimensions<br />
<strong>MIXED</strong> procedure, 3903<br />
direct product structure<br />
<strong>MIXED</strong> procedure, 3953<br />
Dunnett’s adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
EBLUP<br />
<strong>MIXED</strong> procedure, 3934<br />
effect<br />
name length (<strong>MIXED</strong>), 3903<br />
empirical best linear unbiased prediction<br />
<strong>MIXED</strong> procedure, 3934<br />
empirical estimator<br />
<strong>MIXED</strong> procedure, 3901<br />
estimability<br />
<strong>MIXED</strong> procedure, 3911<br />
estimable functions<br />
<strong>MIXED</strong> procedure, 3933<br />
estimation<br />
mixed model (<strong>MIXED</strong>), 3968<br />
estimation methods<br />
<strong>MIXED</strong> procedure, 3902<br />
examples, <strong>MIXED</strong><br />
ASYCOV matrix, 4017<br />
asymptotic covariance of covariance<br />
parameters, 4017<br />
autoregressive structure, R-side, 4019<br />
box plots, 4070<br />
box plots, paneling, 3908<br />
broad inference space, 3912, 3914<br />
compound symmetry, G-side setup, 3965,<br />
4023<br />
compound symmetry, R-side setup, 3965,<br />
4020<br />
constrained anisotropic model, 3958<br />
covariates in LS-mean construction, 3919<br />
COVTEST option, 4014, 4026<br />
deletion estimates, 4055<br />
doubly repeated measure, 3961<br />
estimate, with subject, 3915<br />
fat absorption data, 4055<br />
fixed-effect solutions, 4041<br />
full-rank parameterization, 4018<br />
GDATA= option in RANDOM statement,<br />
4034<br />
geometrically anisotropic model, 3959<br />
getting started, 3890<br />
GLM procedure, split-plot design, 4011<br />
graphics, box plots, 4070<br />
graphics, influence diagnostics, 4055, 4066
graphics, residual panel, 3998<br />
graphics, studentized residual panel, 3998<br />
GROUP= effect in RANDOM statement,<br />
4023<br />
height data, 3890<br />
holding covariance parameters fixed, 3938,<br />
3958, 3959<br />
IML procedure, reading ASYCOV, 4017<br />
inference space, broad, 3912, 3914<br />
inference space, intermediate, 3914<br />
inference space, narrow, 3912, 3914<br />
inference spaces, 4012<br />
influence analysis, iterative, 4055, 4066<br />
influence analysis, non-iterative, 4065<br />
influence analysis, set deletion, 4065, 4066<br />
influence analysis, tuples, 3931<br />
intermediate inference space, 3914<br />
known covariance parameters, 3937<br />
known G and R matrix, 4034<br />
Kronecker covariance structure, 3961<br />
L-components, 3933, 4073, 4076<br />
least squares mean, slice, 3922<br />
least squares means, AT option, 3919<br />
least squares means, covariate, 3919<br />
least squares means, differences against<br />
control, 3920<br />
line-source sprinkler data, 4049<br />
local power-of-mean model, 3950<br />
maximum likelihood estimation, 4014<br />
mixed model equations, 4026, 4034<br />
mixed model equations, solution, 4026,<br />
4034<br />
multiple plot requests, 3909<br />
multiple traits data, 4033<br />
multivariate analysis, 3961<br />
narrow inference space, 3912, 3914<br />
nested error structure, 4045<br />
nested random effects, 3893<br />
NOITER option, 3937, 4034<br />
oven data (Hemmerle and Hartley, 1973),<br />
4026<br />
parameter grid search, 4026<br />
pharmaceutical stability data, 4041<br />
polynomial model, 4076<br />
POM data set, 3950<br />
POM fitting, iterated, 3951<br />
Pothoff and Roy growth measurements,<br />
4013, 4065<br />
random coefficient model, 4019, 4041, 4069<br />
random-effect solutions, 4041<br />
residual panel, 3998<br />
row-wise multiplicity adjustment, 3918<br />
Satterthwaite method, 3918<br />
set deletion, 4066<br />
SGRENDER procedure, 4032<br />
slice F test, 3922<br />
spatial power structure, 4054<br />
specifying lower bounds, 3938<br />
specifying values for degrees of freedom,<br />
3924<br />
split-plot design, 3966, 4009<br />
split-plot design, data, 4008, 4073<br />
split-plot design, equivalent model, 4013<br />
starting values, 4026<br />
studentized maximum modulus, 3918<br />
studentized residual panel, 3998<br />
subject and no-subject formulation, 3965<br />
subject contrasts, 3915<br />
subject v. no-subject formulation, 4013<br />
subject-specific R matrices, 3951<br />
subject-specific V matrices, 3947<br />
Toeplitxz structure, 4049<br />
tuples, influence analysis, 3931<br />
two-way analysis of variance, 3890<br />
unstructured covariance, G-side, 3947<br />
unstructured covariance, R-side, 4014, 4065<br />
varying covariance parameters, 4023<br />
exponential covariance structure<br />
<strong>MIXED</strong> procedure, 3954<br />
external studentization<br />
<strong>MIXED</strong> procedure, 3981<br />
factor analytic structures<br />
<strong>MIXED</strong> procedure, 3953<br />
Fisher information matrix<br />
example (<strong>MIXED</strong>), 4026<br />
<strong>MIXED</strong> procedure, 3991<br />
Fisher’s scoring method<br />
<strong>MIXED</strong> procedure, 3899, 3909, 4006<br />
fixed effects<br />
<strong>MIXED</strong> procedure, 3888<br />
fixed-effects parameters<br />
<strong>MIXED</strong> procedure, 3886, 3963<br />
G matrix<br />
<strong>MIXED</strong> procedure, 3888, 3943, 3944, 3963,<br />
3964, 4046<br />
gaussian covariance structure<br />
<strong>MIXED</strong> procedure, 3954<br />
general linear covariance structure<br />
<strong>MIXED</strong> procedure, 3953<br />
generalized inverse, 3971<br />
<strong>MIXED</strong> procedure, 3913<br />
GLM procedure<br />
compared to other procedures, 3889<br />
gradient<br />
<strong>MIXED</strong> procedure, 3900, 3990<br />
grid search
example (<strong>MIXED</strong>), 4026<br />
growth curve analysis<br />
example (<strong>MIXED</strong>), 3964<br />
Hannan-Quinn information criterion<br />
<strong>MIXED</strong> procedure, 3901<br />
Hessian matrix<br />
<strong>MIXED</strong> procedure, 3899, 3900, 3909, 3938,<br />
3990, 3991, 4005, 4006, 4017, 4026<br />
heterogeneity<br />
example (<strong>MIXED</strong>), 4023<br />
<strong>MIXED</strong> procedure, 3945, 3949<br />
heterogeneous<br />
AR(1) structure (<strong>MIXED</strong>), 3953<br />
compound-symmetry structure (<strong>MIXED</strong>),<br />
3953<br />
covariance structures (<strong>MIXED</strong>), 3962<br />
Toeplitz structure (<strong>MIXED</strong>), 3953<br />
hierarchical model<br />
example (<strong>MIXED</strong>), 4041<br />
Hotelling-Lawley-McKeon statistic<br />
<strong>MIXED</strong> procedure, 3949<br />
Hotelling-Lawley-Pillai-Samson statistic<br />
<strong>MIXED</strong> procedure, 3949<br />
Hsu’s adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
Huynh-Feldt<br />
structure (<strong>MIXED</strong>), 3953<br />
hypothesis tests<br />
mixed model (<strong>MIXED</strong>), 3972, 3992<br />
inference<br />
mixed model (<strong>MIXED</strong>), 3972<br />
space, mixed model (<strong>MIXED</strong>), 3911, 3912,<br />
3914, 4012<br />
infinite likelihood<br />
<strong>MIXED</strong> procedure, 3948, 4005, 4006<br />
influence diagnostics<br />
<strong>MIXED</strong> procedure, 3982<br />
influence diagnostics, details<br />
<strong>MIXED</strong> procedure, 3980<br />
influence plots<br />
<strong>MIXED</strong> procedure, 4000<br />
information criteria<br />
<strong>MIXED</strong> procedure, 3901<br />
initial values<br />
<strong>MIXED</strong> procedure, 3937<br />
interaction effects<br />
<strong>MIXED</strong> procedure, 3977<br />
intercept<br />
<strong>MIXED</strong> procedure, 3976<br />
internal studentization<br />
<strong>MIXED</strong> procedure, 3981<br />
intraclass correlation coefficient<br />
<strong>MIXED</strong> procedure, 4021<br />
iteration history<br />
<strong>MIXED</strong> procedure, 3990<br />
iterations<br />
history (<strong>MIXED</strong>), 3990<br />
Kenward-Roger method<br />
<strong>MIXED</strong> procedure, 3927<br />
Kronecker product structure<br />
<strong>MIXED</strong> procedure, 3953<br />
L matrices<br />
mixed model (<strong>MIXED</strong>), 3911, 3916, 3972<br />
<strong>MIXED</strong> procedure, 3911, 3916, 3972<br />
LATTICE procedure<br />
compared to <strong>MIXED</strong> procedure, 3889<br />
least squares means<br />
Bonferroni adjustment (<strong>MIXED</strong>), 3918<br />
BYLEVEL processing (<strong>MIXED</strong>), 3920<br />
comparison types (<strong>MIXED</strong>), 3920<br />
covariate values (<strong>MIXED</strong>), 3919<br />
Dunnett’s adjustment (<strong>MIXED</strong>), 3918<br />
examples (<strong>MIXED</strong>), 4026, 4050<br />
Hsu’s adjustment (<strong>MIXED</strong>), 3918<br />
mixed model (<strong>MIXED</strong>), 3916<br />
multiple comparison adjustment (<strong>MIXED</strong>),<br />
3917, 3918<br />
nonstandard weights (<strong>MIXED</strong>), 3921<br />
observed margins (<strong>MIXED</strong>), 3921<br />
Sidak’s adjustment (<strong>MIXED</strong>), 3918<br />
simple effects (<strong>MIXED</strong>), 3922<br />
simulation-based adjustment (<strong>MIXED</strong>),<br />
3918<br />
Tukey’s adjustment (<strong>MIXED</strong>), 3918<br />
leverage<br />
<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />
likelihood distance<br />
<strong>MIXED</strong> procedure, 3987<br />
likelihood ratio test, 4011<br />
example (<strong>MIXED</strong>), 4025<br />
mixed model (<strong>MIXED</strong>), 3972, 3973<br />
<strong>MIXED</strong> procedure, 3992<br />
linear covariance structure<br />
<strong>MIXED</strong> procedure, 3953<br />
log-linear variance model<br />
<strong>MIXED</strong> procedure, 3950<br />
main effects<br />
<strong>MIXED</strong> procedure, 3976<br />
marginal residuals<br />
<strong>MIXED</strong> procedure, 3981<br />
Matérn covariance structure<br />
<strong>MIXED</strong> procedure, 3953<br />
matrix<br />
notation, theory (<strong>MIXED</strong>), 3962
maximum likelihood estimation<br />
mixed model (<strong>MIXED</strong>), 3969<br />
MDFFITS<br />
<strong>MIXED</strong> procedure, 3985<br />
MDFFITS for covariance parameters<br />
<strong>MIXED</strong> procedure, 3986<br />
memory requirements<br />
<strong>MIXED</strong> procedure, 4007<br />
missing level combinations<br />
<strong>MIXED</strong> procedure, 3980<br />
mixed model (<strong>MIXED</strong>), see also <strong>MIXED</strong><br />
procedure<br />
estimation, 3968<br />
formulation, 3963<br />
hypothesis tests, 3972, 3992<br />
inference, 3972<br />
inference space, 3911, 3912, 3914, 4012<br />
least squares means, 3916<br />
likelihood ratio test, 3972, 3973<br />
linear model, 3886<br />
maximum likelihood estimation, 3969<br />
notation, 3888<br />
objective function, 3990<br />
parameterization, 3975<br />
predicted values, 3916<br />
restricted maximum likelihood, 4010<br />
theory, 3962<br />
Wald test, 3972, 4017<br />
mixed model equations<br />
example (<strong>MIXED</strong>), 4026<br />
<strong>MIXED</strong> procedure, 3903, 3970<br />
<strong>MIXED</strong> <strong>Procedure</strong><br />
leverage, 3984<br />
PRESS Residual, 3984<br />
PRESS Statistic, 3984<br />
<strong>MIXED</strong> procedure, see also mixed model<br />
2D geometric anisotropic structure, 3953<br />
Akaike’s information criterion, 3901, 3970,<br />
3991<br />
Akaike’s information criterion (finite sample<br />
corrected version), 3901, 3991<br />
alpha level, 3899, 3915, 3919, 3923, 3944<br />
anisotropic power covariance structure, 3954<br />
anisotropic spatial power structure, 3954<br />
ANTE(1) structure, 3953<br />
ante-dependence structure, 3953<br />
AR(1) structure, 3953<br />
ARIMA procedure, compared, 3889<br />
ARMA structure, 3953<br />
assumptions, 3886<br />
asymptotic covariance, 3899<br />
AUTOREG procedure, compared, 3889<br />
autoregressive moving-average structure,<br />
3953<br />
autoregressive structure, 3953, 4019<br />
banded Toeplitz structure, 3953<br />
basic features, 3887<br />
Bayesian analysis, 3939<br />
between-within method, 3901, 3925<br />
BLUE, 3971<br />
BLUP, 3971, 4040<br />
Bonferroni adjustment, 3918<br />
boundary constraints, 3938, 3939, 4005<br />
BYLEVEL processing of LSMEANS, 3920<br />
CALIS procedure, compared, 3889<br />
chi-square test, 3913, 3923<br />
Cholesky root, 3935, 3982, 4004<br />
class level, 3903, 3989<br />
classification variables, 3910<br />
compound symmetry structure, 3953, 3964,<br />
4020, 4025<br />
computational details, 4004<br />
computational order, 4004<br />
conditional residuals, 3981<br />
confidence interval, 3915, 3944<br />
confidence limits, 3900, 3915, 3920, 3923,<br />
3944<br />
containment method, 3924, 3925<br />
continuous effects, 3945, 3946, 3949, 3952<br />
continuous-by-class effects, 3978<br />
continuous-nesting-class effects, 3978<br />
contrasted <strong>SAS</strong> procedures, 3889<br />
contrasts, 3911, 3914<br />
convergence criterion, 3899, 3900, 3990,<br />
4006<br />
convergence problems, 4005<br />
convergence status, 3990<br />
Cook’s D, 3985<br />
Cook’s D for covariance parameters, 3985<br />
correlation estimates, 3945, 3947, 3952,<br />
4021<br />
correlations of least squares means, 3920<br />
covariance parameter estimates, 3900, 3901,<br />
3991<br />
covariance parameter estimates, ratio, 3909<br />
covariance parameters, 3886<br />
covariance structure, 3888, 3953, 3954,<br />
4013<br />
covariances of least squares means, 3920<br />
covariate values for LSMEANS, 3919<br />
covariates, 3976<br />
CovRatio, 3986<br />
CovRatio for covariance parameters, 3986<br />
CovTrace, 3986<br />
CovTrace for covariance parameters, 3986<br />
CPU requirements, 4007<br />
crossed effects, 3977<br />
default output, 3989
degrees of freedom, 3912–3916, 3920, 3924,<br />
3936, 3973, 3980, 3992, 4005, 4040<br />
DFFITS, 3985<br />
dimension information, 3989<br />
dimensions, 3902, 3903<br />
direct product structure, 3953<br />
Dunnett’s adjustment, 3918<br />
EBLUPs, 3946, 3971, 4032, 4047<br />
effect name length, 3903<br />
empirical best linear unbiased prediction,<br />
3934<br />
empirical estimator, 3901<br />
estimability, 3911, 3913–3916, 3921, 3936,<br />
3972, 3980<br />
estimable functions, 3933<br />
estimation methods, 3902<br />
exponential covariance structure, 3954<br />
factor analytic structures, 3953<br />
Fisher information matrix, 3991, 4026<br />
Fisher’s scoring method, 3899, 3909, 4006<br />
fitting information, 3991, 3992<br />
fixed effects, 3888<br />
fixed-effects parameters, 3886, 3936, 3963<br />
fixed-effects variance matrix, 3936<br />
function evaluations, 3902<br />
G matrix, 3888, 3943, 3944, 3963, 3964,<br />
4046<br />
gaussian covariance structure, 3954<br />
general linear covariance structure, 3953<br />
generalized inverse, 3913, 3971<br />
GLIMMIX procedure, compared, 3890<br />
gradient, 3900, 3990<br />
grid search, 3937, 4026<br />
growth curve analysis, 3964<br />
Hannan-Quinn information criterion, 3901<br />
Hessian matrix, 3899, 3900, 3909, 3938,<br />
3990, 3991, 4005, 4006, 4017, 4026<br />
heterogeneity, 3945, 3949, 4023<br />
heterogeneous AR(1) structure, 3953<br />
heterogeneous compound-symmetry<br />
structure, 3953<br />
heterogeneous covariance structures, 3962<br />
heterogeneous Toeplitz structure, 3953<br />
hierarchical model, 4041<br />
Hotelling-Lawley-McKeon statistic, 3949<br />
Hotelling-Lawley-Pillai-Sampson statistic,<br />
3949<br />
Hsu’s adjustment, 3918<br />
Huynh-Feldt structure, 3953<br />
infinite likelihood, 3948, 4005, 4006<br />
influence diagnostics, 3932, 3982<br />
influence plots, 4000<br />
information criteria, 3901<br />
initial values, 3937<br />
input data sets, 3901<br />
interaction effects, 3977<br />
intercept, 3976<br />
intercept effect, 3934, 3943<br />
intraclass correlation coefficient, 4021<br />
introductory example, 3890<br />
iteration history, 3990<br />
iterations, 3902, 3990<br />
Kenward-Roger method, 3927<br />
Kronecker product structure, 3953<br />
LATTICE procedure, compared, 3889<br />
least squares means, 3920, 4026, 4050<br />
leave-one-out-estimates, 4000<br />
likelihood distance, 3987<br />
likelihood ratio test, 3992<br />
linear covariance structure, 3953<br />
log-linear variance model, 3950<br />
main effects, 3976<br />
marginal residuals, 3981<br />
Matérn covariance structure, 3953<br />
matrix notation, 3962<br />
MDFFITS, 3985<br />
MDFFITS for covariance parameters, 3986<br />
memory requirements, 4007<br />
missing level combinations, 3980<br />
mixed linear model, 3886<br />
mixed model, 3963<br />
mixed model equations, 3903, 3970, 4026<br />
mixed model theory, 3962<br />
model information, 3903, 3989<br />
model selection, 3970<br />
multilevel model, 4041<br />
multiple comparisons of least squares<br />
means, 3917, 3918, 3920<br />
multiple tables, 3995<br />
multiplicity adjustment, 3917<br />
multivariate tests, 3949<br />
nested effects, 3977<br />
nested error structure, 4045<br />
NESTED procedure, compared, 3889<br />
Newton-Raphson algorithm, 3969<br />
non-full-rank parameterization, 3889, 3950,<br />
3980<br />
nonstandard weights for LSMEANS, 3921<br />
nugget effect, 3950<br />
number of observations, 3989<br />
oblique projector, 3984<br />
observed margins for LSMEANS, 3921<br />
ODS graph names, 4002<br />
ODS Graphics, 3905, 3998<br />
ODS table names, 3993<br />
ordering of effects, 3904, 3979<br />
over-parameterization, 3976<br />
parameter constraints, 3938, 4005
parameterization, 3975<br />
Pearson residual, 3935<br />
pharmaceutical stability, example, 4041<br />
plotting the likelihood, 4032<br />
polynomial effects, 3976<br />
power-of-the-mean model, 3950<br />
predicted means, 3935<br />
predicted value confidence intervals, 3923<br />
predicted values, 3934, 4026<br />
prior density, 3940<br />
profiling residual variance, 3904, 3938,<br />
3950, 3969, 4004<br />
R matrix, 3888, 3948, 3951, 3963, 3964<br />
random coefficients, 4019, 4041<br />
random effects, 3888, 3943<br />
random-effects parameters, 3887, 3946,<br />
3963<br />
regression effects, 3976<br />
rejection sampling, 3941<br />
repeated measures, 3887, 3948, 4013<br />
residual diagnostics, details, 3980<br />
residual method, 3925<br />
residual plots, 3998<br />
residual variance tolerance, 3935<br />
restricted maximum likelihood (REML),<br />
3887<br />
ridging, 3909, 3969<br />
sandwich estimator, 3901<br />
Satterthwaite method, 3925<br />
scaled residual, 3936, 3982<br />
Schwarz’s Bayesian information criterion,<br />
3901, 3970, 3991<br />
scoring, 3899, 3909, 4006<br />
Sidak’s adjustment, 3918<br />
simple effects, 3922<br />
simulation-based adjustment, 3918<br />
singularities, 4006<br />
spatial anisotropic exponential structure,<br />
3953<br />
spatial covariance structure, 3954, 3962,<br />
4005<br />
split-plot design, 3966, 4008<br />
standard linear model, 3888<br />
statement positions, 3896<br />
studentized residual, 3935, 3985<br />
subject effect, 3912, 3946, 3952, 4007, 4013<br />
summary of commands, 3897<br />
sweep operator, 3985, 4004<br />
table names, 3993<br />
test components, 3933<br />
Toeplitz structure, 3953, 4050<br />
TSCSREG procedure, compared, 3889<br />
Tukey’s adjustment, 3918<br />
Type 1 estimation, 3902<br />
Type 1 testing, 3928<br />
Type 2 estimation, 3902<br />
Type 2 testing, 3928<br />
Type 3 estimation, 3902<br />
Type 3 testing, 3928, 3992<br />
unstructured correlations, 3953<br />
unstructured covariance matrix, 3953<br />
unstructured R matrix, 3952<br />
V matrix, 3947<br />
VARCOMP procedure, example, 4026<br />
variance components, 3887, 3953<br />
variance ratios, 3938, 3946<br />
Wald test, 3991, 3992<br />
weighted LSMEANS, 3921<br />
weighting, 3962<br />
zero design columns, 3927<br />
zero variance component estimates, 4005<br />
model<br />
information (<strong>MIXED</strong>), 3903<br />
model information<br />
<strong>MIXED</strong> procedure, 3989<br />
model selection<br />
<strong>MIXED</strong> procedure, 3970<br />
multilevel model<br />
example (<strong>MIXED</strong>), 4041<br />
multiple comparison adjustment (<strong>MIXED</strong>)<br />
least squares means, 3917, 3918<br />
multiple comparisons of least squares means<br />
<strong>MIXED</strong> procedure, 3917, 3918, 3920<br />
multiple tables<br />
<strong>MIXED</strong> procedure, 3995<br />
multiplicity adjustment<br />
<strong>MIXED</strong> procedure, 3917<br />
row-wise (<strong>MIXED</strong>), 3917<br />
multivariate tests<br />
<strong>MIXED</strong> procedure, 3949<br />
nested effects<br />
<strong>MIXED</strong> procedure, 3977<br />
nested error structure<br />
<strong>MIXED</strong> procedure, 4045<br />
NESTED procedure<br />
compared to other procedures, 3889<br />
Newton-Raphson algorithm<br />
<strong>MIXED</strong> procedure, 3969<br />
non-full-rank parameterization<br />
<strong>MIXED</strong> procedure, 3889, 3950, 3980<br />
nugget effect<br />
<strong>MIXED</strong> procedure, 3950<br />
number of observations<br />
<strong>MIXED</strong> procedure, 3989<br />
objective function<br />
mixed model (<strong>MIXED</strong>), 3990
oblique projector<br />
<strong>MIXED</strong> procedure, 3984<br />
ODS graph names<br />
<strong>MIXED</strong> procedure, 4002<br />
ODS Graphics<br />
<strong>MIXED</strong> procedure, 3905, 3998<br />
options summary<br />
LSMEANS statement, (<strong>MIXED</strong>), 3916<br />
MODEL statement (<strong>MIXED</strong>), 3922<br />
PROC <strong>MIXED</strong> statement, 3898<br />
RANDOM statement (<strong>MIXED</strong>), 3943<br />
REPEATED statement (<strong>MIXED</strong>), 3948<br />
over-parameterization<br />
<strong>MIXED</strong> procedure, 3976<br />
parameter constraints<br />
<strong>MIXED</strong> procedure, 3938, 4005<br />
parameterization<br />
mixed model (<strong>MIXED</strong>), 3975<br />
<strong>MIXED</strong> procedure, 3975<br />
Pearson residual<br />
<strong>MIXED</strong> procedure, 3935<br />
pharmaceutical stability<br />
example (<strong>MIXED</strong>), 4041<br />
plots<br />
likelihood (<strong>MIXED</strong>), 4032<br />
polynomial effects<br />
<strong>MIXED</strong> procedure, 3976<br />
power-of-the-mean model<br />
<strong>MIXED</strong> procedure, 3950<br />
predicted means<br />
<strong>MIXED</strong> procedure, 3935<br />
predicted value confidence intervals<br />
<strong>MIXED</strong> procedure, 3923<br />
predicted values<br />
example (<strong>MIXED</strong>), 4026<br />
mixed model (<strong>MIXED</strong>), 3916<br />
<strong>MIXED</strong> procedure, 3934<br />
PRESS residual<br />
<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />
PRESS statistic<br />
<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />
prior density<br />
<strong>MIXED</strong> procedure, 3940<br />
profiling residual variance<br />
<strong>MIXED</strong> procedure, 4004<br />
R matrix<br />
<strong>MIXED</strong> procedure, 3888, 3948, 3951, 3963,<br />
3964<br />
random coefficients<br />
example (<strong>MIXED</strong>), 4019, 4041<br />
random effects<br />
<strong>MIXED</strong> procedure, 3888, 3943<br />
random-effects parameters<br />
<strong>MIXED</strong> procedure, 3887, 3963<br />
regression effects<br />
<strong>MIXED</strong> procedure, 3976<br />
rejection sampling<br />
<strong>MIXED</strong> procedure, 3941<br />
REML, see restricted maximum likelihood<br />
repeated measures<br />
<strong>MIXED</strong> procedure, 3887, 3948, 4013<br />
residual maximum likelihood, see also restricted<br />
maximum likelihood<br />
residual maximum likelihood (REML)<br />
<strong>MIXED</strong> procedure, 3969, 4010<br />
residual plots<br />
<strong>MIXED</strong> procedure, 3998<br />
residuals, details<br />
<strong>MIXED</strong> procedure, 3980<br />
restricted maximum likelihood<br />
<strong>MIXED</strong> procedure, 3887, 3969, 4010<br />
ridging<br />
<strong>MIXED</strong> procedure, 3909, 3969<br />
sandwich estimator<br />
<strong>MIXED</strong> procedure, 3901<br />
Satterthwaite method<br />
<strong>MIXED</strong> procedure, 3925<br />
scaled residual<br />
<strong>MIXED</strong> procedure, 3936, 3982<br />
Schwarz’s Bayesian information criterion<br />
example (<strong>MIXED</strong>), 4011, 4025, 4054<br />
<strong>MIXED</strong> procedure, 3901, 3970, 3991<br />
scoring<br />
<strong>MIXED</strong> procedure, 3899, 3909, 4006<br />
Sidak’s adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
simple effects<br />
<strong>MIXED</strong> procedure, 3922<br />
simulation-based adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
singularities<br />
<strong>MIXED</strong> procedure, 4006<br />
spatial anisotropic exponential structure<br />
<strong>MIXED</strong> procedure, 3953<br />
spatial covariance structure<br />
examples (<strong>MIXED</strong>), 3954<br />
<strong>MIXED</strong> procedure, 3954, 3962, 4005<br />
split-plot design<br />
<strong>MIXED</strong> procedure, 3966, 4008<br />
standard linear model<br />
<strong>MIXED</strong> procedure, 3888<br />
studentized residual<br />
external, 3985<br />
internal, 3985<br />
<strong>MIXED</strong> procedure, 3935, 3985
subject effect<br />
<strong>MIXED</strong> procedure, 3912, 3946, 3952, 4007,<br />
4013<br />
summary of commands<br />
<strong>MIXED</strong> procedure, 3897<br />
table names<br />
<strong>MIXED</strong> procedure, 3993<br />
test components<br />
<strong>MIXED</strong> procedure, 3933<br />
Toeplitz structure<br />
example (<strong>MIXED</strong>), 4050<br />
<strong>MIXED</strong> procedure, 3953<br />
Tukey’s adjustment<br />
<strong>MIXED</strong> procedure, 3918<br />
Type 1 estimation<br />
<strong>MIXED</strong> procedure, 3902<br />
Type 1 testing<br />
<strong>MIXED</strong> procedure, 3928<br />
Type 2 estimation<br />
<strong>MIXED</strong> procedure, 3902<br />
Type 2 testing<br />
<strong>MIXED</strong> procedure, 3928<br />
Type 3 estimation<br />
<strong>MIXED</strong> procedure, 3902<br />
Type 3 testing<br />
<strong>MIXED</strong> procedure, 3928, 3992<br />
unstructured correlations<br />
<strong>MIXED</strong> procedure, 3953<br />
unstructured covariance matrix<br />
<strong>MIXED</strong> procedure, 3953<br />
V matrix<br />
<strong>MIXED</strong> procedure, 3947<br />
VARCOMP procedure<br />
compared to <strong>MIXED</strong> procedure, 3889<br />
example (<strong>MIXED</strong>), 4026<br />
variance components<br />
<strong>MIXED</strong> procedure, 3887, 3953<br />
variance ratios<br />
<strong>MIXED</strong> procedure, 3938, 3946<br />
Wald test<br />
mixed model (<strong>MIXED</strong>), 3972, 4017<br />
<strong>MIXED</strong> procedure, 3991, 3992<br />
weighting<br />
<strong>MIXED</strong> procedure, 3962<br />
zero variance component estimates<br />
<strong>MIXED</strong> procedure, 4005
Syntax Index<br />
ABSOLUTE option<br />
PROC <strong>MIXED</strong> statement, 3899, 3990<br />
ADJDFE= option<br />
LSMEANS statement (<strong>MIXED</strong>), 3917<br />
ADJUST= option<br />
LSMEANS statement (<strong>MIXED</strong>), 3918<br />
ALG= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
ALPHA= option<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement (<strong>MIXED</strong>), 3919<br />
PROC <strong>MIXED</strong> statement, 3899<br />
RANDOM statement (<strong>MIXED</strong>), 3944<br />
ALPHAP= option<br />
MODEL statement (<strong>MIXED</strong>), 3923<br />
ANOVAF option<br />
PROC <strong>MIXED</strong> statement, 3899<br />
ASYCORR option<br />
PROC <strong>MIXED</strong> statement, 3899<br />
ASYCOV option<br />
PROC <strong>MIXED</strong> statement, 3899, 4026<br />
AT MEANS option<br />
LSMEANS statement (<strong>MIXED</strong>), 3919<br />
AT option<br />
LSMEANS statement (<strong>MIXED</strong>), 3919, 3920<br />
BDATA= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
BY statement<br />
<strong>MIXED</strong> procedure, 3910<br />
BYLEVEL option<br />
LSMEANS statement (<strong>MIXED</strong>), 3920, 3921<br />
CHISQ option<br />
CONTRAST statement (<strong>MIXED</strong>), 3913<br />
MODEL statement (<strong>MIXED</strong>), 3923<br />
CL option<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement (<strong>MIXED</strong>), 3920<br />
MODEL statement (<strong>MIXED</strong>), 3923<br />
RANDOM statement (<strong>MIXED</strong>), 3944<br />
CL= option<br />
PROC <strong>MIXED</strong> statement, 3900<br />
CLASS statement<br />
<strong>MIXED</strong> procedure, 3910, 3989<br />
CONTAIN option<br />
MODEL statement (<strong>MIXED</strong>), 3924, 3925<br />
CONTRAST statement<br />
<strong>MIXED</strong> procedure, 3911<br />
CONVF option<br />
PROC <strong>MIXED</strong> statement, 3900, 3990<br />
CONVG option<br />
PROC <strong>MIXED</strong> statement, 3900, 3990<br />
CONVH option<br />
PROC <strong>MIXED</strong> statement, 3900, 3990<br />
CORR option<br />
LSMEANS statement (<strong>MIXED</strong>), 3920<br />
CORRB option<br />
MODEL statement (<strong>MIXED</strong>), 3924<br />
COV option<br />
LSMEANS statement (<strong>MIXED</strong>), 3920<br />
COVB option<br />
MODEL statement (<strong>MIXED</strong>), 3924<br />
COVBI option<br />
MODEL statement (<strong>MIXED</strong>), 3924<br />
COVTEST option<br />
PROC <strong>MIXED</strong> statement, 3901, 3991<br />
DATA= option<br />
PRIOR statement (<strong>MIXED</strong>), 3940<br />
PROC <strong>MIXED</strong> statement, 3901<br />
DDF= option<br />
MODEL statement (<strong>MIXED</strong>), 3924<br />
DDFM= option<br />
MODEL statement (<strong>MIXED</strong>), 3924<br />
DF= option<br />
CONTRAST statement (<strong>MIXED</strong>), 3913<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement (<strong>MIXED</strong>), 3920<br />
DFBW option<br />
PROC <strong>MIXED</strong> statement, 3901<br />
DIFF option<br />
LSMEANS statement (<strong>MIXED</strong>), 3920<br />
DIVISOR= option<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
E option<br />
CONTRAST statement (<strong>MIXED</strong>), 3913<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement (<strong>MIXED</strong>), 3921<br />
MODEL statement (<strong>MIXED</strong>), 3927<br />
E1 option<br />
MODEL statement (<strong>MIXED</strong>), 3927<br />
E2 option<br />
MODEL statement (<strong>MIXED</strong>), 3927<br />
E3 option<br />
MODEL statement (<strong>MIXED</strong>), 3927<br />
EFFECT= modifier
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3929<br />
EMPIRICAL option<br />
<strong>MIXED</strong>, 3901<br />
EQCONS= option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
ESTIMATE statement<br />
<strong>MIXED</strong> procedure, 3914<br />
ESTIMATES modifier<br />
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3929<br />
FLAT option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
FULLX option<br />
MODEL statement (<strong>MIXED</strong>), 3919, 3927<br />
G option<br />
RANDOM statement (<strong>MIXED</strong>), 3944<br />
GC option<br />
RANDOM statement (<strong>MIXED</strong>), 3944<br />
GCI option<br />
RANDOM statement (<strong>MIXED</strong>), 3944<br />
GCORR option<br />
RANDOM statement (<strong>MIXED</strong>), 3945<br />
GDATA= option<br />
RANDOM statement (<strong>MIXED</strong>), 3945<br />
GI option<br />
RANDOM statement (<strong>MIXED</strong>), 3945<br />
GRID= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
GRIDT= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
GROUP option<br />
CONTRAST statement (<strong>MIXED</strong>), 3913<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
GROUP= option<br />
RANDOM statement (<strong>MIXED</strong>), 3945<br />
REPEATED statement (<strong>MIXED</strong>), 3949<br />
HLM option<br />
REPEATED statement (<strong>MIXED</strong>), 3949<br />
HLPS option<br />
REPEATED statement (<strong>MIXED</strong>), 3949<br />
HOLD= option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
HTYPE= option<br />
MODEL statement (<strong>MIXED</strong>), 3928<br />
IC option<br />
PROC <strong>MIXED</strong> statement, 3901<br />
ID statement<br />
<strong>MIXED</strong> procedure, 3916<br />
IFACTOR= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
INFLUENCE option<br />
MODEL statement (<strong>MIXED</strong>), 3928<br />
INFO option<br />
PROC <strong>MIXED</strong> statement, 3902<br />
INTERCEPT option<br />
MODEL statement (<strong>MIXED</strong>), 3933<br />
ITDETAILS option<br />
PROC <strong>MIXED</strong> statement, 3902<br />
ITER= modifier<br />
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3929<br />
JEFFREYS option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
KEEP= modifier<br />
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3930<br />
LCOMPONENTS option<br />
MODEL statement (<strong>MIXED</strong>), 3933<br />
LDATA= option<br />
RANDOM statement (<strong>MIXED</strong>), 3945<br />
REPEATED statement (<strong>MIXED</strong>), 3950<br />
LOCAL= option<br />
REPEATED statement (<strong>MIXED</strong>), 3950<br />
LOCALW option<br />
REPEATED statement (<strong>MIXED</strong>), 3951<br />
LOGDETH option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
LOGNOTE option<br />
PROC <strong>MIXED</strong> statement, 3902<br />
LOGNOTE= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
LOGRBOUND= option<br />
PRIOR statement (<strong>MIXED</strong>), 3941<br />
LOWERB= option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
LOWERTAILED option<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement<br />
<strong>MIXED</strong> procedure, 3916<br />
MAXFUNC= option<br />
PROC <strong>MIXED</strong> statement, 3902<br />
MAXITER= option<br />
PROC <strong>MIXED</strong> statement, 3902<br />
METHOD= option<br />
PROC <strong>MIXED</strong> statement, 3902, 4014<br />
<strong>MIXED</strong> procedure, 3896<br />
INFLUENCE option, 3928<br />
syntax, 3896<br />
<strong>MIXED</strong> procedure, BY statement, 3910<br />
<strong>MIXED</strong> procedure, CLASS statement, 3910,<br />
3989
TRUNCATE option, 3911<br />
<strong>MIXED</strong> procedure, CONTRAST statement, 3911<br />
CHISQ option, 3913<br />
DF= option, 3913<br />
E option, 3913<br />
GROUP option, 3913<br />
SINGULAR= option, 3914<br />
SUBJECT option, 3914<br />
<strong>MIXED</strong> procedure, ESTIMATE statement, 3914<br />
ALPHA= option, 3915<br />
CL option, 3915<br />
DF= option, 3915<br />
DIVISOR= option, 3915<br />
E option, 3915<br />
GROUP option, 3915<br />
LOWERTAILED option, 3915<br />
SINGULAR= option, 3915<br />
SUBJECT option, 3915<br />
UPPERTAILED option, 3916<br />
<strong>MIXED</strong> procedure, ID statement, 3916<br />
<strong>MIXED</strong> procedure, LSMEANS statement, 3916,<br />
4026<br />
ADJUST= option, 3918<br />
ALPHA= option, 3919<br />
AT MEANS option, 3919<br />
AT option, 3919, 3920<br />
BYLEVEL option, 3920, 3921<br />
CL option, 3920<br />
CORR option, 3920<br />
COV option, 3920<br />
DF= option, 3920<br />
DIFF option, 3920<br />
E option, 3921<br />
OBSMARGINS option, 3921<br />
PDIFF option, 3920, 3921<br />
SINGULAR= option, 3922<br />
SLICE= option, 3922<br />
<strong>MIXED</strong> procedure, MODEL statement, 3922<br />
ALPHAP= option, 3923<br />
CHISQ option, 3923<br />
CL option, 3923<br />
CONTAIN option, 3924, 3925<br />
CORRB option, 3924<br />
COVB option, 3924<br />
COVBI option, 3924<br />
DDF= option, 3924<br />
DDFM= option, 3924<br />
E option, 3927<br />
E1 option, 3927<br />
E2 option, 3927<br />
E3 option, 3927<br />
FULLX option, 3919, 3927<br />
HTYPE= option, 3928<br />
INFLUENCE option, 3928<br />
INTERCEPT option, 3933<br />
LCOMPONENTS option, 3933<br />
NOCONTAIN option, 3934<br />
NOINT option, 3934, 3976<br />
NOTEST option, 3934<br />
ORDER= option, 3980<br />
OUTP= option, 4026<br />
OUTPRED= option, 3934<br />
OUTPREDM= option, 3935<br />
RESIDUAL option, 3935, 3982<br />
SINGCHOL= option, 3935<br />
SINGRES= option, 3935<br />
SINGULAR= option, 3936<br />
SOLUTION option, 3936, 3980<br />
VCIRY option, 3936, 3982<br />
XPVIX option, 3936<br />
XPVIXI option, 3936<br />
ZETA= option, 3936<br />
<strong>MIXED</strong> procedure, MODEL statement,<br />
INFLUENCE option<br />
EFFECT=, 3929<br />
ESTIMATES, 3929<br />
ITER=, 3929<br />
KEEP=, 3930<br />
SELECT=, 3930<br />
SIZE=, 3931<br />
<strong>MIXED</strong> procedure, PARMS statement, 3937,<br />
4026<br />
EQCONS= option, 3938<br />
HOLD= option, 3938<br />
LOGDETH option, 3938<br />
LOWERB= option, 3938<br />
NOBOUND option, 3938<br />
NOITER option, 3938<br />
NOPROFILE option, 3938<br />
OLS option, 3939<br />
PARMSDATA= option, 3939<br />
PDATA= option, 3939<br />
RATIOS option, 3939<br />
UPPERB= option, 3939<br />
<strong>MIXED</strong> procedure, PRIOR statement, 3939<br />
ALG= option, 3941<br />
BDATA= option, 3941<br />
DATA= option, 3940<br />
FLAT option, 3941<br />
GRID= option, 3941<br />
GRIDT= option, 3941<br />
IFACTOR= option, 3941<br />
JEFFREYS option, 3941<br />
LOGNOTE= option, 3941<br />
LOGRBOUND= option, 3941<br />
NSAMPLE= option, 3942<br />
NSEARCH= option, 3942<br />
OUT= option, 3942
OUTG= option, 3942<br />
OUTGT= option, 3942<br />
PSEARCH option, 3942<br />
PTRANS option, 3942<br />
SEED= option, 3942<br />
SFACTOR= option, 3942<br />
TDATA= option, 3943<br />
TRANS= option, 3943<br />
UPDATE= option, 3943<br />
<strong>MIXED</strong> procedure, PROC <strong>MIXED</strong> statement,<br />
3898<br />
ABSOLUTE option, 3899, 3990<br />
ALPHA= option, 3899<br />
ANOVAF option, 3899<br />
ASYCORR option, 3899<br />
ASYCOV option, 3899, 4026<br />
CL= option, 3900<br />
CONVF option, 3900, 3990<br />
CONVG option, 3900, 3990<br />
CONVH option, 3900, 3990<br />
COVTEST option, 3901, 3991<br />
DATA= option, 3901<br />
DFBW option, 3901<br />
IC option, 3901<br />
INFO option, 3902<br />
ITDETAILS option, 3902<br />
LOGNOTE option, 3902<br />
MAXFUNC= option, 3902<br />
MAXITER= option, 3902<br />
METHOD= option, 3902, 4014<br />
MMEQ option, 3903, 4026<br />
MMEQSOL option, 3903, 4026<br />
NAMELEN= option, 3903<br />
NOBOUND option, 3903<br />
NOCLPRINT option, 3903<br />
NOINFO option, 3903<br />
NOITPRINT option, 3903<br />
NOPROFILE option, 3904, 3969<br />
ORD option, 3904<br />
ORDER= option, 3904, 3976<br />
PLOT option, 3905<br />
PLOTS option, 3905<br />
RATIO option, 3909, 3991<br />
RIDGE= option, 3909<br />
SCORING= option, 3909<br />
SIGITER option, 3909<br />
UPDATE option, 3909<br />
<strong>MIXED</strong> procedure, RANDOM statement, 3889,<br />
3943, 4008<br />
ALPHA= option, 3944<br />
CL option, 3944<br />
G option, 3944<br />
GC option, 3944<br />
GCI option, 3944<br />
GCORR option, 3945<br />
GDATA= option, 3945<br />
GI option, 3945<br />
GROUP= option, 3945<br />
LDATA= option, 3945<br />
NOFULLZ option, 3946<br />
RATIOS option, 3946<br />
SOLUTION option, 3946<br />
SUBJECT= option, 3912, 3946<br />
TYPE= option, 3946<br />
V option, 3947<br />
VC option, 3947<br />
VCI option, 3947<br />
VCORR option, 3947<br />
VI option, 3947<br />
<strong>MIXED</strong> procedure, REPEATED statement, 3889,<br />
3948, 4013<br />
GROUP= option, 3949<br />
HLM option, 3949<br />
HLPS option, 3949<br />
LDATA= option, 3950<br />
LOCAL= option, 3950<br />
LOCALW option, 3951<br />
NONLOCALW option, 3951<br />
R option, 3951<br />
RC option, 3952<br />
RCI option, 3952<br />
RCORR option, 3952<br />
RI option, 3952<br />
SSCP option, 3952<br />
SUBJECT= option, 3952<br />
TYPE= option, 3953<br />
<strong>MIXED</strong> procedure, WEIGHT statement, 3962<br />
MMEQ option<br />
PROC <strong>MIXED</strong> statement, 3903, 4026<br />
MMEQSOL option<br />
PROC <strong>MIXED</strong> statement, 3903, 4026<br />
MODEL statement<br />
<strong>MIXED</strong> procedure, 3922<br />
Modifiers of INFLUENCE option<br />
MODEL statement (<strong>MIXED</strong>), 3928<br />
NAMELEN= option<br />
PROC <strong>MIXED</strong> statement, 3903<br />
NOBOUND option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
PROC <strong>MIXED</strong> statement, 3903<br />
NOCLPRINT option<br />
PROC <strong>MIXED</strong> statement, 3903<br />
NOCONTAIN option<br />
MODEL statement (<strong>MIXED</strong>), 3934<br />
NOFULLZ option<br />
RANDOM statement (<strong>MIXED</strong>), 3946<br />
NOINFO option
PROC <strong>MIXED</strong> statement, 3903<br />
NOINT option<br />
MODEL statement (<strong>MIXED</strong>), 3934, 3976<br />
NOITER option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
NOITPRINT option<br />
PROC <strong>MIXED</strong> statement, 3903<br />
NONLOCALW option<br />
REPEATED statement (<strong>MIXED</strong>), 3951<br />
NOPROFILE option<br />
PARMS statement (<strong>MIXED</strong>), 3938<br />
PROC <strong>MIXED</strong> statement, 3904, 3969<br />
NOTEST option<br />
MODEL statement (<strong>MIXED</strong>), 3934<br />
NSAMPLE= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
NSEARCH= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
OBSMARGINS option<br />
LSMEANS statement (<strong>MIXED</strong>), 3921<br />
OLS option<br />
PARMS statement (<strong>MIXED</strong>), 3939<br />
ORD option<br />
PROC <strong>MIXED</strong> statement, 3904<br />
ORDER= option<br />
MODEL statement (<strong>MIXED</strong>), 3980<br />
PROC <strong>MIXED</strong> statement, 3904, 3976<br />
OUT= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
OUTG= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
OUTGT= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
OUTP= option<br />
MODEL statement (<strong>MIXED</strong>), 4026<br />
OUTPRED= option<br />
MODEL statement (<strong>MIXED</strong>), 3934<br />
OUTPREDM= option<br />
MODEL statement (<strong>MIXED</strong>), 3935<br />
PARMS statement<br />
<strong>MIXED</strong> procedure, 3937, 4026<br />
PARMSDATA= option<br />
PARMS statement (<strong>MIXED</strong>), 3939<br />
PDATA= option<br />
PARMS statement (<strong>MIXED</strong>), 3939<br />
PDIFF option<br />
LSMEANS statement (<strong>MIXED</strong>), 3920, 3921<br />
PLOT option<br />
PROC <strong>MIXED</strong> statement, 3905<br />
PLOTS option<br />
PROC <strong>MIXED</strong> statement, 3905<br />
PRIOR statement<br />
<strong>MIXED</strong> procedure, 3939<br />
PROC <strong>MIXED</strong> statement, see <strong>MIXED</strong> procedure<br />
PSEARCH option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
PTRANS option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
R option<br />
REPEATED statement (<strong>MIXED</strong>), 3951<br />
RANDOM statement<br />
<strong>MIXED</strong> procedure, 3943<br />
RATIO option<br />
PROC <strong>MIXED</strong> statement, 3909, 3991<br />
RATIOS option<br />
PARMS statement (<strong>MIXED</strong>), 3939<br />
RANDOM statement (<strong>MIXED</strong>), 3946<br />
RC option<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
RCI option<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
RCORR option<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
REPEATED statement<br />
<strong>MIXED</strong> procedure, 3948, 4013<br />
RESIDUAL option<br />
<strong>MIXED</strong> procedure, MODEL statement,<br />
3982<br />
MODEL statement (<strong>MIXED</strong>), 3935<br />
RI option<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
RIDGE= option<br />
PROC <strong>MIXED</strong> statement, 3909<br />
SCORING= option<br />
PROC <strong>MIXED</strong> statement, 3909<br />
SEED= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
SELECT= modifier<br />
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3930<br />
SFACTOR= option<br />
PRIOR statement (<strong>MIXED</strong>), 3942<br />
SIGITER option<br />
PROC <strong>MIXED</strong> statement, 3909<br />
SINGCHOL= option<br />
MODEL statement (<strong>MIXED</strong>), 3935<br />
SINGRES= option<br />
MODEL statement (<strong>MIXED</strong>), 3935<br />
SINGULAR= option<br />
CONTRAST statement (<strong>MIXED</strong>), 3914<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
LSMEANS statement (<strong>MIXED</strong>), 3922<br />
MODEL statement (<strong>MIXED</strong>), 3936<br />
SIZE= modifier
INFLUENCE option, MODEL statement<br />
(<strong>MIXED</strong>), 3931<br />
SLICE= option<br />
LSMEANS statement (<strong>MIXED</strong>), 3922<br />
SOLUTION option<br />
MODEL statement (<strong>MIXED</strong>), 3936, 3980<br />
RANDOM statement (<strong>MIXED</strong>), 3946<br />
SSCP option<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
SUBJECT option<br />
CONTRAST statement (<strong>MIXED</strong>), 3914<br />
ESTIMATE statement (<strong>MIXED</strong>), 3915<br />
SUBJECT= option<br />
RANDOM statement (<strong>MIXED</strong>), 3912, 3946<br />
REPEATED statement (<strong>MIXED</strong>), 3952<br />
TDATA= option<br />
PRIOR statement (<strong>MIXED</strong>), 3943<br />
TRANS= option<br />
PRIOR statement (<strong>MIXED</strong>), 3943<br />
TYPE= option<br />
RANDOM statement (<strong>MIXED</strong>), 3946<br />
REPEATED statement (<strong>MIXED</strong>), 3953<br />
UPDATE option<br />
PROC <strong>MIXED</strong> statement, 3909<br />
UPDATE= option<br />
PRIOR statement (<strong>MIXED</strong>), 3943<br />
UPPERB= option<br />
PARMS statement (<strong>MIXED</strong>), 3939<br />
UPPERTAILED option<br />
ESTIMATE statement (<strong>MIXED</strong>), 3916<br />
V option<br />
RANDOM statement (<strong>MIXED</strong>), 3947<br />
VC option<br />
RANDOM statement (<strong>MIXED</strong>), 3947<br />
VCI option<br />
RANDOM statement (<strong>MIXED</strong>), 3947<br />
VCIRY option<br />
<strong>MIXED</strong> procedure, MODEL statement,<br />
3982<br />
MODEL statement (<strong>MIXED</strong>), 3936<br />
VCORR option<br />
RANDOM statement (<strong>MIXED</strong>), 3947<br />
VI option<br />
RANDOM statement (<strong>MIXED</strong>), 3947<br />
WEIGHT statement<br />
<strong>MIXED</strong> procedure, 3962<br />
XPVIX option<br />
MODEL statement (<strong>MIXED</strong>), 3936<br />
XPVIXI option<br />
MODEL statement (<strong>MIXED</strong>), 3936<br />
ZETA= option<br />
MODEL statement (<strong>MIXED</strong>), 3936
Your Turn<br />
We welcome your feedback.<br />
If you have comments about this book, please send them to<br />
yourturn@sas.com. Include the full title and page numbers (if<br />
applicable).<br />
If you have comments about the software, please send them to<br />
suggest@sas.com.
<strong>SAS</strong>®<br />
Publishing Delivers!<br />
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly<br />
changing and competitive job market. <strong>SAS</strong> ®<br />
Publishing provides you with a wide range of resources to help you set<br />
yourself apart. Visit us online at support.sas.com/bookstore.<br />
<strong>SAS</strong> ®<br />
Press<br />
Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you<br />
need in example-rich books from <strong>SAS</strong> Press. Written by experienced <strong>SAS</strong> professionals from around the<br />
world, <strong>SAS</strong> Press books deliver real-world insights on a broad range of topics for all skill levels.<br />
<strong>SAS</strong> ®<br />
Documentation<br />
support.sas.com/saspress<br />
To successfully implement applications using <strong>SAS</strong> software, companies in every industry and on every<br />
continent all turn to the one source for accurate, timely, and reliable information: <strong>SAS</strong> documentation.<br />
We currently produce the following types of reference documentation to improve your work experience:<br />
• Online help that is built into the software.<br />
• Tutorials that are integrated into the product.<br />
• Reference documentation delivered in HTML and PDF – free on the Web.<br />
• Hard-copy books.<br />
support.sas.com/publishing<br />
<strong>SAS</strong> ®<br />
Publishing News<br />
Subscribe to <strong>SAS</strong> Publishing News to receive up-to-date information about all new <strong>SAS</strong> titles, author<br />
podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as<br />
access to past issues, are available at our Web site.<br />
support.sas.com/spn<br />
<strong>SAS</strong> and all other <strong>SAS</strong> Institute Inc. product or service names are registered trademarks or trademarks of <strong>SAS</strong> Institute Inc. in the USA and other countries. ® indicates USA registration.<br />
Other brand and product names are trademarks of their respective companies. © 2009 <strong>SAS</strong> Institute Inc. All rights reserved. 518177_1US.0109