30.04.2013 Views

SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)

SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)

SAS/STAT 9.2 User's Guide: The MIXED Procedure (Book Excerpt)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>SAS</strong>/<strong>STAT</strong> ®<br />

<strong>9.2</strong> User’s <strong>Guide</strong><br />

<strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

(<strong>Book</strong> <strong>Excerpt</strong>)<br />

<strong>SAS</strong> ® Documentation


This document is an individual chapter from <strong>SAS</strong>/<strong>STAT</strong> ® <strong>9.2</strong> User’s <strong>Guide</strong>.<br />

<strong>The</strong> correct bibliographic citation for the complete manual is as follows: <strong>SAS</strong> Institute Inc. 2008. <strong>SAS</strong>/<strong>STAT</strong> ® <strong>9.2</strong><br />

User’s <strong>Guide</strong>. Cary, NC: <strong>SAS</strong> Institute Inc.<br />

Copyright © 2008, <strong>SAS</strong> Institute Inc., Cary, NC, USA<br />

All rights reserved. Produced in the United States of America.<br />

For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor<br />

at the time you acquire this publication.<br />

U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation<br />

by the U.S. government is subject to the Agreement with <strong>SAS</strong> Institute and the restrictions set forth in FAR 52.227-19,<br />

Commercial Computer Software-Restricted Rights (June 1987).<br />

<strong>SAS</strong> Institute Inc., <strong>SAS</strong> Campus Drive, Cary, North Carolina 27513.<br />

1st electronic book, March 2008<br />

2nd electronic book, February 2009<br />

<strong>SAS</strong> ® Publishing provides a complete selection of books and electronic products to help customers use <strong>SAS</strong> software to<br />

its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the<br />

<strong>SAS</strong> Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.<br />

<strong>SAS</strong> ® and all other <strong>SAS</strong> Institute Inc. product or service names are registered trademarks or trademarks of <strong>SAS</strong> Institute<br />

Inc. in the USA and other countries. ® indicates USA registration.<br />

Other brand and product names are registered trademarks or trademarks of their respective companies.


Chapter 56<br />

<strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Contents<br />

Overview: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3886<br />

Basic Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3887<br />

Notation for the Mixed Model . . . . . . . . . . . . . . . . . . . . . . . . . 3888<br />

PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s . . . . . . . . . . . . 3889<br />

Getting Started: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . 3890<br />

Clustered Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3890<br />

Syntax: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3896<br />

PROC <strong>MIXED</strong> Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3898<br />

BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910<br />

CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910<br />

CONTRAST Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3911<br />

ESTIMATE Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3914<br />

ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3916<br />

LSMEANS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3916<br />

MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3922<br />

PARMS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3937<br />

PRIOR Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3939<br />

RANDOM Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3943<br />

REPEATED Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3948<br />

WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />

Details: <strong>MIXED</strong> <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />

Mixed Models <strong>The</strong>ory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3962<br />

Parameterization of Mixed Models . . . . . . . . . . . . . . . . . . . . . . 3975<br />

Residuals and Influence Diagnostics . . . . . . . . . . . . . . . . . . . . . . 3980<br />

Default Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3989<br />

ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3993<br />

ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3998<br />

Computational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4004<br />

Examples: Mixed <strong>Procedure</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4008<br />

Example 56.1: Split-Plot Design . . . . . . . . . . . . . . . . . . . . . . . 4008<br />

Example 56.2: Repeated Measures . . . . . . . . . . . . . . . . . . . . . . 4013<br />

Example 56.3: Plotting the Likelihood . . . . . . . . . . . . . . . . . . . . 4026<br />

Example 56.4: Known G and R . . . . . . . . . . . . . . . . . . . . . . . . 4033<br />

Example 56.5: Random Coefficients . . . . . . . . . . . . . . . . . . . . . 4041


3886 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Example 56.6: Line-Source Sprinkler Irrigation . . . . . . . . . . . . . . . 4049<br />

Example 56.7: Influence in Heterogeneous Variance Model . . . . . . . . . 4055<br />

Example 56.8: Influence Analysis for Repeated Measures Data . . . . . . . 4064<br />

Example 56.9: Examining Individual Test Components . . . . . . . . . . . 4073<br />

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4078<br />

Overview: <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> <strong>MIXED</strong> procedure fits a variety of mixed linear models to data and enables you to use these<br />

fitted models to make statistical inferences about the data. A mixed linear model is a generalization<br />

of the standard linear model used in the GLM procedure, the generalization being that the data<br />

are permitted to exhibit correlation and nonconstant variability. <strong>The</strong> mixed linear model, therefore,<br />

provides you with the flexibility of modeling not only the means of your data (as in the standard<br />

linear model) but their variances and covariances as well.<br />

<strong>The</strong> primary assumptions underlying the analyses performed by PROC <strong>MIXED</strong> are as follows:<br />

<strong>The</strong> data are normally distributed (Gaussian).<br />

<strong>The</strong> means (expected values) of the data are linear in terms of a certain set of parameters.<br />

<strong>The</strong> variances and covariances of the data are in terms of a different set of parameters, and<br />

they exhibit a structure matching one of those available in PROC <strong>MIXED</strong>.<br />

Since Gaussian data can be modeled entirely in terms of their means and variances/covariances, the<br />

two sets of parameters in a mixed linear model actually specify the complete probability distribution<br />

of the data. <strong>The</strong> parameters of the mean model are referred to as fixed-effects parameters, and the<br />

parameters of the variance-covariance model are referred to as covariance parameters.<br />

<strong>The</strong> fixed-effects parameters are associated with known explanatory variables, as in the standard<br />

linear model. <strong>The</strong>se variables can be either qualitative (as in the traditional analysis of variance)<br />

or quantitative (as in standard linear regression). However, the covariance parameters are what<br />

distinguishes the mixed linear model from the standard linear model.<br />

<strong>The</strong> need for covariance parameters arises quite frequently in applications, the following being the<br />

two most typical scenarios:<br />

<strong>The</strong> experimental units on which the data are measured can be grouped into clusters, and the<br />

data from a common cluster are correlated.<br />

Repeated measurements are taken on the same experimental unit, and these repeated measurements<br />

are correlated or exhibit variability that changes.<br />

<strong>The</strong> first scenario can be generalized to include one set of clusters nested within another. For example,<br />

if students are the experimental unit, they can be clustered into classes, which in turn can be


Basic Features ✦ 3887<br />

clustered into schools. Each level of this hierarchy can introduce an additional source of variability<br />

and correlation. <strong>The</strong> second scenario occurs in longitudinal studies, where repeated measurements<br />

are taken over time. Alternatively, the repeated measures could be spatial or multivariate in nature.<br />

PROC <strong>MIXED</strong> provides a variety of covariance structures to handle the previous two scenarios.<br />

<strong>The</strong> most common of these structures arises from the use of random-effects parameters, which are<br />

additional unknown random variables assumed to affect the variability of the data. <strong>The</strong> variances of<br />

the random-effects parameters, commonly known as variance components, become the covariance<br />

parameters for this particular structure. Traditional mixed linear models contain both fixed- and<br />

random-effects parameters, and, in fact, it is the combination of these two types of effects that led<br />

to the name mixed model. PROC <strong>MIXED</strong> fits not only these traditional variance component models<br />

but numerous other covariance structures as well.<br />

PROC <strong>MIXED</strong> fits the structure you select to the data by using the method of restricted maximum<br />

likelihood (REML), also known as residual maximum likelihood. It is here that the Gaussian assumption<br />

for the data is exploited. Other estimation methods are also available, including maximum<br />

likelihood and MIVQUE0. <strong>The</strong> details behind these estimation methods are discussed in subsequent<br />

sections.<br />

After a model has been fit to your data, you can use it to draw statistical inferences via both the fixedeffects<br />

and covariance parameters. PROC <strong>MIXED</strong> computes several different statistics suitable for<br />

generating hypothesis tests and confidence intervals. <strong>The</strong> validity of these statistics depends upon<br />

the mean and variance-covariance model you select, so it is important to choose the model carefully.<br />

Some of the output from PROC <strong>MIXED</strong> helps you assess your model and compare it with others.<br />

Basic Features<br />

PROC <strong>MIXED</strong> provides easy accessibility to numerous mixed linear models that are useful in many<br />

common statistical analyses. In the style of the GLM procedure, PROC <strong>MIXED</strong> fits the specified<br />

mixed linear model and produces appropriate statistics.<br />

Here are some basic features of PROC <strong>MIXED</strong>:<br />

covariance structures, including variance components, compound symmetry, unstructured,<br />

AR(1), Toeplitz, spatial, general linear, and factor analytic<br />

GLM-type grammar, by using MODEL, RANDOM, and REPEATED statements for model<br />

specification and CONTRAST, ESTIMATE, and LSMEANS statements for inferences<br />

appropriate standard errors for all specified estimable linear combinations of fixed and random<br />

effects, and corresponding t and F tests<br />

subject and group effects that enable blocking and heterogeneity, respectively<br />

REML and ML estimation methods implemented with a Newton-Raphson algorithm<br />

capacity to handle unbalanced data<br />

ability to create a <strong>SAS</strong> data set corresponding to any table


3888 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

PROC <strong>MIXED</strong> uses the Output Delivery System (ODS), a <strong>SAS</strong> subsystem that provides capabilities<br />

for displaying and controlling the output from <strong>SAS</strong> procedures. ODS enables you to convert any<br />

of the output from PROC <strong>MIXED</strong> into a <strong>SAS</strong> data set. See the section “ODS Table Names” on<br />

page 3993.<br />

<strong>The</strong> <strong>MIXED</strong> procedure now uses ODS Graphics to create graphs as part of its output. For general<br />

information about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific<br />

information about the statistical graphics available with the <strong>MIXED</strong> procedure, see the PLOTS<br />

option in the PROC <strong>MIXED</strong> statement and the section “ODS Graphics” on page 3998.<br />

Notation for the Mixed Model<br />

This section introduces the mathematical notation used throughout this chapter to describe the<br />

mixed linear model. You should be familiar with basic matrix algebra (see Searle 1982). A more<br />

detailed description of the mixed model is contained in the section “Mixed Models <strong>The</strong>ory” on<br />

page 3962.<br />

A statistical model is a mathematical description of how data are generated. <strong>The</strong> standard linear<br />

model, as used by the GLM procedure, is one of the most common statistical models:<br />

y D Xˇ C<br />

In this expression, y represents a vector of observed data, ˇ is an unknown vector of fixed-effects<br />

parameters with known design matrix X, and is an unknown random error vector modeling the<br />

statistical noise around Xˇ. <strong>The</strong> focus of the standard linear model is to model the mean of y<br />

by using the fixed-effects parameters ˇ. <strong>The</strong> residual errors are assumed to be independent and<br />

identically distributed Gaussian random variables with mean 0 and variance 2 .<br />

<strong>The</strong> mixed model generalizes the standard linear model as follows:<br />

y D Xˇ C Z C<br />

Here, is an unknown vector of random-effects parameters with known design matrix Z, and<br />

is an unknown random error vector whose elements are no longer required to be independent and<br />

homogeneous.<br />

To further develop this notion of variance modeling, assume that and are Gaussian random<br />

variables that are uncorrelated and have expectations 0 and variances G and R, respectively. <strong>The</strong><br />

variance of y is thus<br />

V D ZGZ 0 C R<br />

Note that, when R D 2 I and Z D 0, the mixed model reduces to the standard linear model.<br />

You can model the variance of the data, y, by specifying the structure (or form) of Z, G, and R. <strong>The</strong><br />

model matrix Z is set up in the same fashion as X, the model matrix for the fixed-effects parameters.<br />

For G and R, you must select some covariance structure. Possible covariance structures include the<br />

following:


variance components<br />

compound symmetry (common covariance plus diagonal)<br />

unstructured (general covariance)<br />

autoregressive<br />

spatial<br />

general linear<br />

factor analytic<br />

PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s ✦ 3889<br />

By appropriately defining the model matrices X and Z, as well as the covariance structure matrices<br />

G and R, you can perform numerous mixed model analyses.<br />

PROC <strong>MIXED</strong> Contrasted with Other <strong>SAS</strong> <strong>Procedure</strong>s<br />

PROC <strong>MIXED</strong> is a generalization of the GLM procedure in the sense that PROC GLM fits standard<br />

linear models, and PROC <strong>MIXED</strong> fits the wider class of mixed linear models. Both procedures<br />

have similar CLASS, MODEL, CONTRAST, ESTIMATE, and LSMEANS statements, but their<br />

RANDOM and REPEATED statements differ (see the following paragraphs). Both procedures use<br />

the non-full-rank model parameterization, although the sorting of classification levels can differ<br />

between the two. PROC <strong>MIXED</strong> computes only Type I–Type III tests of fixed effects, while PROC<br />

GLM computes Types I–IV.<br />

<strong>The</strong> RANDOM statement in PROC <strong>MIXED</strong> incorporates random effects constituting the vector<br />

in the mixed model. However, in PROC GLM, effects specified in the RANDOM statement are still<br />

treated as fixed as far as the model fit is concerned, and they serve only to produce corresponding<br />

expected mean squares. <strong>The</strong>se expected mean squares lead to the traditional ANOVA estimates of<br />

variance components. PROC <strong>MIXED</strong> computes REML and ML estimates of variance parameters,<br />

which are generally preferred to the ANOVA estimates (Searle 1988; Harville 1988; Searle, Casella,<br />

and McCulloch 1992). Optionally, PROC <strong>MIXED</strong> also computes MIVQUE0 estimates, which are<br />

similar to ANOVA estimates.<br />

<strong>The</strong> REPEATED statement in PROC <strong>MIXED</strong> is used to specify covariance structures for repeated<br />

measurements on subjects, while the REPEATED statement in PROC GLM is used to specify various<br />

transformations with which to conduct the traditional univariate or multivariate tests. In repeated<br />

measures situations, the mixed model approach used in PROC <strong>MIXED</strong> is more flexible and<br />

more widely applicable than either the univariate or multivariate approach. In particular, the mixed<br />

model approach provides a larger class of covariance structures and a better mechanism for handling<br />

missing values (Wolfinger and Chang 1995).<br />

PROC <strong>MIXED</strong> subsumes the VARCOMP procedure. PROC <strong>MIXED</strong> provides a wide variety of covariance<br />

structures, while PROC VARCOMP estimates only simple random effects. PROC <strong>MIXED</strong><br />

carries out several analyses that are absent in PROC VARCOMP, including the estimation and testing<br />

of linear combinations of fixed and random effects.


3890 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> ARIMA and AUTOREG procedures provide more time series structures than PROC <strong>MIXED</strong>,<br />

although they do not fit variance component models. <strong>The</strong> CALIS procedure fits general covariance<br />

matrices, but the fixed effects structure of the model is formed differently than in PROC <strong>MIXED</strong>.<br />

<strong>The</strong> LATTICE and NESTED procedures fit special types of mixed linear models that can also be<br />

handled in PROC <strong>MIXED</strong>, although PROC <strong>MIXED</strong> might run slower because of its more general<br />

algorithm. <strong>The</strong> TSCSREG procedure analyzes time series cross-sectional data, and it fits some<br />

structures not available in PROC <strong>MIXED</strong>.<br />

<strong>The</strong> GLIMMIX procedure fits generalized linear mixed models (GLMMs). Linear mixed models—<br />

where the data are normally distributed, given the random effects—are in the class of GLMMs. <strong>The</strong><br />

<strong>MIXED</strong> procedure can estimate covariance parameters with ANOVA methods that are not available<br />

in the GLIMMIX procedure (see METHOD=TYPE1, METHOD=TYPE2, and METHOD=TYPE3<br />

in the PROC <strong>MIXED</strong> statement). Also, PROC <strong>MIXED</strong> can perform a sampling-based Bayesian<br />

analysis through the PRIOR statement, and the procedure supports certain Kronecker-type covariance<br />

structures. <strong>The</strong>se features are not available in the GLIMMIX procedure. <strong>The</strong> GLIMMIX<br />

procedure, on the other hand, accommodates nonnormal data and offers a broader array of postprocessing<br />

features than the <strong>MIXED</strong> procedure.<br />

Getting Started: <strong>MIXED</strong> <strong>Procedure</strong><br />

Clustered Data Example<br />

Consider the following <strong>SAS</strong> data set as an introductory example:<br />

data heights;<br />

input Family Gender$ Height @@;<br />

datalines;<br />

1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63<br />

2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63<br />

3 M 64 4 F 67 4 F 66 4 M 67 4 M 67 4 M 69<br />

;<br />

<strong>The</strong> response variable Height measures the heights (in inches) of 18 individuals. <strong>The</strong> individuals<br />

are classified according to Family and Gender. You can perform a traditional two-way analysis of<br />

variance of these data with the following PROC <strong>MIXED</strong> statements:<br />

proc mixed data=heights;<br />

class Family Gender;<br />

model Height = Gender Family Family*Gender;<br />

run;<br />

<strong>The</strong> PROC <strong>MIXED</strong> statement invokes the procedure. <strong>The</strong> CLASS statement instructs PROC<br />

<strong>MIXED</strong> to consider both Family and Gender as classification variables. Dummy (indicator) variables<br />

are, as a result, created corresponding to all of the distinct levels of Family and Gender. For<br />

these data, Family has four levels and Gender has two levels.


Clustered Data Example ✦ 3891<br />

<strong>The</strong> MODEL statement first specifies the response (dependent) variable Height. <strong>The</strong> explanatory<br />

(independent) variables are then listed after the equal (=) sign. Here, the two explanatory variables<br />

are Gender and Family, and these are the main effects of the design. <strong>The</strong> third explanatory term,<br />

Family*Gender, models an interaction between the two main effects.<br />

PROC <strong>MIXED</strong> uses the dummy variables associated with Gender, Family, and Family*Gender to<br />

construct the X matrix for the linear model. A column of 1s is also included as the first column of<br />

X to model a global intercept. <strong>The</strong>re are no Z or G matrices for this model, and R is assumed to<br />

equal 2 I, where I is an 18 18 identity matrix.<br />

<strong>The</strong> RUN statement completes the specification. <strong>The</strong> coding is precisely the same as with the GLM<br />

procedure. However, much of the output from PROC <strong>MIXED</strong> is different from that produced by<br />

PROC GLM.<br />

<strong>The</strong> output from PROC <strong>MIXED</strong> is shown in Figure 56.1–Figure 56.7.<br />

<strong>The</strong> “Model Information” table in Figure 56.1 describes the model, some of the variables that it<br />

involves, and the method used in fitting it. This table also lists the method (profile, factor, parameter,<br />

or none) for handling the residual variance.<br />

Figure 56.1 Model Information<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.HEIGHTS<br />

Dependent Variable Height<br />

Covariance Structure Diagonal<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Residual<br />

<strong>The</strong> “Class Level Information” table in Figure 56.2 lists the levels of all variables specified in the<br />

CLASS statement. You can check this table to make sure that the data are correct.<br />

Figure 56.2 Class Level Information<br />

Class Level Information<br />

Class Levels Values<br />

Family 4 1 2 3 4<br />

Gender 2 F M<br />

<strong>The</strong> “Dimensions” table in Figure 56.3 lists the sizes of relevant matrices. This table can be useful<br />

in determining CPU time and memory requirements.


3892 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Figure 56.3 Dimensions<br />

Dimensions<br />

Covariance Parameters 1<br />

Columns in X 15<br />

Columns in Z 0<br />

Subjects 1<br />

Max Obs Per Subject 18<br />

<strong>The</strong> “Number of Observations” table in Figure 56.4 displays information about the sample size<br />

being processed.<br />

Figure 56.4 Number of Observations<br />

Number of Observations<br />

Number of Observations Read 18<br />

Number of Observations Used 18<br />

Number of Observations Not Used 0<br />

<strong>The</strong> “Covariance Parameter Estimates” table in Figure 56.5 displays the estimate of 2 for the<br />

model.<br />

Figure 56.5 Covariance Parameter Estimates<br />

Covariance Parameter<br />

Estimates<br />

Cov Parm Estimate<br />

Residual 2.1000<br />

<strong>The</strong> “Fit Statistics” table in Figure 56.6 lists several pieces of information about the fitted mixed<br />

model, including values derived from the computed value of the restricted/residual likelihood.<br />

Figure 56.6 Fit Statistics<br />

Fit Statistics<br />

-2 Res Log Likelihood 41.6<br />

AIC (smaller is better) 43.6<br />

AICC (smaller is better) 44.1<br />

BIC (smaller is better) 43.9<br />

<strong>The</strong> “Type 3 Tests of Fixed Effects” table in Figure 56.7 displays significance tests for the three<br />

effects listed in the MODEL statement. <strong>The</strong> Type 3 F statistics and p-values are the same as those<br />

produced by the GLM procedure. However, because PROC <strong>MIXED</strong> uses a likelihood-based esti-


Clustered Data Example ✦ 3893<br />

mation scheme, it does not directly compute or display sums of squares for this analysis.<br />

Figure 56.7 Tests of Fixed Effects<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Gender 1 10 17.63 0.0018<br />

Family 3 10 5.90 0.0139<br />

Family*Gender 3 10 2.89 0.0889<br />

<strong>The</strong> Type 3 test for Family*Gender effect is not significant at the 5% level, but the tests for both main<br />

effects are significant.<br />

<strong>The</strong> important assumptions behind this analysis are that the data are normally distributed and that<br />

they are independent with constant variance. For these data, the normality assumption is probably<br />

realistic since the data are observed heights. However, since the data occur in clusters (families),<br />

it is very likely that observations from the same family are statistically correlated—that is, not<br />

independent.<br />

<strong>The</strong> methods implemented in PROC <strong>MIXED</strong> are still based on the assumption of normally distributed<br />

data, but you can drop the assumption of independence by modeling statistical correlation<br />

in a variety of ways. You can also model variances that are heterogeneous—that is, nonconstant.<br />

For the height data, one of the simplest ways of modeling correlation is through the use of random<br />

effects. Here the family effect is assumed to be normally distributed with zero mean and some<br />

unknown variance. This is in contrast to the previous model in which the family effects are just<br />

constants, or fixed effects. Declaring Family as a random effect sets up a common correlation among<br />

all observations having the same level of Family.<br />

Declaring Family*Gender as a random effect models an additional correlation between all observations<br />

that have the same level of both Family and Gender. One interpretation of this effect is that a<br />

female in a certain family exhibits more correlation with the other females in that family than with<br />

the other males, and likewise for a male. With the height data, this model seems reasonable.<br />

<strong>The</strong> statements to fit this correlation model in PROC <strong>MIXED</strong> are as follows:<br />

proc mixed;<br />

class Family Gender;<br />

model Height = Gender;<br />

random Family Family*Gender;<br />

run;<br />

Note that Family and Family*Gender are now listed in the RANDOM statement. <strong>The</strong> dummy variables<br />

associated with them are used to construct the Z matrix in the mixed model. <strong>The</strong> X matrix<br />

now consists of a column of 1s and the dummy variables for Gender.<br />

<strong>The</strong> G matrix for this model is diagonal, and it contains the variance components for both Family<br />

and Family*Gender. <strong>The</strong> R matrix is still assumed to equal 2 I, where I is an identity matrix.


3894 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> output from this analysis is as follows.<br />

Figure 56.8 Model Information<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.HEIGHTS<br />

Dependent Variable Height<br />

Covariance Structure Variance Components<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

<strong>The</strong> “Model Information” table in Figure 56.8 shows that the containment method is used to compute<br />

the degrees of freedom for this analysis. This is the default method when a RANDOM statement<br />

is used; see the description of the DDFM= option for more information.<br />

Figure 56.9 Class Level Information<br />

Class Level Information<br />

Class Levels Values<br />

Family 4 1 2 3 4<br />

Gender 2 F M<br />

<strong>The</strong> “Class Level Information” table in Figure 56.9 is the same as before. <strong>The</strong> “Dimensions” table<br />

in Figure 56.10 displays the new sizes of the X and Z matrices.<br />

Figure 56.10 Dimensions and Number of Observations<br />

Dimensions<br />

Covariance Parameters 3<br />

Columns in X 3<br />

Columns in Z 12<br />

Subjects 1<br />

Max Obs Per Subject 18<br />

Number of Observations<br />

Number of Observations Read 18<br />

Number of Observations Used 18<br />

Number of Observations Not Used 0<br />

<strong>The</strong> “Iteration History” table in Figure 56.11 displays the results of the numerical optimization<br />

of the restricted/residual likelihood. Six iterations are required to achieve the default convergence


criterion of 1E 8.<br />

Figure 56.11 REML Estimation Iteration History<br />

Iteration History<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

0 1 74.11074833<br />

1 2 71.51614003 0.01441208<br />

2 1 71.13845990 0.00412226<br />

3 1 71.03613556 0.00058188<br />

4 1 71.02281757 0.00001689<br />

5 1 71.02245904 0.00000002<br />

6 1 71.02245869 0.00000000<br />

Convergence criteria met.<br />

Clustered Data Example ✦ 3895<br />

<strong>The</strong> “Covariance Parameter Estimates” table in Figure 56.12 displays the results of the REML<br />

fit. <strong>The</strong> Estimate column contains the estimates of the variance components for Family and Family*Gender,<br />

as well as the estimate of 2 .<br />

Figure 56.12 Covariance Parameter Estimates (REML)<br />

Covariance Parameter<br />

Estimates<br />

Cov Parm Estimate<br />

Family 2.4010<br />

Family*Gender 1.7657<br />

Residual 2.1668<br />

<strong>The</strong> “Fit Statistics” table in Figure 56.13 contains basic information about the REML fit.<br />

Figure 56.13 Fit Statistics<br />

Fit Statistics<br />

-2 Res Log Likelihood 71.0<br />

AIC (smaller is better) 77.0<br />

AICC (smaller is better) 79.0<br />

BIC (smaller is better) 75.2<br />

<strong>The</strong> “Type 3 Tests of Fixed Effects” table in Figure 56.14 contains a significance test for the lone<br />

fixed effect, Gender. Note that the associated p-value is not nearly as significant as in the previous<br />

analysis. This illustrates the importance of correctly modeling correlation in your data.


3896 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Figure 56.14 Type 3 Tests of Fixed Effects<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Gender 1 3 7.95 0.0667<br />

An additional benefit of the random effects analysis is that it enables you to make inferences about<br />

gender that apply to an entire population of families, whereas the inferences about gender from the<br />

analysis where Family and Family*Gender are fixed effects apply only to the particular families in the<br />

data set.<br />

PROC <strong>MIXED</strong> thus offers you the ability to model correlation directly and to make inferences about<br />

fixed effects that apply to entire populations of random effects.<br />

Syntax: <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> following statements are available in PROC <strong>MIXED</strong>.<br />

PROC <strong>MIXED</strong> < options > ;<br />

BY variables ;<br />

CLASS variables ;<br />

ID variables ;<br />

MODEL dependent = < fixed-effects > < / options > ;<br />

RANDOM random-effects < / options > ;<br />

REPEATED < repeated-effect >< / options > ;<br />

PARMS (value-list) . . . < / options > ;<br />

PRIOR < distribution >< / options > ;<br />

CONTRAST ’label’ < fixed-effect values . . . ><br />

< | random-effect values . . . >, . . . < / options > ;<br />

ESTIMATE ’label’ < fixed-effect values . . . ><br />

< | random-effect values . . . >< / options > ;<br />

LSMEANS fixed-effects < / options > ;<br />

WEIGHT variable ;<br />

Items within angle brackets ( < > ) are optional. <strong>The</strong> CONTRAST, ESTIMATE, LSMEANS, and<br />

RANDOM statements can appear multiple times; all other statements can appear only once.<br />

<strong>The</strong> PROC <strong>MIXED</strong> and MODEL statements are required, and the MODEL statement must appear<br />

after the CLASS statement if a CLASS statement is included. <strong>The</strong> CONTRAST, ESTIMATE,<br />

LSMEANS, RANDOM, and REPEATED statements must follow the MODEL statement. <strong>The</strong><br />

CONTRAST and ESTIMATE statements must also follow any RANDOM statements.<br />

Table 56.1 summarizes the basic functions and important options of each PROC <strong>MIXED</strong> statement.


Syntax: <strong>MIXED</strong> <strong>Procedure</strong> ✦ 3897<br />

<strong>The</strong> syntax of each statement in Table 56.1 is described in the following sections in alphabetical<br />

order after the description of the PROC <strong>MIXED</strong> statement.<br />

Table 56.1 Summary of PROC <strong>MIXED</strong> Statements<br />

Statement Description Important Options<br />

PROC <strong>MIXED</strong> invokes the procedure DATA= specifies input data set, METHOD= specifies<br />

estimation method<br />

BY performs multiple<br />

PROC <strong>MIXED</strong> analyses<br />

in one invocation<br />

none<br />

CLASS declares qualitative variables<br />

that create indicator<br />

variables in design<br />

matrices<br />

none<br />

ID lists additional variables<br />

to be included in predicted<br />

values tables<br />

none<br />

MODEL specifies dependent vari- S requests solution for fixed-effects parameters,<br />

able and fixed effects, DDFM= specifies denominator degrees of free-<br />

setting up X<br />

dom method, OUTP= outputs predicted values to<br />

a data set, INFLUENCE computes influence diagnostics<br />

RANDOM specifies random effects, SUBJECT= creates block-diagonality, TYPE=<br />

setting up Z and G specifies covariance structure, S requests solution<br />

for random-effects parameters, G displays estimated<br />

G<br />

REPEATED sets up R SUBJECT= creates block-diagonality, TYPE=<br />

specifies covariance structure, R displays estimated<br />

blocks of R, GROUP= enables betweensubject<br />

heterogeneity, LOCAL adds a diagonal<br />

matrix to R<br />

PARMS specifies a grid of initial HOLD= and NOITER hold the covariance pa-<br />

values for the covariance rameters or their ratios constant, PARMSDATA=<br />

parameters<br />

reads the initial values from a <strong>SAS</strong> data set<br />

PRIOR performs a sampling- NSAMPLE= specifies the sample size, SEED=<br />

based Bayesian analysis<br />

for variance component<br />

models<br />

specifies the starting seed<br />

CONTRAST constructs custom hy- E displays the L matrix coefficients<br />

ESTIMATE<br />

pothesis tests<br />

constructs custom scalar<br />

estimates<br />

CL produces confidence limits<br />

LSMEANS computes least squares DIFF computes differences of the least squares<br />

means for classification means, ADJUST= performs multiple compar-<br />

fixed effects<br />

isons adjustments, AT changes covariates, OM<br />

changes weighting, CL produces confidence limits,<br />

SLICE= tests simple effects


3898 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.1 continued<br />

Statement Description Important Options<br />

WEIGHT specifies a variable by<br />

which to weight R<br />

PROC <strong>MIXED</strong> Statement<br />

PROC <strong>MIXED</strong> < options > ;<br />

none<br />

<strong>The</strong> PROC <strong>MIXED</strong> statement invokes the procedure. Table 56.2 summarizes important options in<br />

the PROC <strong>MIXED</strong> statement by function. <strong>The</strong>se and other options in the PROC <strong>MIXED</strong> statement<br />

are then described fully in alphabetical order.<br />

Table 56.2 PROC <strong>MIXED</strong> Statement Options<br />

Option Description<br />

Basic Options<br />

DATA= specifies input data set<br />

METHOD= specifies the estimation method<br />

NOPROFILE includes scale parameter in optimization<br />

ORDER= determines the sort order of CLASS variables<br />

Displayed Output<br />

ASYCORR displays asymptotic correlation matrix of covariance parameter estimates<br />

ASYCOV displays asymptotic covariance matrix of covariance parameter estimates<br />

CL requests confidence limits for covariance parameter estimates<br />

COVTEST displays asymptotic standard errors and Wald tests for covariance<br />

parameters<br />

IC displays a table of information criteria<br />

ITDETAILS displays estimates and gradients added to “Iteration History”<br />

LOGNOTE writes periodic status notes to the log<br />

MMEQ displays mixed model equations<br />

MMEQSOL displays the solution to the mixed model equations<br />

NOCLPRINT suppresses “Class Level Information” completely or in parts<br />

NOITPRINT suppresses “Iteration History” table<br />

PLOTS produces ODS statistical graphics<br />

RATIO produces ratio of covariance parameter estimates with residual<br />

variance<br />

Optimization Options<br />

MAXFUNC= specifies the maximum number of likelihood evaluations<br />

MAXITER= specifies the maximum number of iterations


Table 56.2 continued<br />

Option Description<br />

PROC <strong>MIXED</strong> Statement ✦ 3899<br />

Computational Options<br />

CONVF requests and tunes the relative function convergence criterion<br />

CONVG requests and tunes the relative gradient convergence criterion<br />

CONVH requests and tunes the relative Hessian convergence criterion<br />

DFBW selects between-within degree of freedom method<br />

EMPIRICAL computes empirical (“sandwich”) estimators<br />

NOBOUND unbounds covariance parameter estimates<br />

RIDGE= specifies starting value for minimum ridge value<br />

SCORING= applies Fisher scoring where applicable<br />

You can specify the following options.<br />

ABSOLUTE<br />

makes the convergence criterion absolute. By default, it is relative (divided by the current<br />

objective function value). See the CONVF, CONVG, and CONVH options in this section for<br />

a description of various convergence criteria.<br />

ALPHA=number<br />

requests that confidence limits be constructed for the covariance parameter estimates with<br />

confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is<br />

0.05.<br />

ANOVAF<br />

<strong>The</strong> ANOVAF option computes F tests in models with REPEATED statement and without<br />

RANDOM statement by a method similar to that of Brunner, Domhof, and Langer (2002).<br />

<strong>The</strong> method consists of computing special F statistics and adjusting their degrees of freedom.<br />

<strong>The</strong> technique is a generalization of the Greenhouse-Geiser adjustment in MANOVA models<br />

(Greenhouse and Geiser 1959). For more details, see the section “F Tests With the ANOVAF<br />

Option” on page 3973.<br />

ASYCORR<br />

produces the asymptotic correlation matrix of the covariance parameter estimates. It is<br />

computed from the corresponding asymptotic covariance matrix (see the description of the<br />

ASYCOV option, which follows). For ODS purposes, the name of the “Asymptotic Correlation”<br />

table is “AsyCorr.”<br />

ASYCOV<br />

requests that the asymptotic covariance matrix of the covariance parameters be displayed. By<br />

default, this matrix is the observed inverse Fisher information matrix, which equals 2H 1 ,<br />

where H is the Hessian (second derivative) matrix of the objective function. See the section<br />

“Covariance Parameter Estimates” on page 3991 for more information about this matrix.<br />

When you use the SCORING= option and PROC <strong>MIXED</strong> converges without stopping the<br />

scoring algorithm, PROC <strong>MIXED</strong> uses the expected Hessian matrix to compute the covariance<br />

matrix instead of the observed Hessian. For ODS purposes, the name of the “Asymptotic<br />

Covariance” table is “AsyCov.”


3900 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

CL< =WALD ><br />

requests confidence limits for the covariance parameter estimates. A Satterthwaite approximation<br />

is used to construct limits for all parameters that have a lower boundary constraint of<br />

zero. <strong>The</strong>se limits take the form<br />

b2 2<br />

;1 ˛=2<br />

2<br />

b 2<br />

2<br />

;˛=2<br />

where D 2Z 2 , Z is the Wald statistic b 2 =se.b 2 /, and the denominators are quantiles of<br />

the 2 -distribution with degrees of freedom. See Milliken and Johnson (1992) and Burdick<br />

and Graybill (1992) for similar techniques.<br />

For all other parameters, Wald Z-scores and normal quantiles are used to construct the limits.<br />

Wald limits are also provided for variance components if you specify the NOBOUND option.<br />

<strong>The</strong> optional =WALD specification requests Wald limits for all parameters.<br />

<strong>The</strong> confidence limits are displayed as extra columns in the “Covariance Parameter Estimates”<br />

table. <strong>The</strong> confidence level is 1 ˛ D 0:95 by default; this can be changed with the ALPHA=<br />

option.<br />

CONVF< =number ><br />

requests the relative function convergence criterion with tolerance number. <strong>The</strong> relative function<br />

convergence criterion is<br />

jf k f k 1j<br />

jf kj<br />

number<br />

where f k is the value of the objective function at iteration k. To prevent the division by jf kj,<br />

use the ABSOLUTE option. <strong>The</strong> default convergence criterion is CONVH, and the default<br />

tolerance is 1E 8.<br />

CONVG < =number ><br />

requests the relative gradient convergence criterion with tolerance number. <strong>The</strong> relative gradient<br />

convergence criterion is<br />

maxj jg jkj<br />

jf kj<br />

number<br />

where f k is the value of the objective function, and g jk is the j th element of the gradient<br />

(first derivative) of the objective function, both at iteration k. To prevent division by jf kj,<br />

use the ABSOLUTE option. <strong>The</strong> default convergence criterion is CONVH, and the default<br />

tolerance is 1E 8.<br />

CONVH< =number ><br />

requests the relative Hessian convergence criterion with tolerance number. <strong>The</strong> relative Hessian<br />

convergence criterion is<br />

g k 0 H 1<br />

k g k<br />

jf kj<br />

number<br />

where f k is the value of the objective function, g k is the gradient (first derivative) of the<br />

objective function, and H k is the Hessian (second derivative) of the objective function, all at<br />

iteration k.


If H k is singular, then PROC <strong>MIXED</strong> uses the following relative criterion:<br />

g 0<br />

k g k<br />

jf kj<br />

number<br />

PROC <strong>MIXED</strong> Statement ✦ 3901<br />

To prevent the division by jf kj, use the ABSOLUTE option. <strong>The</strong> default convergence criterion<br />

is CONVH, and the default tolerance is 1E 8.<br />

COVTEST<br />

produces asymptotic standard errors and Wald Z-tests for the covariance parameter estimates.<br />

DATA=<strong>SAS</strong>-data-set<br />

names the <strong>SAS</strong> data set to be used by PROC <strong>MIXED</strong>. <strong>The</strong> default is the most recently created<br />

data set.<br />

DFBW<br />

has the same effect as the DDFM=BW option in the MODEL statement.<br />

EMPIRICAL<br />

computes the estimated variance-covariance matrix of the fixed-effects parameters by using<br />

the asymptotically consistent estimator described in Huber (1967), White (1980), Liang and<br />

Zeger (1986), and Diggle, Liang, and Zeger (1994). This estimator is commonly referred to<br />

as the “sandwich” estimator, and it is computed as follows:<br />

IC<br />

.X 0bV 1 X/<br />

SX<br />

iD1<br />

X 0 i cVi 1 bi bi 0cVi 1 Xi<br />

!<br />

.X 0bV 1 X/<br />

Here, bi D yi Xi bˇ, S is the number of subjects, and matrices with an i subscript are<br />

those for the ith subject. You must include the SUBJECT= option in either a RANDOM or<br />

REPEATED statement for this option to take effect.<br />

When you specify the EMPIRICAL option, PROC <strong>MIXED</strong> adjusts all standard errors and test<br />

statistics involving the fixed-effects parameters. This changes output in the following tables<br />

(listed in Table 56.22): Contrast, CorrB, CovB, Diffs, Estimates, InvCovB, LSMeans, Slices,<br />

SolutionF, Tests1–Tests3. <strong>The</strong> OUTP= and OUTPM= data sets are also affected. Finally,<br />

the Satterthwaite and Kenward-Roger degrees of freedom methods are not available if you<br />

specify the EMPIRICAL option.<br />

displays a table of various information criteria. <strong>The</strong> criteria are all in smaller-is-better form,<br />

and are described in Table 56.3.<br />

Table 56.3 Information Criteria<br />

Criterion Formula Reference<br />

AIC 2` C 2d Akaike (1974)<br />

AICC 2` C 2dn =.n d 1/ Hurvich and Tsai (1989)<br />

Burnham and Anderson (1998)<br />

HQIC 2` C 2d log log n Hannan and Quinn (1979)<br />

BIC 2` C d log n Schwarz (1978)<br />

CAIC 2` C d.log n C 1/ Bozdogan (1987)


3902 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

INFO<br />

Here ` denotes the maximum value of the (possibly restricted) log likelihood, d the dimension<br />

of the model, and n the number of observations. In <strong>SAS</strong> 6 of <strong>SAS</strong>/<strong>STAT</strong> software, n equals<br />

the number of valid observations for maximum likelihood estimation and n p for restricted<br />

maximum likelihood estimation, where p equals the rank of X. In later versions, n equals the<br />

number of effective subjects as displayed in the “Dimensions” table, unless this value equals<br />

1, in which case n equals the number of levels of the first random effect you specify in a<br />

RANDOM statement. If the number of effective subjects equals 1 and you have no RANDOM<br />

statements, then n reverts to the <strong>SAS</strong> 6 values. For AICC (a finite-sample corrected version<br />

of AIC), n equals the <strong>SAS</strong> 6 values of n, unless this number is less than d C 2, in which<br />

case it equals d C 2.<br />

For restricted likelihood estimation, d equals q, the effective number of estimated covariance<br />

parameters. In <strong>SAS</strong> 6, when a parameter estimate lies on a boundary constraint, then it is still<br />

included in the calculation of d, but in later versions it is not. <strong>The</strong> most common example<br />

of this behavior is when a variance component is estimated to equal zero. For maximum<br />

likelihood estimation, d equals q C p.<br />

For ODS purposes, the name of the “Information Criteria” table is “InfoCrit.”<br />

is a default option. <strong>The</strong> creation of the “Model Information,” “Dimensions,” and “Number of<br />

Observations” tables can be suppressed by using the NOINFO option.<br />

Note that in <strong>SAS</strong> 6 this option displays the “Model Information” and “Dimensions” tables.<br />

ITDETAILS<br />

displays the parameter values at each iteration and enables the writing of notes to the <strong>SAS</strong> log<br />

pertaining to “infinite likelihood” and “singularities” during Newton-Raphson iterations.<br />

LOGNOTE<br />

writes periodic notes to the log describing the current status of computations. It is designed<br />

for use with analyses requiring extensive CPU resources.<br />

MAXFUNC=number<br />

specifies the maximum number of likelihood evaluations in the optimization process. <strong>The</strong><br />

default is 150.<br />

MAXITER=number<br />

specifies the maximum number of iterations. <strong>The</strong> default is 50.<br />

METHOD=REML<br />

METHOD=ML<br />

METHOD=MIVQUE0<br />

METHOD=TYPE1<br />

METHOD=TYPE2<br />

METHOD=TYPE3<br />

specifies the estimation method for the covariance parameters. <strong>The</strong> REML specification performs<br />

residual (restricted) maximum likelihood, and it is the default method. <strong>The</strong> ML specification<br />

performs maximum likelihood, and the MIVQUE0 specification performs minimum<br />

variance quadratic unbiased estimation of the covariance parameters.


MMEQ<br />

PROC <strong>MIXED</strong> Statement ✦ 3903<br />

<strong>The</strong> METHOD=TYPEn specifications apply only to variance component models with no<br />

SUBJECT= effects and no REPEATED statement. An analysis of variance table is included<br />

in the output, and the expected mean squares are used to estimate the variance components<br />

(see Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>,” for further explanation). <strong>The</strong> resulting method-ofmoment<br />

variance component estimates are used in subsequent calculations, including standard<br />

errors computed from ESTIMATE and LSMEANS statements. For ODS purposes, the<br />

new table names are “Type1,” “Type2,” and “Type3,” respectively.<br />

requests that coefficients of the mixed model equations be displayed. <strong>The</strong>se are<br />

"<br />

X 0bR 1X X 0bR 1Z Z 0bR 1X Z 0bR 1Z C bG 1<br />

# "<br />

X<br />

;<br />

0bR 1y Z 0bR 1 #<br />

y<br />

assuming that bG is nonsingular. If bG is singular, PROC <strong>MIXED</strong> produces the following<br />

coefficients:<br />

"<br />

X 0bR 1 X X 0bR 1 ZbG<br />

bGZ 0bR 1 X bGZ 0bR 1 ZbG C bG<br />

#<br />

;<br />

"<br />

X 0bR 1 y<br />

bGZ 0bR 1 y<br />

See the section “Estimating Fixed and Random Effects in the Mixed Model” on page 3970<br />

for further information about these equations.<br />

MMEQSOL<br />

requests that a solution to the mixed model equations be produced, as well as the inverted<br />

coefficients matrix. Formulas for these equations are provided in the preceding description of<br />

the MMEQ option.<br />

When bG is singular, b and a generalized inverse of the left-hand-side coefficient matrix are<br />

transformed by using bG to produce b and bC, respectively, where bC is a generalized inverse<br />

of the left-hand-side coefficient matrix of the original equations.<br />

NAMELEN< =number ><br />

specifies the length to which long effect names are shortened. <strong>The</strong> default and minimum value<br />

is 20.<br />

NOBOUND<br />

has the same effect as the NOBOUND option in the PARMS statement.<br />

NOCLPRINT< =number ><br />

suppresses the display of the “Class Level Information” table if you do not specify number.<br />

If you do specify number, only levels with totals that are less than number are listed in the<br />

table.<br />

NOINFO<br />

suppresses the display of the “Model Information,” “Dimensions,” and “Number of Observations”<br />

tables.<br />

NOITPRINT<br />

suppresses the display of the “Iteration History” table.<br />

#


3904 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

NOPROFILE<br />

includes the residual variance as part of the Newton-Raphson iterations. This option applies<br />

only to models that have a residual variance parameter. By default, this parameter is profiled<br />

out of the likelihood calculations, except when you have specified the HOLD= option in the<br />

PARMS statement.<br />

ORD<br />

ORDER=DATA<br />

displays ordinates of the relevant distribution in addition to p-values. <strong>The</strong> ordinate can be<br />

viewed as an approximate odds ratio of hypothesis probabilities.<br />

ORDER=FORMATTED<br />

ORDER=FREQ<br />

ORDER=INTERNAL<br />

specifies the sorting order for the levels of all CLASS variables. This ordering determines<br />

which parameters in the model correspond to each level in the data, so the ORDER= option<br />

can be useful when you use CONTRAST or ESTIMATE statements.<br />

<strong>The</strong> default is ORDER=FORMATTED, and its behavior has been modified for <strong>SAS</strong> 8. When<br />

the default ORDER=FORMATTED is in effect for numeric variables for which you have supplied<br />

no explicit format, the levels are ordered by their internal values. In releases previous to<br />

<strong>SAS</strong> 8, numeric class levels with no explicit format were ordered by their BEST12. formatted<br />

values. In order to revert to the previous method you can specify this format explicitly for<br />

the CLASS variables. <strong>The</strong> change was implemented because the former default behavior for<br />

ORDER=FORMATTED often resulted in levels not being ordered numerically and required<br />

you to use an explicit format or ORDER=INTERNAL to get the more natural ordering.<br />

Table 56.4 shows how PROC <strong>MIXED</strong> interprets values of the ORDER= option.<br />

Table 56.4 Sort Order and Value of ORDER= Option<br />

Value of ORDER= Levels Sorted By<br />

DATA order of appearance in the input data set<br />

FORMATTED external formatted value, except for numeric variables<br />

with no explicit format, which are sorted by their unformatted<br />

(internal) value<br />

FREQ descending frequency count; levels with the most observations<br />

come first in the order<br />

INTERNAL unformatted value<br />

For FORMATTED and INTERNAL, the sort order is machine dependent.<br />

For more information about sort order, see the chapter on the SORT procedure in the <strong>SAS</strong><br />

<strong>Procedure</strong>s <strong>Guide</strong> and the discussion of BY-group processing in <strong>SAS</strong> Language Reference:<br />

Concepts.


PLOTS < (global-plot-options ) > < =plot-request < (options ) > ><br />

PROC <strong>MIXED</strong> Statement ✦ 3905<br />

PLOTS < (global-plot-options ) > < = (plot-request< (options) >< . . . plot-request< (options) > >) ><br />

requests that the <strong>MIXED</strong> procedure produce statistical graphics via the Output Delivery System,<br />

provided that the ODS GRAPHICS statement has been specified. For general information<br />

about ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For examples<br />

of the basic statistical graphics produced by the <strong>MIXED</strong> procedure and aspects of their computation<br />

and interpretation, see the section “ODS Graphics” on page 3998.<br />

<strong>The</strong> global-plot-options apply to all relevant plots generated by the <strong>MIXED</strong> procedure. <strong>The</strong><br />

global-plot-options supported by the <strong>MIXED</strong> procedure follow.<br />

Global Plot Options<br />

OBSNO<br />

uses the data set observation number to identify observations in tooltips, provided that<br />

the observation number can be determined. Otherwise, the number displayed in tooltips<br />

is the index of the observation as it is used in the analysis within the BY group.<br />

ONLY<br />

suppresses the default plots. Only the plots specifically requested are produced.<br />

UNPACK<br />

breaks a graphic that is otherwise paneled into individual component plots.<br />

ALL<br />

Specific Plot Options<br />

<strong>The</strong> following listing describes the specific plots and their options.<br />

requests that all plots appropriate for the particular analysis be produced.<br />

BOXPLOT < (boxplot-options) ><br />

requests box plots for the effects in your model that consist of classification effects only.<br />

Note that these effects can involve more than one classification variable (interaction<br />

and nested effects), but they cannot contain any continuous variables. By default, the<br />

BOXPLOT request produces box plots based on (conditional) raw residuals for the<br />

qualifying effects in the MODEL, RANDOM, and REPEATED statements. See the<br />

discussion of the boxplot-options in a later section for information about how to tune<br />

your box plot request.<br />

DISTANCE< (USEINDEX) ><br />

requests a plot of the likelihood or restricted likelihood distance. When influence diagnostics<br />

are requested with set selection according to an effect, the USEINDEX option<br />

enables you to replace the formatted tick values on the horizontal axis with integer indices<br />

of the effect levels in order to reduce the space taken up by the horizontal plot<br />

axis.


3906 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

INFLUENCEESTPLOT< (options) ><br />

requests panels of the deletiob estimates in an influence analysis, provided that the<br />

INFLUENCE option is specified in the MODEL statement. No plots are produced for<br />

fixed-effects parameters associated with singular columns in the X matrix or for covariance<br />

parameters associated with singularities in the ASYCOV matrix. By default,<br />

separate panels are produced for the fixed-effects and covariance parameters delete estimates.<br />

<strong>The</strong> FIXED and RANDOM options enable you to select these specific panels.<br />

<strong>The</strong> UNPACK option produces separate plots for each of the parameter estimates. <strong>The</strong><br />

USEINDEX option replaces formatted tick values for the horizontal axis with integer<br />

indices.<br />

INFLUENCE<strong>STAT</strong>PANEL< (options) ><br />

requests panels of influence statistics. For iterative influence analysis (see the<br />

INFLUENCE option in the MODEL statement), the panel shows the Cook’s D and<br />

CovRatio statistics for fixed-effects and covariance parameters, enabling you to gauge<br />

impact on estimates and precision for both types of estimates. In noniterative analysis,<br />

only statistics for the fixed effects are plotted. <strong>The</strong> UNPACK option produces separate<br />

plots from the elements in the panel. <strong>The</strong> USEINDEX option replaces formatted tick<br />

values for the horizontal axis with integer indices.<br />

RESIDUALPANEL < (residualplot-options) ><br />

requests a panel of raw residuals. By default, the conditional residuals are produced.<br />

See the discussion of residualplot-options in a later section for information about how<br />

to tune this panel.<br />

STUDENTPANEL < (residualplot-options) ><br />

requests a panel of studentized residuals. By default, the conditional residuals are produced.<br />

See the discussion of residualplot-options in a later section for information<br />

about how to tune this panel.<br />

PEARSONPANEL < (residualplot-options) ><br />

requests a panel of Pearson residuals. By default, the conditional residuals are produced.<br />

See the discussion of residualplot-options in a later section for information<br />

about how to tune this panel.<br />

PRESS< (USEINDEX) ><br />

requests a plot of PRESS residuals or PRESS statistics. <strong>The</strong>se are based on “leave-oneout”<br />

or “leave-set-out” prediction of the marginal mean. When influence diagnostics<br />

are requested with set selection according to an effect, the USEINDEX option enables<br />

you to replace the formatted tick values on the horizontal axis with integer indices of<br />

the effect levels in order to reduce the space taken up by the horizontal plot axis.<br />

VCIRYPANEL < (residualplot-options) ><br />

requests a panel of residual graphics based on the scaled residuals. See the VCIRY<br />

option in the MODEL statement for details about these scaled residuals. Only the<br />

UNPACK and BOX options of the residualplot-options are available for this type of<br />

residual panel.<br />

NONE<br />

suppresses all plots.


Residual Plot Options<br />

PROC <strong>MIXED</strong> Statement ✦ 3907<br />

<strong>The</strong> residualplot-options determine both the composition of the panels and the type of<br />

residuals being plotted.<br />

BOX<br />

BOXPLOT<br />

replaces the inset of summary statistics in the lower-right corner of the panel with<br />

a box plot of the residual (the “PROC GLIMMIX look”).<br />

CONDITIONAL<br />

BLUP<br />

MARGINAL<br />

constructs plots from conditional residuals.<br />

NOBLUP<br />

constructs plots from marginal residuals.<br />

UNPACK<br />

produces separate plots from the elements of the panel. <strong>The</strong> inset statistics are<br />

not part of the unpack operation.<br />

Box Plot Options<br />

<strong>The</strong> boxplot-options determine whether box plots are produced for residuals or for<br />

residuals and observed values, and for which model effects the box plots are constructed.<br />

<strong>The</strong> available boxplot-options are as follows.<br />

CONDITIONAL<br />

BLUP<br />

FIXED<br />

constructs box plots from conditional residuals—that is, residuals using the estimated<br />

BLUPs of random effects.<br />

produces box plots for all fixed effects (MODEL statement) consisting entirely<br />

of classification variables<br />

GROUP<br />

produces box plots for all GROUP= effects (RANDOM and REPEATED statement)<br />

consisting entirely of classification variables<br />

MARGINAL<br />

NOBLUP<br />

constructs box plots from marginal residuals.


3908 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

NPANEL=number<br />

provides the ability to break a box plot into multiple graphics. If number is<br />

negative, no balancing of the number of boxes takes place and number is the<br />

maximum number of boxes per graphic. If number is positive, the number of<br />

boxes per graphic is balanced. For example, suppose variable A has 125 levels,<br />

and consider the following statements:<br />

ods graphics on;<br />

proc mixed plots=boxplot(npanel=20);<br />

class A;<br />

model y = A;<br />

run;<br />

<strong>The</strong> box balancing results in six plots with 18 boxes each and one plot with<br />

17 boxes. If number is zero, and this is the default, all levels of the effect are<br />

displayed in a single plot.<br />

OBSERVED<br />

adds box plots of the observed data for the selected effects.<br />

RANDOM<br />

produces box plots for all random effects (RANDOM statement) consisting entirely<br />

of classification variables. This does not include effects specified in the<br />

GROUP= or SUBJECT= options of the RANDOM statement.<br />

REPEATED<br />

produces box plots for the repeated effects (REPEATED statement). This does<br />

not include effects specified in the GROUP= or SUBJECT= options of the<br />

REPEATED statement.<br />

STUDENT<br />

constructs box plots from studentized residuals rather than from raw residuals.<br />

SUBJECT<br />

produces box plots for all SUBJECT= effects (RANDOM and REPEATED statement)<br />

consisting entirely of classification variables.<br />

USEINDEX<br />

uses as the horizontal axis label the index of the effect level rather than the formatted<br />

value(s). For classification variables with many levels or model effects<br />

that involve multiple classification variables, the formatted values identifying the<br />

effect levels can take up too much space as axis tick values, leading to extensive<br />

thinning. <strong>The</strong> USEINDEX option replaces tick values constructed from formatted<br />

values with the internal level number.


RATIO<br />

Multiple Plot Request<br />

PROC <strong>MIXED</strong> Statement ✦ 3909<br />

You can list a plot request one or more times with different options. For example, the following<br />

statements request a panel of marginal raw residuals, individual plots generated from a<br />

panel of the conditional raw residuals, and a panel of marginal studentized residuals:<br />

ods graphics on;<br />

proc mixed plots(only)=(<br />

ResidualPanel(marginal)<br />

ResidualPanel(unpack conditional)<br />

StudentPanel(marginal box));<br />

<strong>The</strong> inset of residual statistics is replaced in this last panel by a box plot of the studentized<br />

residuals. Similarly, if you specify the INFLUENCE option in the MODEL statement, then<br />

the following statements request statistical graphics of fixed-effects deletion estimates (in a<br />

panel), covariance parameter deletion estimates (unpacked in individual plots), and box plots<br />

for the SUBJECT= and fixed classification effects based on residuals and observed values:<br />

ods graphics on / imagefmt=staticmap;<br />

proc mixed plots(only)=(<br />

InfluenceEstPlot(fixed)<br />

InfluenceEstPlot(random unpack)<br />

BoxPlot(observed fixed subject);<br />

<strong>The</strong> <strong>STAT</strong>ICMAP image format enables tooltips that show, for example, values of influence<br />

diagnostics associated with a particular delete estimate.<br />

This concludes the syntax section for the PLOTS= option in the PROC <strong>MIXED</strong> statement.<br />

produces the ratio of the covariance parameter estimates to the estimate of the residual variance<br />

when the latter exists in the model.<br />

RIDGE=number<br />

specifies the starting value for the minimum ridge value used in the Newton-Raphson algorithm.<br />

<strong>The</strong> default is 0.3125.<br />

SCORING< =number ><br />

requests that Fisher scoring be used in association with the estimation method up to iteration<br />

number, which is 0 by default. When you use the SCORING= option and PROC <strong>MIXED</strong><br />

converges without stopping the scoring algorithm, PROC <strong>MIXED</strong> uses the expected Hessian<br />

matrix to compute approximate standard errors for the covariance parameters instead of<br />

the observed Hessian. <strong>The</strong> output from the ASYCOV and ASYCORR options is similarly<br />

adjusted.<br />

SIGITER<br />

is an alias for the NOPROFILE option.<br />

UPDATE<br />

is an alias for the LOGNOTE option.


3910 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

BY Statement<br />

BY variables ;<br />

You can specify a BY statement with PROC <strong>MIXED</strong> to obtain separate analyses on observations in<br />

groups defined by the BY variables. When a BY statement appears, the procedure expects the input<br />

data set to be sorted in order of the BY variables. <strong>The</strong> variables are one or more variables in the<br />

input data set.<br />

If your input data set is not sorted in ascending order, use one of the following alternatives:<br />

Sort the data by using the SORT procedure with a similar BY statement.<br />

Specify the BY statement options NOTSORTED or DESCENDING in the BY statement for<br />

the <strong>MIXED</strong> procedure. <strong>The</strong> NOTSORTED option does not mean that the data are unsorted<br />

but rather that the data are arranged in groups (according to values of the BY variables) and<br />

that these groups are not necessarily in alphabetical or increasing numeric order.<br />

Create an index on the BY variables by using the DATASETS procedure (in Base <strong>SAS</strong> software).<br />

Because sorting the data changes the order in which PROC <strong>MIXED</strong> reads observations, the sorting<br />

order for the levels of the CLASS variable might be affected if you have specified ORDER=DATA in<br />

the PROC <strong>MIXED</strong> statement. This, in turn, affects specifications in the CONTRAST or ESTIMATE<br />

statement.<br />

For more information about the BY statement, see <strong>SAS</strong> Language Reference: Concepts. For more<br />

information about the DATASETS procedure, see the Base <strong>SAS</strong> <strong>Procedure</strong>s <strong>Guide</strong>.<br />

CLASS Statement<br />

CLASS variables ;<br />

<strong>The</strong> CLASS statement names the classification variables to be used in the analysis. If the CLASS<br />

statement is used, it must appear before the MODEL statement.<br />

Classification variables can be either character or numeric. By default, class levels are determined<br />

from the entire formatted values of the CLASS variables. Note that this represents a slight change<br />

from previous releases in the way in which class levels are determined. In releases prior to <strong>SAS</strong> ® 9,<br />

class levels were determined by using no more than the first 16 characters of the formatted values.<br />

If you want to revert to this previous behavior, you can use the TRUNCATE option in the CLASS<br />

statement. In any case, you can use formats to group values into levels. See the discussion of<br />

the FORMAT procedure in the Base <strong>SAS</strong> <strong>Procedure</strong>s <strong>Guide</strong> and the discussions of the FORMAT<br />

statement and <strong>SAS</strong> formats in <strong>SAS</strong> Language Reference: Dictionary. You can adjust the order of<br />

CLASS variable levels with the ORDER= option in the PROC <strong>MIXED</strong> statement.<br />

You can specify the following option in the CLASS statement after a slash (/):


CONTRAST Statement ✦ 3911<br />

TRUNCATE<br />

specifies that class levels should be determined by using no more than the first 16 characters<br />

of the formatted values of CLASS variables. When formatted values are longer than 16<br />

characters, you can use this option in order to revert to the levels as determined in releases<br />

previous to <strong>SAS</strong> ® 9.<br />

CONTRAST Statement<br />

CONTRAST ’label’ < fixed-effect values . . . ><br />

< | random-effect values . . . >, . . . < / options > ;<br />

<strong>The</strong> CONTRAST statement provides a mechanism for obtaining custom hypothesis tests. It is<br />

patterned after the CONTRAST statement in PROC GLM, although it has been extended to include<br />

random effects. This enables you to select an appropriate inference space (McLean, Sanders, and<br />

Stroup 1991).<br />

You can test the hypothesis L 0 D 0, where L 0 D .K 0 M 0 / and 0 D .ˇ 0 0 /, in several inference<br />

spaces. <strong>The</strong> inference space corresponds to the choice of M. When M D 0, your inferences apply<br />

to the entire population from which the random effects are sampled; this is known as the broad<br />

inference space. When all elements of M are nonzero, your inferences apply only to the observed<br />

levels of the random effects. This is known as the narrow inference space, and you can also choose<br />

it by specifying all of the random effects as fixed. <strong>The</strong> GLM procedure uses the narrow inference<br />

space. Finally, by setting to zero the portions of M corresponding to selected main effects and<br />

interactions, you can choose intermediate inference spaces. <strong>The</strong> broad inference space is usually<br />

the most appropriate, and it is used when you do not specify any random effects in the CONTRAST<br />

statement.<br />

<strong>The</strong> CONTRAST statement has the following arguments:<br />

label identifies the contrast in the table. A label is required for every contrast specified.<br />

Labels can be up to 200 characters and must be enclosed in quotes.<br />

fixed-effect identifies an effect that appears in the MODEL statement. <strong>The</strong> keyword INTER-<br />

CEPT can be used as an effect when an intercept is fitted in the model. You do<br />

not need to include all effects that are in the MODEL statement.<br />

random-effect identifies an effect that appears in the RANDOM statement. <strong>The</strong> first random<br />

effect must follow a vertical bar (|); however, random effects do not have to be<br />

specified.<br />

values are constants that are elements of the L matrix associated with the fixed and<br />

random effects.<br />

<strong>The</strong> rows of L 0 are specified in order and are separated by commas. <strong>The</strong> rows of the K 0 component<br />

of L 0 are specified on the left side of the vertical bars (|). <strong>The</strong>se rows test the fixed effects and are,<br />

therefore, checked for estimability. <strong>The</strong> rows of the M 0 component of L 0 are specified on the right<br />

side of the vertical bars. <strong>The</strong>y test the random effects, and no estimability checking is necessary.


3912 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

If PROC <strong>MIXED</strong> finds the fixed-effects portion of the specified contrast to be nonestimable (see the<br />

SINGULAR= option), then it displays a message in the log.<br />

<strong>The</strong> following CONTRAST statement reproduces the F test for the effect A in the split-plot example<br />

(see Example 56.1):<br />

contrast ’A broad’<br />

A 1 -1 0 A*B .5 .5 -.5 -.5 0 0 ,<br />

A 1 0 -1 A*B .5 .5 0 0 -.5 -.5 / df=6;<br />

Note that no random effects are specified in the preceding contrast; thus, the inference space is<br />

broad. <strong>The</strong> resulting F test has two numerator degrees of freedom because L 0 has two rows. <strong>The</strong><br />

denominator degrees of freedom is, by default, the residual degrees of freedom (9), but the DF=<br />

option changes the denominator degrees of freedom to 6.<br />

<strong>The</strong> following CONTRAST statement reproduces the F test for A when Block and A*Block are considered<br />

fixed effects (the narrow inference space):<br />

contrast ’A narrow’<br />

A 1 -1 0<br />

A*B .5 .5 -.5 -.5 0 0 |<br />

A*Block .25 .25 .25 .25<br />

-.25 -.25 -.25 -.25<br />

0 0 0 0 ,<br />

A 1 0 -1<br />

A*B .5 .5 0 0 -.5 -.5 |<br />

A*Block .25 .25 .25 .25<br />

0 0 0 0<br />

-.25 -.25 -.25 -.25 ;<br />

<strong>The</strong> preceding contrast does not contain coefficients for B and Block, because they cancel out in<br />

estimated differences between levels of A. Coefficients for B and Block are necessary to estimate the<br />

mean of one of the levels of A in the narrow inference space (see Example 56.1).<br />

If the elements of L are not specified for an effect that contains a specified effect, then the elements<br />

of the specified effect are automatically “filled in” over the levels of the higher-order effect. This<br />

feature is designed to preserve estimability for cases where there are complex higher-order effects.<br />

<strong>The</strong> coefficients for the higher-order effect are determined by equitably distributing the coefficients<br />

of the lower-level effect, as in the construction of least squares means. In addition, if the intercept<br />

is specified, it is distributed over all classification effects that are not contained by any other<br />

specified effect. If an effect is not specified and does not contain any specified effects, then all of<br />

its coefficients in L are set to 0. You can override this behavior by specifying coefficients for the<br />

higher-order effect.<br />

If too many values are specified for an effect, the extra ones are ignored; if too few are specified,<br />

the remaining ones are set to 0. If no random effects are specified, the vertical bar can be omitted;<br />

otherwise, it must be present. If a SUBJECT effect is used in the RANDOM statement, then the<br />

coefficients specified for the effects in the RANDOM statement are equitably distributed across the<br />

levels of the SUBJECT effect. You can use the E option to see exactly which L matrix is used.<br />

<strong>The</strong> SUBJECT and GROUP options in the CONTRAST statement are useful for the case when a<br />

SUBJECT= or GROUP= variable appears in the RANDOM statement, and you want to contrast


CONTRAST Statement ✦ 3913<br />

different subjects or groups. By default, CONTRAST statement coefficients on random effects are<br />

distributed equally across subjects and groups.<br />

PROC <strong>MIXED</strong> handles missing level combinations of classification variables similarly to the way<br />

PROC GLM does. Both procedures delete fixed-effects parameters corresponding to missing levels<br />

in order to preserve estimability. However, PROC <strong>MIXED</strong> does not delete missing level combinations<br />

for random-effects parameters because linear combinations of the random-effects parameters<br />

are always estimable. <strong>The</strong>se conventions can affect the way you specify your CONTRAST coefficients.<br />

<strong>The</strong> CONTRAST statement computes the statistic<br />

F D<br />

bˇ<br />

b<br />

0<br />

L.L 0bCL/ 1 L 0<br />

r<br />

bˇ<br />

b<br />

where r D rank.L 0bCL/, and approximates its distribution with an F distribution. In this expression,<br />

bC is an estimate of the generalized inverse of the coefficient matrix in the mixed model equations.<br />

See the section “Inference and Test Statistics” on page 3972 for more information about this F<br />

statistic.<br />

<strong>The</strong> numerator degrees of freedom in the F approximation are r D rank.L 0bCL/, and the denominator<br />

degrees of freedom are taken from the “Tests of Fixed Effects” table and corresponds to the final<br />

effect you list in the CONTRAST statement. You can change the denominator degrees of freedom<br />

by using the DF= option.<br />

You can specify the following options in the CONTRAST statement after a slash (/).<br />

CHISQ<br />

requests that chi-square tests be performed in addition to any F tests. A chi-square statistic<br />

equals its corresponding F statistic times the associate numerator degrees of freedom, and<br />

the same degrees of freedom are used to compute the p-value for the chi-square test. This<br />

p-value is always less than that for the F -test, as it effectively corresponds to an F test with<br />

infinite denominator degrees of freedom.<br />

DF=number<br />

specifies the denominator degrees of freedom for the F test. <strong>The</strong> default is the denominator<br />

degrees of freedom taken from the “Tests of Fixed Effects” table and corresponds to the final<br />

effect you list in the CONTRAST statement.<br />

E<br />

GROUP coeffs<br />

requests that the L matrix coefficients for the contrast be displayed. For ODS purposes, the<br />

label of this “L Matrix Coefficients” table is “Coef.”<br />

GRP coeffs<br />

sets up random-effect contrasts between different groups when a GROUP= variable appears in<br />

the RANDOM statement. By default, CONTRAST statement coefficients on random effects<br />

are distributed equally across groups.


3914 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

SINGULAR=number<br />

tunes the estimability checking. If v is a vector, define ABS(v) to be the absolute value of the<br />

element of v with the largest absolute value. If ABS(K 0 K 0 T) is greater than C*number for<br />

any row of K 0 in the contrast, then K is declared nonestimable. Here T is the Hermite form<br />

matrix .X 0 X/ X 0 X, and C is ABS(K 0 ) except when it equals 0, and then C is 1. <strong>The</strong> value for<br />

number must be between 0 and 1; the default is 1E 4.<br />

SUBJECT coeffs<br />

SUB coeffs<br />

sets up random-effect contrasts between different subjects when a SUBJECT= variable appears<br />

in the RANDOM statement. By default, CONTRAST statement coefficients on random<br />

effects are distributed equally across subjects.<br />

ESTIMATE Statement<br />

ESTIMATE ’label’ < fixed-effect values . . . ><br />

< | random-effect values . . . >< / options > ;<br />

<strong>The</strong> ESTIMATE statement is exactly like a CONTRAST statement, except only one-row L matrices<br />

are permitted. <strong>The</strong> actual estimate, L 0 bp, is displayed along with its approximate standard error. An<br />

approximate t test that L 0 bp = 0 is also produced.<br />

PROC <strong>MIXED</strong> selects the degrees of freedom to match those displayed in the “Tests of Fixed<br />

Effects” table for the final effect you list in the ESTIMATE statement. You can modify the degrees<br />

of freedom by using the DF= option.<br />

If PROC <strong>MIXED</strong> finds the fixed-effects portion of the specified estimate to be nonestimable, then<br />

it displays “Non-est” for the estimate entries.<br />

<strong>The</strong> following examples of ESTIMATE statements compute the mean of the first level of A in the<br />

split-plot example (see Example 56.1) for various inference spaces:<br />

estimate ’A1 mean narrow’ intercept 1<br />

A 1 B .5 .5 A*B .5 .5 |<br />

block .25 .25 .25 .25<br />

A*Block .25 .25 .25 .25<br />

0 0 0 0<br />

0 0 0 0;<br />

estimate ’A1 mean intermed’ intercept 1<br />

A 1 B .5 .5 A*B .5 .5 |<br />

Block .25 .25 .25 .25;<br />

estimate ’A1 mean broad’ intercept 1<br />

A 1 B .5 .5 A*B .5 .5;<br />

<strong>The</strong> construction of the L vector for an ESTIMATE statement follows the same rules as listed under<br />

the CONTRAST statement.<br />

You can specify the following options in the ESTIMATE statement after a slash (/).


ESTIMATE Statement ✦ 3915<br />

ALPHA=number<br />

requests that a t-type confidence interval be constructed with confidence level 1 number.<br />

<strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />

CL<br />

requests that t-type confidence limits be constructed. <strong>The</strong> confidence level is 0.95 by default;<br />

this can be changed with the ALPHA= option.<br />

DF=number<br />

specifies the degrees of freedom for the t test and confidence limits. <strong>The</strong> default is the denominator<br />

degrees of freedom taken from the “Tests of Fixed Effects” table and corresponds<br />

to the final effect you list in the ESTIMATE statement.<br />

DIVISOR=number<br />

specifies a value by which to divide all coefficients so that fractional coefficients can be entered<br />

as integer numerators.<br />

E<br />

GROUP coeffs<br />

requests that the L matrix coefficients be displayed. For ODS purposes, the name of this “L<br />

Matrix Coefficients” table is “Coef.”<br />

GRP coeffs<br />

sets up random-effect contrasts between different groups when a GROUP= variable appears<br />

in the RANDOM statement. By default, ESTIMATE statement coefficients on random effects<br />

are distributed equally across groups.<br />

LOWER<br />

LOWERTAILED<br />

requests that the p-value for the t test be based only on values less than the t statistic. A<br />

two-tailed test is the default. A lower-tailed confidence limit is also produced if you specify<br />

the CL option.<br />

SINGULAR=number<br />

tunes the estimability checking as documented for the SINGULAR= option in the<br />

CONTRAST statement.<br />

SUBJECT coeffs<br />

SUB coeffs<br />

sets up random-effect contrasts between different subjects when a SUBJECT= variable appears<br />

in the RANDOM statement. By default, ESTIMATE statement coefficients on random<br />

effects are distributed equally across subjects. For example, the ESTIMATE statement in the<br />

following code from Example 56.5 constructs the difference between the random slopes of<br />

the first two batches.<br />

proc mixed data=rc;<br />

class batch;<br />

model y = month / s;<br />

random int month / type=un sub=batch s;<br />

estimate ’slope b1 - slope b2’ | month 1 / subject 1 -1;<br />

run;


3916 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

UPPER<br />

UPPERTAILED<br />

requests that the p-value for the t test be based only on values greater than the t statistic. A<br />

two-tailed test is the default. An upper-tailed confidence limit is also produced if you specify<br />

the CL option.<br />

ID Statement<br />

ID variables ;<br />

<strong>The</strong> ID statement specifies which variables from the input data set are to be included in the OUTP=<br />

and OUTPM= data sets from the MODEL statement. If you do not specify an ID statement, then<br />

all variables are included in these data sets. Otherwise, only the variables you list in the ID statement<br />

are included. Specifying an ID statement with no variables prevents any variables from being<br />

included in these data sets.<br />

LSMEANS Statement<br />

LSMEANS fixed-effects < / options > ;<br />

<strong>The</strong> LSMEANS statement computes least squares means (LS-means) of fixed effects. As in the<br />

GLM procedure, LS-means are predicted population margins—that is, they estimate the marginal<br />

means over a balanced population. In a sense, LS-means are to unbalanced designs as class and<br />

subclass arithmetic means are to balanced designs. <strong>The</strong> L matrix constructed to compute them is<br />

the same as the L matrix formed in PROC GLM; however, the standard errors are adjusted for the<br />

covariance parameters in the model.<br />

Each LS-mean is computed as Lbˇ, where L is the coefficient matrix associated with the least squares<br />

mean and bˇ is the estimate of the fixed-effects parameter vector (see the section “Estimating Fixed<br />

and Random Effects in the Mixed Model” on page 3970). <strong>The</strong> approximate standard errors for the<br />

LS-mean is computed as the square root of L.X 0bV 1 X/ L 0 .<br />

LS-means can be computed for any effect in the MODEL statement that involves CLASS variables.<br />

You can specify multiple effects in one LSMEANS statement or in multiple LSMEANS statements,<br />

and all LSMEANS statements must appear after the MODEL statement. As in the ESTIMATE<br />

statement, the L matrix is tested for estimability, and if this test fails, PROC <strong>MIXED</strong> displays<br />

“Non-est” for the LS-means entries.<br />

Assuming the LS-mean is estimable, PROC <strong>MIXED</strong> constructs an approximate t test to test the null<br />

hypothesis that the associated population quantity equals zero. By default, the denominator degrees<br />

of freedom for this test are the same as those displayed for the effect in the “Tests of Fixed Effects”<br />

table (see the section “Default Output” on page 3989).<br />

Table 56.5 summarizes important options in the LSMEANS statement. All LSMEANS options are<br />

subsequently discussed in alphabetical order.


Table 56.5 Summary of Important LSMEANS Statement Options<br />

Option Description<br />

Construction and Computation of LS-Means<br />

AT modifies covariate value in computing LS-means<br />

BYLEVEL computes separate margins<br />

DIFF requests differences of LS-means<br />

OM specifies weighting scheme for LS-mean computation<br />

SINGULAR= tunes estimability checking<br />

SLICE= partitions F tests (simple effects)<br />

LSMEANS Statement ✦ 3917<br />

Degrees of Freedom and P-values<br />

ADJDFE= determines whether to compute row-wise denominator degrees<br />

of freedom with DDFM=SATTERTHWAITE or<br />

DDFM=KENWARDROGER<br />

ADJUST= determines the method for multiple comparison adjustment of LSmean<br />

differences<br />

ALPHA=˛ determines the confidence level (1 ˛)<br />

DF= assigns specific value to degrees of freedom for tests and confidence<br />

limits<br />

Statistical Output<br />

CL constructs confidence limits for means and or mean differences<br />

CORR displays correlation matrix of LS-means<br />

COV displays covariance matrix of LS-means<br />

E prints the L matrix<br />

You can specify the following options in the LSMEANS statement after a slash (/).<br />

ADJDFE=SOURCE<br />

ADJDFE=ROW<br />

specifies how denominator degrees of freedom are determined when p-values and confidence<br />

limits are adjusted for multiple comparisons with the ADJUST= option. When you do not<br />

specify the ADJDFE= option, or when you specify ADJDFE=SOURCE, the denominator<br />

degrees of freedom for multiplicity-adjusted results are the denominator degrees of freedom<br />

for the LS-mean effect in the “Type 3 Tests of Fixed Effects” table. When you specify AD-<br />

JDFE=ROW, the denominator degrees of freedom for multiplicity-adjusted results correspond<br />

to the degrees of freedom displayed in the DF column of the “Differences of Least Squares<br />

Means” table.<br />

<strong>The</strong> ADJDFE=ROW setting is particularly useful if you want multiplicity adjustments to<br />

take into account that denominator degrees of freedom are not constant across LS-mean<br />

differences. This can be the case, for example, when the DDFM=SATTERTHWAITE or<br />

DDFM=KENWARDROGER degrees-of-freedom method is in effect.<br />

In one-way models with heterogeneous variance, combining certain ADJUST= options with<br />

the ADJDFE=ROW option corresponds to particular methods of performing multiplicity adjustments<br />

in the presence of heteroscedasticity. For example, the following statements fit a


3918 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

ADJUST=BON<br />

heteroscedastic one-way model and perform Dunnett’s T3 method (Dunnett 1980), which is<br />

based on the studentized maximum modulus (ADJUST=SMM):<br />

proc mixed;<br />

class A;<br />

model y = A / ddfm=satterth;<br />

repeated / group=A;<br />

lsmeans A / adjust=smm adjdfe=row;<br />

run;<br />

If you combine the ADJDFE=ROW option with ADJUST=SIDAK, the multiplicity adjustment<br />

corresponds to the T2 method of Tamhane (1979), while ADJUST=TUKEY<br />

corresponds to the method of Games-Howell (Games and Howell 1976). Note that<br />

ADJUST=TUKEY gives the exact results for the case of fractional degrees of freedom in<br />

the one-way model, but it does not take into account that the degrees of freedom are subject<br />

to variability. A more conservative method, such as ADJUST=SMM, might protect the<br />

overall error rate better.<br />

Unless the ADJUST= option of the LSMEANS statement is specified, the ADJDFE= option<br />

has no effect.<br />

ADJUST=DUNNETT<br />

ADJUST=SCHEFFE<br />

ADJUST=SIDAK<br />

ADJUST=SIMULATE< (sim-options) ><br />

ADJUST=SMM | GT2<br />

ADJUST=TUKEY<br />

requests a multiple comparison adjustment for the p-values and confidence limits for the<br />

differences of LS-means. By default, PROC <strong>MIXED</strong> adjusts all pairwise differences unless<br />

you specify ADJUST=DUNNETT, in which case PROC <strong>MIXED</strong> analyzes all differences<br />

with a control level. <strong>The</strong> ADJUST= option implies the DIFF option.<br />

<strong>The</strong> BON (Bonferroni) and SIDAK adjustments involve correction factors described in Chapter<br />

39, “<strong>The</strong> GLM <strong>Procedure</strong>,” and Chapter 58, “<strong>The</strong> MULTTEST <strong>Procedure</strong>;” also see Westfall<br />

and Young (1993) and Westfall et al. (1999). When you specify ADJUST=TUKEY<br />

and your data are unbalanced, PROC <strong>MIXED</strong> uses the approximation described in Kramer<br />

(1956). Similarly, when you specify ADJUST=DUNNETT and the LS-means are correlated,<br />

PROC <strong>MIXED</strong> uses the factor-analytic covariance approximation described in Hsu (1992).<br />

<strong>The</strong> preceding references also describe the SCHEFFE and SMM adjustments.<br />

<strong>The</strong> SIMULATE adjustment computes adjusted p-values and confidence limits from the simulated<br />

distribution of the maximum or maximum absolute value of a multivariate t random<br />

vector. All covariance parameters except the residual variance are fixed at their estimated<br />

values throughout the simulation, potentially resulting in some underdispersion. <strong>The</strong> simulation<br />

estimates q, the true .1 ˛/th quantile, where 1 ˛ is the confidence coefficient. <strong>The</strong><br />

default ˛ is 0.05, and you can change this value with the ALPHA= option in the LSMEANS<br />

statement.


LSMEANS Statement ✦ 3919<br />

<strong>The</strong> number of samples is set so that the tail area for the simulated q is within of 1 ˛ with<br />

100.1 /% confidence. In equation form,<br />

P.jF .bq/ .1 ˛/j / D 1<br />

where Oq is the simulated q and F is the true distribution function of the maximum; see<br />

Edwards and Berry (1987) for details. By default, = 0.005 and = 0.01, placing the tail<br />

area of Oq within 0.005 of 0.95 with 99% confidence. <strong>The</strong> ACC= and EPS= sim-options reset<br />

and , respectively; the NSAMP= sim-option sets the sample size directly; and the SEED=<br />

sim-option specifies an integer used to start the pseudo-random number generator for the<br />

simulation. If you do not specify a seed, or if you specify a value less than or equal to zero,<br />

the seed is generated from reading the time of day from the computer clock. For additional<br />

descriptions of these and other simulation options, see the section “LSMEANS Statement”<br />

on page 2456 in Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>.”<br />

ALPHA=number<br />

requests that a t-type confidence interval be constructed for each of the LS-means with confidence<br />

level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />

AT variable = value<br />

AT (variable-list) = (value-list)<br />

AT MEANS<br />

enables you to modify the values of the covariates used in computing LS-means. By default,<br />

all covariate effects are set equal to their mean values for computation of standard LS-means.<br />

<strong>The</strong> AT option enables you to assign arbitrary values to the covariates. Additional columns in<br />

the output table indicate the values of the covariates.<br />

If there is an effect containing two or more covariates, the AT option sets the effect equal<br />

to the product of the individual means rather than the mean of the product (as with standard<br />

LS-means calculations). <strong>The</strong> AT MEANS option sets covariates equal to their mean values<br />

(as with standard LS-means) and incorporates this adjustment to crossproducts of covariates.<br />

As an example, consider the following invocation of PROC <strong>MIXED</strong>:<br />

proc mixed;<br />

class A;<br />

model Y = A X1 X2 X1*X2;<br />

lsmeans A;<br />

lsmeans A / at means;<br />

lsmeans A / at X1=1.2;<br />

lsmeans A / at (X1 X2)=(1.2 0.3);<br />

run;<br />

For the first two LSMEANS statements, the LS-means coefficient for X1 is x1 (the mean<br />

of X1) and for X2 is x2 (the mean of X2). However, for the first LSMEANS statement, the<br />

coefficient for X1*X2 is x1x2, but for the second LSMEANS statement, the coefficient is<br />

x1 x2. <strong>The</strong> third LSMEANS statement sets the coefficient for X1 equal to 1:2 and leaves it<br />

at x2 for X2, and the final LSMEANS statement sets these values to 1:2 and 0:3, respectively.<br />

If a WEIGHT variable is present, it is used in processing AT variables. Also, observations<br />

with missing dependent variables are included in computing the covariate means, unless these


3920 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

observations form a missing cell and the FULLX option in the MODEL statement is not in<br />

effect. You can use the E option in conjunction with the AT option to check that the modified<br />

LS-means coefficients are the ones you want.<br />

<strong>The</strong> AT option is disabled if you specify the BYLEVEL option.<br />

BYLEVEL<br />

requests PROC <strong>MIXED</strong> to process the OM data set by each level of the LS-mean effect<br />

(LSMEANS effect) in question. For more details, see the OM option later in this section.<br />

CL<br />

CORR<br />

COV<br />

requests that t-type confidence limits be constructed for each of the LS-means. <strong>The</strong> confidence<br />

level is 0.95 by default; this can be changed with the ALPHA= option.<br />

displays the estimated correlation matrix of the least squares means as part of the “Least<br />

Squares Means” table.<br />

displays the estimated covariance matrix of the least squares means as part of the “Least<br />

Squares Means” table.<br />

DF=number<br />

specifies the degrees of freedom for the t test and confidence limits. <strong>The</strong> default is the denominator<br />

degrees of freedom taken from the “Tests of Fixed Effects” table corresponding to<br />

the LS-means effect unless the DDFM=SATTERTHWAITE or DDFM=KENWARDROGER<br />

option is in effect in the MODEL statement. For these DDFM= methods, degrees of freedom<br />

are determined separately for each test; see the DDFM= option for more information.<br />

DIFF< =difftype ><br />

PDIFF< =difftype ><br />

requests that differences of the LS-means be displayed. <strong>The</strong> optional difftype specifies which<br />

differences to produce, with possible values being ALL, CONTROL, CONTROLL, and<br />

CONTROLU. <strong>The</strong> difftype ALL requests all pairwise differences, and it is the default. <strong>The</strong><br />

difftype CONTROL requests the differences with a control, which, by default, is the first level<br />

of each of the specified LSMEANS effects.<br />

To specify which levels of the effects are the controls, list the quoted formatted values in<br />

parentheses after the keyword CONTROL. For example, if the effects A, B, and C are classification<br />

variables, each having two levels, 1 and 2, the following LSMEANS statement<br />

specifies the (1,2) level of A*B and the (2,1) level of B*C as controls:<br />

lsmeans A*B B*C / diff=control(’1’ ’2’ ’2’ ’1’);<br />

For multiple effects, the results depend upon the order of the list, and so you should check the<br />

output to make sure that the controls are correct.<br />

Two-tailed tests and confidence limits are associated with the CONTROL difftype. For onetailed<br />

results, use either the CONTROLL or CONTROLU difftype. <strong>The</strong> CONTROLL difftype<br />

tests whether the noncontrol levels are significantly smaller than the control; the upper confidence<br />

limits for the control minus the noncontrol levels are considered to be infinity and


E<br />

LSMEANS Statement ✦ 3921<br />

are displayed as missing. Conversely, the CONTROLU difftype tests whether the noncontrol<br />

levels are significantly larger than the control; the upper confidence limits for the noncontrol<br />

levels minus the control are considered to be infinity and are displayed as missing.<br />

If you want to perform multiple comparison adjustments on the differences of LS-means, you<br />

must specify the ADJUST= option.<br />

<strong>The</strong> differences of the LS-means are displayed in a table titled “Differences of Least Squares<br />

Means.” For ODS purposes, the table name is “Diffs.”<br />

requests that the L matrix coefficients for all LSMEANS effects be displayed. For ODS<br />

purposes, the name of this “L Matrix Coefficients” table is “Coef.”<br />

OM< =OM-data-set ><br />

OBSMARGINS< =OM-data-set ><br />

specifies a potentially different weighting scheme for the computation of LS-means coefficients.<br />

<strong>The</strong> standard LS-means have equal coefficients across classification effects; however,<br />

the OM option changes these coefficients to be proportional to those found in OM-data-set.<br />

This adjustment is reasonable when you want your inferences to apply to a population that is<br />

not necessarily balanced but has the margins observed in OM-data-set.<br />

PDIFF<br />

By default, OM-data-set is the same as the analysis data set. You can optionally specify another<br />

data set that describes the population for which you want to make inferences. This data<br />

set must contain all model variables except for the dependent variable (which is ignored if it<br />

is present). In addition, the levels of all CLASS variables must be the same as those occurring<br />

in the analysis data set. Specifying an OM-data-set enables you to construct arbitrarily<br />

weighted LS-means.<br />

In computing the observed margins, PROC <strong>MIXED</strong> uses all observations for which there<br />

are no missing or invalid independent variables, including those for which there are missing<br />

dependent variables. Also, if OM-data-set has a WEIGHT variable, PROC <strong>MIXED</strong> uses<br />

weighted margins to construct the LS-means coefficients. If OM-data-set is balanced, the<br />

LS-means are unchanged by the OM option.<br />

<strong>The</strong> BYLEVEL option modifies the observed-margins LS-means. Instead of computing the<br />

margins across all of the OM-data-set, PROC <strong>MIXED</strong> computes separate margins for each<br />

level of the LSMEANS effect in question. In this case the resulting LS-means are actually<br />

equal to raw means for fixed-effects models and certain balanced random-effects models, but<br />

their estimated standard errors account for the covariance structure that you have specified. If<br />

the AT option is specified, the BYLEVEL option disables it.<br />

You can use the E option in conjunction with either the OM or BYLEVEL option to check that<br />

the modified LS-means coefficients are the ones you want. It is possible that the modified LSmeans<br />

are not estimable when the standard ones are, or vice versa. Nonestimable LS-means<br />

are noted as “Non-est” in the output.<br />

is the same as the DIFF option.


3922 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

SINGULAR=number<br />

tunes the estimability checking as documented for the SINGULAR= option in the<br />

CONTRAST statement.<br />

SLICE= fixed-effect<br />

SLICE= (fixed-effects)<br />

specifies effects by which to partition interaction LSMEANS effects. This can produce what<br />

are known as tests of simple effects (Winer 1971). For example, suppose that A*B is significant,<br />

and you want to test the effect of A for each level of B. <strong>The</strong> appropriate LSMEANS<br />

statement is as follows:<br />

lsmeans A*B / slice=B;<br />

This code tests for the simple main effects of A for B, which are calculated by extracting the<br />

appropriate rows from the coefficient matrix for the A*B LS-means and by using them to form<br />

an F test. See the section “Inference and Test Statistics” for more information about this F<br />

test.<br />

<strong>The</strong> SLICE option produces a table titled “Tests of Effect Slices.” For ODS purposes, the<br />

table name is “Slices.”<br />

MODEL Statement<br />

MODEL dependent = < fixed-effects >< / options > ;<br />

<strong>The</strong> MODEL statement names a single dependent variable and the fixed effects, which determine the<br />

X matrix of the mixed model (see the section “Parameterization of Mixed Models” on page 3975<br />

for details). <strong>The</strong> specification of effects is the same as in the GLM procedure; however, unlike<br />

PROC GLM, you do not specify random effects in the MODEL statement. <strong>The</strong> MODEL statement<br />

is required.<br />

An intercept is included in the fixed-effects model by default. If no fixed effects are specified, only<br />

this intercept term is fit. <strong>The</strong> intercept can be removed by using the NOINT option.<br />

Table 56.6 summarizes options in the MODEL statement. <strong>The</strong>se are subsequently discussed in<br />

detail in alphabetical order.<br />

Table 56.6 Summary of Important MODEL Statement Options<br />

Option Description<br />

Model Building<br />

NOINT excludes fixed-effect intercept from model<br />

Statistical Computations<br />

ALPHA=˛ determines the confidence level (1 ˛) for fixed effects<br />

ALPHAP=˛ determines the confidence level (1 ˛) for predicted values<br />

CHISQ requests chi-square tests<br />

DDF= specifies denominator degrees of freedom (list)


Table 56.6 continued<br />

Option Description<br />

MODEL Statement ✦ 3923<br />

DDFM= specifies the method for computing denominator degrees of freedom<br />

HTYPE= selects the type of hypothesis test<br />

INFLUENCE requests influence and case-deletion diagnostics<br />

NOTEST suppresses hypothesis tests for the fixed effects<br />

OUTP= specifies output data set for predicted values and related quantities<br />

OUTPM= specifies output data set for predicted values and related quantities<br />

RESIDUAL adds Pearson-type and studentized residuals to output data sets<br />

VCIRY adds scaled marginal residual to output data sets<br />

Statistical Output<br />

CL displays confidence limits for fixed-effects parameter estimates<br />

CORRB displays correlation matrix of fixed-effects parameter estimates<br />

COVB displays covariance matrix of fixed-effects parameter estimates<br />

COVBI displays inverse covariance matrix of fixed-effects parameter estimates<br />

E, E1, E2, E3 displays L matrix coefficients<br />

INTERCEPT adds a row for the intercept to test tables<br />

SOLUTION displays fixed-effects parameter estimates (and scale parameter in<br />

GLM models)<br />

Singularity Tolerances<br />

SINGCHOL= tunes sensitivity in computing Cholesky roots<br />

SINGRES= tunes singularity criterion for residual variance<br />

SINGULAR= tunes the sensitivity in sweeping<br />

ZETA= tunes the sensitivity in forming Type 3 functions<br />

You can specify the following options in the MODEL statement after a slash (/).<br />

ALPHA=number<br />

requests that a t-type confidence interval be constructed for each of the fixed-effects parameters<br />

with confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the<br />

default is 0.05.<br />

ALPHAP=number<br />

requests that a t-type confidence interval be constructed for the predicted values with confidence<br />

level 1 number. <strong>The</strong> value of number must be between 0 and 1; the default is 0.05.<br />

CHISQ<br />

CL<br />

requests that chi-square tests be performed for all specified effects in addition to the F tests.<br />

Type 3 tests are the default; you can produce the Type 1 and Type 2 tests by using the HTYPE=<br />

option.<br />

requests that t-type confidence limits be constructed for each of the fixed-effects parameter


3924 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

estimates. <strong>The</strong> confidence level is 0.95 by default; this can be changed with the ALPHA=<br />

option.<br />

CONTAIN<br />

has the same effect as the DDFM=CONTAIN option.<br />

CORRB<br />

produces the approximate correlation matrix of the fixed-effects parameter estimates. For<br />

ODS purposes, the name of this table is “CorrB.”<br />

COVB<br />

COVBI<br />

produces the approximate variance-covariance matrix of the fixed-effects parameter estimates<br />

bˇ. By default, this matrix equals .X 0bV 1 X/ and results from sweeping .X y/ 0bV 1 .X y/ on<br />

all but its last pivot and removing the y border. <strong>The</strong> EMPIRICAL option in the PROC <strong>MIXED</strong><br />

statement changes this matrix into “empirical sandwich” form. For ODS purposes, the name<br />

of this table is “CovB.” If the degrees-of-freedom method of Kenward and Roger (1997) is in<br />

effect (DDFM=KENWARDROGER), the COVB matrix changes because the method entails<br />

an adjustment of the variance-covariance matrix of the fixed effects by the method proposed<br />

by Prasad and Rao (1990) and Harville and Jeske (1992); see also Kackar and Harville (1984).<br />

produces the inverse of the approximate variance-covariance matrix of the fixed-effects parameter<br />

estimates. For ODS purposes, the name of this table is “InvCovB.”<br />

DDF=value-list<br />

enables you to specify your own denominator degrees of freedom for the fixed effects. <strong>The</strong><br />

value-list specification is a list of numbers or missing values (.) separated by commas. <strong>The</strong><br />

degrees of freedom should be listed in the order in which the effects appear in the “Tests of<br />

Fixed Effects” table. If you want to retain the default degrees of freedom for a particular<br />

effect, use a missing value for its location in the list. For example, the following statement<br />

assigns 3 denominator degrees of freedom to A and 4.7 to A*B, while those for B remain the<br />

same:<br />

model Y = A B A*B / ddf=3,.,4.7;<br />

If you specify DDFM=SATTERTHWAITE or DDFM=KENWARDROGER, the DDF= option<br />

has no effect.<br />

DDFM=CONTAIN<br />

DDFM=BETWITHIN<br />

DDFM=RESIDUAL<br />

DDFM=SATTERTHWAITE<br />

DDFM=KENWARDROGER< (FIRSTORDER) ><br />

specifies the method for computing the denominator degrees of freedom for the tests of fixed<br />

effects resulting from the MODEL, CONTRAST, ESTIMATE, and LSMEANS statements.<br />

Table 56.7 lists syntax aliases for the degrees-of-freedom methods.


Table 56.7 Aliases for DDFM= Option<br />

DDFM= Option Alias<br />

BETWITHIN BW<br />

CONTAIN CON<br />

KENWARDROGER KENROG, KR<br />

RESIDUAL RES<br />

SATTERTHWAITE SATTERTH, SAT<br />

MODEL Statement ✦ 3925<br />

<strong>The</strong> DDFM=CONTAIN option invokes the containment method to compute denominator degrees<br />

of freedom, and it is the default when you specify a RANDOM statement. <strong>The</strong> containment<br />

method is carried out as follows: Denote the fixed effect in question A, and search<br />

the RANDOM effect list for the effects that syntactically contain A. For example, the random<br />

effect B(A) contains A, but the random effect C does not, even if it has the same levels as B(A).<br />

Among the random effects that contain A, compute their rank contribution to the (X Z) matrix.<br />

<strong>The</strong> DDF assigned to A is the smallest of these rank contributions. If no effects are found,<br />

the DDF for A is set equal to the residual degrees of freedom, N rank.X Z/. This choice of<br />

DDF matches the tests performed for balanced split-plot designs and should be adequate for<br />

moderately unbalanced designs.<br />

CAUTION: If you have a Z matrix with a large number of columns, the overall memory<br />

requirements and the computing time after convergence can be substantial for the containment<br />

method. If it is too large, you might want to use the DDFM=BETWITHIN option.<br />

<strong>The</strong> DDFM=BETWITHIN option is the default for REPEATED statement specifications<br />

(with no RANDOM statements). It is computed by dividing the residual degrees of freedom<br />

into between-subject and within-subject portions. PROC <strong>MIXED</strong> then checks whether<br />

a fixed effect changes within any subject. If so, it assigns within-subject degrees of freedom<br />

to the effect; otherwise, it assigns the between-subject degrees of freedom to the effect (see<br />

Schluchter and Elashoff 1990). If there are multiple within-subject effects containing classification<br />

variables, the within-subject degrees of freedom are partitioned into components<br />

corresponding to the subject-by-effect interactions.<br />

One exception to the preceding method is the case where you have specified no RANDOM<br />

statements and a REPEATED statement with the TYPE=UN option. In this case, all effects<br />

are assigned the between-subject degrees of freedom to provide for better small-sample approximations<br />

to the relevant sampling distributions. DDFM=KENWARDROGER might be a<br />

better option to try for this case.<br />

<strong>The</strong> DDFM=RESIDUAL option performs all tests by using the residual degrees of freedom,<br />

n rank.X/, where n is the number of observations.<br />

<strong>The</strong> DDFM=SATTERTHWAITE option performs a general Satterthwaite approximation for<br />

the denominator degrees of freedom, computed as follows. Suppose is the vector of unknown<br />

parameters in V, and suppose C D .X 0 V 1 X/ , where denotes a generalized inverse.<br />

Let bC and b be the corresponding estimates.


3926 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Consider the one-dimensional case, and consider ` to be a vector defining an estimable linear<br />

combination of ˇ. <strong>The</strong> Satterthwaite degrees of freedom for the t statistic<br />

t D `bˇ<br />

p<br />

` OC` 0<br />

is computed as<br />

D 2.` OC` 0 / 2<br />

g 0 Ag<br />

where g is the gradient of `C` 0 with respect to , evaluated at b, and A is the asymptotic<br />

variance-covariance matrix of b obtained from the second derivative matrix of the likelihood<br />

equations.<br />

For the multidimensional case, let L be an estimable contrast matrix and denote the rank of<br />

LbCL 0 as q > 1. <strong>The</strong> Satterthwaite denominator degrees of freedom for the F statistic<br />

F D bˇ 0 L 0 .LbCL 0 / 1 Lbˇ<br />

q<br />

are computed by first performing the spectral decomposition LbCL 0 D P 0 DP, where P is an orthogonal<br />

matrix of eigenvectors and D is a diagonal matrix of eigenvalues, both of dimension<br />

q q. Define `m to be the mth row of PL, and let<br />

m D<br />

2.Dm/ 2<br />

g 0 m Agm<br />

where Dm is the mth diagonal element of D and gm is the gradient of `mC` 0 m<br />

, evaluated at b. <strong>The</strong>n let<br />

E D<br />

qX<br />

mD1<br />

m<br />

m<br />

2 I. m > 2/<br />

where the indicator function eliminates terms for which m<br />

F are then computed as<br />

D 2E<br />

E q<br />

provided E > q; otherwise is set to zero.<br />

with respect to<br />

2. <strong>The</strong> degrees of freedom for<br />

This method is a generalization of the techniques described in Giesbrecht and Burns (1985),<br />

McLean and Sanders (1988), and Fai and Cornelius (1996). <strong>The</strong> method can also include estimated<br />

random effects. In this case, append b to bˇ and change bC to be the inverse of the coefficient<br />

matrix in the mixed model equations. <strong>The</strong> calculations require extra memory to hold c<br />

matrices that are the size of the mixed model equations, where c is the number of covariance<br />

parameters. In the notation of Table 56.25, this is approximately 8q.p C g/.p C g/=2 bytes.<br />

Extra computing time is also required to process these matrices. <strong>The</strong> Satterthwaite method<br />

implemented here is intended to produce an accurate F approximation; however, the results<br />

can differ from those produced by PROC GLM. Also, the small sample properties of this


E<br />

E1<br />

E2<br />

E3<br />

FULLX<br />

MODEL Statement ✦ 3927<br />

approximation have not been extensively investigated for the various models available with<br />

PROC <strong>MIXED</strong>.<br />

<strong>The</strong> DDFM=KENWARDROGER option performs the degrees of freedom calculations detailed<br />

by Kenward and Roger (1997). This approximation involves inflating the estimated<br />

variance-covariance matrix of the fixed and random effects by the method proposed<br />

by Prasad and Rao (1990) and Harville and Jeske (1992); see also Kackar and Harville<br />

(1984). Satterthwaite-type degrees of freedom are then computed based on this adjustment.<br />

By default, the observed information matrix of the covariance parameter estimates<br />

is used in the calculations. For covariance structures that have nonzero second derivatives<br />

with respect to the covariance parameters, the Kenward-Roger covariance matrix adjustment<br />

includes a second-order term. This term can result in standard error shrinkage.<br />

Also, the resulting adjusted covariance matrix can then be indefinite and is not invariant under<br />

reparameterization. <strong>The</strong> FIRSTORDER suboption of the DDFM=KENWARDROGER<br />

option eliminates the second derivatives from the calculation of the covariance matrix<br />

adjustment. For the case of scalar estimable functions, the resulting estimator is referred<br />

to as the Prasad-Rao estimator em @ in Harville and Jeske (1992). <strong>The</strong> following<br />

are examples of covariance structures that generally lead to nonzero second derivatives:<br />

TYPE=ANTE(1), TYPE=AR(1), TYPE=ARH(1), TYPE=ARMA(1,1), TYPE=CSH,<br />

TYPE=FA, TYPE=FA0(q), TYPE=TOEPH, TYPE=UNR, and all TYPE=SP() structures.<br />

When the asymptotic variance matrix of the covariance parameters is found to be singular,<br />

a generalized inverse is used. Covariance parameters with zero variance then do<br />

not contribute to the degrees-of-freedom adjustment for DDFM=SATTERTHWAITE and<br />

DDFM=KENWARDROGER, and a message is written to the log.<br />

This method changes output in the following tables (listed in Table 56.22): Contrast, CorrB,<br />

CovB, Diffs, Estimates, InvCovB, LSMeans, Slices, SolutionF, SolutionR, Tests1–Tests3.<br />

<strong>The</strong> OUTP= and OUTPM= data sets are also affected.<br />

requests that Type 1, Type 2, and Type 3 L matrix coefficients be displayed for all specified<br />

effects. For ODS purposes, the name of the table is “Coef.”<br />

requests that Type 1 L matrix coefficients be displayed for all specified effects. For ODS<br />

purposes, the name of the table is “Coef.”<br />

requests that Type 2 L matrix coefficients be displayed for all specified effects. For ODS<br />

purposes, the name of the table is “Coef.”<br />

requests that Type 3 L matrix coefficients be displayed for all specified effects. For ODS<br />

purposes, the name of the table is “Coef.”<br />

requests that columns of the X matrix that consist entirely of zeros not be eliminated from X;<br />

otherwise, they are eliminated by default. For a column corresponding to a missing cell to<br />

be added to X, its particular levels must be present in at least one observation in the analysis


3928 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

data set along with a missing dependent variable. <strong>The</strong> use of the FULLX option can affect<br />

coefficient specifications in the CONTRAST and ESTIMATE statements, as well as covariate<br />

coefficients from LSMEANS statements specified with the AT MEANS option.<br />

HTYPE=value-list<br />

indicates the type of hypothesis test to perform on the fixed effects. Valid entries for value<br />

are 1, 2, and 3; the default value is 3. You can specify several types by separating the values<br />

with a comma or a space. <strong>The</strong> ODS table names are “Tests1” for the Type 1 tests, “Tests2”<br />

for the Type 2 tests, and “Tests3” for the Type 3 tests.<br />

INFLUENCE< (influence-options) ><br />

specifies that influence and case deletion diagnostics are to be computed.<br />

<strong>The</strong> INFLUENCE option computes influence diagnostics by noniterative or iterative methods.<br />

<strong>The</strong> noniterative diagnostics rely on recomputation formulas under the assumption that<br />

covariance parameters or their ratios remain fixed. With the possible exception of a profiled<br />

residual variance, no covariance parameters are updated. This is the default behavior because<br />

of its computational efficiency. However, the impact of an observation on the overall analysis<br />

can be underestimated if its effect on covariance parameters is not assessed. Toward this end,<br />

iterative methods can be applied to gauge the overall impact of observations and to obtain<br />

influence diagnostics for the covariance parameter estimates.<br />

If you specify the INFLUENCE option without further suboptions, PROC <strong>MIXED</strong> computes<br />

single-case deletion diagnostics and influence statistics for each observation in the data set by<br />

updating estimates for the fixed-effects parameter estimates, and also the residual variance, if<br />

it is profiled. <strong>The</strong> EFFECT=, SELECT=, ITER=, SIZE=, and KEEP= suboptions provide additional<br />

flexibility in the computation and reporting of influence statistics. Table 56.8 briefly<br />

describes important suboptions and their effect on the influence analysis.<br />

Table 56.8 Summary of INFLUENCE Default and Suboptions<br />

Description Suboption<br />

Compute influence diagnostics for individual observations default<br />

Measure influence of sets of observations chosen according to a EFFECT=<br />

classification variable or effect<br />

Remove pairs of observations and report the results sorted by de- SIZE=2<br />

gree of influence<br />

Remove triples, quadruples of observations, etc. SIZE=<br />

Allow selection of individual observations, observations sharing SELECT=<br />

specific levels of effects, and construction of tuples from specified<br />

subsets of observations<br />

Update fixed effects and covariance parameters by refitting the ITER=n > 0<br />

mixed model, adding up to n iterations<br />

Compute influence diagnostics for the covariance parameters ITER=n > 0<br />

Update only fixed effects and the residual variance, if it is profiled ITER=0<br />

Add the reduced-data estimates to the data set created with ODS<br />

OUTPUT<br />

ESTIMATES


MODEL Statement ✦ 3929<br />

<strong>The</strong> modifiers and their default values are discussed in the following paragraphs. <strong>The</strong> set<br />

of computed influence diagnostics varies with the suboptions. <strong>The</strong> most extensive set of<br />

influence diagnostics is obtained when ITER=n with n > 0.<br />

You can produce statistical graphics of influence diagnostics when the ODS GRAPHICS<br />

statement is specified. For general information about ODS Graphics, see Chapter 21,<br />

“Statistical Graphics Using ODS.” For specific information about the graphics available in<br />

the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on page 3998.<br />

You can specify the following influence-options in parentheses:<br />

EFFECT=effect<br />

specifies an effect according to which observations are grouped. Observations sharing<br />

the same level of the effect are removed from the analysis as a group. <strong>The</strong> effect must<br />

contain only classification variables, but they need not be contained in the model.<br />

ESTIMATES<br />

EST<br />

ITER=n<br />

Removing observations can change the rank of the .X 0 V 1 X/ matrix. This is particularly<br />

likely to happen when multiple observations are eliminated from the analysis.<br />

If the rank of the estimated variance-covariance matrix of bˇ changes or its singularity<br />

pattern is altered, no influence diagnostics are computed.<br />

specifies that the updated parameter estimates should be written to the ODS output<br />

data set. <strong>The</strong> values are not displayed in the “Influence” table, but if you use ODS<br />

OUTPUT to create a data set from the listing, the estimates are added to the data set.<br />

If ITER=0, only the fixed-effects estimates are saved. In iterative influence analyses,<br />

fixed-effects and covariance parameters are stored. <strong>The</strong> p fixed-effects parameter estimates<br />

are named Parm1–Parmp, and the q covariance parameter estimates are named<br />

CovP1–CovPq. <strong>The</strong> order corresponds to that in the “Solution for Fixed Effects” and<br />

“Covariance Parameter Estimates” tables. If parameter updates fail—for example, because<br />

of a loss of rank or a nonpositive definite Hessian—missing values are reported.<br />

controls the maximum number of additional iterations PROC <strong>MIXED</strong> performs to update<br />

the fixed-effects and covariance parameter estimates following data point removal.<br />

If you specify n > 0, then statistics such as DFFITS, MDFFITS, and the likelihood<br />

distances measure the impact of observation(s) on all aspects of the analysis. Typically,<br />

the influence will grow compared to values at ITER=0. In models without RANDOM<br />

or REPEATED effects, the ITER= option has no effect.<br />

This documentation refers to analyses when n > 0 simply as iterative influence analysis,<br />

even if final covariance parameter estimates can be updated in a single step (for<br />

example, when METHOD=MIVQUE0 or METHOD=TYPE3). This nomenclature reflects<br />

the fact that only if n > 0 are all model parameters updated, which can require<br />

additional iterations. If n > 0 and METHOD=REML (default) or METHOD=ML,<br />

the procedure updates fixed effects and variance-covariance parameters after removing<br />

the selected observations with additional Newton-Raphson iterations, starting from the<br />

converged estimates for the entire data. <strong>The</strong> process stops for each observation or set of<br />

observations if the convergence criterion is satisfied or the number of further iterations


3930 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

exceeds n. If n > 0 and METHOD=TYPE1, TYPE2, or TYPE3, ANOVA estimates of<br />

the covariance parameters are recomputed in a single step.<br />

Compared to noniterative updates, the computations are more involved. In particular<br />

for large data sets and/or a large number of random effects, iterative updates require<br />

considerably more resources. A one-step (ITER=1) or two-step update might be a<br />

good compromise. <strong>The</strong> output includes the number of iterations performed, which is<br />

less than n if the iteration converges. If the process does not converge in n iterations,<br />

you should be careful in interpreting the results, especially if n is fairly large.<br />

Bounds and other restrictions on the covariance parameters carry over from the fulldata<br />

model. Covariance parameters that are not iterated in the model fit to the full<br />

data (the NOITER or HOLD option in the PARMS statement) are likewise not updated<br />

in the refit. In certain models, such as random-effects models, the ratios between the<br />

covariance parameters and the residual variance are maintained rather than the actual<br />

value of the covariance parameter estimate (see the section “Influence Diagnostics” on<br />

page 3982).<br />

KEEP=n<br />

determines how many observations are retained for display and in the output data set or<br />

how many tuples if you specify SIZE=. <strong>The</strong> output is sorted by an influence statistic as<br />

discussed for the SIZE= suboption.<br />

SELECT=value-list<br />

specifies which observations or effect levels are chosen for influence calculations. If<br />

the SELECT= suboption is not specified, diagnostics are computed as follows:<br />

for all observations, if EFFECT= or SIZE= are not given<br />

for all levels of the specified effect, if EFFECT= is specified<br />

for all tuples of size k formed from the observations in value-list, if SIZE=k is<br />

specified<br />

When you specify an effect with the EFFECT= option, the values in value-list represent<br />

indices of the levels in the order in which PROC <strong>MIXED</strong> builds classification effects.<br />

Which observations in the data set correspond to this index depends on the order of the<br />

variables in the CLASS statement, not the order in which the variables appear in the<br />

interaction effect. See the section “Parameterization of Mixed Models” on page 3975<br />

to understand precisely how the procedure indexes nested and crossed effects and how<br />

levels of classification variables are ordered. <strong>The</strong> actual values of the classification<br />

variables involved in the effect are shown in the output so you can determine which<br />

observations were removed.<br />

If the EFFECT= suboption is not specified, the SELECT= value list refers to the sequence<br />

in which observations are read from the input data set or from the current BY<br />

group if there is a BY statement. This indexing is not necessarily the same as the observation<br />

numbers in the input data set, for example, if a WHERE clause is specified or<br />

during BY processing.


SIZE=n<br />

MODEL Statement ✦ 3931<br />

instructs PROC <strong>MIXED</strong> to remove groups of observations formed as tuples of size<br />

n. For example, SIZE=2 specifies all n .n 1/=2 unique pairs of observations.<br />

<strong>The</strong> number of tuples for SIZE=k is nŠ=.kŠ.n k/Š/ and grows quickly with n and<br />

k. Using the SIZE= option can result in considerable computing time. <strong>The</strong> <strong>MIXED</strong><br />

procedure displays by default only the 50 tuples with the greatest influence. Use the<br />

KEEP= option to override this default and to retain a different number of tuples in the<br />

listing or ODS output data set. Regardless of the KEEP= specification, all tuples are<br />

evaluated and the results are ordered according to an influence statistic. This statistic<br />

is the (restricted) likelihood distance as a measure of overall influence if ITER= n > 0<br />

or when a residual variance is profiled. When likelihood distances are unavailable, the<br />

results are ordered by the PRESS statistic.<br />

To reduce computational burden, the SIZE= option can be combined with the<br />

SELECT=value-list modifier. For example, the following statements evaluate all<br />

15 D 6 5=2 pairs formed from observations 13, 14, 18, 30, 31, and 33 and display<br />

the five pairs with the greatest influence:<br />

proc mixed;<br />

class a m f;<br />

model penetration = a m /<br />

influence(size=2 keep=5<br />

select=13,14,18,30,31,33);<br />

random f(m);<br />

run;<br />

If any observation in a tuple contains missing values or has otherwise not contributed to<br />

the analysis, the tuple is not evaluated. This guarantees that the displayed results refer<br />

to the same number of observations, so that meaningful statistics are available by which<br />

to order the results. If computations fail for a particular tuple—for example, because<br />

the .X 0 V 1 X/ matrix changes rank or the G matrix is not positive definite—no results<br />

are produced. Results are retained when the maximum number of iterative updates is<br />

exceeded in iterative influence analyses.<br />

<strong>The</strong> SIZE= suboption cannot be combined with the EFFECT= suboption. As in the<br />

case of the EFFECT= suboption, the statistics being computed are those appropriate<br />

for removal of multiple data points, even if SIZE=1.<br />

For ODS purposes the name of the “Influence Diagnostics” table is “Influence.” <strong>The</strong> variables<br />

in this table depend on whether you specify the EFFECT=, SIZE=, or KEEP= suboption and<br />

whether covariance parameters are iteratively updated. When ITER=0 (the default), certain<br />

influence diagnostics are meaningful only if the residual variance is profiled. Table 56.9 and<br />

Table 56.10 summarize the statistics obtained depending on the model and modifiers. <strong>The</strong><br />

last column in these tables gives the variable name in the ODS OUTPUT INFLUENCE=<br />

data set. Restricted likelihood distances are reported instead of the likelihood distance unless<br />

METHOD=ML. See the section “Influence Diagnostics” on page 3982 for details about the<br />

individual statistics.


3932 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.9 Statistics Computed with INFLUENCE Option, Noniterative Analysis (ITER=0)<br />

Suboption 2 Statistic Variable<br />

Profiled Name<br />

Default Yes Observed value Observed<br />

Predicted value Predicted<br />

Residual Residual<br />

Leverage Leverage<br />

PRESS residual PRESSRes<br />

Internally studentized residual Student<br />

Externally studentized residual RStudent<br />

RMSE without deleted obs RMSE<br />

Cook’s D CookD<br />

DFFITS DFFITS<br />

CovRatio COVRATIO<br />

(Restricted) likelihood distance RLD, LD<br />

Default No Observed value Observed<br />

Predicted value Predicted<br />

Residual Residual<br />

Leverage Leverage<br />

PRESS residual PRESSRes<br />

Internally studentized residual Student<br />

Cook’s D CookD<br />

EFFECT=, Yes Observations in level (tuple) Nobs<br />

SIZE=, PRESS statistic PRESS<br />

or KEEP= Cook’s D CookD<br />

MDFFITS MDFFITS<br />

CovRatio COVRATIO<br />

COVTRACE COVTRACE<br />

RMSE without deleted level (tuple) RMSE<br />

(Restricted) likelihood distance RLD, LD<br />

EFFECT=, No Observations in level (tuple) Nobs<br />

SIZE=, PRESS statistic PRESS<br />

or KEEP= Cook’s D CookD<br />

Table 56.10 Statistics Computed with INFLUENCE Option, Iterative Analysis (ITER=n > 0)<br />

Suboption Statistic Variable<br />

Name<br />

Default Number of iterations Iter<br />

Observed value Observed<br />

Predicted value Predicted<br />

Residual Residual<br />

Leverage Leverage


Table 56.10 continued<br />

Suboption Statistic Variable<br />

Name<br />

MODEL Statement ✦ 3933<br />

PRESS residual PRESSres<br />

Internally studentized residual Student<br />

Externally studentized residual RStudent<br />

RMSE without deleted obs (if possible) RMSE<br />

Cook’s D CookD<br />

DFFITS DFFITS<br />

CovRatio COVRATIO<br />

Cook’s D CovParms CookDCP<br />

CovRatio CovParms COVRATIOCP<br />

MDFFITS CovParms MDFFITSCP<br />

(Restricted) likelihood distance RLD, LD<br />

EFFECT=, Observations in level (tuple) Nobs<br />

SIZE=, Number of iterations Iter<br />

or KEEP= PRESS statistic PRESS<br />

RMSE without deleted level (tuple) RMSE<br />

Cook’s D CookD<br />

MDFFITS MDFFITS<br />

CovRatio COVRATIO<br />

COVTRACE COVTRACE<br />

Cook’s D CovParms CookDCP<br />

CovRatio CovParms COVRATIOCP<br />

MDFFITS CovParms MDFFITSCP<br />

(Restricted) likelihood distance RLD, LD<br />

INTERCEPT<br />

adds a row to the tables for Type 1, 2, and 3 tests corresponding to the overall intercept.<br />

LCOMPONENTS<br />

requests an estimate for each row of the L matrix used to form tests of fixed effects. Components<br />

corresponding to Type 3 tests are the default; you can produce the Type 1 and Type 2<br />

component estimates with the HTYPE= option.<br />

Tests of fixed effects involve testing of linear hypotheses of the form Lˇ D 0. <strong>The</strong> matrix<br />

L is constructed from Type 1, 2, or 3 estimable functions. By default the <strong>MIXED</strong> procedure<br />

constructs Type 3 tests. In many situations, the individual rows of the matrix L represent<br />

contrasts of interest. For example, in a one-way classification model, the Type 3 estimable<br />

functions define differences of factor-level means. In a balanced two-way layout, the rows of<br />

L correspond to differences of cell means.<br />

For example, suppose factors A and B have a and b levels, respectively. <strong>The</strong> following statements<br />

produce .a 1/ one degree of freedom tests for the rows of L associated with the Type<br />

1 and Type 3 estimable functions for factor A, .b 1/ tests for the rows of L associated with<br />

factor B, and a single test for the Type 1 and Type 3 coefficients associated with regressor X:


3934 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

class A B;<br />

model y = A B x / htype=1,3 lcomponents;<br />

<strong>The</strong> denominator degrees of freedom associated with a row of L are the same as those in the<br />

corresponding “Tests of Fixed Effects” table, except for DDFM=KENWARDROGER and<br />

DDFM=SATTERTHWAITE. For these degree of freedom methods, the denominator degrees<br />

of freedom are computed separately for each row of L.<br />

For ODS purposes, the name of the table containing all requested component tests is “LComponents.”<br />

See Example 56.9 for applications of the LCOMPONENTS option.<br />

NOCONTAIN<br />

has the same effect as the DDFM=RESIDUAL option.<br />

NOINT<br />

requests that no intercept be included in the model. An intercept is included by default.<br />

NOTEST<br />

specifies that no hypothesis tests be performed for the fixed effects.<br />

OUTP=<strong>SAS</strong>-data-set<br />

OUTPRED=<strong>SAS</strong>-data-set<br />

specifies an output data set containing predicted values and related quantities. This option<br />

replaces the P option from <strong>SAS</strong> 6.<br />

Predicted values are formed by using the rows from (X Z) as L matrices. Thus, predicted<br />

values from the original data are Xbˇ C Zb. <strong>The</strong>ir approximate standard errors of prediction<br />

are formed from the quadratic form of L with bC defined in the section “Statistical Properties”<br />

on page 3971. <strong>The</strong> L95 and U95 variables provide a t-type confidence interval for the<br />

predicted values, and they correspond to the L95M and U95M variables from the GLM and<br />

REG procedures for fixed-effects models. <strong>The</strong> residuals are the observed minus the predicted<br />

values. Predicted values for data points other than those observed can be obtained by using<br />

missing dependent variables in your input data set.<br />

Specifications that have a REPEATED statement with the SUBJECT= option and missing dependent<br />

variables compute predicted values by using empirical best linear unbiased prediction<br />

(EBLUP). Using hats . O / to denote estimates, the EBLUP formula is<br />

Om D Xm Ǒ C OCm OV 1 .y X Ǒ/<br />

where m represents a hypothetical realization of a missing data vector with associated design<br />

matrix Xm. <strong>The</strong> matrix Cm is the model-based covariance matrix between m and the observed<br />

data y, and other notation is as presented in the section “Mixed Models <strong>The</strong>ory” on page 3962.<br />

<strong>The</strong> estimated prediction variance is as follows:<br />

cVar. Om m/ D OVm OCm OV 1 OC T m C<br />

ŒXm OCm OV 1 X.X T OV 1 X/ ŒXm OCm OV 1 X T


MODEL Statement ✦ 3935<br />

where Vm is the model-based variance matrix of m. For further details, see Henderson (1984)<br />

and Harville (1990). This feature can be useful for forecasting time series or for computing<br />

spatial predictions.<br />

By default, all variables from the input data set are included in the OUTP= data set. You can<br />

select a subset of these variables by using the ID statement.<br />

OUTPM=<strong>SAS</strong>-data-set<br />

OUTPREDM=<strong>SAS</strong>-data-set<br />

specifies an output data set containing predicted means and related quantities. This option<br />

replaces the PM option from <strong>SAS</strong> 6.<br />

<strong>The</strong> output data set is of the same form as that resulting from the OUTP= option, except<br />

that the predicted values do not incorporate the EBLUP values Zb. <strong>The</strong>y also do not use the<br />

EBLUPs for specifications that have a REPEATED statement with the SUBJECT= option and<br />

missing dependent variables. <strong>The</strong> predicted values are formed as Xbˇ in the OUTPM= data<br />

set, and standard errors are quadratic forms in the approximate variance-covariance matrix of<br />

bˇ as displayed by the COVB option.<br />

By default, all variables from the input data set are included in the OUTPM= data set. You<br />

can select a subset of these variables by using the ID statement.<br />

RESIDUAL<br />

requests that Pearson-type and (internally) studentized residuals be added to the OUTP= and<br />

OUTPM= data sets. Studentized residuals are raw residuals standardized by their estimated<br />

standard error. When residuals are internally studentized, the data point in question has<br />

contributed to the estimation of the covariance parameter estimates on which the standard<br />

error of the residual is based. Externally studentized residuals can be computed with the<br />

INFLUENCE option. Pearson-type residuals scale the residual by the standard deviation of<br />

the response.<br />

<strong>The</strong> option has no effect unless the OUTP= or OUTPM= option is specified or unless you request<br />

statistical graphics with the ODS GRAPHICS statement. For general information about<br />

ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific information<br />

about the graphics available in the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on<br />

page 3998. For computational details about studentized and Pearson residuals in <strong>MIXED</strong>,<br />

see the section “Residual Diagnostics” on page 3980.<br />

SINGCHOL=number<br />

tunes the sensitivity in computing Cholesky roots. If a diagonal pivot element is less than<br />

D*number as PROC <strong>MIXED</strong> performs the Cholesky decomposition on a matrix, the associated<br />

column is declared to be linearly dependent upon previous columns and is set to 0. <strong>The</strong><br />

value D is the original diagonal element of the matrix. <strong>The</strong> default for number is 1E4 times<br />

the machine epsilon; this product is approximately 1E 12 on most computers.<br />

SINGRES=number<br />

sets the tolerance for which the residual variance is considered to be zero. <strong>The</strong> default is 1E4<br />

times the machine epsilon; this product is approximately 1E 12 on most computers.


3936 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

SINGULAR=number<br />

tunes the sensitivity in sweeping. If a diagonal pivot element is less than D*number as<br />

PROC <strong>MIXED</strong> sweeps a matrix, the associated column is declared to be linearly dependent<br />

upon previous columns, and the associated parameter is set to 0. <strong>The</strong> value D is the original<br />

diagonal element of the matrix. <strong>The</strong> default is 1E4 times the machine epsilon; this product is<br />

approximately 1E 12 on most computers.<br />

SOLUTION<br />

S<br />

VCIRY<br />

XPVIX<br />

XPVIXI<br />

requests that a solution for the fixed-effects parameters be produced. Using notation from<br />

the section “Mixed Models <strong>The</strong>ory” on page 3962, the fixed-effects parameter estimates<br />

are bˇ and their approximate standard errors are the square roots of the diagonal elements<br />

of .X 0bV 1 X/ . You can output this approximate variance matrix with the COVB option<br />

or modify it with the EMPIRICAL option in the PROC <strong>MIXED</strong> statement or the<br />

DDFM=KENWARDROGER option in the MODEL statement.<br />

Along with the estimates and their approximate standard errors, a t statistic is computed as the<br />

estimate divided by its standard error. <strong>The</strong> degrees of freedom for this t statistic matches the<br />

one appearing in the “Tests of Fixed Effects” table under the effect containing the parameter.<br />

<strong>The</strong> “Pr > |t|” column contains the two-tailed p-value corresponding to the t statistic and<br />

associated degrees of freedom. You can use the CL option to request confidence intervals<br />

for all of the parameters; they are constructed around the estimate by using a radius of the<br />

standard error times a percentage point from the t distribution.<br />

requests that responses and marginal residuals be scaled by the inverse Cholesky root of the<br />

marginal variance-covariance matrix. <strong>The</strong> variables ScaledDep and ScaledResid are added to<br />

the OUTPM= data set. <strong>The</strong>se quantities can be important in bootstrapping of data or residuals.<br />

Examination of the scaled residuals is also helpful in diagnosing departures from normality.<br />

Notice that the results of this scaling operation can depend on the order in which the <strong>MIXED</strong><br />

procedure processes the data.<br />

<strong>The</strong> VCIRY option has no effect unless you also use the OUTPM= option or unless you request<br />

statistical graphics with the ODS GRAPHICS statement. For general information about<br />

ODS Graphics, see Chapter 21, “Statistical Graphics Using ODS.” For specific information<br />

about the graphics available in the <strong>MIXED</strong> procedure, see the section “ODS Graphics” on<br />

page 3998.<br />

is an alias for the COVBI option.<br />

is an alias for the COVB option.<br />

ZETA=number<br />

tunes the sensitivity in forming Type 3 functions. Any element in the estimable function basis<br />

with an absolute value less than number is set to 0. <strong>The</strong> default is 1E 8.


PARMS Statement<br />

PARMS (value-list) . . . < / options > ;<br />

PARMS Statement ✦ 3937<br />

<strong>The</strong> PARMS statement specifies initial values for the covariance parameters, or it requests a grid<br />

search over several values of these parameters. You must specify the values in the order in which<br />

they appear in the “Covariance Parameter Estimates” table.<br />

<strong>The</strong> value-list specification can take any of several forms:<br />

m a single value<br />

m1; m2; : : : ; mn several values<br />

m to n a sequence where m equals the starting value, n equals the ending value, and the<br />

increment equals 1<br />

m to n by i a sequence where m equals the starting value, n equals the ending value, and the<br />

increment equals i<br />

m1; m2 to m3<br />

mixed values and sequences<br />

You can use the PARMS statement to input known parameters. Referring to the split-plot example<br />

(Example 56.1), suppose the three variance components are known to be 60, 20, and 6. <strong>The</strong> <strong>SAS</strong><br />

statements to fix the variance components at these values are as follows:<br />

proc mixed data=sp noprofile;<br />

class Block A B;<br />

model Y = A B A*B;<br />

random Block A*Block;<br />

parms (60) (20) (6) / noiter;<br />

run;<br />

<strong>The</strong> NOPROFILE option requests PROC <strong>MIXED</strong> to refrain from profiling the residual variance parameter<br />

during its calculations, thereby enabling its value to be held at 6 as specified in the PARMS<br />

statement. <strong>The</strong> NOITER option prevents any Newton-Raphson iterations so that the subsequent<br />

results are based on the given variance components. You can also specify known parameters of G<br />

by using the GDATA= option in the RANDOM statement.<br />

If you specify more than one set of initial values, PROC <strong>MIXED</strong> performs a grid search of the<br />

likelihood surface and uses the best point on the grid for subsequent analysis. Specifying a large<br />

number of grid points can result in long computing times. <strong>The</strong> grid search feature is also useful for<br />

exploring the likelihood surface. (See Example 56.3.)<br />

<strong>The</strong> results from the PARMS statement are the values of the parameters on the specified grid (denoted<br />

by CovP1–CovPn), the residual variance (possibly estimated) for models with a residual<br />

variance parameter, and various functions of the likelihood.<br />

For ODS purposes, the name of the “Parameter Search” table is “ParmSearch.”<br />

You can specify the following options in the PARMS statement after a slash (/).


3938 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

HOLD=value-list<br />

EQCONS=value-list<br />

specifies which parameter values PROC <strong>MIXED</strong> should hold to equal the specified values.<br />

For example, the following statement constrains the first and third covariance parameters to<br />

equal 5 and 2, respectively:<br />

parms (5) (3) (2) (3) / hold=1,3;<br />

LOGDETH<br />

evaluates the log determinant of the Hessian matrix for each point specified in the PARMS<br />

statement. A Log Det H column is added to the “Parameter Search” table.<br />

LOWERB=value-list<br />

enables you to specify lower boundary constraints on the covariance parameters. <strong>The</strong> valuelist<br />

specification is a list of numbers or missing values (.) separated by commas. You must<br />

list the numbers in the order that PROC <strong>MIXED</strong> uses for the covariance parameters, and<br />

each number corresponds to the lower boundary constraint. A missing value instructs PROC<br />

<strong>MIXED</strong> to use its default constraint, and if you do not specify numbers for all of the covariance<br />

parameters, PROC <strong>MIXED</strong> assumes the remaining ones are missing.<br />

An example for which this option is useful is when you want to constrain the G matrix to<br />

be positive definite in order to avoid the more computationally intensive algorithms required<br />

when G becomes singular. <strong>The</strong> corresponding statements for a random coefficients model are<br />

as follows:<br />

proc mixed;<br />

class person;<br />

model y = time;<br />

random int time / type=fa0(2) sub=person;<br />

parms / lowerb=1e-4,.,1e-4;<br />

run;<br />

Here the TYPE=FA0(2) structure is used in order to specify a Cholesky root parameterization<br />

for the 2 2 unstructured blocks in G. This parameterization ensures that the G matrix is<br />

nonnegative definite, and the PARMS statement then ensures that it is positive definite by<br />

constraining the two diagonal terms to be greater than or equal to 1E 4.<br />

NOBOUND<br />

requests the removal of boundary constraints on covariance parameters. For example, variance<br />

components have a default lower boundary constraint of 0, and the NOBOUND option<br />

allows their estimates to be negative.<br />

NOITER<br />

requests that no Newton-Raphson iterations be performed and that PROC <strong>MIXED</strong> use the<br />

best value from the grid search to perform inferences. By default, iterations begin at the best<br />

value from the PARMS grid search.<br />

NOPROFILE<br />

specifies a different computational method for the residual variance during the grid search.<br />

By default, PROC <strong>MIXED</strong> estimates this parameter by using the profile likelihood when


OLS<br />

PRIOR Statement ✦ 3939<br />

appropriate. This estimate is displayed in the Variance column of the “Parameter Search”<br />

table. <strong>The</strong> NOPROFILE option suppresses the profiling and uses the actual value of the<br />

specified variance in the likelihood calculations.<br />

requests starting values corresponding to the usual general linear model. Specifically, all<br />

variances and covariances are set to zero except for the residual variance, which is set equal<br />

to its ordinary least squares (OLS) estimate. This option is useful when the default MIVQUE0<br />

procedure produces poor starting values for the optimization process.<br />

PARMSDATA=<strong>SAS</strong>-data-set<br />

PDATA=<strong>SAS</strong>-data-set<br />

reads in covariance parameter values from a <strong>SAS</strong> data set. <strong>The</strong> data set should contain the Est<br />

or Covp1–Covpn variables.<br />

RATIOS<br />

indicates that ratios with the residual variance are specified instead of the covariance parameters<br />

themselves. <strong>The</strong> default is to use the individual covariance parameters.<br />

UPPERB=value-list<br />

enables you to specify upper boundary constraints on the covariance parameters. <strong>The</strong> valuelist<br />

specification is a list of numbers or missing values (.) separated by commas. You must<br />

list the numbers in the order that PROC <strong>MIXED</strong> uses for the covariance parameters, and<br />

each number corresponds to the upper boundary constraint. A missing value instructs PROC<br />

<strong>MIXED</strong> to use its default constraint, and if you do not specify numbers for all of the covariance<br />

parameters, PROC <strong>MIXED</strong> assumes that the remaining ones are missing.<br />

PRIOR Statement<br />

PRIOR < distribution >< / options > ;<br />

<strong>The</strong> PRIOR statement enables you to carry out a sampling-based Bayesian analysis in PROC<br />

<strong>MIXED</strong>. It currently operates only with variance component models. <strong>The</strong> analysis produces a<br />

<strong>SAS</strong> data set containing a pseudo-random sample from the joint posterior density of the variance<br />

components and other parameters in the mixed model.<br />

<strong>The</strong> posterior analysis is performed after all other PROC <strong>MIXED</strong> computations. It begins with the<br />

“Posterior Sampling Information” table, which provides basic information about the posterior sampling<br />

analysis, including the prior densities, sampling algorithm, sample size, and random number<br />

seed. For ODS purposes, the name of this table is “Posterior.”<br />

By default, PROC <strong>MIXED</strong> uses an independence chain algorithm in order to generate the posterior<br />

sample (Tierney 1994). This algorithm works by generating a pseudo-random proposal from a<br />

convenient base distribution, chosen to be as close as possible to the posterior. <strong>The</strong> proposal is then<br />

retained in the sample with probability proportional to the ratio of weights constructed by taking<br />

the ratio of the true posterior to the base density. If a proposal is not accepted, then a duplicate of<br />

the previous observation is added to the chain.


3940 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

In selecting the base distribution, PROC <strong>MIXED</strong> makes use of the fact that the fixed-effects parameters<br />

can be analytically integrated out of the joint posterior, leaving the marginal posterior density<br />

of the variance components. In order to better approximate the marginal posterior density of the<br />

variance components, PROC <strong>MIXED</strong> transforms them by using the MIVQUE(0) equations. You<br />

can display the selected transformation with the PTRANS option or specify your own with the<br />

TDATA= option. <strong>The</strong> density of the transformed parameters is then approximated by a product of<br />

inverted gamma densities (see Gelfand et al. 1990).<br />

To determine the parameters for the inverted gamma densities, PROC <strong>MIXED</strong> evaluates the logarithm<br />

of the posterior density over a grid of points in each of the transformed parameters, and you<br />

can display the results of this search with the PSEARCH option. PROC <strong>MIXED</strong> then performs a<br />

linear regression of these values on the logarithm of the inverted gamma density. <strong>The</strong> resulting base<br />

densities are displayed in the “Base Densities” table; for ODS purposes, the name of this table is<br />

“BaseDen.” You can input different base densities with the BDATA= option.<br />

At the end of the sampling, the “Acceptance Rates” table displays the acceptance rate computed<br />

as the number of accepted samples divided by the total number of samples generated. For ODS<br />

purposes, the name of the “Acceptance Rates” table is “AcceptanceRates.”<br />

<strong>The</strong> OUT= option specifies the output data set containing the posterior sample. PROC <strong>MIXED</strong> automatically<br />

includes all variance component parameters in this data set (labeled COVP1–COVPn),<br />

the Type 3 F statistics constructed as in Ghosh (1992) discussing Schervish (1992) (labeled T3Fn),<br />

the log values of the posterior (labeled LOGF), the log of the base sampling density (labeled<br />

LOGG), and the log of their ratio (labeled LOGRATIO). If you specify the SOLUTION option<br />

in the MODEL statement, the data set also contains a random sample from the posterior density<br />

of the fixed-effects parameters (labeled BETAn); and if you specify the SOLUTION option in<br />

the RANDOM statement, the table contains a random sample from the posterior density of the<br />

random-effects parameters (labeled GAMn). PROC <strong>MIXED</strong> also generates additional variables<br />

corresponding to any CONTRAST, ESTIMATE, or LSMEANS statement that you specify.<br />

Subsequently, you can use <strong>SAS</strong>/INSIGHT or the UNIVARIATE, CAPABILITY, or KDE procedure<br />

to analyze the posterior sample.<br />

<strong>The</strong> prior density of the variance components is, by default, a noninformative version of Jeffreys’<br />

prior (Box and Tiao 1973). You can also specify informative priors with the DATA= option or a<br />

flat (equal to 1) prior for the variance components. <strong>The</strong> prior density of the fixed-effects parameters<br />

is assumed to be flat (equal to 1), and the resulting posterior is conditionally multivariate normal<br />

(conditioning on the variance component parameters) with mean .X 0 V 1 X/ X 0 V 1 y and variance<br />

.X 0 V 1 X/ .<br />

<strong>The</strong> distribution argument in the PRIOR statement determines the prior density for the variance<br />

component parameters of your mixed model. Valid values are as follows.<br />

DATA=<br />

enables you to input the prior densities of the variance components used by the sampling<br />

algorithm. This data set must contain the Type and Parm1–Parmn variables, where n is the<br />

largest number of parameters among each of the base densities. <strong>The</strong> format of the DATA=<br />

data set matches that created by PROC <strong>MIXED</strong> in the “Base Densities” table, so you can<br />

output the densities from one run and use them as input for a subsequent run.


PRIOR Statement ✦ 3941<br />

JEFFREYS<br />

specifies a noninformative reference version of Jeffreys’ prior constructed by using the square<br />

root of the determinant of the expected information matrix as in (1.3.92) of Box and Tiao<br />

(1973). This is the default prior.<br />

FLAT<br />

specifies a prior density equal to 1 everywhere, making the likelihood function the posterior.<br />

You can specify the following options in the PRIOR statement after a slash (/).<br />

ALG=IC | INDCHAIN<br />

ALG=IS | IMPSAMP<br />

ALG=RS | REJSAMP<br />

ALG=RWC | RWCHAIN<br />

specifies the algorithm used for generating the posterior sample. <strong>The</strong> ALG=IC option requests<br />

an independence chain algorithm, and it is the default. <strong>The</strong> option ALG=IS requests<br />

importance sampling, ALG=RS requests rejection sampling, and ALG=RWC requests a random<br />

walk chain. For more information about these techniques, see Ripley (1987), Smith and<br />

Gelfand (1992), and Tierney (1994).<br />

BDATA=<br />

enables you to input the base densities used by the sampling algorithm. This data set must<br />

contain the Type and Parm1–Parmn variables, where n is the largest number of parameters<br />

among each of the base densities. <strong>The</strong> format of the BDATA= data set matches that created<br />

by PROC <strong>MIXED</strong> in the “Base Densities” table, so you can output the densities from one run<br />

and use them as input for a subsequent run.<br />

GRID=(value-list)<br />

specifies a grid of values over which to evaluate the posterior density. <strong>The</strong> value-list syntax is<br />

the same as in the PARMS statement, and you must specify an output data set name with the<br />

OUTG= option.<br />

GRIDT=(value-list)<br />

specifies a transformed grid of values over which to evaluate the posterior density. <strong>The</strong> valuelist<br />

syntax is the same as in the PARMS statement, and you must specify an output data set<br />

name with the OUTGT= option.<br />

IFACTOR=number<br />

is an alias for the SFACTOR= option.<br />

LOGNOTE=number<br />

instructs PROC <strong>MIXED</strong> to write a note to the <strong>SAS</strong> log after it generates the sample corresponding<br />

to each multiple of number. This is useful for monitoring the progress of CPUintensive<br />

runs.<br />

LOGRBOUND=number<br />

specifies the bounding constant for rejection sampling. <strong>The</strong> value of number equals the maximum<br />

of logff =gg over the variance component parameter space, where f is the posterior<br />

density and g is the product inverted gamma densities used to perform rejection sampling.<br />

When performing the rejection sampling, you might encounter the following message:


3942 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

WARNING: <strong>The</strong> log ratio bound of LL was violated at sample XX.<br />

When this occurs, PROC <strong>MIXED</strong> reruns an optimization algorithm to determine a new log<br />

upper bound and then restarts the rejection sampling. <strong>The</strong> resulting OUT= data set contains<br />

all observations that have been generated; therefore, assuming that you have requested N<br />

samples, you should retain only the final N observations in this data set for analysis purposes.<br />

NSAMPLE=number<br />

specifies the number of posterior samples to generate. <strong>The</strong> default is 1000, but more accurate<br />

results are obtained with larger samples such as 10000.<br />

NSEARCH=number<br />

specifies the number of posterior evaluations PROC <strong>MIXED</strong> makes for each transformed<br />

parameter in determining the parameters for the inverted gamma densities. <strong>The</strong> default is 20.<br />

OUT=<strong>SAS</strong>-data-set<br />

creates an output data set containing the sample from the posterior density.<br />

OUTG=<strong>SAS</strong>-data-set<br />

creates an output data set from the grid evaluations specified in the GRID= option.<br />

OUTGT=<strong>SAS</strong>-data-set<br />

creates an output data set from the transformed grid evaluations specified in the GRIDT=<br />

option.<br />

PSEARCH<br />

displays the search used to determine the parameters for the inverted gamma densities. For<br />

ODS purposes, the name of the table is “Search.”<br />

PTRANS<br />

displays the transformation of the variance components. For ODS purposes, the name of the<br />

table is “Trans.”<br />

SEED=number<br />

specifies an integer used to start the pseudo-random number generator for the simulation. If<br />

you do not specify a seed, or if you specify a value less than or equal to zero, the seed is by<br />

default generated from reading the time of day from the computer clock. You should use a<br />

positive seed (less than 2 31 1) whenever you want to duplicate the sample in another run of<br />

PROC <strong>MIXED</strong>.<br />

SFACTOR=number<br />

enables you to adjust the range over which PROC <strong>MIXED</strong> searches the transformed parameters<br />

in order to determine the parameters for the inverted gamma densities. PROC <strong>MIXED</strong><br />

determines the range by first transforming the estimates from the standard PROC <strong>MIXED</strong><br />

analysis (REML, ML, or MIVQUE0, depending upon which estimation method you select).<br />

It then multiplies and divides the transformed estimates by 2 number to obtain upper and<br />

lower bounds, respectively. Transformed values that produce negative variance components<br />

in the original scale are not included in the search. <strong>The</strong> default value is 1; number must be<br />

greater than 0.5.


RANDOM Statement ✦ 3943<br />

TDATA=<br />

enables you to input the transformation of the covariance parameters used by the sampling<br />

algorithm. This data set should contain the CovP1–CovPn variables. <strong>The</strong> format of the<br />

TDATA= data set matches that created by PROC <strong>MIXED</strong> in the “Trans” table, so you can<br />

output the transformation from one run and use it as input for a subsequent run.<br />

TRANS=EXPECTED<br />

TRANS=MIVQUE0<br />

TRANS=OBSERVED<br />

specifies the particular algorithm used to determine the transformation of the covariance parameters.<br />

<strong>The</strong> default is MIVQUE0, indicating a transformation based on the MIVQUE(0)<br />

equations. <strong>The</strong> other two options indicate the type of Hessian matrix used in constructing the<br />

transformation via a Cholesky root.<br />

UPDATE=number<br />

is an alias for the LOGNOTE= option.<br />

RANDOM Statement<br />

RANDOM random-effects < / options > ;<br />

<strong>The</strong> RANDOM statement defines the random effects constituting the vector in the mixed model.<br />

It can be used to specify traditional variance component models (as in the VARCOMP procedure)<br />

and to specify random coefficients. <strong>The</strong> random effects can be classification or continuous, and<br />

multiple RANDOM statements are possible.<br />

Using notation from the section “Mixed Models <strong>The</strong>ory” on page 3962, the purpose of the RAN-<br />

DOM statement is to define the Z matrix of the mixed model, the random effects in the vector,<br />

and the structure of G. <strong>The</strong> Z matrix is constructed exactly as the X matrix for the fixed effects,<br />

and the G matrix is constructed to correspond with the effects constituting Z. <strong>The</strong> structure of G is<br />

defined by using the TYPE= option.<br />

You can specify INTERCEPT (or INT) as a random effect to indicate the intercept. PROC <strong>MIXED</strong><br />

does not include the intercept in the RANDOM statement by default as it does in the MODEL<br />

statement.<br />

Table 56.11 summarizes important options in the RANDOM statement. All options are subsequently<br />

discussed in alphabetical order.<br />

Table 56.11 Summary of Important RANDOM Statement Options<br />

Option Description<br />

Construction of Covariance Structure<br />

GDATA= requests that the G matrix be read from a <strong>SAS</strong> data set<br />

GROUP= varies covariance parameters by groups<br />

LDATA= specifies data set with coefficient matrices for TYPE= LIN<br />

NOFULLZ eliminates columns in Z corresponding to missing values


3944 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.11 continued<br />

Option Description<br />

RATIOS indicates that ratios are specified in the GDATA= data set<br />

SUBJECT= identifies the subjects in the model<br />

TYPE= specifies the covariance structure<br />

Statistical Output<br />

ALPHA=˛ determines the confidence level (1 ˛)<br />

CL requests confidence limits for predictors of random effects<br />

G displays the estimated G matrix<br />

GC displays the Cholesky root (lower) of estimated G matrix<br />

GCI displays the inverse Cholesky root (lower) of estimated G matrix<br />

GCORR displays the correlation matrix corresponding to estimated G matrix<br />

GI displays the inverse of the estimated G matrix<br />

SOLUTION displays solutions b of the G-side random effects<br />

V displays blocks of the estimated V matrix<br />

VC displays the lower-triangular Cholesky root of blocks of the estimated<br />

V matrix<br />

VCI displays the inverse Cholesky root of blocks of the estimated V<br />

matrix<br />

VCORR displays the correlation matrix corresponding to blocks of the estimated<br />

V matrix<br />

VI displays the inverse of the blocks of the estimated V matrix<br />

You can specify the following options in the RANDOM statement after a slash (/).<br />

ALPHA=number<br />

requests that a t-type confidence interval be constructed for each of the random-effect estimates<br />

with confidence level 1 number. <strong>The</strong> value of number must be between 0 and 1; the<br />

default is 0.05.<br />

CL<br />

G<br />

GC<br />

GCI<br />

requests that t-type confidence limits be constructed for each of the random-effect estimates.<br />

<strong>The</strong> confidence level is 0.95 by default; this can be changed with the ALPHA= option.<br />

requests that the estimated G matrix be displayed. PROC <strong>MIXED</strong> displays blanks for values<br />

that are 0. If you specify the SUBJECT= option, then the block of the G matrix corresponding<br />

to the first subject is displayed. For ODS purposes, the name of the table is “G.”<br />

displays the lower-triangular Cholesky root of the estimated G matrix according to the rules<br />

listed under the G option. For ODS purposes, the name of the table is “CholG.”<br />

displays the inverse Cholesky root of the estimated G matrix according to the rules listed<br />

under the G option. For ODS purposes, the name of the table is “InvCholG.”


RANDOM Statement ✦ 3945<br />

GCORR<br />

displays the correlation matrix corresponding to the estimated G matrix according to the rules<br />

listed under the G option. For ODS purposes, the name of the table is “GCorr.”<br />

GDATA=<strong>SAS</strong>-data-set<br />

requests that the G matrix be read in from a <strong>SAS</strong> data set. This G matrix is assumed to<br />

be known; therefore, only R-side parameters from effects in the REPEATED statement are<br />

included in the Newton-Raphson iterations. If no REPEATED statement is specified, then<br />

only a residual variance is estimated.<br />

GI<br />

GROUP=effect<br />

<strong>The</strong> information in the GDATA= data set can appear in one of two ways. <strong>The</strong> first is a<br />

sparse representation for which you include Row, Col, and Value variables to indicate the row,<br />

column, and value of G, respectively. All unspecified locations are assumed to be 0. <strong>The</strong><br />

second representation is for dense matrices. In it you include Row and Col1–Coln variables to<br />

indicate, respectively, the row and columns of G, which is a symmetric matrix of order n. For<br />

both representations, you must specify effects in the RANDOM statement that generate a Z<br />

matrix that contains n columns. (See Example 56.4.)<br />

If you have more than one RANDOM statement, only one GDATA= option is required in any<br />

one of them, and the data set you specify must contain the entire G matrix defined by all of<br />

the RANDOM statements.<br />

If the GDATA= data set contains variance ratios instead of the variances themselves, then use<br />

the RATIOS option.<br />

Known parameters of G can also be input by using the PARMS statement with the HOLD=<br />

option.<br />

displays the inverse of the estimated G matrix according to the rules listed under the G option.<br />

For ODS purposes, the name of the table is “InvG.”<br />

GRP=effect<br />

defines an effect specifying heterogeneity in the covariance structure of G. All observations<br />

having the same level of the group effect have the same covariance parameters. Each new<br />

level of the group effect produces a new set of covariance parameters with the same structure<br />

as the original group. You should exercise caution in defining the group effect, because<br />

strange covariance patterns can result from its misuse. Also, the group effect can greatly<br />

increase the number of estimated covariance parameters, which can adversely affect the optimization<br />

process.<br />

Continuous variables are permitted as arguments to the GROUP= option. PROC <strong>MIXED</strong><br />

does not sort by the values of the continuous variable; rather, it considers the data to be<br />

from a new subject or group whenever the value of the continuous variable changes from the<br />

previous observation. Using a continuous variable decreases execution time for models with<br />

a large number of subjects or groups and also prevents the production of a large “Class Level<br />

Information” table.<br />

LDATA=<strong>SAS</strong>-data-set<br />

reads the coefficient matrices associated with the TYPE=LIN(number) option. <strong>The</strong> data set


3946 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

must contain the variables Parm, Row, Col1–Coln or Parm, Row, Col, Value. <strong>The</strong> Parm variable<br />

denotes which of the number coefficient matrices is currently being constructed, and the Row,<br />

Col1–Coln, or Row, Col, Value variables specify the matrix values, as they do with the GDATA=<br />

option. Unspecified values of these matrices are set equal to 0.<br />

NOFULLZ<br />

eliminates the columns in Z corresponding to missing levels of random effects involving<br />

CLASS variables. By default, these columns are included in Z.<br />

RATIOS<br />

indicates that ratios with the residual variance are specified in the GDATA= data set instead of<br />

the covariance parameters themselves. <strong>The</strong> default GDATA= data set contains the individual<br />

covariance parameters.<br />

SOLUTION<br />

S<br />

requests that the solution for the random-effects parameters be produced. Using notation<br />

from the section “Mixed Models <strong>The</strong>ory” on page 3962, these estimates are the empirical<br />

best linear unbiased predictors (EBLUPs) b D bGZ 0bV 1 .y Xbˇ/. <strong>The</strong>y can be useful for<br />

comparing the random effects from different experimental units and can also be treated as<br />

residuals in performing diagnostics for your mixed model.<br />

<strong>The</strong> numbers displayed in the SE Pred column of the “Solution for Random Effects” table<br />

are not the standard errors of the b displayed in the Estimate column; rather, they are the<br />

standard errors of predictions bi i, where bi is the ith EBLUP and i is the ith randomeffect<br />

parameter.<br />

SUBJECT=effect<br />

SUB=effect<br />

identifies the subjects in your mixed model. Complete independence is assumed across subjects;<br />

thus, for the RANDOM statement, the SUBJECT= option produces a block-diagonal<br />

structure in G with identical blocks. <strong>The</strong> Z matrix is modified to accommodate this block<br />

diagonality. In fact, specifying a subject effect is equivalent to nesting all other effects in the<br />

RANDOM statement within the subject effect.<br />

Continuous variables are permitted as arguments to the SUBJECT= option. PROC <strong>MIXED</strong><br />

does not sort by the values of the continuous variable; rather, it considers the data to be<br />

from a new subject or group whenever the value of the continuous variable changes from the<br />

previous observation. Using a continuous variable decreases execution time for models with<br />

a large number of subjects or groups and also prevents the production of a large “Class Level<br />

Information” table.<br />

When you specify the SUBJECT= option and a classification random effect, computations<br />

are usually much quicker if the levels of the random effect are duplicated within each level of<br />

the SUBJECT= effect.<br />

TYPE=covariance-structure<br />

specifies the covariance structure of G. Valid values for covariance-structure and their descriptions<br />

are listed in Table 56.13 and Table 56.14. Although a variety of structures are


RANDOM Statement ✦ 3947<br />

available, most applications call for either TYPE=VC or TYPE=UN. <strong>The</strong> TYPE=VC (variance<br />

components) option is the default structure, and it models a different variance component<br />

for each random effect.<br />

<strong>The</strong> TYPE=UN (unstructured) option is useful for correlated random coefficient models. For<br />

example, the following statement specifies a random intercept-slope model that has different<br />

variances for the intercept and slope and a covariance between them:<br />

random intercept age / type=un subject=person;<br />

You can also use TYPE=FA0(2) here to request a G estimate that is constrained to be nonnegative<br />

definite.<br />

If you are constructing your own columns of Z with continuous variables, you can use the<br />

TYPE=TOEP(1) structure to group them together to have a common variance component. If<br />

you want to have different covariance structures in different parts of G, you must use multiple<br />

RANDOM statements with different TYPE= options.<br />

V< =value-list ><br />

requests that blocks of the estimated V matrix be displayed. <strong>The</strong> first block determined by<br />

the SUBJECT= effect is the default displayed block. PROC <strong>MIXED</strong> displays entries that are<br />

0 as blanks in the table.<br />

You can optionally use the value-list specification, which indicates the subjects for which<br />

blocks of V are to be displayed. For example, the following statement displays block matrices<br />

for the first, third, and seventh persons:<br />

random int time / type=un subject=person v=1,3,7;<br />

<strong>The</strong> table name for ODS purposes is “V.”<br />

VC< =value-list ><br />

displays the Cholesky root of the blocks of the estimated V matrix. <strong>The</strong> value-list specification<br />

is the same as in the V= option. <strong>The</strong> table name for ODS purposes is “CholV.”<br />

VCI< =value-list ><br />

displays the inverse of the Cholesky root of the blocks of the estimated V matrix. <strong>The</strong> valuelist<br />

specification is the same as in the V= option. <strong>The</strong> table name for ODS purposes is “Inv-<br />

CholV.”<br />

VCORR< =value-list ><br />

displays the correlation matrix corresponding to the blocks of the estimated V matrix. <strong>The</strong><br />

value-list specification is the same as in the V= option. <strong>The</strong> table name for ODS purposes is<br />

“VCorr.”<br />

VI< =value-list ><br />

displays the inverse of the blocks of the estimated V matrix. <strong>The</strong> value-list specification is<br />

the same as in the V= option. <strong>The</strong> table name for ODS purposes is “InvV.”


3948 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

REPEATED Statement<br />

REPEATED < repeated-effect >< / options > ;<br />

<strong>The</strong> REPEATED statement is used to specify the R matrix in the mixed model. Its syntax is different<br />

from that of the REPEATED statement in PROC GLM. If no REPEATED statement is specified, R<br />

is assumed to be equal to 2 I.<br />

For many repeated measures models, no repeated effect is required in the REPEATED statement.<br />

Simply use the SUBJECT= option to define the blocks of R and the TYPE= option to define their<br />

covariance structure. In this case, the repeated measures data must be similarly ordered for each<br />

subject, and you must indicate all missing response variables with periods in the input data set unless<br />

they all fall at the end of a subject’s repeated response profile. <strong>The</strong>se requirements are necessary in<br />

order to inform PROC <strong>MIXED</strong> of the proper location of the observed repeated responses.<br />

Specifying a repeated effect is useful when you do not want to indicate missing values with periods<br />

in the input data set. <strong>The</strong> repeated effect must contain only classification variables. Make sure that<br />

the levels of the repeated effect are different for each observation within a subject; otherwise, PROC<br />

<strong>MIXED</strong> constructs identical rows in R corresponding to the observations with the same level. This<br />

results in a singular R and an infinite likelihood.<br />

Whether you specify a REPEATED effect or not, the rows of R for each subject are constructed in<br />

the order in which they appear in the input data set.<br />

Table 56.12 summarizes important options in the REPEATED statement. All options are subsequently<br />

discussed in alphabetical order.<br />

Table 56.12 Summary of Important REPEATED Statement Options<br />

Option Description<br />

Construction of Covariance Structure<br />

GROUP= defines an effect specifying heterogeneity in the R-side covariance<br />

structure<br />

LDATA= specifies data set with coefficient matrices for TYPE= LIN<br />

LOCAL requests that a diagonal matrix be added to R<br />

LOCALW specifies that only the local effects are weighted<br />

NOLOCALW specifies that only the nonlocal effects are weighted<br />

SUBJECT= identifies the subjects in the R-side model<br />

TYPE= specifies the R-side covariance structure<br />

Statistical Output<br />

HLM produces a table of Hotelling-Lawley-McKeon statistics (McKeon<br />

1974)<br />

HLPS produces a table of Hotelling-Lawley-Pillai-Samson statistics (Pillai<br />

and Samson 1959)<br />

R displays blocks of the estimated R matrix<br />

RC display the Cholesky root (lower) of blocks of the estimated R<br />

matrix


Table 56.12 continued<br />

Option Description<br />

REPEATED Statement ✦ 3949<br />

RCI displays the inverse Cholesky root (lower) of blocks of the estimated<br />

R matrix<br />

RCORR displays the correlation matrix corresponding to blocks of the estimated<br />

R matrix<br />

RI displays the inverse of blocks of the estimated R matrix<br />

You can specify the following options in the REPEATED statement after a slash (/).<br />

GROUP=effect<br />

GRP=effect<br />

defines an effect specifying heterogeneity in the covariance structure of R. All observations<br />

having the same level of the GROUP effect have the same covariance parameters. Each<br />

new level of the GROUP effect produces a new set of covariance parameters with the same<br />

structure as the original group. You should exercise caution in properly defining the GROUP<br />

effect, because strange covariance patterns can result with its misuse. Also, the GROUP effect<br />

can greatly increase the number of estimated covariance parameters, which can adversely<br />

affect the optimization process.<br />

HLM<br />

HLPS<br />

Continuous variables are permitted as arguments to the GROUP= option. PROC <strong>MIXED</strong><br />

does not sort by the values of the continuous variable; rather, it considers the data to be<br />

from a new subject or group whenever the value of the continuous variable changes from the<br />

previous observation. Using a continuous variable decreases execution time for models with<br />

a large number of subjects or groups and also prevents the production of a large “Class Level<br />

Information” table.<br />

produces a table of Hotelling-Lawley-McKeon statistics (McKeon 1974) for all fixed effects<br />

whose levels change across data having the same level of the SUBJECT= effect (the withinsubject<br />

fixed effects). This option applies only when you specify a REPEATED statement<br />

with the TYPE=UN option and no RANDOM statements. For balanced data, this model is<br />

equivalent to the multivariate model for repeated measures in PROC GLM.<br />

<strong>The</strong> Hotelling-Lawley-McKeon statistic has a slightly better F approximation than the<br />

Hotelling-Lawley-Pillai-Samson statistic (see the description of the HLPS option, which follows).<br />

Both of the Hotelling-Lawley statistics can perform much better in small samples than<br />

the default F statistic (Wright 1994).<br />

Separate tables are produced for Type 1, 2, and 3 tests, according to the ones you select. For<br />

ODS purposes, the table names are “HLM1,” “HLM2,” and “HLM3,” respectively.<br />

produces a table of Hotelling-Lawley-Pillai-Samson statistics (Pillai and Samson 1959) for all<br />

fixed effects whose levels change across data having the same level of the SUBJECT= effect<br />

(the within-subject fixed effects). This option applies only when you specify a REPEATED<br />

statement with the TYPE=UN option and no RANDOM statements. For balanced data, this


3950 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

model is equivalent to the multivariate model for repeated measures in PROC GLM, and this<br />

statistic is the same as the Hotelling-Lawley Trace statistic produced by PROC GLM.<br />

Separate tables are produced for Type 1, 2, and 3 tests, according to the ones you select. For<br />

ODS purposes, the table names are “HLPS1,” “HLPS2,” and “HLPS3,” respectively.<br />

LDATA=<strong>SAS</strong>-data-set<br />

reads the coefficient matrices associated with the TYPE=LIN(number) option. <strong>The</strong> data set<br />

must contain the variables Parm, Row, Col1–Coln or Parm, Row, Col, Value. <strong>The</strong> Parm variable<br />

denotes which of the number coefficient matrices is currently being constructed, and the<br />

Row, Col1–Coln, or Row, Col, Value variables specify the matrix values, as they do with the<br />

RANDOM statement option GDATA=. Unspecified values of these matrices are set equal to<br />

0.<br />

LOCAL<br />

LOCAL=POM(POM-data-set)<br />

requests that a diagonal matrix be added to R. With just the LOCAL option, this diagonal<br />

matrix equals 2 I, and 2 becomes an additional variance parameter that PROC <strong>MIXED</strong><br />

profiles out of the likelihood provided that you do not specify the NOPROFILE option in the<br />

PROC <strong>MIXED</strong> statement. <strong>The</strong> LOCAL option is useful if you want to add an observational<br />

error to a time series structure (Jones and Boadi-Boateng 1991) or a nugget effect to a spatial<br />

structure (Cressie 1993).<br />

<strong>The</strong> LOCAL=EXP() option produces exponential local effects, also known as dispersion<br />

effects, in a log-linear variance model. <strong>The</strong>se local effects have the form<br />

2 diagŒexp.Uı/<br />

where U is the full-rank design matrix corresponding to the effects that you specify and ı are<br />

the parameters that PROC <strong>MIXED</strong> estimates. An intercept is not included in U because it is<br />

accounted for by 2 . PROC <strong>MIXED</strong> constructs the full-rank U in terms of 1s and 1s for<br />

classification effects. Be sure to scale continuous effects in U sensibly.<br />

<strong>The</strong> LOCAL=POM(POM-data-set) option specifies the power-of-the-mean structure. This<br />

structure possesses a variance of the form 2 jx 0 i ˇ j for the ith observation, where xi is the<br />

ith row of X (the design matrix of the fixed effects) and ˇ is an estimate of the fixed-effects<br />

parameters that you specify in POM-data-set.<br />

<strong>The</strong> <strong>SAS</strong> data set specified by POM-data-set contains the numeric variable Estimate (in previous<br />

releases, the variable name was required to be EST), and it has at least as many observations<br />

as there are fixed-effects parameters. <strong>The</strong> first p observations of the Estimate variable<br />

in POM-data-set are taken to be the elements of ˇ , where p is the number of columns of<br />

X. You must order these observations according to the non-full-rank parameterization of the<br />

<strong>MIXED</strong> procedure. One easy way to set up POM-data-set for a ˇ corresponding to ordinary<br />

least squares is illustrated by the following statements:<br />

ods output SolutionF=sf;<br />

proc mixed;<br />

class a;<br />

model y = a x / s;<br />

run;


proc mixed;<br />

class a;<br />

model y = a x;<br />

repeated / local=pom(sf);<br />

run;<br />

REPEATED Statement ✦ 3951<br />

Note that the generalized least squares estimate of the fixed-effects parameters from the second<br />

PROC <strong>MIXED</strong> step usually is not the same as your specified ˇ . However, you can<br />

iterate the POM fitting until the two estimates agree. Continuing from the previous example,<br />

the statements for performing one step of this iteration are as follows:<br />

ods output SolutionF=sf1;<br />

proc mixed;<br />

class a;<br />

model y = a x / s;<br />

repeated / local=pom(sf);<br />

run;<br />

proc compare brief data=sf compare=sf1;<br />

var estimate;<br />

run;<br />

data sf;<br />

set sf1;<br />

run;<br />

Unfortunately, this iterative process does not always converge. For further details, refer to the<br />

description of pseudo-likelihood in Chapter 3 of Carroll and Ruppert (1988).<br />

LOCALW<br />

specifies that only the local effects and no others be weighted. By default, all effects are<br />

weighted. <strong>The</strong> LOCALW option is used in connection with the WEIGHT statement and the<br />

LOCAL option in the REPEATED statement.<br />

NONLOCALW<br />

specifies that only the nonlocal effects and no others be weighted. By default, all effects are<br />

weighted. <strong>The</strong> NONLOCALW option is used in connection with the WEIGHT statement and<br />

the LOCAL option in the REPEATED statement.<br />

R< =value-list ><br />

requests that blocks of the estimated R matrix be displayed. <strong>The</strong> first block determined by<br />

the SUBJECT= effect is the default displayed block. PROC <strong>MIXED</strong> displays blanks for<br />

value-lists that are 0.<br />

<strong>The</strong> value-list indicates the subjects for which blocks of R are to be displayed. For example,<br />

the following statement displays block matrices for the first, third, and fifth persons:<br />

repeated / type=cs subject=person r=1,3,5;<br />

See the PARMS statement for the possible forms of value-list. <strong>The</strong> table name for ODS<br />

purposes is “R.”


3952 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

RC< =value-list ><br />

produces the Cholesky root of blocks of the estimated R matrix. <strong>The</strong> value-list specification<br />

is the same as with the R option. <strong>The</strong> table name for ODS purposes is “CholR.”<br />

RCI< =value-list ><br />

produces the inverse Cholesky root of blocks of the estimated R matrix. <strong>The</strong> value-list specification<br />

is the same as with the R option. <strong>The</strong> table name for ODS purposes is “InvCholR.”<br />

RCORR< =value-list ><br />

produces the correlation matrix corresponding to blocks of the estimated R matrix. <strong>The</strong><br />

value-list specification is the same as with the R option. <strong>The</strong> table name for ODS purposes is<br />

“RCorr.”<br />

RI< =value-list ><br />

produces the inverse of blocks of the estimated R matrix. <strong>The</strong> value-list specification is the<br />

same as with the R option. <strong>The</strong> table name for ODS purposes is “InvR.”<br />

SSCP<br />

requests that an unstructured R matrix be estimated from the sum-of-squares-andcrossproducts<br />

matrix of the residuals. It applies only when you specify TYPE=UN and<br />

have no RANDOM statements. Also, you must have a sufficient number of subjects for the<br />

estimate to be positive definite.<br />

This option is useful when the size of the blocks of R is large (for example, greater than 10)<br />

and you want to use or inspect an unstructured estimate that is much quicker to compute than<br />

the default REML estimate. <strong>The</strong> two estimates will agree for certain balanced data sets when<br />

you have a classification fixed effect defined across all time points within a subject.<br />

SUBJECT=effect<br />

SUB=effect<br />

identifies the subjects in your mixed model. Complete independence is assumed across subjects;<br />

therefore, the SUBJECT= option produces a block-diagonal structure in R with identical<br />

blocks. When the SUBJECT= effect consists entirely of classification variables, the<br />

blocks of R correspond to observations sharing the same level of that effect. <strong>The</strong>se blocks are<br />

sorted according to this effect as well.<br />

Continuous variables are permitted as arguments to the SUBJECT= option. PROC <strong>MIXED</strong><br />

does not sort by the values of the continuous variable; rather, it considers the data to be<br />

from a new subject or group whenever the value of the continuous variable changes from the<br />

previous observation. Using a continuous variable decreases execution time for models with<br />

a large number of subjects or groups and also prevents the production of a large “Class Level<br />

Information” table.<br />

If you want to model nonzero covariance among all of the observations in your <strong>SAS</strong> data set,<br />

specify SUBJECT=INTERCEPT to treat the data as if they are all from one subject. However,<br />

be aware that in this case PROC <strong>MIXED</strong> manipulates an R matrix with dimensions equal to<br />

the number of observations. If no SUBJECT= effect is specified, then every observation is<br />

assumed to be from a different subject and R is assumed to be diagonal. For this reason, you<br />

usually want to use the SUBJECT= option in the REPEATED statement.


REPEATED Statement ✦ 3953<br />

TYPE=covariance-structure<br />

specifies the covariance structure of the R matrix. <strong>The</strong> SUBJECT= option defines the<br />

blocks of R, and the TYPE= option specifies the structure of these blocks. Valid values<br />

for covariance-structure and their descriptions are provided in Table 56.13 and Table 56.14.<br />

<strong>The</strong> default structure is VC.<br />

Table 56.13 Covariance Structures<br />

Structure Description Parms .i; j /th element<br />

ANTE(1) Ante-dependence 2t 1 i<br />

Qj 1<br />

j kDi k<br />

AR(1) Autoregressive(1) 2 2 ji j j<br />

ARH(1) Heterogeneous AR(1) t C 1 i j<br />

ARMA(1,1) ARMA(1,1) 3 2 Œ ji j j 1 1.i ¤ j / C 1.i D j /<br />

ji j j<br />

CS Compound Symmetry 2 1 C 2 1.i D j /<br />

CSH Heterogeneous CS t C 1 i j Œ 1.i ¤ j / C 1.i D j /<br />

FA(q) Factor Analytic<br />

FA0(q) No Diagonal FA<br />

FA1(q) Equal Diagonal FA<br />

q<br />

2 .2t q C 1/ C t †min.i;j;q/<br />

kD1 ik jk C 2 i 1.i D j /<br />

q<br />

2 .2t q C 1/ †min.i;j;q/<br />

kD1 ik jk<br />

q<br />

2 .2t q C 1/ C 1 †min.i;j;q/<br />

kD1 ik jk C 21.i D j /<br />

HF Huynh-Feldt t C 1 . 2 i C 2 j<br />

/=2 C 1.i ¤ j /<br />

LIN(q) General Linear q † q<br />

kD1 kAij<br />

TOEP Toeplitz t ji j jC1<br />

TOEP(q) Banded Toeplitz q ji j jC11.ji j j < q/<br />

TOEPH Heterogeneous TOEP 2t 1 i j ji j j<br />

TOEPH(q) Banded Hetero TOEP t C q 1 i j ji j j1.ji j j < q/<br />

UN Unstructured t.t C 1/=2 ij<br />

UN(q) Banded<br />

q<br />

2 .2t q C 1/ ij 1.ji j j < q/<br />

UNR Unstructured Corrs t.t C 1/=2 i j max.i;j / min.i;j /<br />

UNR(q) Banded Correlations<br />

q<br />

2 .2t q C 1/ i j max.i;j / min.i;j /<br />

ji2 j2j<br />

UN@AR(1) Direct Product AR(1) t1.t1 C 1/=2 C 1 i1j1<br />

UN@CS Direct Product CS t1.t1 C 1/=2 C 1<br />

UN@UN Direct Product UN t1.t1 C 1/=2 C 1;i1j1 2;i2j2<br />

t2.t2 C 1/=2 1<br />

8<br />

<<br />

:<br />

i1j1<br />

i2 D j2<br />

2 i1j1 i2 6D j2<br />

0 2 1<br />

2<br />

VC Variance Components q 1.i D j /<br />

k<br />

and i corresponds to kth effect<br />

In Table 56.13, “Parms” is the number of covariance parameters in the structure, t is the<br />

overall dimension of the covariance matrix, and 1.A/ equals 1 when A is true and 0 otherwise.<br />

For example, 1.i D j / equals 1 when i D j and 0 otherwise, and 1.ji j j < q/ equals<br />

1 when ji j j < q and 0 otherwise. For the TOEPH structures, 0 D 1, and for the UNR


3954 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

structures, ii D 1 for all i. For the direct product structures, the subscripts “1” and “2” refer<br />

to the first and second structure in the direct product, respectively, and i1 D int..i C t2<br />

1/=t2/, j1 D int..j C t2 1/=t2/, i2 D mod.i 1; t2/ C 1, and j2 D mod.j 1; t2/ C 1.<br />

Table 56.14 Spatial Covariance Structures<br />

Structure Description Parms .i; j /th element<br />

SP(EXP)(c-list ) Exponential 2 2 expf dij = g<br />

SP(EXPA)(c-list ) Anisotropic Exponential 2c C 1 2 Q c<br />

kD1 expf kd.i; j; k/ pkg<br />

SP(EXPGA)(c1 c2) 2D Exponential, 4 2 expf dij . ; /= g<br />

Geometrically Anisotropic<br />

SP(GAU)(c-list ) Gaussian 2 2 expf d 2 ij = 2 g<br />

SP(GAUGA)(c1 c2) 2D Gaussian, 4 2 expf dij . ; / 2 = 2 g<br />

Geometrically Anisotropic<br />

SP(LIN)(c-list ) Linear 2 2 .1 dij / 1. dij 1/<br />

SP(LINL)(c-list ) Linear Log 2 2 .1 log.dij //<br />

1. log.dij / 1/<br />

SP(MATERN)(c-list) Matérn 3 2 1<br />

. /<br />

SP(MATHSW)(c-list) Matérn 3 2 1<br />

. /<br />

(Handcock-Stein-Wallis)<br />

SP(POW)(c-list) Power 2 2 dij<br />

SP(POWA)(c-list) Anisotropic Power c C 1 2 d.i;j;1/<br />

1<br />

dij<br />

2<br />

p<br />

dij<br />

d.i;j;2/<br />

2<br />

2K .dij = /<br />

2K 2dij<br />

p<br />

: : : d.i;j;c/<br />

c<br />

SP(SPH)(c-list ) Spherical 2 2 3dij<br />

Œ1 . 2 / C .<br />

2 3 / 1.dij<br />

SP(SPHGA)(c1 c2) 2D Spherical, 4 2 Œ1 . 3dij . ; /<br />

2<br />

Geometrically Anisotropic 1.dij . ; / /<br />

d 3<br />

ij<br />

/ C . dij . ; / 3<br />

2 3 /<br />

In Table 56.14, c-list contains the names of the numeric variables used as coordinates of the<br />

location of the observation in space, and dij is the Euclidean distance between the ith and<br />

jth vectors of these coordinates, which correspond to the ith and jth observations in the input<br />

data set. For SP(POWA) and SP(EXPA), c is the number of coordinates, and d.i; j; k/ is the<br />

absolute distance between the kth coordinate, k D 1; : : : ; c, of the ith and jth observations in<br />

the input data set. For the geometrically anisotropic structures SP(EXPGA), SP(GAUGA),<br />

and SP(SPHGA), exactly two spatial coordinate variables must be specified as c1 and c2.<br />

Geometric anisotropy is corrected by applying a rotation and scaling to the coordinate<br />

system, and dij . ; / represents the Euclidean distance between two points in the transformed<br />

space. SP(MATERN) and SP(MATHSW) represent covariance structures in a class defined<br />

by Matérn (see Matérn 1986, Handcock and Stein 1993, Handcock and Wallis 1994). <strong>The</strong><br />

function K is the modified Bessel function of the second kind of (real) order > 0; the<br />

parameter governs the smoothness of the process (see below for more details).<br />

Table 56.15 lists some examples of the structures in Table 56.13 and Table 56.14.<br />

/


Table 56.15 Covariance Structure Examples<br />

Description Structure Example<br />

2<br />

Variance<br />

Components<br />

Compound<br />

Symmetry<br />

VC (default)<br />

CS<br />

Unstructured UN<br />

Banded Main<br />

Diagonal<br />

First-Order<br />

Autoregressive<br />

UN(1)<br />

AR(1)<br />

Toeplitz TOEP<br />

Toeplitz with<br />

Two Bands<br />

Spatial<br />

Power<br />

Heterogeneous<br />

AR(1)<br />

First-Order<br />

Autoregressive<br />

Moving-Average<br />

TOEP(2)<br />

SP(POW)(c)<br />

ARH(1)<br />

ARMA(1,1)<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

2<br />

B 0 0 0<br />

0 2 B 0 0<br />

0 0 2 AB 0<br />

0 0 0 2 AB<br />

3<br />

7<br />

5<br />

2 C 1 1 1 1<br />

2<br />

1 C 1 1 1<br />

2<br />

1 1 C 1 1<br />

2<br />

1 1 1 C 1<br />

3<br />

2<br />

1 21 31 41<br />

2<br />

21 2 32 42<br />

2<br />

31 32 3 43<br />

2<br />

41 42 43 4<br />

3<br />

2<br />

1 0 0 0<br />

2 0 2 0 0 7<br />

2 0 0 3 0 5<br />

2 0 0 0 4<br />

2<br />

3<br />

1 2 3<br />

2<br />

6<br />

4<br />

6<br />

4<br />

1<br />

2<br />

2 1<br />

3 2 1<br />

2<br />

1 2 3<br />

2<br />

1 1 2<br />

2<br />

2 1 1<br />

2<br />

3 2 1<br />

3<br />

7<br />

5<br />

2<br />

3<br />

2<br />

1 0 0<br />

6 2<br />

6 1 1 0 7<br />

4 0 2 5<br />

1 1<br />

0 0 2<br />

1<br />

2<br />

2<br />

2<br />

6<br />

4<br />

6<br />

4<br />

7<br />

5<br />

7<br />

5<br />

1 d12 d13 d14<br />

d21 1 d23 d24<br />

d31 d32 1 d34<br />

d41 d42 d43 1<br />

REPEATED Statement ✦ 3955<br />

3<br />

7<br />

5<br />

2<br />

1 1 2 1 3 2<br />

1 4 3<br />

2<br />

2 1 2 2 3 2 4 2<br />

3 1 2<br />

2<br />

3 2 3 3 4<br />

4 1 3<br />

2<br />

4 2 4 3 4<br />

2<br />

3 2<br />

1<br />

6<br />

2 6<br />

4<br />

1<br />

1<br />

7<br />

5<br />

2 1<br />

3<br />

7<br />

5<br />

3<br />

7<br />

5


3956 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.15 continued<br />

Description Structure Example<br />

2<br />

Heterogeneous<br />

CS<br />

First-Order<br />

Factor<br />

Analytic<br />

CSH<br />

FA(1)<br />

Huynh-Feldt HF<br />

First-Order<br />

Ante-dependence<br />

Heterogeneous<br />

Toeplitz<br />

Unstructured<br />

Correlations<br />

Direct Product<br />

AR(1)<br />

ANTE(1)<br />

TOEPH<br />

UNR<br />

UN@AR(1)<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

4<br />

2<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

6<br />

4<br />

2<br />

1<br />

2 1<br />

3 1<br />

4 1<br />

1 2<br />

2<br />

2<br />

3 2<br />

4 2<br />

1 3<br />

2 3<br />

2<br />

3<br />

4 3<br />

1 4<br />

2 4<br />

3 4<br />

2<br />

4<br />

7<br />

5<br />

2<br />

1 C d1<br />

2 1<br />

1 2<br />

2<br />

2<br />

1 3 1 4<br />

C d2<br />

3 1 3 2<br />

2 3<br />

2<br />

3<br />

2 4<br />

C d3<br />

4 1 4 2 4 3<br />

3 4<br />

2<br />

4<br />

2<br />

1<br />

2<br />

2 C 2 1<br />

2<br />

2<br />

3 C 2 1<br />

2<br />

1 C 2 2<br />

2<br />

2<br />

2<br />

2<br />

3 C 2 2<br />

2<br />

3<br />

2<br />

1 C 2 3<br />

2<br />

2<br />

2 C 2 3<br />

2<br />

2<br />

3<br />

2<br />

2<br />

1<br />

2 1 1<br />

3 1 2 1<br />

1<br />

3<br />

2<br />

2<br />

2<br />

2<br />

1<br />

2<br />

1 3 1 2<br />

2 3 2<br />

2<br />

3<br />

3<br />

5<br />

C d4<br />

3<br />

7<br />

5<br />

3<br />

3<br />

7<br />

5<br />

2<br />

3<br />

4<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

3<br />

1<br />

3<br />

4<br />

2<br />

2<br />

2<br />

2<br />

2<br />

1<br />

1<br />

2<br />

1<br />

2<br />

4<br />

3<br />

3<br />

2<br />

3<br />

3<br />

2<br />

1<br />

1<br />

1<br />

2<br />

3<br />

4<br />

4<br />

4<br />

2<br />

4<br />

3<br />

7 27<br />

5 1<br />

2<br />

1<br />

2 1 21<br />

3 1 31<br />

4 1 41<br />

2<br />

1 21<br />

2<br />

21 2<br />

1 2 21<br />

2<br />

2<br />

3 2 32<br />

4 2 42<br />

2<br />

1<br />

˝ 4<br />

1<br />

2<br />

4<br />

3 31<br />

3 32<br />

2<br />

3<br />

3 43<br />

3 2<br />

5 D<br />

1<br />

2<br />

3<br />

4 41<br />

4 42<br />

4 43<br />

2<br />

4<br />

1<br />

2 1<br />

3<br />

7<br />

5<br />

2 2 2<br />

1 1 1 2<br />

21 21 21 2<br />

2 2 2<br />

1 1 1 21 21 21<br />

2<br />

1 2 2 2<br />

1 1 21 2<br />

21 21<br />

21 21 21 2 2 2 2<br />

2 2 2 2<br />

2 2 2<br />

21 21 21 2 2 2<br />

21 2<br />

2<br />

21 21 2 2 2 2<br />

2 2<br />

<strong>The</strong> following provides some further information about these covariance structures:<br />

TYPE=ANTE(1) specifies the first-order antedependence structure (see Kenward 1987, Patel<br />

1991, and Macchiavelli and Arnold 1994). In Table 56.13, 2 i is the ith<br />

variance parameter, and k is the kth autocorrelation parameter satisfying<br />

j kj < 1.<br />

3<br />

7<br />

5


REPEATED Statement ✦ 3957<br />

TYPE=AR(1) specifies a first-order autoregressive structure. PROC <strong>MIXED</strong> imposes the<br />

constraint j j < 1 for stationarity.<br />

TYPE=ARH(1) specifies a heterogeneous first-order autoregressive structure. As with<br />

TYPE=AR(1), PROC <strong>MIXED</strong> imposes the constraint j j < 1 for stationarity.<br />

TYPE=ARMA(1,1) specifies the first-order autoregressive moving-average structure. In<br />

Table 56.13, is the autoregressive parameter, models a moving-average<br />

component, and 2 is the residual variance. In the notation of Fuller (1976,<br />

p. 68), D 1 and<br />

D .1 C b1 1/. 1 C b1/<br />

1 C b 2 1 C 2b1 1<br />

<strong>The</strong> example in Table 56.15 and jb1j < 1 imply that<br />

b1 D ˇ p ˇ 2 4˛ 2<br />

2˛<br />

where ˛ D and ˇ D 1 C 2 2 . PROC <strong>MIXED</strong> imposes the<br />

constraints j j < 1 and j j < 1 for stationarity, although for some values<br />

of and in this region the resulting covariance matrix is not positive<br />

definite. When the estimated value of becomes negative, the computed<br />

covariance is multiplied by cos. dij / to account for the negativity.<br />

TYPE=CS specifies the compound-symmetry structure, which has constant variance<br />

and constant covariance.<br />

TYPE=CSH specifies the heterogeneous compound-symmetry structure. This structure<br />

has a different variance parameter for each diagonal element, and it<br />

uses the square roots of these parameters in the off-diagonal entries. In<br />

Table 56.13, 2 i is the ith variance parameter, and is the correlation parameter<br />

satisfying j j < 1.<br />

TYPE=FA(q) specifies the factor-analytic structure with q factors (Jennrich and<br />

Schluchter 1986). This structure is of the form ƒƒ 0 C D, where ƒ<br />

is a t q rectangular matrix and D is a t t diagonal matrix with t<br />

different parameters. When q > 1, the elements of ƒ in its upper-right<br />

corner (that is, the elements in the ith row and j th column for j > i) are<br />

set to zero to fix the rotation of the structure.<br />

TYPE=FA0(q) is similar to the FA(q) structure except that no diagonal matrix D is included.<br />

When q < t—that is, when the number of factors is less than<br />

the dimension of the matrix—this structure is nonnegative definite but not<br />

of full rank. In this situation, you can use it for approximating an unstructured<br />

G matrix in the RANDOM statement or for combining with the<br />

LOCAL option in the REPEATED statement. When q D t, you can use<br />

this structure to constrain G to be nonnegative definite in the RANDOM<br />

statement.<br />

TYPE=FA1(q) is similar to the FA(q) structure except that all of the elements in D are<br />

constrained to be equal. This offers a useful and more parsimonious alternative<br />

to the full factor-analytic structure.


3958 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

TYPE=HF specifies the Huynh-Feldt covariance structure (Huynh and Feldt 1970).<br />

This structure is similar to the CSH structure in that it has the same number<br />

of parameters and heterogeneity along the main diagonal. However, it<br />

constructs the off-diagonal elements by taking arithmetic rather than geometric<br />

means.<br />

You can perform a likelihood ratio test of the Huynh-Feldt conditions<br />

by running PROC <strong>MIXED</strong> twice, once with TYPE=HF and once with<br />

TYPE=UN, and then subtracting their respective values of 2 times the<br />

maximized likelihood.<br />

If PROC <strong>MIXED</strong> does not converge under your Huynh-Feldt model, you<br />

can specify your own starting values with the PARMS statement. <strong>The</strong><br />

default MIVQUE(0) starting values can sometimes be poor for this structure.<br />

A good choice for starting values is often the parameter estimates<br />

corresponding to an initial fit that uses TYPE=CS.<br />

TYPE=LIN(q) specifies the general linear covariance structure with q parameters. This<br />

structure consists of a linear combination of known matrices that are input<br />

with the LDATA= option. This structure is very general, and you need to<br />

make sure that the variance matrix is positive definite. By default, PROC<br />

<strong>MIXED</strong> sets the initial values of the parameters to 1. You can use the<br />

PARMS statement to specify other initial values.<br />

TYPE=SIMPLE is an alias for TYPE=VC.<br />

TYPE=SP(EXPA)(c-list) specifies the spatial anisotropic exponential structure, where c-list<br />

is a list of variables indicating the coordinates. This structure has .i; j /th<br />

element equal to<br />

2<br />

cY<br />

expf kd.i; j; k/ pkg kD1<br />

where c is the number of coordinates and d.i; j; k/ is the absolute distance<br />

between the kth coordinate (k D 1; : : : ; c) of the ith and j th observations<br />

in the input data set. <strong>The</strong>re are 2c C 1 parameters to be estimated: k, p k<br />

(k D 1; : : : ; c), and 2 .<br />

You might want to constrain some of the EXPA parameters to known values.<br />

For example, suppose you have three coordinate variables C1, C2,<br />

and C3 and you want to constrain the powers p k to equal 2, as in Sacks et<br />

al. (1989). Suppose further that you want to model covariance across the<br />

entire input data set and you suspect the k and 2 estimates are close to<br />

3, 4, 5, and 1, respectively. <strong>The</strong>n specify the following statements:<br />

repeated / type=sp(expa)(c1 c2 c3)<br />

subject=intercept;<br />

parms (3) (4) (5) (2) (2) (2) (1) /<br />

hold=4,5,6;


TYPE=SP(EXPGA)(c1 c2)<br />

TYPE=SP(GAUGA)(c1 c2)<br />

REPEATED Statement ✦ 3959<br />

TYPE=SP(SPHGA)(c1 c2) specify modifications of the isotropic SP(EXP), SP(SPH), and<br />

SP(GAU) covariance structures that allow for geometric anisotropy in two<br />

dimensions. <strong>The</strong> coordinates are specified by the variables c1 and c2.<br />

If the spatial process is geometrically anisotropic in c D Œci1; ci2, then it<br />

is isotropic in the coordinate system<br />

Ac D<br />

TYPE=SP(MATERN)(c-list )<br />

1 0<br />

0<br />

cos sin<br />

sin cos<br />

c D c<br />

for a properly chosen angle and scaling factor . Elliptical isocorrelation<br />

contours are thereby transformed to spherical contours, adding two parameters<br />

to the respective isotropic covariance structures. Euclidean distances<br />

(see Table 56.14) are expressed in terms of c .<br />

<strong>The</strong> angle of the clockwise rotation is reported in radians, 0 2 .<br />

<strong>The</strong> scaling parameter represents the ratio of the range parameters in the<br />

direction of the major and minor axis of the correlation contours. In other<br />

words, following a rotation of the coordinate system by angle , isotropy<br />

is achieved by compressing or magnifying distances in one coordinate by<br />

the factor .<br />

Fixing D 1:0 reduces the models to isotropic ones for any angle of<br />

rotation. If the scaling parameter is held constant at 1.0, you should also<br />

hold constant the angle of rotation, as in the following statements:<br />

repeated / type=sp(expga)(gxc gyc)<br />

subject=intercept;<br />

parms (6) (1.0) (0.0) (1) / hold=2,3;<br />

If is fixed at any other value than 1.0, the angle of rotation can be estimated.<br />

Specifying a starting grid of angles and scaling factors can considerably<br />

improve the convergence properties of the optimization algorithm<br />

for these models. Only a single random effect with geometrically<br />

anisotropic structure is permitted.<br />

TYPE=SP(MATHSW)(c-list ) specifies covariance structures in the Matérn class of covariance<br />

functions (Matérn 1986). Two observations for the same subject<br />

(block of R) that are Euclidean distance dij apart have covariance<br />

2 1<br />

. /<br />

dij<br />

2<br />

2K .dij = / > 0; > 0<br />

where K is the modified Bessel function of the second kind of (real) order<br />

> 0. <strong>The</strong> smoothness (continuity) of a stochastic process with covariance<br />

function in this class increases with . <strong>The</strong> Matérn class thus enables<br />

data-driven estimation of the smoothness properties. <strong>The</strong> covariance<br />

is identical to the exponential model for D 0:5 (TYPE=SP(EXP)(clist)),<br />

while for D 1 the model advocated by Whittle (1954) results.


3960 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

TYPE=SP(POW)(c-list)<br />

As ! 1 the model approaches the gaussian covariance structure<br />

(TYPE=SP(GAU)(c-list)).<br />

<strong>The</strong> MATHSW structure represents the Matérn class in the parameterization<br />

of Handcock and Stein (1993) and Handcock and Wallis (1994),<br />

2 1<br />

. /<br />

p<br />

dij<br />

2K 2dij<br />

p<br />

Since computation of the function K and its derivatives is numerically<br />

very intensive, fitting models with Matérn covariance structures can be<br />

more time-consuming than with other spatial covariance structures. Good<br />

starting values are essential.<br />

TYPE=SP(POWA)(c-list) specifies the spatial power structures. When the estimated<br />

value of becomes negative, the computed covariance is multiplied by<br />

cos. dij / to account for the negativity.<br />

TYPE=TOEP specifies a banded Toeplitz structure. This can be viewed as a movingaverage<br />

structure with order equal to q 1. <strong>The</strong> TYPE=TOEP option is<br />

a full Toeplitz matrix, which can be viewed as an autoregressive structure<br />

with order equal to the dimension of the matrix. <strong>The</strong> specification<br />

TYPE=TOEP(1) is the same as 2 I , where I is an identity matrix, and<br />

it can be useful for specifying the same variance component for several<br />

effects.<br />

TYPE=TOEPH specifies a heterogeneous banded Toeplitz structure. In<br />

Table 56.13, 2 i is the ith variance parameter and j is the j th correlation<br />

parameter satisfying j j j < 1. If you specify the order parameter q,<br />

then PROC <strong>MIXED</strong> estimates only the first q bands of the matrix, setting<br />

all higher bands equal to 0. <strong>The</strong> option TOEPH(1) is equivalent to both<br />

the UN(1) and UNR(1) options.<br />

TYPE=UN specifies a completely general (unstructured) covariance matrix parameterized<br />

directly in terms of variances and covariances. <strong>The</strong> variances are<br />

constrained to be nonnegative, and the covariances are unconstrained. This<br />

structure is not constrained to be nonnegative definite in order to avoid<br />

nonlinear constraints; however, you can use the FA0 structure if you want<br />

this constraint to be imposed by a Cholesky factorization. If you specify<br />

the order parameter q, then PROC <strong>MIXED</strong> estimates only the first q bands<br />

of the matrix, setting all higher bands equal to 0.<br />

TYPE=UNR specifies a completely general (unstructured) covariance matrix parameterized<br />

in terms of variances and correlations. This structure fits the same<br />

model as the TYPE=UN(q) option but with a different parameterization.<br />

<strong>The</strong> ith variance parameter is 2 i . <strong>The</strong> parameter jk is the correlation between<br />

the j th and kth measurements; it satisfies j jkj < 1. If you specify<br />

the order parameter r, then PROC <strong>MIXED</strong> estimates only the first q bands<br />

of the matrix, setting all higher bands equal to zero.


TYPE=UN@AR(1)<br />

TYPE=UN@CS<br />

REPEATED Statement ✦ 3961<br />

TYPE=UN@UN specify direct (Kronecker) product structures designed for multivariate repeated<br />

measures (see Galecki 1994). <strong>The</strong>se structures are constructed by<br />

taking the Kronecker product of an unstructured matrix (modeling covariance<br />

across the multivariate observations) with an additional covariance<br />

matrix (modeling covariance across time or another factor). <strong>The</strong> upper-left<br />

value in the second matrix is constrained to equal 1 to identify the model.<br />

See the <strong>SAS</strong>/IML User’s <strong>Guide</strong> for more details about direct products.<br />

To use these structures in the REPEATED statement, you must specify<br />

two distinct REPEATED effects, both of which must be included in the<br />

CLASS statement. <strong>The</strong> first effect indicates the multivariate observations,<br />

and the second identifies the levels of time or some additional factor. Note<br />

that the input data set must still be constructed in “univariate” format; that<br />

is, all dependent observations are still listed observation-wise in one single<br />

variable. Although this construction provides for general modeling possibilities,<br />

it forces you to construct variables indicating both dimensions of<br />

the Kronecker product.<br />

For example, suppose your observed data consist of heights and weights of<br />

several children measured over several successive years. Your input data<br />

set should then contain variables similar to the following:<br />

Y, all of the heights and weights, with a separate observation for each<br />

Var, indicating whether the measurement is a height or a weight<br />

Year, indicating the year of measurement<br />

Child, indicating the child on which the measurement was taken<br />

Your PROC <strong>MIXED</strong> statements for a Kronecker AR(1) structure across<br />

years would then be as follows:<br />

proc mixed;<br />

class Var Year Child;<br />

model Y = Var Year Var*Year;<br />

repeated Var Year / type=un@ar(1)<br />

subject=Child;<br />

run;<br />

You should nearly always want to model different means for the multivariate<br />

observations; hence the inclusion of Var in the MODEL statement. <strong>The</strong><br />

preceding mean model consists of cell means for all combinations of VAR<br />

and YEAR.<br />

TYPE=VC specifies standard variance components and is the default structure for<br />

both the RANDOM and REPEATED statements. In the RANDOM<br />

statement, a distinct variance component is assigned to each effect. In<br />

the REPEATED statement, this structure is usually used only with the<br />

GROUP= option to specify a heterogeneous variance model.<br />

Jennrich and Schluchter (1986) provide general information about the use of covariance structures,<br />

and Wolfinger (1996) presents details about many of the heterogeneous structures.


3962 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Modeling with spatial covariance structures is discussed in many sources, for example, Marx<br />

and Thompson (1987), Zimmerman and Harville (1991), Cressie (1993), Brownie, Bowman,<br />

and Burton (1993), Stroup, Baenziger, and Mulitze (1994), Brownie and Gumpertz (1997),<br />

Gotway and Stroup (1997), Chilès and Delfiner (1999), Schabenberger and Gotway (2005),<br />

and Littell et al. (2006).<br />

WEIGHT Statement<br />

WEIGHT variable ;<br />

If you do not specify a REPEATED statement, the WEIGHT statement operates exactly like the<br />

one in PROC GLM. In this case PROC <strong>MIXED</strong> replaces X 0 X and Z 0 Z with X 0 WX and Z 0 WZ,<br />

where W is the diagonal weight matrix. If you specify a REPEATED statement, then the WEIGHT<br />

statement replaces R with LRL, where L is a diagonal matrix with elements W 1=2 . Observations<br />

with nonpositive or missing weights are not included in the PROC <strong>MIXED</strong> analysis.<br />

Details: <strong>MIXED</strong> <strong>Procedure</strong><br />

Mixed Models <strong>The</strong>ory<br />

This section provides an overview of a likelihood-based approach to general linear mixed models.<br />

This approach simplifies and unifies many common statistical analyses, including those involving<br />

repeated measures, random effects, and random coefficients. <strong>The</strong> basic assumption is that the data<br />

are linearly related to unobserved multivariate normal random variables. For extensions to nonlinear<br />

and nonnormal situations see the documentation of the GLIMMIX and NL<strong>MIXED</strong> procedures.<br />

Additional theory and examples are provided in Littell et al. (2006), Verbeke and Molenberghs<br />

(1997, 2000), and Brown and Prescott (1999).<br />

Matrix Notation<br />

Suppose that you observe n data points y1; : : : ; yn and that you want to explain them by using<br />

n values for each of p explanatory variables x11; : : : ; x1p, x21; : : : ; x2p, : : : ; xn1; : : : ; xnp. <strong>The</strong><br />

xij values can be either regression-type continuous variables or dummy variables indicating class<br />

membership. <strong>The</strong> standard linear model for this setup is<br />

yi D<br />

pX<br />

j D1<br />

xij ˇj C i<br />

i D 1; : : : ; n


Mixed Models <strong>The</strong>ory ✦ 3963<br />

where ˇ1; : : : ; ˇp are unknown fixed-effects parameters to be estimated and 1; : : : ; n are unknown<br />

independent and identically distributed normal (Gaussian) random variables with mean 0 and variance<br />

2 .<br />

<strong>The</strong> preceding equations can be written simultaneously by using vectors and a matrix, as follows:<br />

2<br />

y1<br />

6 y2<br />

6<br />

4 :<br />

3<br />

7<br />

5 D<br />

2<br />

x11<br />

6 x21<br />

6<br />

4 :<br />

x12<br />

x22<br />

:<br />

: : :<br />

: : :<br />

x1p<br />

x2p<br />

:<br />

3 2<br />

7 6<br />

7 6<br />

7 6<br />

5 4<br />

ˇ1<br />

ˇ2<br />

:<br />

3<br />

7<br />

5 C<br />

2<br />

6<br />

4<br />

1<br />

2<br />

:<br />

3<br />

7<br />

5<br />

yn<br />

xn1 xn2 : : : xnp<br />

For convenience, simplicity, and extendability, this entire system is written as<br />

y D Xˇ C<br />

ˇp<br />

where y denotes the vector of observed yi’s, X is the known matrix of xij ’s, ˇ is the unknown fixedeffects<br />

parameter vector, and is the unobserved vector of independent and identically distributed<br />

Gaussian random errors.<br />

In addition to denoting data, random variables, and explanatory variables in the preceding fashion,<br />

the subsequent development makes use of basic matrix operators such as transpose ( 0 ), inverse ( 1 ),<br />

generalized inverse ( ), determinant (j j), and matrix multiplication. See Searle (1982) for details<br />

about these and other matrix techniques.<br />

Formulation of the Mixed Model<br />

<strong>The</strong> previous general linear model is certainly a useful one (Searle 1971), and it is the one fitted by<br />

the GLM procedure. However, many times the distributional assumption about is too restrictive.<br />

<strong>The</strong> mixed model extends the general linear model by allowing a more flexible specification of the<br />

covariance matrix of . In other words, it allows for both correlation and heterogeneous variances,<br />

although you still assume normality.<br />

<strong>The</strong> mixed model is written as<br />

y D Xˇ C Z C<br />

where everything is the same as in the general linear model except for the addition of the known<br />

design matrix, Z, and the vector of unknown random-effects parameters, . <strong>The</strong> matrix Z can<br />

contain either continuous or dummy variables, just like X. <strong>The</strong> name mixed model comes from the<br />

fact that the model contains both fixed-effects parameters, ˇ, and random-effects parameters, .<br />

See Henderson (1990) and Searle, Casella, and McCulloch (1992) for historical developments of<br />

the mixed model.<br />

A key assumption in the foregoing analysis is that and are normally distributed with<br />

E D 0<br />

0<br />

Var D<br />

G 0<br />

0 R<br />

n


3964 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> variance of y is, therefore, V D ZGZ 0 C R. You can model V by setting up the random-effects<br />

design matrix Z and by specifying covariance structures for G and R.<br />

Note that this is a general specification of the mixed model, in contrast to many texts and articles<br />

that discuss only simple random effects. Simple random effects are a special case of the general<br />

specification with Z containing dummy variables, G containing variance components in a diagonal<br />

structure, and R D 2 In, where In denotes the n n identity matrix. <strong>The</strong> general linear model is a<br />

further special case with Z D 0 and R D 2 In.<br />

<strong>The</strong> following two examples illustrate the most common formulations of the general linear mixed<br />

model.<br />

Example: Growth Curve with Compound Symmetry<br />

Suppose that you have three growth curve measurements for s individuals and that you want to fit<br />

an overall linear trend in time. Your X matrix is as follows:<br />

2 3<br />

1 1<br />

6 1 2 7<br />

6 1 3 7<br />

6 7<br />

X D 6 : : 7<br />

6 1 1 7<br />

4 1 2 5<br />

1 3<br />

<strong>The</strong> first column (coded entirely with 1s) fits an intercept, and the second column (coded with times<br />

of 1; 2; 3) fits a slope. Here, n D 3s and p D 2.<br />

Suppose further that you want to introduce a common correlation among the observations from a<br />

single individual, with correlation being the same for all individuals. One way of setting this up in<br />

the general mixed model is to eliminate the Z and G matrices and let the R matrix be block diagonal<br />

with blocks corresponding to the individuals and with each block having the compound-symmetry<br />

structure. This structure has two unknown parameters, one modeling a common covariance and the<br />

other modeling a residual variance. <strong>The</strong> form for R would then be as follows:<br />

2<br />

6<br />

R D 6<br />

4<br />

2<br />

1 C 2 2 2<br />

1<br />

1<br />

2 2<br />

1 1 C 2 2 1<br />

2<br />

2 2<br />

1<br />

1 1<br />

C 2<br />

: ::<br />

2<br />

1 C 2 2 2<br />

1<br />

1<br />

2 2<br />

1 1 C 2 2 1<br />

2<br />

2 2<br />

1<br />

1 1<br />

C 2<br />

where blanks denote zeros. <strong>The</strong>re are 3s rows and columns altogether, and the common correlation<br />

is 2 1 =. 2 1 C 2 /.<br />

<strong>The</strong> PROC <strong>MIXED</strong> statements to fit this model are as follows:<br />

3<br />

7<br />

5


proc mixed;<br />

class indiv;<br />

model y = time;<br />

repeated / type=cs subject=indiv;<br />

run;<br />

Mixed Models <strong>The</strong>ory ✦ 3965<br />

Here, indiv is a classification variable indexing individuals. <strong>The</strong> MODEL statement fits a straight line<br />

for time ; the intercept is fit by default just as in PROC GLM. <strong>The</strong> REPEATED statement models the<br />

R matrix: TYPE=CS specifies the compound symmetry structure, and SUBJECT=INDIV specifies<br />

the blocks of R.<br />

An alternative way of specifying the common intra-individual correlation is to let<br />

2<br />

6<br />

Z D 6<br />

4<br />

2<br />

6<br />

G D 6<br />

4<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

: ::<br />

: ::<br />

1<br />

1<br />

1<br />

3<br />

7<br />

5<br />

2<br />

1<br />

3<br />

7<br />

5<br />

and R D 2 In. <strong>The</strong> Z matrix has 3s rows and s columns, and G is s s.<br />

You can set up this model in PROC <strong>MIXED</strong> in two different but equivalent ways:<br />

proc mixed;<br />

class indiv;<br />

model y = time;<br />

random indiv;<br />

run;<br />

proc mixed;<br />

class indiv;<br />

model y = time;<br />

random intercept / subject=indiv;<br />

run;<br />

Both of these specifications fit the same model as the previous one that used the REPEATED statement;<br />

however, the RANDOM specifications constrain the correlation to be positive, whereas the<br />

REPEATED specification leaves the correlation unconstrained.


3966 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Example: Split-Plot Design<br />

<strong>The</strong> split-plot design involves two experimental treatment factors, A and B, and two different sizes of<br />

experimental units to which they are applied (see Winer 1971, Snedecor and Cochran 1980, Milliken<br />

and Johnson 1992, and Steel, Torrie, and Dickey 1997). <strong>The</strong> levels of A are randomly assigned to<br />

the larger-sized experimental unit, called whole plots, whereas the levels of B are assigned to the<br />

smaller-sized experimental unit, the subplots. <strong>The</strong> subplots are assumed to be nested within the<br />

whole plots, so that a whole plot consists of a cluster of subplots and a level of A is applied to the<br />

entire cluster.<br />

Such an arrangement is often necessary by nature of the experiment, the classical example being<br />

the application of fertilizer to large plots of land and different crop varieties planted in subdivisions<br />

of the large plots. For this example, fertilizer is the whole-plot factor A and variety is the subplot<br />

factor B.<br />

<strong>The</strong> first example is a split-plot design for which the whole plots are arranged in a randomized block<br />

design. <strong>The</strong> appropriate PROC <strong>MIXED</strong> statements are as follows:<br />

Here<br />

proc mixed;<br />

class a b block;<br />

model y = a|b;<br />

random block a*block;<br />

run;<br />

R D 2 I24<br />

and X, Z, and G have the following form:<br />

2<br />

1 1 1 1<br />

3<br />

6 1<br />

6<br />

1<br />

6<br />

1<br />

6 1<br />

6 1<br />

6<br />

X D 6 :<br />

6 1<br />

6 1<br />

6 1<br />

6 1<br />

4 1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

:<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

:<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

:<br />

1<br />

1<br />

1<br />

7<br />

1 7<br />

5<br />

1 1 1 1


2<br />

6<br />

Z D 6<br />

4<br />

2<br />

6<br />

G D 6<br />

4<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

2<br />

B<br />

2<br />

B<br />

2<br />

B<br />

2<br />

B<br />

2<br />

AB<br />

2<br />

AB<br />

: ::<br />

2<br />

AB<br />

3<br />

7<br />

5<br />

Mixed Models <strong>The</strong>ory ✦ 3967<br />

where 2 B is the variance component for Block and 2 AB is the variance component for A*Block.<br />

Changing the RANDOM statement as follows fits the same model, but with Z and G sorted differently:<br />

random int a / subject=block;<br />

3<br />

7<br />

5


3968 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

2<br />

6<br />

Z D 6<br />

4<br />

2<br />

6<br />

G D 6<br />

4<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

2<br />

B<br />

2<br />

AB<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

2<br />

AB<br />

2<br />

AB<br />

: ::<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

2<br />

B<br />

2<br />

AB<br />

Estimating Covariance Parameters in the Mixed Model<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

1 1<br />

Estimation is more difficult in the mixed model than in the general linear model. Not only do you<br />

have ˇ as in the general linear model, but you have unknown parameters in , G, and R as well.<br />

Least squares is no longer the best method. Generalized least squares (GLS) is more appropriate,<br />

minimizing<br />

.y Xˇ/ 0 V 1 .y Xˇ/<br />

However, it requires knowledge of V and, therefore, knowledge of G and R. Lacking such information,<br />

one approach is to use estimated GLS, in which you insert some reasonable estimate for V<br />

into the minimization problem. <strong>The</strong> goal thus becomes finding a reasonable estimate of G and R.<br />

2<br />

AB<br />

2<br />

AB<br />

3<br />

7<br />

5<br />

3<br />

7<br />

5


Mixed Models <strong>The</strong>ory ✦ 3969<br />

In many situations, the best approach is to use likelihood-based methods, exploiting the assumption<br />

that and are normally distributed (Hartley and Rao 1967; Patterson and Thompson 1971;<br />

Harville 1977; Laird and Ware 1982; Jennrich and Schluchter 1986). PROC <strong>MIXED</strong> implements<br />

two likelihood-based methods: maximum likelihood (ML) and restricted/residual maximum likelihood<br />

(REML). A favorable theoretical property of ML and REML is that they accommodate data<br />

that are missing at random (Rubin 1976; Little 1995).<br />

PROC <strong>MIXED</strong> constructs an objective function associated with ML or REML and maximizes it<br />

over all unknown parameters. Using calculus, it is possible to reduce this maximization problem<br />

to one over only the parameters in G and R. <strong>The</strong> corresponding log-likelihood functions are as<br />

follows:<br />

ML W l.G; R/ D 1<br />

log jVj<br />

2<br />

REML W lR.G; R/ D 1<br />

log jVj<br />

2<br />

1<br />

2 r0 V 1 r<br />

1<br />

2 log jX0 V 1 Xj<br />

n<br />

log.2 /<br />

2<br />

1<br />

2 r0 V 1 r<br />

n p<br />

2<br />

log.2 /g<br />

where r D y X.X 0 V 1 X/ X 0 V 1 y and p is the rank of X. PROC <strong>MIXED</strong> actually minimizes<br />

2 times these functions by using a ridge-stabilized Newton-Raphson algorithm. Lindstrom and<br />

Bates (1988) provide reasons for preferring Newton-Raphson to the Expectation-Maximum (EM)<br />

algorithm described in Dempster, Laird, and Rubin (1977) and Laird, Lange, and Stram (1987), as<br />

well as analytical details for implementing a QR-decomposition approach to the problem. Wolfinger,<br />

Tobias, and Sall (1994) present the sweep-based algorithms that are implemented in PROC<br />

<strong>MIXED</strong>.<br />

One advantage of using the Newton-Raphson algorithm is that the second derivative matrix of the<br />

objective function evaluated at the optima is available upon completion. Denoting this matrix H,<br />

the asymptotic theory of maximum likelihood (see Serfling 1980) shows that 2H 1 is an asymptotic<br />

variance-covariance matrix of the estimated parameters of G and R. Thus, tests and confidence<br />

intervals based on asymptotic normality can be obtained. However, these can be unreliable in small<br />

samples, especially for parameters such as variance components that have sampling distributions<br />

that tend to be skewed to the right.<br />

If a residual variance 2 is a part of your mixed model, it can usually be profiled out of the likelihood.<br />

This means solving analytically for the optimal 2 and plugging this expression back into the<br />

likelihood formula (see Wolfinger, Tobias, and Sall 1994). This reduces the number of optimization<br />

parameters by one and can improve convergence properties. PROC <strong>MIXED</strong> profiles the residual<br />

variance out of the log likelihood whenever it appears reasonable to do so. This includes the case<br />

when R equals 2 I and when it has blocks with a compound symmetry, time series, or spatial structure.<br />

PROC <strong>MIXED</strong> does not profile the log likelihood when R has unstructured blocks, when you<br />

use the HOLD= or NOITER option in the PARMS statement, or when you use the NOPROFILE<br />

option in the PROC <strong>MIXED</strong> statement.<br />

Instead of ML or REML, you can use the noniterative MIVQUE0 method to estimate G and R (Rao<br />

1972; LaMotte 1973; Wolfinger, Tobias, and Sall 1994). In fact, by default PROC <strong>MIXED</strong> uses<br />

MIVQUE0 estimates as starting values for the ML and REML procedures. For variance component<br />

models, another estimation method involves equating Type 1, 2, or 3 expected mean squares to<br />

their observed values and solving the resulting system. However, Swallow and Monahan (1984)<br />

present simulation evidence favoring REML and ML over MIVQUE0 and other method-of-moment<br />

estimators.


3970 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Estimating Fixed and Random Effects in the Mixed Model<br />

ML, REML, MIVQUE0, or Type1–Type3 provide estimates of G and R, which are denoted bG and<br />

bR, respectively. To obtain estimates of ˇ and , the standard method is to solve the mixed model<br />

equations (Henderson 1984):<br />

"<br />

X 0bR 1 X X 0bR 1 Z<br />

Z 0bR 1 X Z 0bR 1 Z C bG 1<br />

<strong>The</strong> solutions can also be written as<br />

bˇ D .X 0bV 1 X/ X 0bV 1 y<br />

b D bGZ 0bV 1 .y Xbˇ/<br />

#<br />

bˇ<br />

b D<br />

"<br />

X 0bR 1 y<br />

Z 0bR 1 y<br />

and have connections with empirical Bayes estimators (Laird and Ware 1982, Carlin and Louis<br />

1996).<br />

Note that the mixed model equations are extended normal equations and that the preceding expression<br />

assumes that bG is nonsingular. For the extreme case where the eigenvalues of bG are very large,<br />

bG 1 contributes very little to the equations and b is close to what it would be if actually contained<br />

fixed-effects parameters. On the other hand, when the eigenvalues of bG are very small, bG 1 dominates<br />

the equations and b is close to 0. For intermediate cases, bG 1 can be viewed as shrinking the<br />

fixed-effects estimates of toward 0 (Robinson 1991).<br />

If bG is singular, then the mixed model equations are modified (Henderson 1984) as follows:<br />

"<br />

X 0bR 1X X 0bR 1 bL<br />

ZbL<br />

0Z 0bR 1X bL 0Z 0bR 1 #<br />

ZbL C I<br />

bˇ<br />

b D<br />

"<br />

X 0bR 1y bL 0Z 0bR 1 #<br />

y<br />

where bL is the lower-triangular Cholesky root of bG, satisfying bG D bLbL 0 . Both b and a generalized<br />

inverse of the left-hand-side coefficient matrix are then transformed by using bL to determine b.<br />

An example of when the singular form of the equations is necessary is when a variance component<br />

estimate falls on the boundary constraint of 0.<br />

Model Selection<br />

<strong>The</strong> previous section on estimation assumes the specification of a mixed model in terms of X, Z,<br />

G, and R. Even though X and Z have known elements, their specific form and construction are<br />

flexible, and several possibilities can present themselves for a particular data set. Likewise, several<br />

different covariance structures for G and R might be reasonable.<br />

Space does not permit a thorough discussion of model selection, but a few brief comments and<br />

references are in order. First, subject matter considerations and objectives are of great importance<br />

when selecting a model; see Diggle (1988) and Lindsey (1993).<br />

Second, when the data themselves are looked to for guidance, many of the graphical methods and<br />

diagnostics appropriate for the general linear model extend to the mixed model setting as well<br />

(Christensen, Pearson, and Johnson 1992).<br />

#


Mixed Models <strong>The</strong>ory ✦ 3971<br />

Finally, a likelihood-based approach to the mixed model provides several statistical measures for<br />

model adequacy as well. <strong>The</strong> most common of these are the likelihood ratio test and Akaike’s and<br />

Schwarz’s criteria (Bozdogan 1987; Wolfinger 1993; Keselman et al. 1998, 1999).<br />

Statistical Properties<br />

If G and R are known, bˇ is the best linear unbiased estimator (BLUE) of ˇ, and b is the best linear<br />

unbiased predictor (BLUP) of (Searle 1971; Harville 1988, 1990; Robinson 1991; McLean,<br />

Sanders, and Stroup 1991). Here, “best” means minimum mean squared error. <strong>The</strong> covariance<br />

matrix of .bˇ ˇ; b / is<br />

C D X0 R 1 X X 0 R 1 Z<br />

Z 0 R 1 X Z 0 R 1 Z C G 1<br />

where denotes a generalized inverse (see Searle 1971).<br />

However, G and R are usually unknown and are estimated by using one of the aforementioned<br />

methods. <strong>The</strong>se estimates, bG and bR, are therefore simply substituted into the preceding expression<br />

to obtain<br />

bC D<br />

"<br />

X 0bR 1 X X 0bR 1 Z<br />

Z 0bR 1 X Z 0bR 1 Z C bG 1<br />

#<br />

as the approximate variance-covariance matrix of .bˇ ˇ; b ). In this case, the BLUE and BLUP<br />

acronyms no longer apply, but the word empirical is often added to indicate such an approximation.<br />

<strong>The</strong> appropriate acronyms thus become EBLUE and EBLUP.<br />

McLean and Sanders (1988) show that bC can also be written as<br />

"<br />

bC11 bC D<br />

bC 0 21<br />

#<br />

where<br />

bC21 bC22<br />

bC11 D .X 0bV 1 X/<br />

bC21 D bGZ 0bV 1 XbC11<br />

bC22 D .Z 0bR 1 Z C bG 1 / 1 bC21X 0bV 1 ZbG<br />

Note that bC11 is the familiar estimated generalized least squares formula for the variance-covariance<br />

matrix of bˇ.<br />

As a cautionary note, bC tends to underestimate the true sampling variability of<br />

(bˇ b) because no account is made for the uncertainty in estimating G and R. Although inflation<br />

factors have been proposed (Kackar and Harville 1984; Kass and Steffey 1989; Prasad and<br />

Rao 1990), they tend to be small for data sets that are fairly well balanced. PROC <strong>MIXED</strong> does not<br />

compute any inflation factors by default, but rather accounts for the downward bias by using the<br />

approximate t and F statistics described subsequently. <strong>The</strong> DDFM=KENWARDROGER option<br />

in the MODEL statement prompts PROC <strong>MIXED</strong> to compute a specific inflation factor along with<br />

Satterthwaite-based degrees of freedom.


3972 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Inference and Test Statistics<br />

For inferences concerning the covariance parameters in your model, you can use likelihood-based<br />

statistics. One common likelihood-based statistic is the Wald Z, which is computed as the parameter<br />

estimate divided by its asymptotic standard error. <strong>The</strong> asymptotic standard errors are computed from<br />

the inverse of the second derivative matrix of the likelihood with respect to each of the covariance<br />

parameters. <strong>The</strong> Wald Z is valid for large samples, but it can be unreliable for small data sets<br />

and for parameters such as variance components, which are known to have a skewed or bounded<br />

sampling distribution.<br />

A better alternative is the likelihood ratio 2 statistic. This statistic compares two covariance models,<br />

one a special case of the other. To compute it, you must run PROC <strong>MIXED</strong> twice, once for<br />

each of the two models, and then subtract the corresponding values of 2 times the log likelihoods.<br />

You can use either ML or REML to construct this statistic, which tests whether the full model is<br />

necessary beyond the reduced model.<br />

As long as the reduced model does not occur on the boundary of the covariance parameter space,<br />

the 2 statistic computed in this fashion has a large-sample 2 distribution that is 2 with degrees<br />

of freedom equal to the difference in the number of covariance parameters between the two models.<br />

If the reduced model does occur on the boundary of the covariance parameter space, the asymptotic<br />

distribution becomes a mixture of 2 distributions (Self and Liang 1987). A common example of<br />

this is when you are testing that a variance component equals its lower boundary constraint of 0.<br />

A final possibility for obtaining inferences concerning the covariance parameters is to simulate or<br />

resample data from your model and construct empirical sampling distributions of the parameters.<br />

<strong>The</strong> <strong>SAS</strong> macro language and the ODS system are useful tools in this regard.<br />

F and t Tests for Fixed- and Random-Effects Parameters<br />

For inferences concerning the fixed- and random-effects parameters in the mixed model, consider<br />

estimable linear combinations of the following form:<br />

L ˇ<br />

<strong>The</strong> estimability requirement (Searle 1971) applies only to the ˇ portion of L, because any linear<br />

combination of is estimable. Such a formulation in terms of a general L matrix encompasses a<br />

wide variety of common inferential procedures such as those employed with Type 1–Type 3 tests<br />

and LS-means. <strong>The</strong> CONTRAST and ESTIMATE statements in PROC <strong>MIXED</strong> enable you to<br />

specify your own L matrices. Typically, inference on fixed effects is the focus, and, in this case, the<br />

portion of L is assumed to contain all 0s.<br />

Statistical inferences are obtained by testing the hypothesis<br />

H W L ˇ D 0<br />

or by constructing point and interval estimates.


Mixed Models <strong>The</strong>ory ✦ 3973<br />

When L consists of a single row, a general t statistic can be constructed as follows (see McLean and<br />

Sanders 1988, Stroup 1989a):<br />

t D<br />

L bˇ<br />

b<br />

p<br />

LbCL 0<br />

Under the assumed normality of and , t has an exact t distribution only for data exhibiting<br />

certain types of balance and for some special unbalanced cases. In general, t is only approximately<br />

t-distributed, and its degrees of freedom must be estimated. See the DDFM= option for a description<br />

of the various degrees-of-freedom methods available in PROC <strong>MIXED</strong>.<br />

With b being the approximate degrees of freedom, the associated confidence interval is<br />

L bˇ<br />

b ˙ t q<br />

LbCL b;˛=2<br />

0<br />

where t b;˛=2 is the .1 ˛=2/100th percentile of the t b distribution.<br />

When the rank of L is greater than 1, PROC <strong>MIXED</strong> constructs the following general F statistic:<br />

F D<br />

bˇ<br />

b<br />

0<br />

L 0 .LbCL 0 / 1 L bˇ<br />

b<br />

r<br />

where r D rank.LbCL 0 /. Analogous to t, F in general has an approximate F distribution with r<br />

numerator degrees of freedom and b denominator degrees of freedom.<br />

<strong>The</strong> t and F statistics enable you to make inferences about your fixed effects, which account for<br />

the variance-covariance model you select. An alternative is the 2 statistic associated with the<br />

likelihood ratio test. This statistic compares two fixed-effects models, one a special case of the<br />

other. It is computed just as when comparing different covariance models, although you should use<br />

ML and not REML here because the penalty term associated with restricted likelihoods depends<br />

upon the fixed-effects specification.<br />

F Tests With the ANOVAF Option<br />

<strong>The</strong> ANOVAF option computes F tests by the following method in models with REPEATED statement<br />

and without RANDOM statement. Let L denote the matrix of estimable functions for the<br />

hypothesis H W Lˇ D 0, where ˇ are the fixed-effects parameters. Let M D L 0 .LL 0 / L, and<br />

suppose that bC denotes the estimated variance-covariance matrix of bˇ (see the section “Statistical<br />

Properties” for the construction of bC).<br />

<strong>The</strong> ANOVAF F statistics are computed as<br />

FA D bˇ 0 L 0 LL 0 .<br />

1<br />

Lbˇ t1 D bˇ 0 .<br />

Mbˇ<br />

t1<br />

Notice that this is a modification of the usual F statistic where .LbCL 0 / 1 is replaced with .LL 0 / 1<br />

and rank.L/ is replaced with t1 D trace.MbC/; see, for example, Brunner, Domhof, and Langer


3974 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

(2002, Sec. 5.4). <strong>The</strong> p-values for this statistic are computed from either an F 1; 2 or an F 1;1<br />

distribution. <strong>The</strong> respective degrees of freedom are determined by the <strong>MIXED</strong> procedure as follows:<br />

1 D<br />

t 2 1<br />

trace.MbCMbC/<br />

2 D 2t 2 1<br />

g 0 Ag<br />

2 D maxfminf 2 ; dfeg; 1g g 0 Ag > 1E3 MACEPS<br />

1 otherwise<br />

<strong>The</strong> term g 0 Ag in the term 2 for the denominator degrees of freedom is based on approximating<br />

VarŒtrace.MbC/ based on a first-order Taylor series about the true covariance parameters. This<br />

generalizes results in the appendix of Brunner, Dette, and Munk (1997) to a broader class of models.<br />

<strong>The</strong> vector g D Œg1; ; gq contains the partial derivatives<br />

trace<br />

L 0 LL 0 !<br />

1 @bC<br />

L<br />

@ i<br />

and A is the asymptotic variance-covariance matrix of the covariance parameter estimates<br />

(ASYCOV option).<br />

PROC <strong>MIXED</strong> reports 1 and 2 as “NumDF” and “DenDF” under the “ANOVA F” heading in the<br />

output. <strong>The</strong> corresponding p-values are denoted as “Pr > F(DDF)” for F 1; 2 and “Pr > F(infty)”<br />

for F 1;1, respectively.<br />

P-values computed with the ANOVAF option can be identical to the nonparametric tests in Akritas,<br />

Arnold, and Brunner (1997) and in Brunner, Domhof, and Langer (2002), provided that the<br />

response data consist of properly created (and sorted) ranks and that the covariance parameters are<br />

estimated by MIVQUE0 in models with REPEATED statement and properly chosen SUBJECT=<br />

and/or GROUP= effects.<br />

If you model an unstructured covariance matrix in a longitudinal model with one or more repeated<br />

factors, the ANOVAF results are identical to a multivariate MANOVA where degrees of freedom<br />

are corrected with the Greenhouse-Geiser adjustment (Greenhouse and Geiser 1959). For example,<br />

suppose that factor A has 2 levels and factor B has 4 levels. <strong>The</strong> following two sets of statements<br />

produce the same p-values:<br />

proc mixed data=Mydata anovaf method=mivque0;<br />

class id A B;<br />

model score = A | B / chisq;<br />

repeated / type=un subject=id;<br />

ods select Tests3;<br />

run;<br />

proc transpose data=MyData out=tdata;<br />

by id;<br />

var score;<br />

proc glm data=tdata;<br />

model col: = / nouni;


Parameterization of Mixed Models ✦ 3975<br />

repeated A 2, B 4;<br />

ods output ModelANOVA=maov epsilons=eps;<br />

run;<br />

proc transpose data=eps(where=(substr(statistic,1,3)=’Gre’)) out=teps;<br />

var cvalue1;<br />

run;<br />

data aov; set maov;<br />

if (_n_ = 1) then merge teps;<br />

if (Source=’A’) then do;<br />

pFddf = ProbF;<br />

pFinf = 1 - probchi(df*Fvalue,df);<br />

output;<br />

end; else if (Source=’B’) then do;<br />

pFddf = ProbFGG;<br />

pFinf = 1 - probchi(df*col1*Fvalue,df*col1);<br />

output;<br />

end; else if (Source=’A*B’) then do;<br />

pfddF = ProbFGG;<br />

pFinf = 1 - probchi(df*col2*Fvalue,df*col2);<br />

output;<br />

end;<br />

proc print data=aov label noobs;<br />

label Source = ’Effect’<br />

df = ’NumDF’<br />

Fvalue = ’Value’<br />

pFddf = ’Pr > F(DDF)’<br />

pFinf = ’Pr > F(infty)’;<br />

var Source df Fvalue pFddf pFinf;<br />

format pF: pvalue6.;<br />

run;<br />

<strong>The</strong> PROC GLM code produces p-values that correspond to the ANOVAF p-values shown as Pr ><br />

F(DDF) in the <strong>MIXED</strong> output. <strong>The</strong> subsequent DATA step computes the p-values that correspond<br />

to Pr > F(infty) in the PROC <strong>MIXED</strong> output.<br />

Parameterization of Mixed Models<br />

Recall that a mixed model is of the form<br />

y D Xˇ C Z C<br />

where y represents univariate data, ˇ is an unknown vector of fixed effects with known model matrix<br />

X, is an unknown vector of random effects with known model matrix Z, and is an unknown<br />

random error vector.<br />

PROC <strong>MIXED</strong> constructs a mixed model according to the specifications in the MODEL,<br />

RANDOM, and REPEATED statements. Each effect in the MODEL statement generates one or<br />

more columns in the model matrix X, and each effect in the RANDOM statement generates one or<br />

more columns in the model matrix Z. Effects in the REPEATED statement do not generate model


3976 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

matrices; they serve only to index observations within subjects. This section shows precisely how<br />

PROC <strong>MIXED</strong> builds X and Z.<br />

Intercept<br />

By default, all models automatically include a column of 1s in X to estimate a fixed-effect intercept<br />

parameter . You can use the NOINT option in the MODEL statement to suppress this intercept.<br />

<strong>The</strong> NOINT option is useful when you are specifying a classification effect in the MODEL statement<br />

and you want the parameter estimate to be in terms of the mean response for each level of that effect,<br />

rather than in terms of a deviation from an overall mean.<br />

By contrast, the intercept is not included by default in Z. To obtain a column of 1s in Z, you must<br />

specify in the RANDOM statement either the INTERCEPT effect or some effect that has only one<br />

level.<br />

Regression Effects<br />

Numeric variables, or polynomial terms involving them, can be included in the model as regression<br />

effects (covariates). <strong>The</strong> actual values of such terms are included as columns of the model matrices<br />

X and Z. You can use the bar operator with a regression effect to generate polynomial effects. For<br />

instance, X|X|X expands to X X*X X*X*X, a cubic model.<br />

Main Effects<br />

If a classification variable has m levels, PROC <strong>MIXED</strong> generates m columns in the model matrix<br />

for its main effect. Each column is an indicator variable for a given level. <strong>The</strong> order of the columns<br />

is the sort order of the values of their levels and can be controlled with the ORDER= option in the<br />

PROC <strong>MIXED</strong> statement. Table 56.16 is an example.<br />

Table 56.16 Example of Main Effects<br />

Data I A B<br />

A B A1 A2 B1 B2 B3<br />

1 1 1 1 0 1 0 0<br />

1 2 1 1 0 0 1 0<br />

1 3 1 1 0 0 0 1<br />

2 1 1 0 1 1 0 0<br />

2 2 1 0 1 0 1 0<br />

2 3 1 0 1 0 0 1<br />

Typically, there are more columns for these effects than there are degrees of freedom for them. In<br />

other words, PROC <strong>MIXED</strong> uses an overparameterized model.


Interaction Effects<br />

Parameterization of Mixed Models ✦ 3977<br />

Often a model includes interaction (crossed) effects. With an interaction, PROC <strong>MIXED</strong> first reorders<br />

the terms to correspond to the order of the variables in the CLASS statement. Thus, B*A<br />

becomes A*B if A precedes B in the CLASS statement. <strong>The</strong>n, PROC <strong>MIXED</strong> generates columns for<br />

all combinations of levels that occur in the data. <strong>The</strong> order of the columns is such that the rightmost<br />

variables in the cross index faster than the leftmost variables (Table 56.17). Empty columns (that<br />

would contain all 0s) are not generated for X, but they are for Z.<br />

Table 56.17 Example of Interaction Effects<br />

Data I A B A*B<br />

A B A1 A2 B1 B2 B3 A1B1 A1B2 A1B3 A2B1 A2B2 A2B3<br />

1 1 1 1 0 1 0 0 1 0 0 0 0 0<br />

1 2 1 1 0 0 1 0 0 1 0 0 0 0<br />

1 3 1 1 0 0 0 1 0 0 1 0 0 0<br />

2 1 1 0 1 1 0 0 0 0 0 1 0 0<br />

2 2 1 0 1 0 1 0 0 0 0 0 1 0<br />

2 3 1 0 1 0 0 1 0 0 0 0 0 1<br />

In the preceding matrix, main-effects columns are not linearly independent of crossed-effects<br />

columns; in fact, the column space for the crossed effects contains the space of the main effect.<br />

When your model contains many interaction effects, you might be able to code them more parsimoniously<br />

by using the bar operator ( | ). <strong>The</strong> bar operator generates all possible interaction effects.<br />

For example, A|B|C expands to A B A*B C A*C B*C A*B*C. To eliminate higher-order interaction<br />

effects, use the at sign (@) in conjunction with the bar operator. For instance, A|B|C|D @2 expands<br />

to A B A*B C A*C B*C D A*D B*D C*D.<br />

Nested Effects<br />

Nested effects are generated in the same manner as crossed effects. Hence, the design columns generated<br />

by the following two statements are the same (but the ordering of the columns is different):<br />

model Y=A B(A);<br />

model Y=A A*B;<br />

<strong>The</strong> nesting operator in PROC <strong>MIXED</strong> is more a notational convenience than an operation distinct<br />

from crossing. Nested effects are typically characterized by the property that the nested variables<br />

never appear as main effects. <strong>The</strong> order of the variables within nesting parentheses is made to<br />

correspond to the order of these variables in the CLASS statement. <strong>The</strong> order of the columns is<br />

such that variables outside the parentheses index faster than those inside the parentheses, and the<br />

rightmost nested variables index faster than the leftmost variables (Table 56.18).


3978 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.18 Example of Nested Effects<br />

Data I A B(A)<br />

A B A1 A2 B1A1 B2A1 B3A1 B1A2 B2A2 B3A2<br />

1 1 1 1 0 1 0 0 0 0 0<br />

1 2 1 1 0 0 1 0 0 0 0<br />

1 3 1 1 0 0 0 1 0 0 0<br />

2 1 1 0 1 0 0 0 1 0 0<br />

2 2 1 0 1 0 0 0 0 1 0<br />

2 3 1 0 1 0 0 0 0 0 1<br />

Note that nested effects are often distinguished from interaction effects by the implied randomization<br />

structure of the design. That is, they usually indicate random effects within a fixed-effects<br />

framework. <strong>The</strong> fact that random effects can be modeled directly in the RANDOM statement might<br />

make the specification of nested effects in the MODEL statement unnecessary.<br />

Continuous-Nesting-Class Effects<br />

When a continuous variable nests with a classification variable, the design columns are constructed<br />

by multiplying the continuous values into the design columns for the class effect (Table 56.19).<br />

Table 56.19 Example of Continuous-Nesting-Class Effects<br />

Data I A X(A)<br />

X A A1 A2 X(A1) X(A2)<br />

21 1 1 1 0 21 0<br />

24 1 1 1 0 24 0<br />

22 1 1 1 0 22 0<br />

28 2 1 0 1 0 28<br />

19 2 1 0 1 0 19<br />

23 2 1 0 1 0 23<br />

This model estimates a separate slope for X within each level of A.<br />

Continuous-by-Class Effects<br />

Continuous-by-class effects generate the same design columns as continuous-nesting-class effects.<br />

<strong>The</strong> two models are made different by the presence of the continuous variable as a regressor by<br />

itself, as well as a contributor to a compound effect. Table 56.20 shows an example.


Table 56.20 Example of Continuous-by-Class Effects<br />

Data I X A X*A<br />

Parameterization of Mixed Models ✦ 3979<br />

X A X A1 A2 X*A1 X*A2<br />

21 1 1 21 1 0 21 0<br />

24 1 1 24 1 0 24 0<br />

22 1 1 22 1 0 22 0<br />

28 2 1 28 0 1 0 28<br />

19 2 1 19 0 1 0 19<br />

23 2 1 23 0 1 0 23<br />

You can use continuous-by-class effects to test for homogeneity of slopes.<br />

General Effects<br />

An example that combines all the effects is X1*X2*A*B*C (D E). <strong>The</strong> continuous list comes first,<br />

followed by the crossed list, followed by the nested list in parentheses. You should be aware of<br />

the sequencing of parameters when you use the CONTRAST or ESTIMATE statement to compute<br />

some function of the parameter estimates.<br />

Effects might be renamed by PROC <strong>MIXED</strong> to correspond to ordering rules. For example, B*A(E<br />

D) might be renamed A*B(D E) to satisfy the following:<br />

Classification variables that occur outside parentheses (crossed effects) are sorted in the order<br />

in which they appear in the CLASS statement.<br />

Variables within parentheses (nested effects) are sorted in the order in which they appear in<br />

the CLASS statement.<br />

<strong>The</strong> sequencing of the parameters generated by an effect can be described by which variables have<br />

their levels indexed faster:<br />

Variables in the crossed list index faster than variables in the nested list.<br />

Within a crossed or nested list, variables to the right index faster than variables to the left.<br />

For example, suppose a model includes four effects—A, B, C, and D—each having two levels, 1 and<br />

2. Suppose the CLASS statement is as follows:<br />

class A B C D;<br />

<strong>The</strong>n the order of the parameters for the effect B*A(C D), which is renamed A*B (C D), is<br />

A1B1C1D1 ! A1B2C1D1 ! A2B1C1D1 ! A2B2C1D1 !<br />

A1B1C1D2 ! A1B2C1D2 ! A2B1C1D2 ! A2B2C1D2 !<br />

A1B1C2D1 ! A1B2C2D1 ! A2B1C2D1 ! A2B2C2D1 !<br />

A1B1C2D2 ! A1B2C2D2 ! A2B1C2D2 ! A2B2C2D2


3980 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Note that first the crossed effects B and A are sorted in the order in which they appear in the CLASS<br />

statement so that A precedes B in the parameter list. <strong>The</strong>n, for each combination of the nested effects<br />

in turn, combinations of A and B appear. <strong>The</strong> B effect moves fastest because it is rightmost in the<br />

cross list. <strong>The</strong>n A moves next fastest, and D moves next fastest. <strong>The</strong> C effect is the slowest since it<br />

is leftmost in the nested list.<br />

When numeric levels are used, levels are sorted by their character format, which might not correspond<br />

to their numeric sort sequence (for example, noninteger levels). <strong>The</strong>refore, it is advisable to<br />

include a desired format for numeric levels or to use the ORDER=INTERNAL option in the PROC<br />

<strong>MIXED</strong> statement to ensure that levels are sorted by their internal values.<br />

Implications of the Non-Full-Rank Parameterization<br />

For models with fixed effects involving classification variables, there are more design columns in<br />

X constructed than there are degrees of freedom for the effect. Thus, there are linear dependencies<br />

among the columns of X. In this event, all of the parameters are not estimable; there is an infinite<br />

number of solutions to the mixed model equations. PROC <strong>MIXED</strong> uses a generalized inverse (a<br />

g2-inverse, Pringle and Rayner, 1971) to obtain values for the estimates (Searle 1971). <strong>The</strong> solution<br />

values are not displayed unless you specify the SOLUTION option in the MODEL statement. <strong>The</strong><br />

solution has the characteristic that estimates are 0 whenever the design column for that parameter<br />

is a linear combination of previous columns. With this parameterization, hypothesis tests are<br />

constructed to test linear functions of the parameters that are estimable.<br />

Some procedures (such as the CATMOD procedure) reparameterize models to full rank by using<br />

restrictions on the parameters. PROC GLM and PROC <strong>MIXED</strong> do not reparameterize, making the<br />

hypotheses that are commonly tested more understandable. See Goodnight (1978) for additional<br />

reasons for not reparameterizing.<br />

Missing Level Combinations<br />

PROC <strong>MIXED</strong> handles missing level combinations of classification variables similarly to the way<br />

PROC GLM does. Both procedures delete fixed-effects parameters corresponding to missing levels<br />

in order to preserve estimability. However, PROC <strong>MIXED</strong> does not delete missing level combinations<br />

for random-effects parameters because linear combinations of the random-effects parameters<br />

are always estimable. <strong>The</strong>se conventions can affect the way you specify your CONTRAST and<br />

ESTIMATE coefficients.<br />

Residuals and Influence Diagnostics<br />

Residual Diagnostics<br />

Consider a residual vector of the form e D PY, where P is a projection matrix, possibly an oblique<br />

projector. A typical element ei with variance vi and estimated variancebvi is said to be standardized


as<br />

ei ei<br />

p D pvi<br />

VarŒei<br />

and studentized as<br />

ei<br />

p bvi<br />

Residuals and Influence Diagnostics ✦ 3981<br />

External studentization uses an estimate of VarŒei that does not involve the ith observation. Externally<br />

studentized residuals are often preferred over studentized residuals because they have wellknown<br />

distributional properties in standard linear models for independent data.<br />

q<br />

Residuals that are scaled by the estimated variance of the response, i.e., ei= cVarŒYi, are referred<br />

to as Pearson-type residuals.<br />

Marginal and Conditional Residuals<br />

<strong>The</strong> marginal and conditional means in the linear mixed model are EŒY D Xˇ and EŒYj D<br />

Xˇ C Z , respectively. Accordingly, the vector rm of marginal residuals is defined as<br />

rm D Y Xbˇ<br />

and the vector rc of conditional residuals is<br />

rc D Y Xbˇ Zb D rm Zb<br />

Following Gregoire, Schabenberger, and Barrett (1995), let Q D X.X 0bV 1 X/ X 0 and K D I<br />

ZbGZ 0bV 1 . <strong>The</strong>n<br />

cVarŒrm D bV Q<br />

cVarŒrc D K.bV Q/K 0<br />

For an individual observation the raw, studentized, and Pearson-type residuals computed by the<br />

<strong>MIXED</strong> procedure are given in Table 56.21.<br />

Table 56.21 Residual Types Computed by the <strong>MIXED</strong> <strong>Procedure</strong><br />

Type of Residual Marginal Conditional<br />

Raw rmi D Yi x 0 i bˇ rci D rmi z 0 i b<br />

Studentized rstudent mi D rmi p<br />

bVarŒrmi <br />

Pearson r pearson<br />

mi D rmi p<br />

bVarŒYi <br />

rstudent ci D rci p<br />

bVarŒrci <br />

r pearson<br />

ci<br />

D<br />

rci p<br />

bVarŒYi j


3982 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

When the OUTPM= option is specified in addition to the RESIDUAL option in the MODEL statement,<br />

rstudent mi and r pearson<br />

mi are added to the data set as variables Resid, StudentResid, and PearsonResid,<br />

respectively. When the OUTP= option is specified, rstudent ci and r pearson<br />

ci are added to<br />

the data set. Raw residuals are part of the OUTPM= and OUTP= data sets without the RESIDUAL<br />

option.<br />

Scaled Residuals<br />

For correlated data, a set of scaled quantities can be defined through the Cholesky decomposition<br />

of the variance-covariance matrix. Since fitted residuals in linear models are rank-deficient, it is<br />

customary to draw on the variance-covariance matrix of the data. If VarŒY D V and C 0 C D V,<br />

then C 0 1 Y has uniform dispersion and its elements are uncorrelated.<br />

Scaled residuals in a mixed model are meaningful for quantities based on the marginal distribution<br />

of the data. Let bC denote the Cholesky root of bV, so that bC 0bC D bV, and define<br />

Yc D bC 0 1 Y<br />

r m.c/ D bC 0 1 rm<br />

By analogy with other scalings, the inverse Cholesky decomposition can also be applied to the<br />

residual vector, bC 0 1 rm, although V is not the variance-covariance matrix of rm.<br />

To diagnose whether the covariance structure of the model has been specified correctly can be<br />

difficult based on Yc, since the inverse Cholesky transformation affects the expected value of Yc.<br />

You can draw on r m.c/ as a vector of (approximately) uncorrelated data with constant mean.<br />

When the OUTPM= option in the MODEL statement is specified in addition to the VCIRY option,<br />

Yc is added as variable ScaledDep and r m.c/ is added as ScaledResid to the data set.<br />

Influence Diagnostics<br />

Basic Idea and Statistics<br />

<strong>The</strong> general idea of quantifying the influence of one or more observations relies on computing parameter<br />

estimates based on all data points, removing the cases in question from the data, refitting<br />

the model, and computing statistics based on the change between full-data and reduced-data estimation.<br />

Influence statistics can be coarsely grouped by the aspect of estimation that is their primary<br />

target:<br />

overall measures compare changes in objective functions: (restricted) likelihood distance<br />

(Cook and Weisberg 1982, Ch. 5.2)<br />

influence on parameter estimates: Cook’s D (Cook 1977, 1979), MDFFITS (Belsley, Kuh,<br />

and Welsch 1980, p. 32)<br />

influence on precision of estimates: CovRatio and CovTrace<br />

influence on fitted and predicted values: PRESS residual, PRESS statistic (Allen 1974), DF-<br />

FITS (Belsley, Kuh, and Welsch 1980, p. 15)


Residuals and Influence Diagnostics ✦ 3983<br />

outlier properties: internally and externally studentized residuals, leverage<br />

For linear models for uncorrelated data, it is not necessary to refit the model after removing a<br />

data point in order to measure the impact of an observation on the model. <strong>The</strong> change in fixed<br />

effect estimates, residuals, residual sums of squares, and the variance-covariance matrix of the fixed<br />

effects can be computed based on the fit to the full data alone. By contrast, in mixed models<br />

several important complications arise. Data points can affect not only the fixed effects but also the<br />

covariance parameter estimates on which the fixed-effects estimates depend. Furthermore, closedform<br />

expressions for computing the change in important model quantities might not be available.<br />

This section provides background material for the various influence diagnostics available with the<br />

<strong>MIXED</strong> procedure. See the section “Mixed Models <strong>The</strong>ory” on page 3962 for relevant expressions<br />

and definitions. <strong>The</strong> parameter vector denotes all unknown parameters in the R and G matrix.<br />

<strong>The</strong> observations whose influence is being ascertained are represented by the set U and referred to<br />

simply as “the observations in U .” <strong>The</strong> estimate of a parameter vector, such as ˇ, obtained from<br />

all observations except those in the set U is denoted bˇ .U /. In case of a matrix A, the notation<br />

A .U / represents the matrix with the rows in U removed; these rows are collected in AU . If A is<br />

symmetric, then notation A .U / implies removal of rows and columns. <strong>The</strong> vector YU comprises<br />

the responses of the data points being removed, and V .U / is the variance-covariance matrix of<br />

the remaining observations. When k D 1, lowercase notation emphasizes that single points are<br />

removed, such as A .u/.<br />

Managing the Covariance Parameters<br />

An important component of influence diagnostics in the mixed model is the estimated variancecovariance<br />

matrix V D ZGZ 0 CR. To make the dependence on the vector of covariance parameters<br />

explicit, write it as V. /. If one parameter, 2 , is profiled or factored out of V, the remaining<br />

parameters are denoted as . Notice that in a model where G is diagonal and R D 2 I, the<br />

parameter vector contains the ratios of each variance component and 2 (see Wolfinger, Tobias,<br />

and Sall 1994). When ITER=0, two scenarios are distinguished:<br />

1. If the residual variance is not profiled, either because the model does not contain a residual<br />

variance or because it is part of the Newton-Raphson iterations, then b b. .U /<br />

2. If the residual variance is profiled, then b b and b .U /<br />

2<br />

.U / 6D b2 . Influence statistics<br />

such as Cook’s D and internally studentized residuals are based on V.b/, whereas externally<br />

studentized residuals and the DFFITS statistic are based on V.bU / D 2<br />

.U / V.b /. In a<br />

random components model with uncorrelated errors, for example, the computation of V.bU /<br />

involves scaling of bG and bR by the full-data estimate b2 and multiplying the result with the<br />

reduced-data estimate b2 .U / .<br />

Certain statistics, such as MDFFITS, CovRatio, and CovTrace, require an estimate of the variance<br />

of the fixed effects that is based on the reduced number of observations. For example, V.b U / is<br />

evaluated at the reduced-data parameter estimates but computed for the entire data set. <strong>The</strong> matrix<br />

V .U /.b .U //, on the other hand, has rows and columns corresponding to the points in U removed.<br />

<strong>The</strong> resulting matrix is evaluated at the delete-case estimates.


3984 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

When influence analysis is iterative, the entire vector is updated, whether the residual variance<br />

is profiled or not. <strong>The</strong> matrices to be distinguished here are V.b/, V.b .U //, and V .U /.b .U //, with<br />

unambiguous notation.<br />

Predicted Values, PRESS Residual, and PRESS Statistic<br />

An unconditional predicted value is byi D x 0 i bˇ, where the vector xi is the ith row of X. <strong>The</strong> (raw)<br />

residual is given asbi D yi byi, and the PRESS residual is<br />

b i.U / D yi x 0 i bˇ .U /<br />

<strong>The</strong> PRESS statistic is the sum of the squared PRESS residuals,<br />

PRESS D X<br />

i2U<br />

b 2<br />

i.U /<br />

where the sum is over the observations in U .<br />

If EFFECT=, SIZE=, or KEEP= is not specified, PROC <strong>MIXED</strong> computes the PRESS residual<br />

for each observation selected through SELECT= (or all observations if SELECT= is not given). If<br />

EFFECT=, SIZE=, or KEEP= is specified, the procedure computes PRESS.<br />

Leverage<br />

For the general mixed model, leverage can be defined through the projection matrix that results from<br />

a transformation of the model with the inverse of the Cholesky decomposition of V, or through an<br />

oblique projector. <strong>The</strong> <strong>MIXED</strong> procedure follows the latter path in the computation of influence<br />

diagnostics. <strong>The</strong> leverage value reported for the ith observation is the ith diagonal entry of the<br />

matrix<br />

H D X.X 0 V.b/ 1 X/ X 0 V.b/ 1<br />

which is the weight of the observation in contributing to its own predicted value, H D dbY=dY.<br />

While H is idempotent, it is generally not symmetric and thus not a projection matrix in the narrow<br />

sense.<br />

<strong>The</strong> properties of these leverages are generalizations of the properties in models with diagonal<br />

variance-covariance matrices. For example, bY D HY, and in a model with intercept and V D 2 I,<br />

the leverage values<br />

hii D x 0 i .X0 X/ xi<br />

are h l ii D 1=n hii 1 D h u ii and P n<br />

iD1 hii D rank.X/. <strong>The</strong> lower bound for hii is achieved<br />

in an intercept-only model, and the upper bound is achieved in a saturated model. <strong>The</strong> trace of H<br />

equals the rank of X.<br />

If ij denotes the element in row i, column j of V 1 , then for a model containing only an intercept<br />

the diagonal elements of H are<br />

hii D<br />

Pn j D1 ij<br />

Pn Pn iD1 j D1 ij


Residuals and Influence Diagnostics ✦ 3985<br />

Because P n<br />

j D1 ij is a sum of elements in the ith row of the inverse variance-covariance matrix, hii<br />

can be negative, even if the correlations among data points are nonnegative. In case of a saturated<br />

model with X D I, hii D 1:0.<br />

Internally and Externally Studentized Residuals<br />

See the section “Residual Diagnostics” on page 3980 for the distinction between standardization,<br />

studentization, and scaling of residuals. Internally studentized marginal and conditional residuals<br />

are computed with the RESIDUAL option of the MODEL statement. <strong>The</strong> INFLUENCE option<br />

computes internally and externally studentized marginal residuals.<br />

<strong>The</strong> computation of internally studentized residuals relies on the diagonal entries of V.b/ Q.b/,<br />

where Q.b/ D X.X 0 V.b/ 1 X/ X 0 . Externally studentized residuals require iterative influence<br />

analysis or a profiled residual variance. In the former case the studentization is based on V.b U /; in<br />

the latter case it is based on 2<br />

.U / V.b /.<br />

Cook’s D<br />

Cook’s D statistic is an invariant norm that measures the influence of observations in U on a vector<br />

of parameter estimates (Cook 1977). In case of the fixed-effects coefficients, let<br />

ı .U / D bˇ bˇ .U /<br />

<strong>The</strong>n the <strong>MIXED</strong> procedure computes<br />

D.ˇ/ D ı 0<br />

.U / cVarŒbˇ ı .U /=rank.X/<br />

where cVarŒbˇ is the matrix that results from sweeping .X 0 V.b/ 1 X/ .<br />

If V is known, Cook’s D can be calibrated according to a chi-square distribution with degrees of<br />

freedom equal to the rank of X (Christensen, Pearson, and Johnson 1992). For estimated V the<br />

calibration can be carried out according to an F .rank.X/; n rank.X// distribution. To interpret D<br />

on a familiar scale, Cook (1979) and Cook and Weisberg (1982, p. 116) refer to the 50th percentile<br />

of the reference distribution. If D is equal to that percentile, then removing the points in U moves<br />

the fixed-effects coefficient vector from the center of the confidence region to the 50% confidence<br />

ellipsoid (Myers 1990, p. 262).<br />

In the case of iterative influence analysis, the <strong>MIXED</strong> procedure also computes a D-type statistic<br />

for the covariance parameters. If is the asymptotic variance-covariance matrix of b, then <strong>MIXED</strong><br />

computes<br />

D D .b b .U /// 0b 1 .b b .U //<br />

DFFITS and MDFFITS<br />

A DFFIT measures the change in predicted values due to removal of data points. If this change is<br />

standardized by the externally estimated standard error of the predicted value in the full data, the<br />

DFFITS statistic of Belsley, Kuh, and Welsch (1980, p. 15) results:<br />

DFFITSi D .byi by i.u//=ese.byi/


3986 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> <strong>MIXED</strong> procedure computes DFFITS when the EFFECT= or SIZE= modifier of the<br />

INFLUENCE option is not in effect. In general, an external estimate of the estimated standard<br />

error is used. When ITER > 0, the estimate is<br />

ese.byi/ D<br />

q<br />

x 0 i .X0 V.b .u// X/ 1 xi<br />

When ITER=0 and 2 is profiled, then<br />

q<br />

ese.byi/ D b .u/<br />

x 0 i .X0 V.b / 1 X/ xi<br />

When the EFFECT=, SIZE=, or KEEP= modifier is specified, the <strong>MIXED</strong> procedure computes a<br />

multivariate version suitable for the deletion of multiple data points. <strong>The</strong> statistic, termed MDFFITS<br />

after the MDFFIT statistic of Belsley, Kuh, and Welsch (1980, p. 32), is closely related to Cook’s<br />

D. Consider the case V D 2 V. / so that<br />

VarŒbˇ D 2 .X 0 V. / 1 X/<br />

and let fVarŒbˇ .U / be an estimate of VarŒbˇ .U / that does not use the observations in U . <strong>The</strong> MDF-<br />

FITS statistic is then computed as<br />

MDFFITS.ˇ/ D ı 0<br />

.U / fVarŒbˇ .U / ı .U /=rank.X/<br />

If ITER=0 and 2 is profiled, then fVarŒbˇ .U / is obtained by sweeping<br />

b 2<br />

.U / .X0<br />

.U / V .U /.b / X .U //<br />

<strong>The</strong> underlying idea is that if were known, then<br />

.X 0<br />

.U / V .U /. / 1 X .U //<br />

would be VarŒbˇ= 2 in a generalized least squares regression with all but the data in U .<br />

In the case of iterative influence analysis, fVarŒbˇ .U / is evaluated at b .U /. Furthermore, a MDFFITStype<br />

statistic is then computed for the covariance parameters:<br />

MDFFITS. / D .b b .U // 0 cVarŒb .U / 1 .b b .U //<br />

Covariance Ratio and Trace<br />

<strong>The</strong>se statistics depend on the availability of an external estimate of V, or at least of 2 . Whereas<br />

Cook’s D and MDFFITS measure the impact of data points on a vector of parameter estimates, the<br />

covariance-based statistics measure impact on their precision. Following Christensen, Pearson, and<br />

Johnson (1992), the <strong>MIXED</strong> procedure computes<br />

CovTrace.ˇ/ D jtrace. cVarŒbˇ fVarŒbˇ .U // rank.X/j<br />

CovRatio.ˇ/ D detns. fVarŒbˇ .U //<br />

detns. cVarŒbˇ/<br />

where detns.M/ denotes the determinant of the nonsingular part of matrix M.


Residuals and Influence Diagnostics ✦ 3987<br />

In the case of iterative influence analysis these statistics are also computed for the covariance parameter<br />

estimates. If q denotes the rank of VarŒb, then<br />

CovTrace. / D jtrace. cVarŒb cVarŒb .U // qj<br />

CovRatio. / D detns. cVarŒb .U //<br />

detns. cVarŒb/<br />

Likelihood Distances<br />

<strong>The</strong> log-likelihood function l and restricted log-likelihood function lR of the linear mixed model<br />

are given in the section “Estimating Covariance Parameters in the Mixed Model” on page 3968.<br />

Denote as the collection of all parameters, i.e., the fixed effects ˇ and the covariance parameters<br />

. Twice the difference between the (restricted) log-likelihood evaluated at the full-data estimates<br />

b and at the reduced-data estimates b .U / is known as the (restricted) likelihood distance:<br />

RLD .U / D 2flR.b / lR.b .U //g<br />

LD .U / D 2fl.b / l.b .U //g<br />

Cook and Weisberg (1982, Ch. 5.2) refer to these differences as likelihood distances, Beckman,<br />

Nachtsheim, and Cook (1987) call the measures likelihood displacements. If the number of elements<br />

in that are subject to updating following point removal is q, then likelihood displacements can be<br />

compared against cutoffs from a chi-square distribution with q degrees of freedom. Notice that this<br />

reference distribution does not depend on the number of observations removed from the analysis,<br />

but rather on the number of model parameters that are updated. <strong>The</strong> likelihood displacement gives<br />

twice the amount by which the log likelihood of the full data changes if one were to use an estimate<br />

based on fewer data points. It is thus a global, summary measure of the influence of the observations<br />

in U jointly on all parameters.<br />

Unless METHOD=ML, the <strong>MIXED</strong> procedure computes the likelihood displacement based on the<br />

residual (=restricted) log likelihood, even if METHOD=MIVQUE0 or METHOD=TYPE1, TYPE2,<br />

or TYPE3.<br />

Noniterative Update Formulas<br />

Update formulas that do not require refitting of the model are available for the cases where V D<br />

2 I, V is known, or V is known. When ITER=0 and these update formulas can be invoked, the<br />

<strong>MIXED</strong> procedure uses the computational devices that are outlined in the following paragraphs. It<br />

is then assumed that the variance-covariance matrix of the fixed effects has the form .X 0 V 1 X/ .<br />

When DDFM=KENWARDROGER, this is not the case; the estimated variance-covariance matrix<br />

is then inflated to better represent the uncertainty in the estimated covariance parameters. Influence<br />

statistics when DDFM=KENWARDROGER should iteratively update the covariance parameters<br />

(ITER > 0). <strong>The</strong> dependence of V on is suppressed in the sequel for brevity.<br />

Updating the Fixed Effects Denote by U the .n k/ matrix that is assembled from k columns<br />

of the identity matrix. Each column of U corresponds to the removal of one data point. <strong>The</strong> point<br />

being targeted by the ith column of U corresponds to the row in which a 1 appears. Furthermore,


3988 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

define<br />

D .X 0 V 1 X/<br />

Q D X X 0<br />

P D V 1 .V Q/V 1<br />

<strong>The</strong> change in the fixed-effects estimates following removal of the observations in U is<br />

bˇ bˇ .U / D X 0 V 1 U.U 0 PU/ 1 U 0 V 1 .y Xbˇ/<br />

Using results in Cook and Weisberg (1982, A2) you can further compute<br />

e D .X 0 1<br />

.U / V.U / X .U // D C X 0 V 1 U.U 0 PU/ 1 U 0 V 1 X<br />

If X is .n p/ of rank m < p, then is deficient in rank and the <strong>MIXED</strong> procedure computes<br />

needed quantities in e by sweeping (Goodnight 1979). If the rank of the .k k/ matrix U0PU is<br />

less than k, the removal of the observations introduces a new singularity, whether X is of full rank or<br />

not. <strong>The</strong> solution vectors bˇ and bˇ .U / then do not have the same expected values and should not be<br />

compared. When the <strong>MIXED</strong> procedure encounters this situation, influence diagnostics that depend<br />

on the choice of generalized inverse are not computed. <strong>The</strong> procedure also monitors the singularity<br />

criteria when sweeping the rows of .X0V 1X/ and of .X0 1 V .U / .U / X .U // . If a new singularity is<br />

encountered or a former singularity disappears, no influence statistics are computed.<br />

Residual Variance When 2 is profiled out of the marginal variance-covariance matrix, a closedform<br />

estimate of 2 that is based on only the remaining observations can be computed provided<br />

V D V.b / is known. Hurtado (1993, Thm. 5.2) shows that<br />

.n q r/b 2<br />

.U / D .n q/b2 b 0 U .b2 U 0 PU/ 1 bU<br />

and bU D U 0 V 1 .y Xbˇ/. In the case of maximum likelihood estimation q D 0 and for REML<br />

estimation q D rank.X/. <strong>The</strong> constant r equals the rank of .U 0 PU/ for REML estimation and the<br />

number of effective observations that are removed if METHOD=ML.<br />

Likelihood Distances For noniterative methods the following computational devices are used to<br />

compute (restricted) likelihood distances provided that the residual variance 2 is profiled.<br />

<strong>The</strong> log likelihood function l.b/ evaluated at the full-data and reduced-data estimates can be written<br />

as<br />

l.b / D n<br />

2 log.b2 /<br />

1<br />

log jV j<br />

2<br />

1<br />

2 .y Xbˇ/ 0 V 1 .y Xbˇ/=b 2 n<br />

log.2 /<br />

2<br />

l.b<br />

.U // D n<br />

2 log.b2 .U / /<br />

1<br />

log jV j<br />

2<br />

1<br />

2 .y Xbˇ .U // 0 V 1 .y Xbˇ .U //=b 2<br />

.U /<br />

n<br />

log.2 /<br />

2<br />

Notice that l.b .U // evaluates the log likelihood for n data points at the reduced-data estimates. It is<br />

not the log likelihood obtained by fitting the model to the reduced data. <strong>The</strong> likelihood distance is<br />

then<br />

LD .U / D n log<br />

(<br />

2 b.U /<br />

b2 )<br />

n C y Xbˇ .U /<br />

0<br />

V 1 y Xbˇ .U / =b 2<br />

.U /<br />

Expressions for RLD .U / in noniterative influence analysis are derived along the same lines.


Default Output<br />

Default Output ✦ 3989<br />

<strong>The</strong> following sections describe the output PROC <strong>MIXED</strong> produces by default. This output is<br />

organized into various tables, and they are discussed in order of appearance.<br />

Model Information<br />

<strong>The</strong> “Model Information” table describes the model, some of the variables it involves, and the<br />

method used in fitting it. It also lists the method (profile, factor, parameter, or none) for handling<br />

the residual variance in the model. <strong>The</strong> profile method concentrates the residual variance out of the<br />

optimization problem, whereas the parameter method retains it as a parameter in the optimization.<br />

<strong>The</strong> factor method keeps the residual fixed, and none is displayed when a residual variance is not<br />

part of the model.<br />

<strong>The</strong> “Model Information” table also has a row labeled Fixed Effects SE Method. This row describes<br />

the method used to compute the approximate standard errors for the fixed-effects parameter<br />

estimates and related functions of them. <strong>The</strong> two possibilities for this row are Model-Based, which<br />

is the default method, and Empirical, which results from using the EMPIRICAL option in the PROC<br />

<strong>MIXED</strong> statement.<br />

For ODS purposes, the name of the “Model Information” table is “ModelInfo.”<br />

Class Level Information<br />

<strong>The</strong> “Class Level Information” table lists the levels of every variable specified in the CLASS statement.<br />

You should check this information to make sure the data are correct. You can adjust the order<br />

of the CLASS variable levels with the ORDER= option in the PROC <strong>MIXED</strong> statement. For ODS<br />

purposes, the name of the “Class Level Information” table is “ClassLevels.”<br />

Dimensions<br />

<strong>The</strong> “Dimensions” table lists the sizes of relevant matrices. This table can be useful in determining<br />

CPU time and memory requirements. For ODS purposes, the name of the “Dimensions” table is<br />

“Dimensions.”<br />

Number of Observations<br />

<strong>The</strong> “Number of Observations” table shows the number of observations read from the data set and<br />

the number of observations used in fitting the model.


3990 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Iteration History<br />

<strong>The</strong> “Iteration History” table describes the optimization of the residual log likelihood or log likelihood.<br />

<strong>The</strong> function to be minimized (the objective function) is 2l for ML and 2lR for REML;<br />

the column name of the objective function in the “Iteration History” table is “-2 Log Like” for<br />

ML and “-2 Res Log Like” for REML. <strong>The</strong> minimization is performed by using a ridge-stabilized<br />

Newton-Raphson algorithm, and the rows of this table describe the iterations that this algorithm<br />

takes in order to minimize the objective function.<br />

<strong>The</strong> Evaluations column of the “Iteration History” table tells how many times the objective function<br />

is evaluated during each iteration.<br />

<strong>The</strong> Criterion column of the “Iteration History” table is, by default, a relative Hessian convergence<br />

quantity given by<br />

g0 1 H k k gk jfkj where f k is the value of the objective function at iteration k, g k is the gradient (first derivative) of<br />

f k, and H k is the Hessian (second derivative) of f k. If H k is singular, then PROC <strong>MIXED</strong> uses the<br />

following relative quantity:<br />

g 0<br />

k g k<br />

jf kj<br />

To prevent the division by jf kj, use the ABSOLUTE option in the PROC <strong>MIXED</strong> statement. To<br />

use a relative function or gradient criterion, use the CONVF or CONVG option, respectively.<br />

<strong>The</strong> Hessian criterion is considered superior to function and gradient criteria because it measures<br />

orthogonality rather than lack of progress (Bates and Watts 1988). Provided the initial estimate is<br />

feasible and the maximum number of iterations is not exceeded, the Newton-Raphson algorithm<br />

is considered to have converged when the criterion is less than the tolerance specified with the<br />

CONVF, CONVG, or CONVH option in the PROC <strong>MIXED</strong> statement. <strong>The</strong> default tolerance is<br />

1E 8. If convergence is not achieved, PROC <strong>MIXED</strong> displays the estimates of the parameters at<br />

the last iteration.<br />

A convergence criterion that is missing indicates that a boundary constraint has been dropped; it is<br />

usually not a cause for concern.<br />

If you specify the ITDETAILS option in the PROC <strong>MIXED</strong> statement, then the covariance parameter<br />

estimates at each iteration are included as additional columns in the “Iteration History” table.<br />

For ODS purposes, the name of the “Iteration History” table is “IterHistory.”<br />

Convergence Status<br />

<strong>The</strong> “Convergence Status” table informs about the status of the iterative estimation process at the<br />

end of the Newton-Raphson optimization. It appears as a message in the listing, and this message<br />

is repeated in the log. <strong>The</strong> ODS object “ConvergenceStatus” also contains several nonprinting<br />

columns that can be helpful in checking the success of the iterative process, in particular during


Default Output ✦ 3991<br />

batch processing or when analyzing BY groups. <strong>The</strong> Status variable takes on the value 0 for a<br />

successful convergence (even if the Hessian matrix might not be positive definite). <strong>The</strong> values 1<br />

and 2 of the Status variable indicate lack of convergence and infeasible initial parameter values,<br />

respectively. <strong>The</strong> variables pdG and pdH can be used to check whether the G and R matrices are<br />

positive definite.<br />

For models that are not fit iteratively, such as models without random effects or when the NOITER<br />

option is in effect, the “Convergence Status” is not produced.<br />

Covariance Parameter Estimates<br />

<strong>The</strong> “Covariance Parameter Estimates” table contains the estimates of the parameters in G and<br />

R (see the section “Estimating Covariance Parameters in the Mixed Model” on page 3968). <strong>The</strong>ir<br />

values are labeled in the table along with Subject and Group information if applicable. <strong>The</strong> estimates<br />

are displayed in the Estimate column and are the results of one of the following estimation methods:<br />

REML, ML, MIVQUE0, SSCP, Type1, Type2, or Type3.<br />

If you specify the RATIO option in the PROC <strong>MIXED</strong> statement, the Ratio column is added to the<br />

table listing the ratio of each parameter estimate to that of the residual variance.<br />

Specifying the COVTEST option in the PROC <strong>MIXED</strong> statement produces the “Std Error,” “Z<br />

Value,” and “Pr Z” columns. <strong>The</strong> “Std Error” column contains the approximate standard errors of<br />

the covariance parameter estimates. <strong>The</strong>se are the square roots of the diagonal elements of the observed<br />

inverse Fisher information matrix, which equals 2H 1 , where H is the Hessian matrix. <strong>The</strong><br />

H matrix consists of the second derivatives of the objective function with respect to the covariance<br />

parameters; see Wolfinger, Tobias, and Sall (1994) for formulas. When you use the SCORING=<br />

option and PROC <strong>MIXED</strong> converges without stopping the scoring algorithm, PROC <strong>MIXED</strong> uses<br />

the expected Hessian matrix to compute the covariance matrix instead of the observed Hessian. <strong>The</strong><br />

observed or expected inverse Fisher information matrix can be viewed as an asymptotic covariance<br />

matrix of the estimates.<br />

<strong>The</strong> “Z Value” column is the estimate divided by its approximate standard error, and the “Pr Z”<br />

column is the one- or two-tailed area of the standard Gaussian density outside of the Z-value. <strong>The</strong><br />

<strong>MIXED</strong> procedure computes one-sided p-values for the residual variance and for covariance parameters<br />

with a lower bound of 0. <strong>The</strong> procedure computes two-sided p-values otherwise. <strong>The</strong>se<br />

statistics constitute Wald tests of the covariance parameters, and they are valid only asymptotically.<br />

CAUTION: Wald tests can be unreliable in small samples.<br />

For ODS purposes, the name of the “Covariance Parameter Estimates” table is “CovParms.”<br />

Fit Statistics<br />

<strong>The</strong> “Fit Statistics” table provides some statistics about the estimated mixed model. Expressions<br />

for the 2 times the log likelihood are provided in the section “Estimating Covariance Parameters<br />

in the Mixed Model” on page 3968. If the log likelihood is an extremely large number, then PROC<br />

<strong>MIXED</strong> has deemed the estimated V matrix to be singular. In this case, all subsequent results should<br />

be viewed with caution.


3992 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

In addition, the “Fit Statistics” table lists three information criteria: AIC, AICC, and BIC, all in<br />

smaller-is-better form. Expressions for these criteria are described under the IC option.<br />

For ODS purposes, the name of the “Model Fitting Information” table is “FitStatistics.”<br />

Null Model Likelihood Ratio Test<br />

If one covariance model is a submodel of another, you can carry out a likelihood ratio test for the<br />

significance of the more general model by computing 2 times the difference between their log<br />

likelihoods. <strong>The</strong>n compare this statistic to the 2 distribution with degrees of freedom equal to the<br />

difference in the number of parameters for the two models.<br />

This test is reported in the “Null Model Likelihood Ratio Test” table to determine whether it is<br />

necessary to model the covariance structure of the data at all. <strong>The</strong> “Chi-Square” value is 2 times<br />

the log likelihood from the null model minus 2 times the log likelihood from the fitted model,<br />

where the null model is the one with only the fixed effects listed in the MODEL statement and<br />

R D 2 I. This statistic has an asymptotic 2 distribution with q 1 degrees of freedom, where q is<br />

the effective number of covariance parameters (those not estimated to be on a boundary constraint).<br />

<strong>The</strong> “Pr > ChiSq” column contains the upper-tail area from this distribution. This p-value can be<br />

used to assess the significance of the model fit.<br />

This test is not produced for cases where the null hypothesis lies on the boundary of the parameter<br />

space, which is typically for variance component models. This is because the standard asymptotic<br />

theory does not apply in this case (Self and Liang 1987, Case 5).<br />

If you specify a PARMS statement, PROC <strong>MIXED</strong> constructs a likelihood ratio test between the<br />

best model from the grid search and the final fitted model and reports the results in the “Parameter<br />

Search” table.<br />

For ODS purposes, the name of the “Null Model Likelihood Ratio Test” table is “LRT.”<br />

Type 3 Tests of Fixed Effects<br />

<strong>The</strong> “Type 3 Tests of Fixed Effects” table contains hypothesis tests for the significance of each of<br />

the fixed effects—that is, those effects you specify in the MODEL statement. By default, PROC<br />

<strong>MIXED</strong> computes these tests by first constructing a Type 3 L matrix (see Chapter 15, “<strong>The</strong> Four<br />

Types of Estimable Functions”) for each effect. This L matrix is then used to compute the following<br />

F statistic:<br />

F D bˇ 0 L 0 ŒL.X 0bV 1 X/ L 0 Lbˇ<br />

r<br />

where r D rank.L.X 0bV 1 X/ L 0 /. A p-value for the test is computed as the tail area beyond this<br />

statistic from an F distribution with NDF and DDF degrees of freedom. <strong>The</strong> numerator degrees<br />

of freedom (NDF) are the row rank of L, and the denominator degrees of freedom are computed<br />

by using one of the methods described under the DDFM= option. Small values of the p-value<br />

(typically less than 0.05 or 0.01) indicate a significant effect.<br />

You can use the HTYPE= option in the MODEL statement to obtain tables of Type 1 (sequential)<br />

tests and Type 2 (adjusted) tests in addition to or instead of the table of Type 3 (partial) tests.


ODS Table Names ✦ 3993<br />

You can use the CHISQ option in the MODEL statement to obtain Wald 2 tests of the fixed<br />

effects. <strong>The</strong>se are carried out by using the numerator of the F statistic and comparing it with the 2<br />

distribution with NDF degrees of freedom. It is more liberal than the F test because it effectively<br />

assumes infinite denominator degrees of freedom.<br />

For ODS purposes, the names of the “Type 1 Tests of Fixed Effects” through the “Type 3 Tests of<br />

Fixed Effects” tables are “Tests1” through “Tests3,” respectively.<br />

ODS Table Names<br />

Each table created by PROC <strong>MIXED</strong> has a name associated with it, and you must use this name to<br />

reference the table when using ODS statements. <strong>The</strong>se names are listed in Table 56.22.<br />

Table 56.22 ODS Tables Produced by PROC <strong>MIXED</strong><br />

Table Name Description Required Statement / Option<br />

AccRates acceptance rates for posterior sampling<br />

PRIOR<br />

AsyCorr asymptotic correlation matrix of<br />

covariance parameters<br />

PROC <strong>MIXED</strong> ASYCORR<br />

AsyCov asymptotic covariance matrix of<br />

covariance parameters<br />

PROC <strong>MIXED</strong> ASYCOV<br />

Base base densities used for posterior<br />

sampling<br />

PRIOR<br />

Bound computed bound for posterior rejection<br />

sampling<br />

PRIOR<br />

CholG Cholesky root of the estimated G<br />

matrix<br />

RANDOM / GC<br />

CholR Cholesky root of blocks of the estimated<br />

R matrix<br />

REPEATED / RC<br />

CholV Cholesky root of blocks of the estimated<br />

V matrix<br />

RANDOM / VC<br />

ClassLevels level information from the CLASS<br />

statement<br />

default output<br />

Coef L matrix coefficients E option in MODEL,<br />

CONTRAST, ESTIMATE,<br />

or LSMEANS<br />

Contrasts results from the CONTRAST<br />

statements<br />

CONTRAST<br />

ConvergenceStatus convergence status default<br />

CorrB approximate correlation matrix of<br />

fixed-effects parameter estimates<br />

MODEL / CORRB<br />

CovB approximate covariance matrix of<br />

fixed-effects parameter estimates<br />

MODEL / COVB<br />

CovParms estimated covariance parameters default output<br />

Diffs differences of LS-means LSMEANS / DIFF (or PDIFF)


3994 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.22 continued<br />

Table Name Description Required Statement / Option<br />

Dimensions dimensions of the model default output<br />

Estimates results from ESTIMATE statements ESTIMATE<br />

FitStatistics fit statistics default<br />

G estimated G matrix RANDOM / G<br />

GCorr correlation matrix from the<br />

estimated G matrix<br />

RANDOM / GCORR<br />

HLM1 Type 1 Hotelling-Lawley-McKeon MODEL / HTYPE=1 and<br />

tests of fixed effects<br />

REPEATED / HLM TYPE=UN<br />

HLM2 Type 2 Hotelling-Lawley-McKeon MODEL / HTYPE=2 and<br />

tests of fixed effects<br />

REPEATED / HLM TYPE=UN<br />

HLM3 Type 3 Hotelling-Lawley-McKeon<br />

tests of fixed effects<br />

REPEATED / HLM TYPE=UN<br />

HLPS1 Type 1 Hotelling-Lawley-Pillai- MODEL / HTYPE=1 and<br />

Samson tests of fixed effects REPEATED / HLPS TYPE=UN<br />

HLPS2 Type 2 Hotelling-Lawley-Pillai- MODEL / HTYPE=1 and<br />

Samson tests of fixed effects REPEATED / HLPS TYPE=UN<br />

HLPS3 Type 3 Hotelling-Lawley-Pillai-<br />

Samson tests of fixed effects<br />

REPEATED / HLPS TYPE=UN<br />

Influence influence diagnostics MODEL / INFLUENCE<br />

InfoCrit information criteria PROC <strong>MIXED</strong> IC<br />

InvCholG inverse Cholesky root of the<br />

estimated G matrix<br />

RANDOM / GCI<br />

InvCholR inverse Cholesky root of blocks of<br />

the estimated R matrix<br />

REPEATED / RCI<br />

InvCholV inverse Cholesky root of blocks of<br />

the estimated V matrix<br />

RANDOM / VCI<br />

InvCovB inverse of approximate covariance<br />

matrix of fixed-effects parameter estimates<br />

MODEL / COVBI<br />

InvG inverse of the estimated G<br />

matrix<br />

RANDOM / GI<br />

InvR inverse of blocks of the estimated R<br />

matrix<br />

REPEATED / RI<br />

InvV inverse of blocks of the estimated V<br />

matrix<br />

RANDOM / VI<br />

IterHistory iteration history default output<br />

LComponents single-degree-of-freedom estimates<br />

corresponding to rows of the L matrix<br />

for fixed effects<br />

MODEL / LCOMPONENTS<br />

LRT likelihood ratio test default output<br />

LSMeans LS-means LSMEANS<br />

MMEq mixed model equations PROC <strong>MIXED</strong> MMEQ<br />

MMEqSol mixed model equations solution PROC <strong>MIXED</strong> MMEQSOL<br />

ModelInfo model information default output


Table 56.22 continued<br />

ODS Table Names ✦ 3995<br />

Table Name Description Required Statement / Option<br />

NObs number of observations read and<br />

used<br />

default output<br />

ParmSearch parameter search values PARMS<br />

Posterior posterior sampling information PRIOR<br />

R blocks of the estimated R matrix REPEATED / R<br />

RCorr correlation matrix from blocks of the<br />

estimated R matrix<br />

REPEATED / RCORR<br />

Search posterior density search table PRIOR / PSEARCH<br />

Slices tests of LS-means slices LSMEANS / SLICE=<br />

SolutionF fixed-effects solution vector MODEL / S<br />

SolutionR random-effects solution vector RANDOM / S<br />

Tests1 Type 1 tests of fixed effects MODEL / HTYPE=1<br />

Tests2 Type 2 tests of fixed effects MODEL / HTYPE=2<br />

Tests3 Type 3 tests of fixed effects default output<br />

Type1 Type 1 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE1<br />

Type2 Type 2 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE2<br />

Type3 Type 3 analysis of variance PROC <strong>MIXED</strong> METHOD=TYPE3<br />

Trans transformation of covariance parameters<br />

PRIOR / PTRANS<br />

V blocks of the estimated V matrix RANDOM / V<br />

VCorr correlation matrix from blocks of the<br />

estimated V matrix<br />

RANDOM / VCORR<br />

In Table 56.22, “Coef” refers to multiple tables produced by the E, E1, E2, or E3 option in the<br />

MODEL statement and the E option in the CONTRAST, ESTIMATE, and LSMEANS statements.<br />

You can create one large data set of these tables with a statement similar to the following:<br />

ods output Coef=c;<br />

To create separate data sets, use the following statement:<br />

ods output Coef(match_all)=c;<br />

Here the resulting data sets are named C, C1, C2, etc. <strong>The</strong> same principles apply to data sets created<br />

from the “R,” “CholR,” “InvCholR,” “RCorr,” “InvR,” “V,” “CholV,” “InvCholV,” “VCorr,” and<br />

“InvV” tables.<br />

In Table 56.22, the following changes have occurred from <strong>SAS</strong> 6. <strong>The</strong> “Predicted,” “PredMeans,”<br />

and “Sample” tables from <strong>SAS</strong> 6 no longer exist and have been replaced by output data sets; see<br />

descriptions of the MODEL statement options OUTPRED= and OUTPREDM= and the PRIOR<br />

statement option OUT= for more details. <strong>The</strong> “ML” and “REML” tables from <strong>SAS</strong> 6 have been<br />

replaced by the “IterHistory” table. <strong>The</strong> “Tests,” “HLM,” and “HLPS” tables from <strong>SAS</strong> 6 have<br />

been renamed “Tests3,” “HLM3,” and “HLPS3,” respectively.<br />

Table 56.23 lists the variable names associated with the data sets created when you use the ODS<br />

OUTPUT option in conjunction with the preceding tables. In Table 56.23, n is used to denote a


3996 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

generic number that depends on the particular data set and model you select, and it can assume a<br />

different value each time it is used (even within the same table). <strong>The</strong> phrase model specific appears<br />

in rows of the affected tables to indicate that columns in these tables depend on the variables you<br />

specify in the model.<br />

CAUTION: <strong>The</strong>re is a danger of name collisions with the variables in the model specific tables in<br />

Table 56.23 and variables in your input data set. You should avoid using input variables with the<br />

same names as the variables in these tables.<br />

Table 56.23 Variable Names for the ODS Tables Produced in PROC <strong>MIXED</strong><br />

Table Name Variables<br />

AsyCorr Row, CovParm, CovP1–CovPn<br />

AsyCov Row, CovParm, CovP1–CovPn<br />

BaseDen Type, Parm1–Parmn<br />

Bound Technique, Converge, Iterations, Evaluations, LogBound, CovP1–<br />

CovPn, TCovP1–TCovPn<br />

CholG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

CholR Index, Row, Col1–Coln<br />

CholV Index, Row, Col1–Coln<br />

ClassLevels Class, Levels, Values<br />

Coef model specific, LMatrix, Effect, Subject, Sub1–Subn, Group,<br />

Group1–Groupn, Row1–Rown<br />

Contrasts Label, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />

CorrB model specific, Effect, Row, Col1–Coln<br />

CovB model specific, Effect, Row, Col1–Coln<br />

CovParms CovParm, Subject, Group, Estimate, StandardError, ZValue,<br />

ProbZ, Alpha, Lower, Upper<br />

Diffs model specific, Effect, Margins, ByLevel, AT variables, Diff, StandardError,<br />

DF, tValue, Tails, Probt, Adjustment, Adjp, Alpha,<br />

Lower, Upper, AdjLow, AdjUpp<br />

Dimensions Descr, Value<br />

Estimates Label, Estimate, StandardError, DF, tValue, Tails, Probt, Alpha,<br />

Lower, Upper<br />

FitStatistics Descr, Value<br />

G model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

GCorr model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

HLM1 Effect, NumDF, DenDF, FValue, ProbF<br />

HLM2 Effect, NumDF, DenDF, FValue, ProbF<br />

HLM3 Effect, NumDF, DenDF, FValue, ProbF<br />

HLPS1 Effect, NumDF, DenDF, FValue, ProbF<br />

HLPS2 Effect, NumDF, DenDF, FValue, ProbF<br />

HLPS3 Effect, NumDF, DenDF, FValue, ProbF


Table 56.23 continued<br />

Table Name Variables<br />

ODS Table Names ✦ 3997<br />

Influence dependent on option modifiers, Effect, Tuple, Obs1–Obsk, Level,<br />

Iter, Index, Predicted, Residual, Leverage, PressRes, PRESS, Student,<br />

RMSE, RStudent, CookD, DFFITS, MDFFITS, CovRatio,<br />

CovTrace, CookDCP, MDFFITSCP, CovRatioCP, CovTraceCP,<br />

LD, RLD, Parm1–Parmp, CovP1–CovPq, Notes<br />

InfoCrit Neg2LogLike, Parms, AIC, AICC, HQIC, BIC, CAIC<br />

InvCholG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

InvCholR Index, Row, Col1–Coln<br />

InvCholV Index, Row, Col1–Coln<br />

InvCovB model specific, Effect, Row, Col1–Coln<br />

InvG model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

InvR Index, Row, Col1–Coln<br />

InvV Index, Row, Col1–Coln<br />

IterHistory CovP1–CovPn, Iteration, Evaluations, M2ResLogLike,<br />

M2LogLike, Criterion<br />

LComponents Effect, TestType, LIndex, Estimate, StdErr, DF, tValue, Probt<br />

LRT DF, ChiSquare, ProbChiSq<br />

LSMeans model specific, Effect, Margins, ByLevel, AT variables, Estimate,<br />

StandardError, DF, tValue, Probt, Alpha, Lower, Upper, Cov1–<br />

Covn, Corr1–Corrn<br />

MMEq model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

MMEqSol model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Row, Col1–Coln<br />

ModelInfo Descr, Value<br />

Nobs Label, N, NObsRead, NObsUsed, SumFreqsRead, SumFreqsUsed<br />

ParmSearch CovP1–CovPn, Var, ResLogLike, M2ResLogLike2, LogLike,<br />

M2LogLike, LogDetH<br />

Posterior Descr, Value<br />

R Index, Row, Col1–Coln<br />

RCorr Index, Row, Col1–Coln<br />

Search Parm, TCovP1–TCovPn, Posterior<br />

Slices model specific, Effect, Margins, ByLevel, AT variables, NumDF,<br />

DenDF, FValue, ProbF<br />

SolutionF model specific, Effect, Estimate, StandardError, DF, tValue, Probt,<br />

Alpha, Lower, Upper<br />

SolutionR model specific, Effect, Subject, Sub1–Subn, Group, Group1–<br />

Groupn, Estimate, StdErrPred, DF, tValue, Probt, Alpha, Lower,<br />

Upper<br />

Tests1 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />

Tests2 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />

Tests3 Effect, NumDF, DenDF, ChiSquare, FValue, ProbChiSq, ProbF<br />

Type1 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF


3998 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Table 56.23 continued<br />

Table Name Variables<br />

Type2 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF<br />

Type3 Source, DF, SS, MS, EMS, ErrorTerm, ErrorDF, FValue, ProbF<br />

Trans Prior, TCovP, CovP1–CovPn<br />

V Index, Row, Col1–Coln<br />

VCorr Index, Row, Col1–Coln<br />

Some of the variables listed in Table 56.23 are created only when you specify certain options in the<br />

relevant PROC <strong>MIXED</strong> statements.<br />

ODS Graphics<br />

This section describes the use of ODS for creating diagnostic plots with the <strong>MIXED</strong> procedure.<br />

To request these graphs you must specify the ODS GRAPHICS statement and the relevant options<br />

of the PROC <strong>MIXED</strong> or MODEL statement (Table 56.24). For more information about the ODS<br />

GRAPHICS statement, see Chapter 21, “Statistical Graphics Using ODS.” ODS names of the various<br />

graphics are given in the section “ODS Graph Names” on page 4002.<br />

Residual Plots<br />

<strong>The</strong> <strong>MIXED</strong> procedure can generate panels of residual diagnostics. Each panel consists of a plot<br />

of residuals versus predicted values, a histogram with normal density overlaid, a Q-Q plot, and<br />

summary residual and fit statistics (Figure 56.15). <strong>The</strong> plots are produced even if the OUTP= and<br />

OUTPM= options in the MODEL statement are not specified. Residual panels can be generated for<br />

marginal and conditional raw, studentized, and Pearson residuals as well as for scaled residuals (see<br />

the section “Residual Diagnostics” on page 3980).<br />

Recall the example in the section “Getting Started: <strong>MIXED</strong> <strong>Procedure</strong>” on page 3890. <strong>The</strong> following<br />

statements generate several 2 2 panels of residual graphs:<br />

ods graphics on;<br />

proc mixed data=heights;<br />

class Family Gender;<br />

model Height = Gender / residual;<br />

random Family Family*Gender;<br />

run;<br />

ods graphics off;


ODS Graphics ✦ 3999<br />

<strong>The</strong> graphical displays are requested by specifying the ODS GRAPHICS statement. <strong>The</strong> panel<br />

of the studentized marginal residuals is shown in Figure 56.15, and the panel of the studentized<br />

conditional residuals is shown in Figure 56.16.<br />

Figure 56.15 Panel of the Studentized (Marginal) Residuals<br />

Since the fixed-effects part of the model comprises only an intercept and the gender effect, the<br />

marginal mean takes on only two values, one for each gender. <strong>The</strong> “Residual Statistics” inset in the<br />

lower-right corner provides descriptive statistics for the set of residuals that is displayed. Note that<br />

residuals in a mixed model do not necessarily sum to zero, even if the model contains an intercept.


4000 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Figure 56.16 Panel of the Conditional Studentized Residuals<br />

Influence Plots<br />

<strong>The</strong> graphical features of the <strong>MIXED</strong> procedure enable you to generate plots of influence diagnostics<br />

and of deletion estimates. <strong>The</strong> type and number of plots produced depend on your modifiers of<br />

the INFLUENCE option in the MODEL statement and on the PLOTS= option in the PROC <strong>MIXED</strong><br />

statement. Plots related to covariance parameters are produced only when diagnostics are computed<br />

by iterative methods (ITER=). <strong>The</strong> estimates of the fixed effects—and covariance parameters when<br />

updates are iterative—are plotted when you specify the ESTIMATES modifier or when you request<br />

PLOTS=INFLUENCEESTPLOT.<br />

Two basic types of influence panels are shown in Figure 56.17 and Figure 56.18. <strong>The</strong> diagnostics<br />

panel shows Cook’s D and CovRatio statistics for the fixed effects and the covariance parameters.<br />

For the <strong>SAS</strong> statements that produce these influence panels, see Example 56.8. In this example, the<br />

impact of subjects (Person) on the analysis is assessed. <strong>The</strong> Cook’s D statistic measures a subject’s<br />

impact on the estimates, and the CovRatio statistic measures a subject’s impact on the precision of<br />

the estimates. Separate statistics are computed for the fixed effects and the covariance parameters.<br />

<strong>The</strong> CovRatio statistic has a threshold of 1.0. Values larger than 1.0 indicate that precision of the<br />

estimates is lost by exclusion of the observations in question. Values smaller than 1.0 indicate that


ODS Graphics ✦ 4001<br />

precision is gained by exclusion of the observations from the analysis. For example, it is evident<br />

from Output 56.17 that person 20 has considerable impact on the covariance parameter estimates<br />

and moderate influence on the fixed-effects estimates. Furthermore, exclusion of this subject from<br />

the analysis increases the precision of the covariance parameters, whereas the effect on the precision<br />

of the fixed effects is minor.<br />

Output 56.18 shows another type of influence plot, a panel of the deletion estimates. Each plot<br />

within the panel corresponds to one of the model parameters. A reference line is drawn at the<br />

estimate based on the full data.<br />

Figure 56.17 Influence Diagnostics


4002 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Figure 56.18 Deletion Estimates<br />

ODS Graph Names<br />

To request graphics with PROC <strong>MIXED</strong>, you must first enable ODS Graphics by specifying the ODS<br />

GRAPHICS ON statement. See Chapter 21, “Statistical Graphics Using ODS,” for more information.<br />

Some graphs are produced by default; other graphs are produced by using statements and options.<br />

You can reference every graph produced through ODS Graphics with a name. <strong>The</strong> names of the<br />

graphs that PROC <strong>MIXED</strong> generates are listed in Table 56.24, along with the required statements<br />

and options.<br />

Table 56.24 ODS Graphics Produced by PROC <strong>MIXED</strong><br />

ODS Graph Name Plot Description Statement or Option<br />

Boxplot Box plots PLOTS=BOXPLOT<br />

CovRatioPlot CovRatio statistics for fixed<br />

effects or covariance parame-<br />

ters<br />

CooksDPlot Cook’s D for fixed effects or<br />

covariance parameters<br />

PLOTS=INFLUENCE<strong>STAT</strong>PANEL(UNPACK)<br />

and MODEL / INFLUENCE<br />

PLOTS=INFLUENCE<strong>STAT</strong>PANEL(UNPACK)<br />

and MODEL / INFLUENCE


Table 56.24 continued<br />

ODS Graph Name Plot Description Statement or Option<br />

ODS Graphics ✦ 4003<br />

DistancePlot Likelihood or restricted likelihood<br />

distance<br />

MODEL / INFLUENCE<br />

InfluenceEstPlot Panel of deletion estimates MODEL / INFLUENCE(EST)<br />

or PLOTS=INFLUENCEESTPLOT and<br />

MODEL / INFLUENCE<br />

InfluenceEstPlot Parameter estimates after removing<br />

observation or sets of<br />

observations<br />

PLOTS=INFLUENCEESTPLOT(UNPACK)<br />

and MODEL / INFLUENCE<br />

InfluenceStatPanel Panel of influence statistics MODEL / INFLUENCE<br />

PearsonBoxPlot Box plot of Pearson residuals PLOTS=PEARSONPANEL(UNPACK BOX)<br />

PearsonByPredicted Pearson residuals vs.<br />

dictedpre-<br />

PLOTS=PEARSONPANEL(UNPACK)<br />

PearsonHistogram Histogram of Pearson residuals<br />

PLOTS=PEARSONPANEL(UNPACK)<br />

PearsonPanel Panel of Pearson residuals MODEL / RESIDUAL<br />

PearsonQQplot Q-Q plot of Pearson residuals PLOTS=PEARSONPANEL(UNPACK)<br />

PressPlot Plot of PRESS residuals or<br />

PRESS statistic<br />

PLOTS=PRESS and MODEL / INFLUENCE<br />

ResidualBoxplot Box plot of (raw) residuals PLOTS=RESIDUALPANEL(UNPACK BOX)<br />

ResidualByPredicted Residuals vs. predicted PLOTS=RESIDUALPANEL(UNPACK)<br />

ResidualHistogram Histogram of raw residuals PLOTS=RESIDUALPANEL(UNPACK)<br />

ResidualPanel Panel of (raw) residuals MODEL / RESIDUAL<br />

ResidualQQplot Q-Q plot of raw residuals PLOTS=RESIDUALPANEL(UNPACK)<br />

ScaledBoxplot Box plot of scaled residuals PLOTS=VCIRYPANEL(UNPACK BOX)<br />

ScaledByPredicted Scaled residuals vs. predicted PLOTS=VCIRYPANEL(UNPACK)<br />

ScaledHistogram Histogram of scaled residuals PLOTS=VCIRYPANEL(UNPACK)<br />

ScaledQQplot Q-Q plot of scaled residuals PLOTS=VCIRYPANEL(UNPACK)<br />

StudentBoxplot Box plot of studentized residuals<br />

PLOTS=STUDENTPANEL(UNPACK BOX)<br />

StudentByPredicted Studentized residuals vs. predicted<br />

PLOTS=STUDENTPANEL(UNPACK)<br />

StudentHistogram Histogram<br />

residuals<br />

of studentized PLOTS=STUDENTPANEL(UNPACK)<br />

StudentPanel Panel of studentized residuals MODEL / RESIDUAL<br />

StudentQQplot Q-Q plot of studentized residuals<br />

PLOTS=STUDENTPANEL(UNPACK)<br />

VCIRYPanel Panel of scaled residuals MODEL / VCIRY


4004 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Computational Issues<br />

Computational Method<br />

In addition to numerous matrix-multiplication routines, PROC <strong>MIXED</strong> frequently uses the sweep<br />

operator (Goodnight 1979) and the Cholesky root (Golub and Van Loan 1989). <strong>The</strong> routines perform<br />

a modified W transformation (Goodnight and Hemmerle 1979) for G-side likelihood calculations<br />

and a direct method for R-side likelihood calculations. For the Type 3 F tests, PROC <strong>MIXED</strong><br />

uses the algorithm described in Chapter 39, “<strong>The</strong> GLM <strong>Procedure</strong>.”<br />

PROC <strong>MIXED</strong> uses a ridge-stabilized Newton-Raphson algorithm to optimize either a full (ML)<br />

or residual (REML) likelihood function. <strong>The</strong> Newton-Raphson algorithm is preferred to the EM<br />

algorithm (Lindstrom and Bates 1988). PROC <strong>MIXED</strong> profiles the likelihood with respect to the<br />

fixed effects and also with respect to the residual variance whenever it appears reasonable to do<br />

so. <strong>The</strong> residual profiling can be avoided by using the NOPROFILE option of the PROC <strong>MIXED</strong><br />

statement. PROC <strong>MIXED</strong> uses the MIVQUE0 method (Rao 1972; Giesbrecht 1989) to compute<br />

initial values.<br />

<strong>The</strong> likelihoods that PROC <strong>MIXED</strong> optimizes are usually well-defined continuous functions with a<br />

single optimum. <strong>The</strong> Newton-Raphson algorithm typically performs well and finds the optimum in<br />

a few iterations. It is a quadratically converging algorithm, meaning that the error of the approximation<br />

near the optimum is squared at each iteration. <strong>The</strong> quadratic convergence property is evident<br />

when the convergence criterion drops to zero by factors of 10 or more.<br />

Table 56.25 Notation for Order Calculations<br />

Symbol Number<br />

p columns of X<br />

g columns of Z<br />

N observations<br />

q covariance parameters<br />

t maximum observations per subject<br />

S subjects<br />

Using the notation from Table 56.25, the following are estimates of the computational speed of the<br />

algorithms used in PROC <strong>MIXED</strong>. For likelihood calculations, the crossproducts matrix construction<br />

is of order N.p C g/ 2 and the sweep operations are of order .p C g/ 3 . <strong>The</strong> first derivative<br />

calculations for parameters in G are of order qg 3 for ML and q.g 3 Cpg 2 Cp 2 g/ for REML. If you<br />

specify a subject effect in the RANDOM statement and if you are not using the REPEATED statement,<br />

then replace g with g=S and q with qS in these calculations. <strong>The</strong> first derivative calculations<br />

for parameters in R are of order qS.t 3 C gt 2 C g 2 t/ for ML and qS.t 3 C .p C g/t 2 C .p 2 C g 2 /t/<br />

for REML. For the second derivatives, replace q with q.q C 1/=2 in the first derivative expressions.<br />

When you specify both G- and R-side parameters (that is, when you use both the RANDOM and<br />

REPEATED statements), then additional calculations are required of an order equal to the sum of<br />

the orders for G and R. Considerable execution times can result in this case.


Computational Issues ✦ 4005<br />

For further details about the computational techniques used in PROC <strong>MIXED</strong>, see Wolfinger, Tobias,<br />

and Sall (1994).<br />

Parameter Constraints<br />

By default, some covariance parameters are assumed to satisfy certain boundary constraints during<br />

the Newton-Raphson algorithm. For example, variance components are constrained to be nonnegative,<br />

and autoregressive parameters are constrained to be between 1 and 1. You can remove these<br />

constraints with the NOBOUND option in the PARMS statement (or with the NOBOUND option<br />

in the PROC <strong>MIXED</strong> statement), but this can lead to estimates that produce an infinite likelihood.<br />

You can also introduce or change boundary constraints with the LOWERB= and UPPERB= options<br />

in the PARMS statement.<br />

During the Newton-Raphson algorithm, a parameter might be set equal to one of its boundary<br />

constraints for a few iterations and then it might move away from the boundary. You see a missing<br />

value in the Criterion column of the “Iteration History” table whenever a boundary constraint is<br />

dropped.<br />

For some data sets the final estimate of a parameter might equal one of its boundary constraints.<br />

This is usually not a cause for concern, but it might lead you to consider a different model. For<br />

instance, a variance component estimate can equal zero; in this case, you might want to drop the<br />

corresponding random effect from the model. However, be aware that changing the model in this<br />

fashion can affect degrees-of-freedom calculations.<br />

Convergence Problems<br />

For some data sets, the Newton-Raphson algorithm can fail to converge. Nonconvergence can result<br />

from a number of causes, including flat or ridged likelihood surfaces and ill-conditioned data.<br />

It is also possible for PROC <strong>MIXED</strong> to converge to a point that is not the global optimum of the<br />

likelihood, although this usually occurs only with the spatial covariance structures.<br />

If you experience convergence problems, the following points might be helpful:<br />

One useful tool is the PARMS statement, which lets you input initial values for the covariance<br />

parameters and performs a grid search over the likelihood surface.<br />

Sometimes the Newton-Raphson algorithm does not perform well when two of the covariance<br />

parameters are on a different scale—that is, when they are several orders of magnitude apart.<br />

This is because the Hessian matrix is processed jointly for the two parameters, and elements<br />

of it corresponding to one of the parameters can become close to internal tolerances in PROC<br />

<strong>MIXED</strong>. In this case, you can improve stability by rescaling the effects in the model so that<br />

the covariance parameters are on the same scale.<br />

Data that are extremely large or extremely small can adversely affect results because of the<br />

internal tolerances in PROC <strong>MIXED</strong>. Rescaling it can improve stability.<br />

For stubborn problems, you might want to specify ODS OUTPUT COVPARMS=data-setname<br />

to output the “Covariance Parameter Estimates” table as a precautionary measure. That


4006 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

way, if the problem does not converge, you can read the final parameter values back into a<br />

new run with the PARMSDATA= option in the PARMS statement.<br />

Fisher scoring can be more robust than Newton-Raphson with poor MIVQUE(0) starting<br />

values. Specifying a SCORING= value of 5 or so might help to recover from poor starting<br />

values.<br />

Tuning the singularity options SINGULAR=, SINGCHOL=, and SINGRES= in the MODEL<br />

statement can improve the stability of the optimization process.<br />

Tuning the MAXITER= and MAXFUNC= options in the PROC <strong>MIXED</strong> statement can save<br />

resources. Also, the ITDETAILS option displays the values of all the parameters at each<br />

iteration.<br />

Using the NOPROFILE and NOBOUND options in the PROC <strong>MIXED</strong> statement might help<br />

convergence, although they can produce unusual results.<br />

Although the CONVH convergence criterion usually gives the best results, you might want<br />

to try CONVF or CONVG, possibly along with the ABSOLUTE option.<br />

If the convergence criterion reaches a relatively small value such as 1E 7 but never gets<br />

lower than 1E 8, you might want to specify CONVH=1E 6 in the PROC <strong>MIXED</strong> statement<br />

to get results; however, interpret the results with caution.<br />

An infinite likelihood during the iteration process means that the Newton-Raphson algorithm<br />

has stepped into a region where either the R or V matrix is nonpositive definite. This is<br />

usually no cause for concern as long as iterations continue. If PROC <strong>MIXED</strong> stops because<br />

of an infinite likelihood, recheck your model to make sure that no observations from the same<br />

subject are producing identical rows in R or V and that you have enough data to estimate the<br />

particular covariance structure you have selected. Any time that the final estimated likelihood<br />

is infinite, subsequent results should be interpreted with caution.<br />

A nonpositive definite Hessian matrix can indicate a surface saddlepoint or linear dependencies<br />

among the parameters.<br />

A warning message about the singularities of X changing indicates that there is some linear<br />

dependency in the estimate of X 0bV 1 X that is not found in X 0 X. This can adversely affect<br />

the likelihood calculations and optimization process. If you encounter this problem, make<br />

sure that your model specification is reasonable and that you have enough data to estimate<br />

the particular covariance structure you have selected. Rearranging effects in the MODEL<br />

statement so that the most significant ones are first can help, because PROC <strong>MIXED</strong> sweeps<br />

the estimate of X 0 V 1 X in the order of the MODEL effects and the sweep is more stable<br />

if larger pivots are dealt with first. If this does not help, specifying starting values with the<br />

PARMS statement can place the optimization on a different and possibly more stable path.<br />

Lack of convergence can indicate model misspecification or a violation of the normality assumption.


Memory<br />

Computational Issues ✦ 4007<br />

Let p be the number of columns in X, and let g be the number of columns in Z. For large models,<br />

most of the memory resources are required for holding symmetric matrices of order p, g, and pCg.<br />

<strong>The</strong> approximate memory requirement in bytes is<br />

40.p 2 C g 2 / C 32.p C g/ 2<br />

If you have a large model that exceeds the memory capacity of your computer, see the suggestions<br />

listed under “Computing Time.”<br />

Computing Time<br />

PROC <strong>MIXED</strong> is computationally intensive, and execution times can be long. In addition to the<br />

CPU time used in collecting sums and crossproducts and in solving the mixed model equations (as<br />

in PROC GLM), considerable CPU time is often required to compute the likelihood function and<br />

its derivatives. <strong>The</strong>se latter computations are performed for every Newton-Raphson iteration.<br />

If you have a model that takes too long to run, the following suggestions can be helpful:<br />

Examine the “Model Information” table to find out the number of columns in the X and Z<br />

matrices. A large number of columns in either matrix can greatly increase computing time.<br />

You might want to eliminate some higher-order effects if they are too large.<br />

If you have a Z matrix with a lot of columns, use the DDFM=BW option in the MODEL<br />

statement to eliminate the time required for the containment method.<br />

If possible, “factor out” a common effect from the effects in the RANDOM statement and<br />

make it the SUBJECT= effect. This creates a block-diagonal G matrix and can often speed<br />

calculations.<br />

If possible, use the same or nested SUBJECT= effects in all RANDOM and REPEATED<br />

statements.<br />

If your data set is very large, you might want to analyze it in pieces. <strong>The</strong> BY statement can<br />

help implement this strategy.<br />

In general, specify random effects with a lot of levels in the REPEATED statement and those<br />

with a few levels in the RANDOM statement.<br />

<strong>The</strong> METHOD=MIVQUE0 option runs faster than either the METHOD=REML or<br />

METHOD=ML option because it is noniterative.<br />

You can specify known values for the covariance parameters by using the HOLD= or<br />

NOITER option in the PARMS statement or the GDATA= option in the RANDOM statement.<br />

This eliminates the need for iteration.<br />

<strong>The</strong> LOGNOTE option in the PROC <strong>MIXED</strong> statement writes periodic messages to the <strong>SAS</strong><br />

log concerning the status of the calculations. It can help you diagnose where the slowdown is<br />

occurring.


4008 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Examples: Mixed <strong>Procedure</strong><br />

<strong>The</strong> following are basic examples of the use of PROC <strong>MIXED</strong>. More examples and details can be<br />

found in Littell et al. (2006), Wolfinger (1997), Verbeke and Molenberghs (1997, 2000), Murray<br />

(1998), Singer (1998), Sullivan, Dukes, and Losina (1999), and Brown and Prescott (1999).<br />

Example 56.1: Split-Plot Design<br />

PROC <strong>MIXED</strong> can fit a variety of mixed models. One of the most common mixed models is the<br />

split-plot design. <strong>The</strong> split-plot design involves two experimental factors, A and B. Levels of A are<br />

randomly assigned to whole plots (main plots), and levels of B are randomly assigned to split plots<br />

(subplots) within each whole plot. <strong>The</strong> design provides more precise information about B than about<br />

A, and it often arises when A can be applied only to large experimental units. An example is where<br />

A represents irrigation levels for large plots of land and B represents different crop varieties planted<br />

in each large plot.<br />

Consider the following data from Stroup (1989a), which arise from a balanced split-plot design with<br />

the whole plots arranged in a randomized complete-block design. <strong>The</strong> variable A is the whole-plot<br />

factor, and the variable B is the subplot factor. A traditional analysis of these data involves the<br />

construction of the whole-plot error (A*Block) to test A and the pooled residual error (B*Block and<br />

A*B*Block) to test B and A*B. To carry out this analysis with PROC GLM, you must use a TEST<br />

statement to obtain the correct F test for A.<br />

Performing a mixed model analysis with PROC <strong>MIXED</strong> eliminates the need for the error term<br />

construction. PROC <strong>MIXED</strong> estimates variance components for Block, A*Block, and the residual,<br />

and it automatically incorporates the correct error terms into test statistics.<br />

<strong>The</strong> following statements create a DATA set for a split-plot design with four blocks, three whole-plot<br />

levels, and two subplot levels:<br />

data sp;<br />

input Block A B Y @@;<br />

datalines;<br />

1 1 1 56 1 1 2 41<br />

1 2 1 50 1 2 2 36<br />

1 3 1 39 1 3 2 35<br />

2 1 1 30 2 1 2 25<br />

2 2 1 36 2 2 2 28<br />

2 3 1 33 2 3 2 30<br />

3 1 1 32 3 1 2 24<br />

3 2 1 31 3 2 2 27<br />

3 3 1 15 3 3 2 19<br />

4 1 1 30 4 1 2 25<br />

4 2 1 35 4 2 2 30<br />

4 3 1 17 4 3 2 18<br />

;


<strong>The</strong> following statements fit the split-plot model assuming random block effects:<br />

proc mixed;<br />

class A B Block;<br />

model Y = A B A*B;<br />

random Block A*Block;<br />

run;<br />

Example 56.1: Split-Plot Design ✦ 4009<br />

<strong>The</strong> variables A, B, and Block are listed as classification variables in the CLASS statement. <strong>The</strong><br />

columns of model matrix X consist of indicator variables corresponding to the levels of the fixed<br />

effects A, B, and A*B listed on the right side of the MODEL statement. <strong>The</strong> dependent variable Y is<br />

listed on the left side of the MODEL statement.<br />

<strong>The</strong> columns of the model matrix Z consist of indicator variables corresponding to the levels of the<br />

random effects Block and A*Block. <strong>The</strong> G matrix is diagonal and contains the variance components<br />

of Block and A*Block. <strong>The</strong> R matrix is also diagonal and contains the residual variance.<br />

<strong>The</strong> <strong>SAS</strong> statements produce Output 56.1.1–Output 56.1.8.<br />

<strong>The</strong> “Model Information” table in Output 56.1.1 lists basic information about the split-plot model.<br />

REML is used to estimate the variance components, and the residual variance is profiled from the<br />

optimization.<br />

Output 56.1.1 Results for Split-Plot Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.SP<br />

Dependent Variable Y<br />

Covariance Structure Variance Components<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

<strong>The</strong> “Class Level Information” table in Output 56.1.2 lists the levels of all variables specified in the<br />

CLASS statement. You can check this table to make sure that the data are correct.<br />

Output 56.1.2 Split-Plot Example (continued)<br />

Class Level Information<br />

Class Levels Values<br />

A 3 1 2 3<br />

B 2 1 2<br />

Block 4 1 2 3 4<br />

<strong>The</strong> “Dimensions” table in Output 56.1.3 lists the magnitudes of various vectors and matrices. <strong>The</strong><br />

X matrix is seen to be 24 12, and the Z matrix is 24 16.


4010 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.1.3 Split-Plot Example (continued)<br />

Dimensions<br />

Covariance Parameters 3<br />

Columns in X 12<br />

Columns in Z 16<br />

Subjects 1<br />

Max Obs Per Subject 24<br />

<strong>The</strong> “Number of Observations” table in Output 56.1.4 shows that all observations read from the data<br />

set are used in the analysis.<br />

Output 56.1.4 Split-Plot Example (continued)<br />

Number of Observations<br />

Number of Observations Read 24<br />

Number of Observations Used 24<br />

Number of Observations Not Used 0<br />

PROC <strong>MIXED</strong> estimates the variance components for Block, A*Block, and the residual by REML.<br />

<strong>The</strong> REML estimates are the values that maximize the likelihood of a set of linearly independent<br />

error contrasts, and they provide a correction for the downward bias found in the usual maximum<br />

likelihood estimates. <strong>The</strong> objective function is 2 times the logarithm of the restricted likelihood,<br />

and PROC <strong>MIXED</strong> minimizes this objective function to obtain the estimates.<br />

<strong>The</strong> minimization method is the Newton-Raphson algorithm, which uses the first and second derivatives<br />

of the objective function to iteratively find its minimum. <strong>The</strong> “Iteration History” table in<br />

Output 56.1.5 records the steps of that optimization process. For this example, only one iteration<br />

is required to obtain the estimates. <strong>The</strong> Evaluations column reveals that the restricted likelihood<br />

is evaluated once for each of the iterations. A criterion of 0 indicates that the Newton-Raphson<br />

algorithm has converged.<br />

Output 56.1.5 Split-Plot Analysis (continued)<br />

Iteration History<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

0 1 139.81461222<br />

1 1 119.76184570 0.00000000<br />

Convergence criteria met.<br />

<strong>The</strong> REML estimates for the variance components of Block, A*Block, and the residual are 62.40,<br />

15.38, and 9.36, respectively, as listed in the Estimate column of the “Covariance Parameter Estimates”<br />

table in Output 56.1.6.


Output 56.1.6 Split-Plot Analysis (continued)<br />

Covariance Parameter<br />

Estimates<br />

Cov Parm Estimate<br />

Block 62.3958<br />

A*Block 15.3819<br />

Residual 9.3611<br />

Example 56.1: Split-Plot Design ✦ 4011<br />

<strong>The</strong> “Fit Statistics” table in Output 56.1.7 lists several pieces of information about the fitted mixed<br />

model, including the residual log likelihood. <strong>The</strong> Akaike (AIC) and Bayesian (BIC) information<br />

criteria can be used to compare different models; the ones with smaller values are preferred. <strong>The</strong><br />

AICC information criteria is a small-sample bias-adjusted form of the Akaike criterion (Hurvich<br />

and Tsai 1989).<br />

Output 56.1.7 Split-Plot Analysis (continued)<br />

Fit Statistics<br />

-2 Res Log Likelihood 119.8<br />

AIC (smaller is better) 125.8<br />

AICC (smaller is better) 127.5<br />

BIC (smaller is better) 123.9<br />

Finally, the fixed effects are tested by using Type 3 estimable functions (Output 56.1.8).<br />

Output 56.1.8 Split-Plot Analysis (continued)<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

A 2 6 4.07 0.0764<br />

B 1 9 19.39 0.0017<br />

A*B 2 9 4.02 0.0566<br />

<strong>The</strong> tests match the one obtained from the following PROC GLM statements:<br />

proc glm data=sp;<br />

class A B Block;<br />

model Y = A B A*B Block A*Block;<br />

test h=A e=A*Block;<br />

run;<br />

You can continue this analysis by producing solutions for the fixed and random effects and then<br />

testing various linear combinations of them by using the CONTRAST and ESTIMATE statements.<br />

If you use the same CONTRAST and ESTIMATE statements with PROC GLM, the test statistics


4012 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

correspond to the fixed-effects-only model. <strong>The</strong> test statistics from PROC <strong>MIXED</strong> incorporate the<br />

random effects.<br />

<strong>The</strong> various “inference space” contrasts given by Stroup (1989a) can be implemented via the<br />

ESTIMATE statement. Consider the following examples:<br />

proc mixed data=sp;<br />

class A B Block;<br />

model Y = A B A*B;<br />

random Block A*Block;<br />

estimate ’a1 mean narrow’<br />

intercept 1 A 1 B .5 .5 A*B .5 .5 |<br />

Block .25 .25 .25 .25<br />

A*Block .25 .25 .25 .25 0 0 0 0 0 0 0 0;<br />

estimate ’a1 mean intermed’<br />

intercept 1 A 1 B .5 .5 A*B .5 .5 |<br />

Block .25 .25 .25 .25;<br />

estimate ’a1 mean broad’<br />

intercept 1 a 1 b .5 .5 A*B .5 .5;<br />

run;<br />

<strong>The</strong>se statements result in Output 56.1.9.<br />

Output 56.1.9 Inference Space Results<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Estimates<br />

Standard<br />

Label Estimate Error DF t Value Pr > |t|<br />

a1 mean narrow 32.8750 1.0817 9 30.39


proc mixed;<br />

class A B Block;<br />

model Y = A B A*B;<br />

random Block A*Block;<br />

run;<br />

An equivalent way of specifying this model is as follows:<br />

proc mixed data=sp;<br />

class A B Block;<br />

model Y = A B A*B;<br />

random intercept A / subject=Block;<br />

run;<br />

Example 56.2: Repeated Measures ✦ 4013<br />

In general, if all of the effects in the RANDOM statement can be nested within one effect, you<br />

can specify that one effect by using the SUBJECT= option. <strong>The</strong> subject effect is, in a sense, “factored<br />

out” of the random effects. <strong>The</strong> specification that uses the SUBJECT= effect can result in<br />

faster execution times for large problems because PROC <strong>MIXED</strong> is able to perform the likelihood<br />

calculations separately for each subject.<br />

Example 56.2: Repeated Measures<br />

<strong>The</strong> following data are from Pothoff and Roy (1964) and consist of growth measurements for 11<br />

girls and 16 boys at ages 8, 10, 12, and 14. Some of the observations are suspect (for example, the<br />

third observation for person 20); however, all of the data are used here for comparison purposes.<br />

<strong>The</strong> analysis strategy employs a linear growth curve model for the boys and girls as well as a<br />

variance-covariance model that incorporates correlations for all of the observations arising from<br />

the same person. <strong>The</strong> data are assumed to be Gaussian, and their likelihood is maximized to estimate<br />

the model parameters. See Jennrich and Schluchter (1986), Louis (1988), Crowder and Hand<br />

(1990), Diggle, Liang, and Zeger (1994), and Everitt (1995) for overviews of this approach to repeated<br />

measures. Jennrich and Schluchter present results for the Pothoff and Roy data from various<br />

covariance structures. <strong>The</strong> PROC <strong>MIXED</strong> statements to fit an unstructured variance matrix (their<br />

Model 2) are as follows:<br />

data pr;<br />

input Person Gender $ y1 y2 y3 y4;<br />

y=y1; Age=8; output;<br />

y=y2; Age=10; output;<br />

y=y3; Age=12; output;<br />

y=y4; Age=14; output;<br />

drop y1-y4;<br />

datalines;<br />

1 F 21.0 20.0 21.5 23.0<br />

2 F 21.0 21.5 24.0 25.5<br />

3 F 20.5 24.0 24.5 26.0<br />

4 F 23.5 24.5 25.0 26.5<br />

5 F 21.5 23.0 22.5 23.5


4014 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

6 F 20.0 21.0 21.0 22.5<br />

7 F 21.5 22.5 23.0 25.0<br />

8 F 23.0 23.0 23.5 24.0<br />

9 F 20.0 21.0 22.0 21.5<br />

10 F 16.5 19.0 19.0 19.5<br />

11 F 24.5 25.0 28.0 28.0<br />

12 M 26.0 25.0 29.0 31.0<br />

13 M 21.5 22.5 23.0 26.5<br />

14 M 23.0 22.5 24.0 27.5<br />

15 M 25.5 27.5 26.5 27.0<br />

16 M 20.0 23.5 22.5 26.0<br />

17 M 24.5 25.5 27.0 28.5<br />

18 M 22.0 22.0 24.5 26.5<br />

19 M 24.0 21.5 24.5 25.5<br />

20 M 23.0 20.5 31.0 26.0<br />

21 M 27.5 28.0 31.0 31.5<br />

22 M 23.0 23.0 23.5 25.0<br />

23 M 21.5 23.5 24.0 28.0<br />

24 M 17.0 24.5 26.0 29.5<br />

25 M 22.5 25.5 25.5 26.0<br />

26 M 23.0 24.5 26.0 30.0<br />

27 M 22.0 21.5 23.5 25.0<br />

;<br />

proc mixed data=pr method=ml covtest;<br />

class Person Gender;<br />

model y = Gender Age Gender*Age / s;<br />

repeated / type=un subject=Person r;<br />

run;<br />

To follow Jennrich and Schluchter, this example uses maximum likelihood (METHOD=ML) instead<br />

of the default REML to estimate the unknown covariance parameters. <strong>The</strong> COVTEST option<br />

requests asymptotic tests of all the covariance parameters.<br />

<strong>The</strong> MODEL statement first lists the dependent variable Y. <strong>The</strong> fixed effects are then listed after the<br />

equal sign. <strong>The</strong> variable Gender requests a different intercept for the girls and boys, Age models<br />

an overall linear growth trend, and Gender*Age makes the slopes different over time. It is actually<br />

not necessary to specify Age separately, but doing so enables PROC <strong>MIXED</strong> to carry out a test for<br />

heterogeneous slopes. <strong>The</strong> S option requests the display of the fixed-effects solution vector.<br />

<strong>The</strong> REPEATED statement contains no effects, taking advantage of the default assumption that the<br />

observations are ordered similarly for each subject. <strong>The</strong> TYPE=UN option requests an unstructured<br />

block for each SUBJECT=Person. <strong>The</strong> R matrix is, therefore, block diagonal with 27 blocks, each<br />

block consisting of identical 4 4 unstructured matrices. <strong>The</strong> 10 parameters of these unstructured<br />

blocks make up the covariance parameters estimated by maximum likelihood. <strong>The</strong> R option requests<br />

that the first block of R be displayed.<br />

<strong>The</strong> results from this analysis are shown in Output 56.2.1–Output 56.2.9.


Example 56.2: Repeated Measures ✦ 4015<br />

Output 56.2.1 Repeated Measures Analysis with Unstructured Covariance Matrix<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.PR<br />

Dependent Variable y<br />

Covariance Structure Unstructured<br />

Subject Effect Person<br />

Estimation Method ML<br />

Residual Variance Method None<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Between-Within<br />

In Output 56.2.1, the covariance structure is listed as “Unstructured,” and no residual variance is<br />

used with this structure. <strong>The</strong> default degrees-of-freedom method here is “Between-Within.”<br />

Output 56.2.2 Repeated Measures Analysis (continued)<br />

Class Level Information<br />

Class Levels Values<br />

Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />

14 15 16 17 18 19 20 21 22 23<br />

24 25 26 27<br />

Gender 2 F M<br />

In Output 56.2.2, note that Person has 27 levels and Gender has 2.<br />

Output 56.2.3 Repeated Measures Analysis (continued)<br />

Dimensions<br />

Covariance Parameters 10<br />

Columns in X 6<br />

Columns in Z 0<br />

Subjects 27<br />

Max Obs Per Subject 4<br />

In Output 56.2.3, the 10 covariance parameters result from the 4 4 unstructured blocks of R. <strong>The</strong>re<br />

is no Z matrix for this model, and each of the 27 subjects has a maximum of 4 observations.


4016 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.2.4 Repeated Measures Analysis (continued)<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 108<br />

Number of Observations Not Used 0<br />

Iteration History<br />

Iteration Evaluations -2 Log Like Criterion<br />

0 1 478.24175986<br />

1 2 419.47721707 0.00000152<br />

2 1 419.47704812 0.00000000<br />

Convergence criteria met.<br />

Three Newton-Raphson iterations are required to find the maximum likelihood estimates<br />

(Output 56.2.4). <strong>The</strong> default relative Hessian criterion has a final value less than 1E 8, indicating<br />

the convergence of the Newton-Raphson algorithm and the attainment of an optimum.<br />

Output 56.2.5 Repeated Measures Analysis (continued)<br />

Estimated R Matrix for Person 1<br />

Row Col1 Col2 Col3 Col4<br />

1 5.1192 2.4409 3.6105 2.5222<br />

2 2.4409 3.9279 2.7175 3.0624<br />

3 3.6105 2.7175 5.9798 3.8235<br />

4 2.5222 3.0624 3.8235 4.6180<br />

<strong>The</strong> 4 4 matrix in Output 56.2.5 is the estimated unstructured covariance matrix. It is the estimate<br />

of the first block of R, and the other 26 blocks all have the same estimate.


Output 56.2.6 Repeated Measures Analysis (continued)<br />

Covariance Parameter Estimates<br />

Example 56.2: Repeated Measures ✦ 4017<br />

Standard Z<br />

Cov Parm Subject Estimate Error Value Pr Z<br />

UN(1,1) Person 5.1192 1.4169 3.61 0.0002<br />

UN(2,1) Person 2.4409 0.9835 2.48 0.0131<br />

UN(2,2) Person 3.9279 1.0824 3.63 0.0001<br />

UN(3,1) Person 3.6105 1.2767 2.83 0.0047<br />

UN(3,2) Person 2.7175 1.0740 2.53 0.0114<br />

UN(3,3) Person 5.9798 1.6279 3.67 0.0001<br />

UN(4,1) Person 2.5222 1.0649 2.37 0.0179<br />

UN(4,2) Person 3.0624 1.0135 3.02 0.0025<br />

UN(4,3) Person 3.8235 1.2508 3.06 0.0022<br />

UN(4,4) Person 4.6180 1.2573 3.67 0.0001<br />

<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.2.6 lists the 10 estimated covariance parameters<br />

in order; note their correspondence to the first block of R displayed in Output 56.2.5. <strong>The</strong><br />

parameter estimates are labeled according to their location in the block in the Cov Parm column,<br />

and all of these estimates are associated with Person as the subject effect. <strong>The</strong> Std Error column lists<br />

approximate standard errors of the covariance parameters obtained from the inverse Hessian matrix.<br />

<strong>The</strong>se standard errors lead to approximate Wald Z statistics, which are compared with the standard<br />

normal distribution <strong>The</strong> results of these tests indicate that all the parameters are significantly different<br />

from 0; however, the Wald test can be unreliable in small samples.<br />

To carry out Wald tests of various linear combinations of these parameters, use the following procedure.<br />

First, run the statements again, adding the ASYCOV option and an ODS statement:<br />

ods output CovParms=cp AsyCov=asy;<br />

proc mixed data=pr method=ml covtest asycov;<br />

class Person Gender;<br />

model y = Gender Age Gender*Age / s;<br />

repeated / type=un subject=Person r;<br />

run;<br />

This creates two data sets, cp and asy, which contain the covariance parameter estimates and their<br />

asymptotic variance covariance matrix, respectively. <strong>The</strong>n read these data sets into the <strong>SAS</strong>/IML<br />

matrix programming language as follows:<br />

proc iml;<br />

use cp;<br />

read all var {Estimate} into est;<br />

use asy;<br />

read all var (’CovP1’:’CovP10’) into asy;<br />

You can then construct your desired linear combinations and corresponding quadratic forms with<br />

the asy matrix.


4018 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.2.7 Repeated Measures Analysis (continued)<br />

Fit Statistics<br />

-2 Log Likelihood 419.5<br />

AIC (smaller is better) 447.5<br />

AICC (smaller is better) 452.0<br />

BIC (smaller is better) 465.6<br />

Null Model Likelihood Ratio Test<br />

DF Chi-Square Pr > ChiSq<br />

9 58.76 |t|<br />

Intercept 15.8423 0.9356 25 16.93


Output 56.2.9 Repeated Measures Analysis (continued)<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Gender 1 25 1.17 0.2904<br />

Age 1 25 110.54


4020 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

This specifies an unstructured covariance matrix for the random intercept and slope. In mixed model<br />

notation, G is block diagonal with identical 2 2 unstructured blocks for each person. By default, R<br />

becomes 2 I. See Example 56.5 for further information about this model.<br />

Finally, you can fit a compound symmetry structure by using TYPE=CS, as follows:<br />

proc mixed data=pr method=ml covtest;<br />

class Person Gender;<br />

model y = Gender Age Gender*Age / s;<br />

repeated / type=cs subject=Person r;<br />

run;<br />

<strong>The</strong> results from this analysis are shown in Output 56.2.10–Output 56.2.17.<br />

<strong>The</strong> “Model Information” table in Output 56.2.10 is the same as before except for the change in<br />

“Covariance Structure.”<br />

Output 56.2.10 Repeated Measures Analysis with Compound Symmetry Structure<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.PR<br />

Dependent Variable y<br />

Covariance Structure Compound Symmetry<br />

Subject Effect Person<br />

Estimation Method ML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Between-Within<br />

<strong>The</strong> “Dimensions” table in Output 56.2.11 shows that there are only two covariance parameters<br />

in the compound symmetry model; this covariance structure has common variance and common<br />

covariance.<br />

Output 56.2.11 Analysis with Compound Symmetry (continued)<br />

Class Level Information<br />

Class Levels Values<br />

Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />

14 15 16 17 18 19 20 21 22 23<br />

24 25 26 27<br />

Gender 2 F M


Output 56.2.11 continued<br />

Dimensions<br />

Covariance Parameters 2<br />

Columns in X 6<br />

Columns in Z 0<br />

Subjects 27<br />

Max Obs Per Subject 4<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 108<br />

Number of Observations Not Used 0<br />

Example 56.2: Repeated Measures ✦ 4021<br />

Since the data are balanced, only one step is required to find the estimates (Output 56.2.12).<br />

Output 56.2.12 Analysis with Compound Symmetry (continued)<br />

Iteration History<br />

Iteration Evaluations -2 Log Like Criterion<br />

0 1 478.24175986<br />

1 1 428.63905802 0.00000000<br />

Convergence criteria met.<br />

Output 56.2.13 displays the estimated R matrix for the first subject. Note the compound symmetry<br />

structure here, which consists of a common covariance with a diagonal enhancement.<br />

Output 56.2.13 Analysis with Compound Symmetry (continued)<br />

Estimated R Matrix for Person 1<br />

Row Col1 Col2 Col3 Col4<br />

1 4.9052 3.0306 3.0306 3.0306<br />

2 3.0306 4.9052 3.0306 3.0306<br />

3 3.0306 3.0306 4.9052 3.0306<br />

4 3.0306 3.0306 3.0306 4.9052<br />

<strong>The</strong> common covariance is estimated to be 3:0306, as listed in the CS row of the “Covariance Parameter<br />

Estimates” table in Output 56.2.14, and the residual variance is estimated to be 1:8746, as<br />

listed in the Residual row. You can use these two numbers to estimate the intraclass correlation coefficient<br />

(ICC) for this model. Here, the ICC estimate equals 3:0306=.3:0306 C 1:8746/ D 0:6178.<br />

You can also obtain this number by adding the RCORR option to the REPEATED statement.


4022 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.2.14 Analysis with Compound Symmetry (continued)<br />

Covariance Parameter Estimates<br />

Standard Z<br />

Cov Parm Subject Estimate Error Value Pr Z<br />

CS Person 3.0306 0.9552 3.17 0.0015<br />

Residual 1.8746 0.2946 6.36 ChiSq<br />

1 49.60 |t|<br />

Intercept 16.3406 0.9631 25 16.97


Output 56.2.17 Analysis with Compound Symmetry (continued)<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Gender 1 25 0.47 0.5003<br />

Age 1 79 111.10


4024 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.2.19 Analysis with Heterogeneous Structures (continued)<br />

Class Level Information<br />

Class Levels Values<br />

Person 27 1 2 3 4 5 6 7 8 9 10 11 12 13<br />

14 15 16 17 18 19 20 21 22 23<br />

24 25 26 27<br />

Gender 2 F M<br />

Dimensions<br />

Covariance Parameters 4<br />

Columns in X 6<br />

Columns in Z 0<br />

Subjects 27<br />

Max Obs Per Subject 4<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 108<br />

Number of Observations Not Used 0<br />

As Output 56.2.20 shows, even with the heterogeneity, only one iteration is required for convergence.<br />

Output 56.2.20 Analysis with Heterogeneous Structures (continued)<br />

Iteration History<br />

Iteration Evaluations -2 Log Like Criterion<br />

0 1 478.24175986<br />

1 1 408.81297228 0.00000000<br />

Convergence criteria met.<br />

<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.2.21 lists the heterogeneous estimates.<br />

Note that both the common covariance and the diagonal enhancement differ between girls and boys.<br />

Output 56.2.21 Analysis with Heterogeneous Structures (continued)<br />

Covariance Parameter Estimates<br />

Cov Parm Subject Group Estimate<br />

Variance Person Gender F 0.5900<br />

CS Person Gender F 3.8804<br />

Variance Person Gender M 2.7577<br />

CS Person Gender M 2.4463


Example 56.2: Repeated Measures ✦ 4025<br />

As Output 56.2.22 shows, both Akaike’s information criterion (424.8) and Schwarz’s Bayesian<br />

information criterion (435.2) are smaller for this model than for the homogeneous compound symmetry<br />

model (440.6 and 448.4, respectively). This indicates that the heterogeneous model is more<br />

appropriate. To construct the likelihood ratio test between the two models, subtract the 2 log<br />

likelihood values: 428:6 408:8 D 19:8. Comparing this value with the 2 distribution with two<br />

degrees of freedom yields a p-value less than 0.0001, again favoring the heterogeneous model.<br />

Output 56.2.22 Analysis with Heterogeneous Structures (continued)<br />

Fit Statistics<br />

-2 Log Likelihood 408.8<br />

AIC (smaller is better) 424.8<br />

AICC (smaller is better) 426.3<br />

BIC (smaller is better) 435.2<br />

Null Model Likelihood Ratio Test<br />

DF Chi-Square Pr > ChiSq<br />

3 69.43 |t|<br />

Intercept 16.3406 1.1130 25 14.68


4026 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.2.24 Analysis with Heterogeneous Structures (continued)<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Gender 1 25 0.55 0.4644<br />

Age 1 79 141.37


<strong>The</strong> results from this analysis are shown in Output 56.3.1–Output 56.3.13.<br />

Example 56.3: Plotting the Likelihood ✦ 4027<br />

<strong>The</strong> “Model Information” table in Output 56.3.1 lists details about this variance components model.<br />

Output 56.3.1 Model Information<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.HH<br />

Dependent Variable y<br />

Covariance Structure Variance Components<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

<strong>The</strong> “Class Level Information” table in Output 56.3.2 lists the levels for A and B.<br />

Output 56.3.2 Class Level Information<br />

Class Level Information<br />

Class Levels Values<br />

a 3 1 2 3<br />

b 2 1 2<br />

<strong>The</strong> “Dimensions” table in Output 56.3.3 reveals that X is 16 4 and Z is 16 8. Since there are<br />

no SUBJECT= effects, PROC <strong>MIXED</strong> considers the data to be effectively from one subject with 16<br />

observations.<br />

Output 56.3.3 Model Dimensions and Number of Observations<br />

Dimensions<br />

Covariance Parameters 3<br />

Columns in X 4<br />

Columns in Z 8<br />

Subjects 1<br />

Max Obs Per Subject 16<br />

Number of Observations<br />

Number of Observations Read 16<br />

Number of Observations Used 16<br />

Number of Observations Not Used 0<br />

Only a portion of the “Parameter Search” table is shown in Output 56.3.4 because the full listing<br />

has 651 rows.


4028 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.3.4 Selected Results of Parameter Search<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

-2 Res Log<br />

CovP1 CovP2 CovP3 Variance Res Log Like Like<br />

17.0000 0.3000 1.0000 80.1400 -52.4699 104.9399<br />

17.0000 0.3050 1.0000 80.0466 -52.4697 104.9393<br />

17.0000 0.3100 1.0000 79.9545 -52.4694 104.9388<br />

17.0000 0.3150 1.0000 79.8637 -52.4692 104.9384<br />

17.0000 0.3200 1.0000 79.7742 -52.4691 104.9381<br />

17.0000 0.3250 1.0000 79.6859 -52.4690 104.9379<br />

17.0000 0.3300 1.0000 79.5988 -52.4689 104.9378<br />

17.0000 0.3350 1.0000 79.5129 -52.4689 104.9377<br />

17.0000 0.3400 1.0000 79.4282 -52.4689 104.9377<br />

17.0000 0.3450 1.0000 79.3447 -52.4689 104.9378<br />

. . . . . .<br />

. . . . . .<br />

. . . . . .<br />

20.0000 0.3550 1.0000 78.2003 -52.4683 104.9366<br />

20.0000 0.3600 1.0000 78.1201 -52.4684 104.9368<br />

20.0000 0.3650 1.0000 78.0409 -52.4685 104.9370<br />

20.0000 0.3700 1.0000 77.9628 -52.4687 104.9373<br />

20.0000 0.3750 1.0000 77.8857 -52.4689 104.9377<br />

20.0000 0.3800 1.0000 77.8096 -52.4691 104.9382<br />

20.0000 0.3850 1.0000 77.7345 -52.4693 104.9387<br />

20.0000 0.3900 1.0000 77.6603 -52.4696 104.9392<br />

20.0000 0.3950 1.0000 77.5871 -52.4699 104.9399<br />

20.0000 0.4000 1.0000 77.5148 -52.4703 104.9406<br />

As Output 56.3.5 shows, convergence occurs quickly because PROC <strong>MIXED</strong> starts from the best<br />

value from the grid search.<br />

Output 56.3.5 Iteration History and Convergence Status<br />

Iteration History<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

1 2 104.93416367 0.00000000<br />

Convergence criteria met.<br />

<strong>The</strong> “Covariance Parameter Estimates” table in Output 56.3.6 lists the variance components estimates.<br />

Note that B is much more variable than A*B.


Output 56.3.6 Estimated Covariance Parameters<br />

Covariance Parameter Estimates<br />

Example 56.3: Plotting the Likelihood ✦ 4029<br />

Standard Z<br />

Cov Parm Estimate Error Value Pr > Z<br />

b 1464.36 2098.01 0.70 0.2426<br />

a*b 26.9581 59.6570 0.45 0.3257<br />

Residual 78.8426 35.3512 2.23 0.0129<br />

<strong>The</strong> asymptotic covariance matrix in Output 56.3.7 also reflects the large variability of B relative to<br />

A*B.<br />

Output 56.3.7 Asymptotic Covariance Matrix of Covariance Parameters<br />

Asymptotic Covariance Matrix of Estimates<br />

Row Cov Parm CovP1 CovP2 CovP3<br />

1 b 4401640 1.2831 -273.32<br />

2 a*b 1.2831 3558.96 -502.84<br />

3 Residual -273.32 -502.84 1249.71<br />

As Output 56.3.8 shows, the PARMS likelihood ratio test (LRT) compares the best model from the<br />

grid search with the final fitted model. Since these models are nearly the same, the LRT is not<br />

significant.<br />

Output 56.3.8 Fit Statistics and Likelihood Ratio Test<br />

Fit Statistics<br />

-2 Res Log Likelihood 104.9<br />

AIC (smaller is better) 110.9<br />

AICC (smaller is better) 113.6<br />

BIC (smaller is better) 107.0<br />

PARMS Model Likelihood Ratio Test<br />

DF Chi-Square Pr > ChiSq<br />

2 0.00 1.0000<br />

<strong>The</strong> mixed model equations are analogous to the normal equations in the standard linear model.<br />

As Output 56.3.9 shows, for this example, rows 1–4 correspond to the fixed effects, rows 5–12<br />

correspond to the random effects, and Col13 corresponds to the dependent variable.


4030 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.3.9 Mixed Model Equations<br />

Mixed Model Equations<br />

Row Effect a b Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />

1 Intercept 0.2029 0.06342 0.07610 0.06342 0.1015 0.1015 0.03805<br />

2 a 1 0.06342 0.06342 0.03805 0.02537 0.03805<br />

3 a 2 0.07610 0.07610 0.03805 0.03805<br />

4 a 3 0.06342 0.06342 0.02537 0.03805<br />

5 b 1 0.1015 0.03805 0.03805 0.02537 0.1022 0.03805<br />

6 b 2 0.1015 0.02537 0.03805 0.03805 0.1022<br />

7 a*b 1 1 0.03805 0.03805 0.03805 0.07515<br />

8 a*b 1 2 0.02537 0.02537 0.02537<br />

9 a*b 2 1 0.03805 0.03805 0.03805<br />

10 a*b 2 2 0.03805 0.03805 0.03805<br />

11 a*b 3 1 0.02537 0.02537 0.02537<br />

12 a*b 3 2 0.03805 0.03805 0.03805<br />

Mixed Model Equations<br />

Row Col8 Col9 Col10 Col11 Col12 Col13<br />

1 0.02537 0.03805 0.03805 0.02537 0.03805 36.4143<br />

2 0.02537 13.8757<br />

3 0.03805 0.03805 12.7469<br />

4 0.02537 0.03805 9.7917<br />

5 0.03805 0.02537 21.2956<br />

6 0.02537 0.03805 0.03805 15.1187<br />

7 9.3477<br />

8 0.06246 4.5280<br />

9 0.07515 7.2676<br />

10 0.07515 5.4793<br />

11 0.06246 4.6802<br />

12 0.07515 5.1115<br />

<strong>The</strong> solution matrix in Output 56.3.10 results from sweeping all but the last row of the mixed model<br />

equations matrix. <strong>The</strong> final column contains a solution vector for the fixed and random effects. <strong>The</strong><br />

first four rows correspond to fixed effects and the last eight correspond to random effects.


Output 56.3.10 Solutions of the Mixed Model Equations<br />

Mixed Model Equations Solution<br />

Example 56.3: Plotting the Likelihood ✦ 4031<br />

Row Effect a b Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />

1 Intercept 761.84 -29.7718 -29.6578 -731.14 -733.22 -0.4680<br />

2 a 1 -29.7718 59.5436 29.7718 -2.0764 2.0764 -14.0239<br />

3 a 2 -29.6578 29.7718 56.2773 -1.0382 1.0382 0.4680<br />

4 a 3<br />

5 b 1 -731.14 -2.0764 -1.0382 741.63 722.73 -4.2598<br />

6 b 2 -733.22 2.0764 1.0382 722.73 741.63 4.2598<br />

7 a*b 1 1 -0.4680 -14.0239 0.4680 -4.2598 4.2598 22.8027<br />

8 a*b 1 2 0.4680 -12.9342 -0.4680 4.2598 -4.2598 4.1555<br />

9 a*b 2 1 -0.5257 1.0514 -12.9534 -4.7855 4.7855 2.1570<br />

10 a*b 2 2 0.5257 -1.0514 -14.0048 4.7855 -4.7855 -2.1570<br />

11 a*b 3 1 -12.4663 12.9342 12.4663 -4.2598 4.2598 1.9200<br />

12 a*b 3 2 -14.4918 14.0239 14.4918 4.2598 -4.2598 -1.9200<br />

Mixed Model Equations Solution<br />

Row Col8 Col9 Col10 Col11 Col12 Col13<br />

1 0.4680 -0.5257 0.5257 -12.4663 -14.4918 159.61<br />

2 -12.9342 1.0514 -1.0514 12.9342 14.0239 53.2049<br />

3 -0.4680 -12.9534 -14.0048 12.4663 14.4918 7.8856<br />

4<br />

5 4.2598 -4.7855 4.7855 -4.2598 4.2598 26.8837<br />

6 -4.2598 4.7855 -4.7855 4.2598 -4.2598 -26.8837<br />

7 4.1555 2.1570 -2.1570 1.9200 -1.9200 3.0198<br />

8 22.8027 -2.1570 2.1570 -1.9200 1.9200 -3.0198<br />

9 -2.1570 22.5560 4.4021 2.1570 -2.1570 -1.7134<br />

10 2.1570 4.4021 22.5560 -2.1570 2.1570 1.7134<br />

11 -1.9200 2.1570 -2.1570 22.8027 4.1555 -0.8115<br />

12 1.9200 -2.1570 2.1570 4.1555 22.8027 0.8115<br />

<strong>The</strong> A factor is significant at the 5% level (Output 56.3.11).<br />

Output 56.3.11 Tests of Fixed Effects<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

a 2 2 28.00 0.0345<br />

Output 56.3.12 shows that the significance of A appears to be from the difference between its first<br />

level and its other two levels.


4032 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.3.12 Least Squares Means for A Effect<br />

Least Squares Means<br />

Standard<br />

Effect a Estimate Error DF t Value Pr > |t|<br />

a 1 212.82 27.6014 2 7.71 0.0164<br />

a 2 167.50 27.5463 2 6.08 0.0260<br />

a 3 159.61 27.6014 2 5.78 0.0286<br />

Output 56.3.13 lists the predicted values from the model. <strong>The</strong>se values are the sum of the fixedeffects<br />

estimates and the empirical best linear unbiased predictors (EBLUPs) of the random effects.<br />

Output 56.3.13 Predicted Values<br />

StdErr<br />

Obs a b y Pred Pred DF Alpha Lower Upper Resid<br />

1 1 1 237 242.723 4.72563 10 0.05 232.193 253.252 -5.7228<br />

2 1 1 254 242.723 4.72563 10 0.05 232.193 253.252 11.2772<br />

3 1 1 246 242.723 4.72563 10 0.05 232.193 253.252 3.2772<br />

4 1 2 178 182.916 5.52589 10 0.05 170.603 195.228 -4.9159<br />

5 1 2 179 182.916 5.52589 10 0.05 170.603 195.228 -3.9159<br />

6 2 1 208 192.670 4.70076 10 0.05 182.196 203.144 15.3297<br />

7 2 1 178 192.670 4.70076 10 0.05 182.196 203.144 -14.6703<br />

8 2 1 187 192.670 4.70076 10 0.05 182.196 203.144 -5.6703<br />

9 2 2 146 142.330 4.70076 10 0.05 131.856 152.804 3.6703<br />

10 2 2 145 142.330 4.70076 10 0.05 131.856 152.804 2.6703<br />

11 2 2 141 142.330 4.70076 10 0.05 131.856 152.804 -1.3297<br />

12 3 1 186 185.687 5.52589 10 0.05 173.374 197.999 0.3134<br />

13 3 1 183 185.687 5.52589 10 0.05 173.374 197.999 -2.6866<br />

14 3 2 142 133.542 4.72563 10 0.05 123.013 144.072 8.4578<br />

15 3 2 125 133.542 4.72563 10 0.05 123.013 144.072 -8.5422<br />

16 3 2 136 133.542 4.72563 10 0.05 123.013 144.072 2.4578<br />

To plot the likelihood surface by using ODS Graphics, use the following statements:<br />

proc template;<br />

define statgraph surface;<br />

begingraph;<br />

layout overlay3d;<br />

surfaceplotparm x=CovP1 y=CovP2 z=ResLogLike;<br />

endlayout;<br />

endgraph;<br />

end;<br />

run;<br />

proc sgrender data=parms template=surface;<br />

run;<br />

<strong>The</strong> results from this plot are shown in Output 56.3.14. <strong>The</strong> peak of the surface is the REML<br />

estimates for the B and A*B variance components.


Output 56.3.14 Plot of Likelihood Surface<br />

Example 56.4: Known G and R<br />

Example 56.4: Known G and R ✦ 4033<br />

This animal breeding example from Henderson (1984, p. 48) considers multiple traits. <strong>The</strong> data<br />

are artificial and consist of measurements of two traits on three animals, but the second trait of the<br />

third animal is missing. Assuming an additive genetic model, you can use PROC <strong>MIXED</strong> to predict<br />

the breeding value of both traits on all three animals and also to predict the second trait of the third<br />

animal. <strong>The</strong> data are as follows:<br />

data h;<br />

input Trait Animal Y;<br />

datalines;<br />

1 1 6<br />

1 2 8<br />

1 3 7<br />

2 1 9<br />

2 2 5<br />

2 3 .<br />

;


4034 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Both G and R are known.<br />

2<br />

2 1 1 2 1 1<br />

3<br />

6<br />

1<br />

6<br />

G D 6 1<br />

6 2<br />

4 1<br />

2<br />

:5<br />

1<br />

2<br />

:5<br />

2<br />

1<br />

:5<br />

1<br />

1<br />

3<br />

1:5<br />

2<br />

:5<br />

1:5<br />

3<br />

:5 7<br />

2 7<br />

1:5 7<br />

:75 5<br />

1 :5 2 1:5 :75 3<br />

2<br />

6<br />

R D 6<br />

4<br />

4 0 0 1 0 0<br />

0 4 0 0 1 0<br />

0 0 4 0 0 1<br />

1 0 0 5 0 0<br />

0 1 0 0 5 0<br />

0 0 1 0 0 5<br />

3<br />

7<br />

5<br />

In order to read G into PROC <strong>MIXED</strong> by using the GDATA= option in the RANDOM statement,<br />

perform the following DATA step:<br />

data g;<br />

input Row Col1-Col6;<br />

datalines;<br />

1 2 1 1 2 1 1<br />

2 1 2 .5 1 2 .5<br />

3 1 .5 2 1 .5 2<br />

4 2 1 1 3 1.5 1.5<br />

5 1 2 .5 1.5 3 .75<br />

6 1 .5 2 1.5 .75 3<br />

;<br />

<strong>The</strong> preceding data are in the dense representation for a GDATA= data set. You can also construct<br />

a data set with the sparse representation by using Row, Col, and Value variables, although this would<br />

require 21 observations instead of 6 for this example.<br />

<strong>The</strong> PROC <strong>MIXED</strong> statements are as follows:<br />

proc mixed data=h mmeq mmeqsol;<br />

class Trait Animal;<br />

model Y = Trait / noint s outp=predicted;<br />

random Trait*Animal / type=un gdata=g g gi s;<br />

repeated / type=un sub=Animal r ri;<br />

parms (4) (1) (5) / noiter;<br />

run;<br />

proc print data=predicted;<br />

run;<br />

<strong>The</strong> MMEQ and MMEQSOL options request the mixed model equations and their solution. <strong>The</strong><br />

variables Trait and Animal are classification variables, and Trait defines the entire X matrix for the<br />

fixed-effects portion of the model, since the intercept is omitted with the NOINT option. <strong>The</strong> fixedeffects<br />

solution vector and predicted values are also requested by using the S and OUTP= options,<br />

respectively.


Example 56.4: Known G and R ✦ 4035<br />

<strong>The</strong> random effect Trait*Animal leads to a Z matrix with six columns, the first five corresponding to<br />

the identity matrix and the last consisting of 0s. An unstructured G matrix is specified by using the<br />

TYPE=UN option, and it is read into PROC <strong>MIXED</strong> from a <strong>SAS</strong> data set by using the GDATA=G<br />

specification. <strong>The</strong> G and GI options request the display of G and G 1 , respectively. <strong>The</strong> S option<br />

requests that the random-effects solution vector be displayed.<br />

Note that the preceding R matrix is block diagonal if the data are sorted by animals. <strong>The</strong><br />

REPEATED statement exploits this fact by requesting R to have unstructured 2 2 blocks corresponding<br />

to animals, which are the subjects. <strong>The</strong> R and RI options request that the estimated 2 2<br />

blocks for the first animal and its inverse be displayed. <strong>The</strong> PARMS statement lists the parameters<br />

of this 2 2 matrix. Note that the parameters from G are not specified in the PARMS statement<br />

because they have already been assigned by using the GDATA= option in the RANDOM statement.<br />

<strong>The</strong> NOITER option prevents PROC <strong>MIXED</strong> from computing residual (restricted) maximum likelihood<br />

estimates; instead, the known values are used for inferences.<br />

<strong>The</strong> results from this analysis are shown in Output 56.4.1–Output 56.4.12.<br />

<strong>The</strong> “Unstructured” covariance structure (Output 56.4.1) applies to both G and R here. <strong>The</strong> levels<br />

of Trait and Animal have been specified correctly.<br />

Output 56.4.1 Model and Class Level Information<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.H<br />

Dependent Variable Y<br />

Covariance Structure Unstructured<br />

Subject Effect Animal<br />

Estimation Method REML<br />

Residual Variance Method None<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

Class Level Information<br />

Class Levels Values<br />

Trait 2 1 2<br />

Animal 3 1 2 3<br />

<strong>The</strong> three covariance parameters indicated in Output 56.4.2 correspond to those from the R matrix.<br />

Those from G are considered fixed and known because of the GDATA= option.


4036 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.4.2 Model Dimensions and Number of Observations<br />

Dimensions<br />

Covariance Parameters 3<br />

Columns in X 2<br />

Columns in Z 6<br />

Subjects 1<br />

Max Obs Per Subject 6<br />

Number of Observations<br />

Number of Observations Read 6<br />

Number of Observations Used 5<br />

Number of Observations Not Used 1<br />

Because starting values for the covariance parameters are specified in the PARMS statement, the<br />

<strong>MIXED</strong> procedure prints the residual (restricted) log likelihood at the starting values. Because of<br />

the NOITER option in the PARMS statement, this is also the final log likelihood in this analysis<br />

(Output 56.4.3).<br />

Output 56.4.3 REML Log Likelihood<br />

Parameter Search<br />

CovP1 CovP2 CovP3 Res Log Like -2 Res Log Like<br />

4.0000 1.0000 5.0000 -7.3731 14.7463<br />

<strong>The</strong> block of R corresponding to the first animal and the inverse of this block are shown in<br />

Output 56.4.4.<br />

Output 56.4.4 Inverse R Matrix<br />

Estimated R Matrix<br />

for Animal 1<br />

Row Col1 Col2<br />

1 4.0000 1.0000<br />

2 1.0000 5.0000<br />

Estimated Inv(R) Matrix<br />

for Animal 1<br />

Row Col1 Col2<br />

1 0.2632 -0.05263<br />

2 -0.05263 0.2105


Example 56.4: Known G and R ✦ 4037<br />

<strong>The</strong> G matrix as specified in the GDATA= data set and its inverse are shown in Output 56.4.5 and<br />

Output 56.4.6.<br />

Output 56.4.5 G Matrix<br />

Estimated G Matrix<br />

Row Effect Trait Animal Col1 Col2 Col3 Col4<br />

1 Trait*Animal 1 1 2.0000 1.0000 1.0000 2.0000<br />

2 Trait*Animal 1 2 1.0000 2.0000 0.5000 1.0000<br />

3 Trait*Animal 1 3 1.0000 0.5000 2.0000 1.0000<br />

4 Trait*Animal 2 1 2.0000 1.0000 1.0000 3.0000<br />

5 Trait*Animal 2 2 1.0000 2.0000 0.5000 1.5000<br />

6 Trait*Animal 2 3 1.0000 0.5000 2.0000 1.5000<br />

Output 56.4.6 Inverse G Matrix<br />

Estimated G Matrix<br />

Row Col5 Col6<br />

1 1.0000 1.0000<br />

2 2.0000 0.5000<br />

3 0.5000 2.0000<br />

4 1.5000 1.5000<br />

5 3.0000 0.7500<br />

6 0.7500 3.0000<br />

Estimated Inv(G) Matrix<br />

Row Effect Trait Animal Col1 Col2 Col3 Col4<br />

1 Trait*Animal 1 1 2.5000 -1.0000 -1.0000 -1.6667<br />

2 Trait*Animal 1 2 -1.0000 2.0000 0.6667<br />

3 Trait*Animal 1 3 -1.0000 2.0000 0.6667<br />

4 Trait*Animal 2 1 -1.6667 0.6667 0.6667 1.6667<br />

5 Trait*Animal 2 2 0.6667 -1.3333 -0.6667<br />

6 Trait*Animal 2 3 0.6667 -1.3333 -0.6667<br />

Estimated Inv(G) Matrix<br />

Row Col5 Col6<br />

1 0.6667 0.6667<br />

2 -1.3333<br />

3 -1.3333<br />

4 -0.6667 -0.6667<br />

5 1.3333<br />

6 1.3333<br />

<strong>The</strong> table of covariance parameter estimates in Output 56.4.7 displays only the parameters in R.<br />

Because of the GDATA= option in the RANDOM statement, the G-side parameters do not partici-


4038 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

pate in the parameter estimation process. Because of the NOITER option in the PARMS statement,<br />

however, the R-side parameters in this output are identical to their starting values.<br />

Output 56.4.7 R-Side Covariance Parameters<br />

Covariance Parameter Estimates<br />

Cov Parm Subject Estimate<br />

UN(1,1) Animal 4.0000<br />

UN(2,1) Animal 1.0000<br />

UN(2,2) Animal 5.0000<br />

<strong>The</strong> coefficients of the mixed model equations in Output 56.4.8 agree with Henderson (1984, p. 55).<br />

Recall from Output 56.4.1 that there are 2 columns in X and 6 columns in Z. <strong>The</strong> first 8 columns<br />

of the mixed model equations correspond to the X and Z components. Column 9 represents the Y<br />

border.<br />

Output 56.4.8 Mixed Model Equations with Y Border<br />

Mixed Model Equations<br />

Row Effect Trait Animal Col1 Col2 Col3 Col4<br />

1 Trait 1 0.7763 -0.1053 0.2632 0.2632<br />

2 Trait 2 -0.1053 0.4211 -0.05263 -0.05263<br />

3 Trait*Animal 1 1 0.2632 -0.05263 2.7632 -1.0000<br />

4 Trait*Animal 1 2 0.2632 -0.05263 -1.0000 2.2632<br />

5 Trait*Animal 1 3 0.2500 -1.0000<br />

6 Trait*Animal 2 1 -0.05263 0.2105 -1.7193 0.6667<br />

7 Trait*Animal 2 2 -0.05263 0.2105 0.6667 -1.3860<br />

8 Trait*Animal 2 3 0.6667<br />

Mixed Model Equations<br />

Row Col5 Col6 Col7 Col8 Col9<br />

1 0.2500 -0.05263 -0.05263 4.6974<br />

2 0.2105 0.2105 2.2105<br />

3 -1.0000 -1.7193 0.6667 0.6667 1.1053<br />

4 0.6667 -1.3860 1.8421<br />

5 2.2500 0.6667 -1.3333 1.7500<br />

6 0.6667 1.8772 -0.6667 -0.6667 1.5789<br />

7 -0.6667 1.5439 0.6316<br />

8 -1.3333 -0.6667 1.3333<br />

<strong>The</strong> solution to the mixed model equations also matches that given by Henderson (1984, p. 55).<br />

After solving the augmented mixed model equations, you can find the solutions for fixed and random<br />

effects in the last column (Output 56.4.9).


Output 56.4.9 Solutions of the Mixed Model Equations with Y Border<br />

Mixed Model Equations Solution<br />

Example 56.4: Known G and R ✦ 4039<br />

Row Effect Trait Animal Col1 Col2 Col3 Col4<br />

1 Trait 1 2.5508 1.5685 -1.3047 -1.1775<br />

2 Trait 2 1.5685 4.5539 -1.4112 -1.3534<br />

3 Trait*Animal 1 1 -1.3047 -1.4112 1.8282 1.0652<br />

4 Trait*Animal 1 2 -1.1775 -1.3534 1.0652 1.7589<br />

5 Trait*Animal 1 3 -1.1701 -0.9410 1.0206 0.7085<br />

6 Trait*Animal 2 1 -1.3002 -2.1592 1.8010 1.0900<br />

7 Trait*Animal 2 2 -1.1821 -2.1055 1.0925 1.7341<br />

8 Trait*Animal 2 3 -1.1678 -1.3149 1.0070 0.7209<br />

Mixed Model Equations Solution<br />

Row Col5 Col6 Col7 Col8 Col9<br />

1 -1.1701 -1.3002 -1.1821 -1.1678 6.9909<br />

2 -0.9410 -2.1592 -2.1055 -1.3149 6.9959<br />

3 1.0206 1.8010 1.0925 1.0070 0.05450<br />

4 0.7085 1.0900 1.7341 0.7209 -0.04955<br />

5 1.7812 1.0095 0.7197 1.7756 0.02230<br />

6 1.0095 2.7518 1.6392 1.4849 0.2651<br />

7 0.7197 1.6392 2.6874 0.9930 -0.2601<br />

8 1.7756 1.4849 0.9930 2.7645 0.1276<br />

<strong>The</strong> solutions for the fixed and random effects in Output 56.4.10 correspond to the last column in<br />

Output 56.4.9. Note that the standard errors for the fixed effects and the prediction standard errors<br />

for the random effects are the square root values of the diagonal entries in the solution of the mixed<br />

model equations (Output 56.4.9).<br />

Output 56.4.10 Solutions for Fixed and Random Effects<br />

Solution for Fixed Effects<br />

Standard<br />

Effect Trait Estimate Error DF t Value Pr > |t|<br />

Trait 1 6.9909 1.5971 3 4.38 0.0221<br />

Trait 2 6.9959 2.1340 3 3.28 0.0465<br />

Solution for Random Effects<br />

Std Err<br />

Effect Trait Animal Estimate Pred DF t Value Pr > |t|<br />

Trait*Animal 1 1 0.05450 1.3521 0 0.04 .<br />

Trait*Animal 1 2 -0.04955 1.3262 0 -0.04 .<br />

Trait*Animal 1 3 0.02230 1.3346 0 0.02 .<br />

Trait*Animal 2 1 0.2651 1.6589 0 0.16 .<br />

Trait*Animal 2 2 -0.2601 1.6393 0 -0.16 .<br />

Trait*Animal 2 3 0.1276 1.6627 0 0.08 .


4040 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> estimates for the two traits are nearly identical, but the standard error of the second trait is<br />

larger because of the missing observation.<br />

<strong>The</strong> Estimate column in the “Solution for Random Effects” table lists the best linear unbiased predictions<br />

(BLUPs) of the breeding values of both traits for all three animals. <strong>The</strong> p-values are missing<br />

because the default containment method for computing degrees of freedom results in zero degrees<br />

of freedom for the random effects parameter tests.<br />

Output 56.4.11 Significance Test Comparing Traits<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Trait 2 3 10.59 0.0437<br />

<strong>The</strong> two estimated traits are significantly different from zero at the 5% level (Output 56.4.11).<br />

Output 56.4.12 displays the predicted values of the observations based on the trait and breeding<br />

value estimates—that is, the fixed and random effects.<br />

Output 56.4.12 Predicted Observations<br />

StdErr<br />

Obs Trait Animal Y Pred Pred DF Alpha Lower Upper Resid<br />

1 1 1 6 7.04542 1.33027 0 0.05 . . -1.04542<br />

2 1 2 8 6.94137 1.39806 0 0.05 . . 1.05863<br />

3 1 3 7 7.01321 1.41129 0 0.05 . . -0.01321<br />

4 2 1 9 7.26094 1.72839 0 0.05 . . 1.73906<br />

5 2 2 5 6.73576 1.74077 0 0.05 . . -1.73576<br />

6 2 3 . 7.12015 2.99088 0 0.05 . . .<br />

<strong>The</strong> predicted values are not the predictions of future records in the sense that they do not contain<br />

a component corresponding to a new observational error. See Henderson (1984) for information<br />

about predicting future records. <strong>The</strong> Lower and Upper columns usually contain confidence limits<br />

for the predicted values; they are missing here because the random-effects parameter degrees of<br />

freedom equals 0.


Example 56.5: Random Coefficients<br />

Example 56.5: Random Coefficients ✦ 4041<br />

This example comes from a pharmaceutical stability data simulation performed by Obenchain<br />

(1990). <strong>The</strong> observed responses are replicate assay results, expressed in percent of label claim,<br />

at various shelf ages, expressed in months. <strong>The</strong> desired mixed model involves three batches of<br />

product that differ randomly in intercept (initial potency) and slope (degradation rate). This type<br />

of model is also known as a hierarchical or multilevel model (Singer 1998; Sullivan, Dukes, and<br />

Losina 1999).<br />

<strong>The</strong> <strong>SAS</strong> statements are as follows:<br />

data rc;<br />

input Batch Month @@;<br />

Monthc = Month;<br />

do i = 1 to 6;<br />

input Y @@;<br />

output;<br />

end;<br />

datalines;<br />

1 0 101.2 103.3 103.3 102.1 104.4 102.4<br />

1 1 98.8 99.4 99.7 99.5 . .<br />

1 3 98.4 99.0 97.3 99.8 . .<br />

1 6 101.5 100.2 101.7 102.7 . .<br />

1 9 96.3 97.2 97.2 96.3 . .<br />

1 12 97.3 97.9 96.8 97.7 97.7 96.7<br />

2 0 102.6 102.7 102.4 102.1 102.9 102.6<br />

2 1 99.1 99.0 99.9 100.6 . .<br />

2 3 105.7 103.3 103.4 104.0 . .<br />

2 6 101.3 101.5 100.9 101.4 . .<br />

2 9 94.1 96.5 97.2 95.6 . .<br />

2 12 93.1 92.8 95.4 92.2 92.2 93.0<br />

3 0 105.1 103.9 106.1 104.1 103.7 104.6<br />

3 1 102.2 102.0 100.8 99.8 . .<br />

3 3 101.2 101.8 100.8 102.6 . .<br />

3 6 101.1 102.0 100.1 100.2 . .<br />

3 9 100.9 99.5 102.2 100.8 . .<br />

3 12 97.8 98.3 96.9 98.4 96.9 96.5<br />

;<br />

proc mixed data=rc;<br />

class Batch;<br />

model Y = Month / s;<br />

random Int Month / type=un sub=Batch s;<br />

run;<br />

In the DATA step, Monthc is created as a duplicate of Month in order to enable both a continuous and<br />

a classification version of the same variable. <strong>The</strong> variable Monthc is used in a subsequent analysis<br />

In the PROC <strong>MIXED</strong> statements, Batch is listed as the only classification variable. <strong>The</strong> fixed effect<br />

Month in the MODEL statement is not declared as a classification variable; thus it models a linear<br />

trend in time. An intercept is included as a fixed effect by default, and the S option requests that the


4042 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

fixed-effects parameter estimates be produced.<br />

<strong>The</strong> two random effects are Int and Month, modeling random intercepts and slopes, respectively.<br />

Note that Intercept and Month are used as both fixed and random effects. <strong>The</strong> TYPE=UN option in<br />

the RANDOM statement specifies an unstructured covariance matrix for the random intercept and<br />

slope effects. In mixed model notation, G is block diagonal with unstructured 2 2 blocks. Each<br />

block corresponds to a different level of Batch, which is the SUBJECT= effect. <strong>The</strong> unstructured<br />

type provides a mechanism for estimating the correlation between the random coefficients. <strong>The</strong> S<br />

option requests the production of the random-effects parameter estimates.<br />

<strong>The</strong> results from this analysis are shown in Output 56.5.1–Output 56.5.9. <strong>The</strong> “Unstructured” covariance<br />

structure in Output 56.5.1 applies to G here.<br />

Output 56.5.1 Model Information in Random Coefficients Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.RC<br />

Dependent Variable Y<br />

Covariance Structure Unstructured<br />

Subject Effect Batch<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

Batch is the only classification variable in this analysis, and it has three levels (Output 56.5.2).<br />

Output 56.5.2 Random Coefficients Analysis (continued)<br />

Class Level Information<br />

Class Levels Values<br />

Batch 3 1 2 3<br />

<strong>The</strong> “Dimensions” table in Output 56.5.3 indicates that there are three subjects (corresponding to<br />

batches). <strong>The</strong> 24 observations not used correspond to the missing values of Y in the input data set.<br />

Output 56.5.3 Random Coefficients Analysis (continued)<br />

Dimensions<br />

Covariance Parameters 4<br />

Columns in X 2<br />

Columns in Z Per Subject 2<br />

Subjects 3<br />

Max Obs Per Subject 36


Output 56.5.3 continued<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 84<br />

Number of Observations Not Used 24<br />

As Output 56.5.4 shows, only one iteration is required for convergence.<br />

Output 56.5.4 Random Coefficients Analysis (continued)<br />

Iteration History<br />

Example 56.5: Random Coefficients ✦ 4043<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

0 1 367.02768461<br />

1 1 350.32813577 0.00000000<br />

Convergence criteria met.<br />

<strong>The</strong> Estimate column in Output 56.5.5 lists the estimated elements of the unstructured 2 2 matrix<br />

comprising the blocks of G. Note that the random coefficients are negatively correlated.<br />

Output 56.5.5 Random Coefficients Analysis (continued)<br />

Covariance Parameter Estimates<br />

Cov Parm Subject Estimate<br />

UN(1,1) Batch 0.9768<br />

UN(2,1) Batch -0.1045<br />

UN(2,2) Batch 0.03717<br />

Residual 3.2932<br />

<strong>The</strong> null model likelihood ratio test indicates a significant improvement over the null model consisting<br />

of no random effects and a homogeneous residual error (Output 56.5.6).<br />

Output 56.5.6 Random Coefficients Analysis (continued)<br />

Fit Statistics<br />

-2 Res Log Likelihood 350.3<br />

AIC (smaller is better) 358.3<br />

AICC (smaller is better) 358.8<br />

BIC (smaller is better) 354.7


4044 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.5.6 continued<br />

Null Model Likelihood Ratio Test<br />

DF Chi-Square Pr > ChiSq<br />

3 16.70 0.0008<br />

<strong>The</strong> fixed-effects estimates represent the estimated means for the random intercept and slope, respectively<br />

(Output 56.5.7).<br />

Output 56.5.7 Random Coefficients Analysis (continued)<br />

Solution for Fixed Effects<br />

Standard<br />

Effect Estimate Error DF t Value Pr > |t|<br />

Intercept 102.70 0.6456 2 159.08 |t|<br />

Intercept 1 -1.0010 0.6842 78 -1.46 0.1474<br />

Month 1 0.1287 0.1245 78 1.03 0.3047<br />

Intercept 2 0.3934 0.6842 78 0.58 0.5669<br />

Month 2 -0.2060 0.1245 78 -1.65 0.1021<br />

Intercept 3 0.6076 0.6842 78 0.89 0.3772<br />

Month 3 0.07731 0.1245 78 0.62 0.5365<br />

<strong>The</strong> F statistic in the “Type 3 Tests of Fixed Effects” table in Output 56.5.9 is the square of the<br />

t statistic used in the test of Month in the preceding “Solution for Fixed Effects” table (compare<br />

Output 56.5.7 and Output 56.5.9). Both statistics test the null hypothesis that the slope assigned to<br />

Month equals 0, and this hypothesis can barely be rejected at the 5% level.


Output 56.5.9 Random Coefficients Analysis (continued)<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Month 1 2 19.41 0.0478<br />

Example 56.5: Random Coefficients ✦ 4045<br />

It is also possible to fit a random coefficients model with error terms that follow a nested structure<br />

(Fuller and Battese 1973). <strong>The</strong> following <strong>SAS</strong> statements represent one way of doing this:<br />

proc mixed data=rc;<br />

class Batch Monthc;<br />

model Y = Month / s;<br />

random Int Month Monthc / sub=Batch s;<br />

run;<br />

<strong>The</strong> variable Monthc is added to the CLASS and RANDOM statements, and it models the nested<br />

errors. Note that Month and Monthc are continuous and classification versions of the same variable.<br />

Also, the TYPE=UN option is dropped from the RANDOM statement, resulting in the default<br />

variance components model instead of correlated random coefficients. <strong>The</strong> results from this analysis<br />

are shown in Output 56.5.10.<br />

Output 56.5.10 Random Coefficients with Nested Errors Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.RC<br />

Dependent Variable Y<br />

Covariance Structure Variance Components<br />

Subject Effect Batch<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

Class Level Information<br />

Class Levels Values<br />

Batch 3 1 2 3<br />

Monthc 6 0 1 3 6 9 12<br />

Dimensions<br />

Covariance Parameters 4<br />

Columns in X 2<br />

Columns in Z Per Subject 8<br />

Subjects 3<br />

Max Obs Per Subject 36


4046 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.5.10 continued<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 84<br />

Number of Observations Not Used 24<br />

Iteration History<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

0 1 367.02768461<br />

1 4 277.51945360 .<br />

2 1 276.97551718 0.00104208<br />

3 1 276.90304909 0.00003174<br />

4 1 276.90100316 0.00000004<br />

5 1 276.90100092 0.00000000<br />

Convergence criteria met.<br />

Covariance Parameter Estimates<br />

Cov Parm Subject Estimate<br />

Intercept Batch 0<br />

Month Batch 0.01243<br />

Monthc Batch 3.7411<br />

Residual 0.7969<br />

For this analysis, the Newton-Raphson algorithm requires five iterations and nine likelihood evaluations<br />

to achieve convergence. <strong>The</strong> missing value in the Criterion column in iteration 1 indicates<br />

that a boundary constraint has been dropped.<br />

<strong>The</strong> estimate for the Intercept variance component equals 0. This occurs frequently in practice and<br />

indicates that the restricted likelihood is maximized by setting this variance component equal to 0.<br />

Whenever a zero variance component estimate occurs, the following note appears in the <strong>SAS</strong> log:<br />

NOTE: Estimated G matrix is not positive definite.<br />

<strong>The</strong> remaining variance component estimates are positive, and the estimate corresponding to the<br />

nested errors (MONTHC) is much larger than the other two.<br />

A comparison of AIC and BIC for this model with those of the previous model favors the nested<br />

error model (compare Output 56.5.11 and Output 56.5.6). Strictly speaking, a likelihood ratio test<br />

cannot be carried out between the two models because one is not contained in the other; however, a<br />

cautious comparison of likelihoods can be informative.


Example 56.5: Random Coefficients ✦ 4047<br />

Output 56.5.11 Random Coefficients with Nested Errors Analysis (continued)<br />

Fit Statistics<br />

-2 Res Log Likelihood 276.9<br />

AIC (smaller is better) 282.9<br />

AICC (smaller is better) 283.2<br />

BIC (smaller is better) 280.2<br />

<strong>The</strong> better-fitting covariance model affects the standard errors of the fixed-effects parameter estimates<br />

more than the estimates themselves (Output 56.5.12).<br />

Output 56.5.12 Random Coefficients with Nested Errors Analysis (continued)<br />

Solution for Fixed Effects<br />

Standard<br />

Effect Estimate Error DF t Value Pr > |t|<br />

Intercept 102.56 0.7287 2 140.74


4048 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.5.13 Random Coefficients with Nested Errors Analysis (continued)<br />

Solution for Random Effects<br />

Std Err<br />

Effect Batch Monthc Estimate Pred DF t Value Pr > |t|<br />

Intercept 1 0 . . . .<br />

Month 1 -0.00028 0.09268 66 -0.00 0.9976<br />

Monthc 1 0 0.2191 0.7896 66 0.28 0.7823<br />

Monthc 1 1 -2.5690 0.7571 66 -3.39 0.0012<br />

Monthc 1 3 -2.3067 0.6865 66 -3.36 0.0013<br />

Monthc 1 6 1.8726 0.7328 66 2.56 0.0129<br />

Monthc 1 9 -1.2350 0.9300 66 -1.33 0.1888<br />

Monthc 1 12 0.7736 1.1992 66 0.65 0.5211<br />

Intercept 2 0 . . . .<br />

Month 2 -0.07571 0.09268 66 -0.82 0.4169<br />

Monthc 2 0 -0.00621 0.7896 66 -0.01 0.9938<br />

Monthc 2 1 -2.2126 0.7571 66 -2.92 0.0048<br />

Monthc 2 3 3.1063 0.6865 66 4.53 F<br />

Month 1 2 15.78 0.0579<br />

<strong>The</strong> test of Month is similar to that from the previous model, although it is no longer significant at<br />

the 5% level (Output 56.5.14).


Example 56.6: Line-Source Sprinkler Irrigation<br />

Example 56.6: Line-Source Sprinkler Irrigation ✦ 4049<br />

<strong>The</strong>se data appear in Hanks et al. (1980), Johnson, Chaudhuri, and Kanemasu (1983), and Stroup<br />

(1989b). Three cultivars (Cult) of winter wheat are randomly assigned to rectangular plots within<br />

each of three blocks (Block). <strong>The</strong> nine plots are located side by side, and a line-source sprinkler is<br />

placed through the middle. Each plot is subdivided into twelve subplots—six to the north of the<br />

line source, six to the south (Dir). <strong>The</strong> two plots closest to the line source represent the maximum<br />

irrigation level (Irrig=6), the two next-closest plots represent the next-highest level (Irrig=5), and so<br />

forth.<br />

This example is a case where both G and R can be modeled. One of Stroup’s models specifies a<br />

diagonal G containing the variance components for Block, Block*Dir, and Block*Irrig, and a Toeplitz<br />

R with four bands. <strong>The</strong> <strong>SAS</strong> statements to fit this model and carry out some further analyses follow.<br />

CAUTION: This analysis can require considerable CPU time.<br />

data line;<br />

length Cult$ 8;<br />

input Block Cult$ @;<br />

row = _n_;<br />

do Sbplt=1 to 12;<br />

if Sbplt le 6 then do;<br />

Irrig = Sbplt;<br />

Dir = ’North’;<br />

end; else do;<br />

Irrig = 13 - Sbplt;<br />

Dir = ’South’;<br />

end;<br />

input Y @; output;<br />

end;<br />

datalines;<br />

1 Luke 2.4 2.7 5.6 7.5 7.9 7.1 6.1 7.3 7.4 6.7 3.8 1.8<br />

1 Nugaines 2.2 2.2 4.3 6.3 7.9 7.1 6.2 5.3 5.3 5.2 5.4 2.9<br />

1 Bridger 2.9 3.2 5.1 6.9 6.1 7.5 5.6 6.5 6.6 5.3 4.1 3.1<br />

2 Nugaines 2.4 2.2 4.0 5.8 6.1 6.2 7.0 6.4 6.7 6.4 3.7 2.2<br />

2 Bridger 2.6 3.1 5.7 6.4 7.7 6.8 6.3 6.2 6.6 6.5 4.2 2.7<br />

2 Luke 2.2 2.7 4.3 6.9 6.8 8.0 6.5 7.3 5.9 6.6 3.0 2.0<br />

3 Nugaines 1.8 1.9 3.7 4.9 5.4 5.1 5.7 5.0 5.6 5.1 4.2 2.2<br />

3 Luke 2.1 2.3 3.7 5.8 6.3 6.3 6.5 5.7 5.8 4.5 2.7 2.3<br />

3 Bridger 2.7 2.8 4.0 5.0 5.2 5.2 5.9 6.1 6.0 4.3 3.1 3.1<br />

;<br />

proc mixed;<br />

class Block Cult Dir Irrig;<br />

model Y = Cult|Dir|Irrig@2;<br />

random Block Block*Dir Block*Irrig;<br />

repeated / type=toep(4) sub=Block*Cult r;<br />

lsmeans Cult|Irrig;<br />

estimate ’Bridger vs Luke’ Cult 1 -1 0;<br />

estimate ’Linear Irrig’ Irrig -5 -3 -1 1 3 5;<br />

estimate ’B vs L x Linear Irrig’ Cult*Irrig<br />

-5 -3 -1 1 3 5 5 3 1 -1 -3 -5;<br />

run;


4050 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> preceding statements use the bar operator ( | ) and the at sign (@) to specify all two-factor<br />

interactions between Cult, Dir, and Irrig as fixed effects.<br />

<strong>The</strong> RANDOM statement sets up the Z and G matrices corresponding to the random effects Block,<br />

Block*Dir, and Block*Irrig.<br />

In the REPEATED statement, the TYPE=TOEP(4) option sets up the blocks of the R matrix to be<br />

Toeplitz with four bands below and including the main diagonal. <strong>The</strong> subject effect is Block*Cult,<br />

and it produces nine 12 12 blocks. <strong>The</strong> R option requests that the first block of R be displayed.<br />

Least squares means (LSMEANS) are requested for Cult, Irrig, and Cult*Irrig, and a few ESTIMATE<br />

statements are specified to illustrate some linear combinations of the fixed effects.<br />

<strong>The</strong> results from this analysis are shown in Output 56.6.1.<br />

<strong>The</strong> “Covariance Structures” row in Output 56.6.1 reveals the two different structures assumed for<br />

G and R.<br />

Output 56.6.1 Model Information in Line-Source Sprinkler Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.LINE<br />

Dependent Variable Y<br />

Covariance Structures Variance Components,<br />

Toeplitz<br />

Subject Effect Block*Cult<br />

Estimation Method REML<br />

Residual Variance Method Profile<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Containment<br />

<strong>The</strong> levels of each classification variable are listed as a single string in the Values column, regardless<br />

of whether the levels are numeric or character (Output 56.6.2).<br />

Output 56.6.2 Class Level Information<br />

Class Level Information<br />

Class Levels Values<br />

Block 3 1 2 3<br />

Cult 3 Bridger Luke Nugaines<br />

Dir 2 North South<br />

Irrig 6 1 2 3 4 5 6<br />

Even though there is a SUBJECT= effect in the REPEATED statement, the analysis considers all<br />

of the data to be from one subject because there is no corresponding SUBJECT= effect in the<br />

RANDOM statement (Output 56.6.3).


Output 56.6.3 Model Dimensions and Number of Observations<br />

Example 56.6: Line-Source Sprinkler Irrigation ✦ 4051<br />

Dimensions<br />

Covariance Parameters 7<br />

Columns in X 48<br />

Columns in Z 27<br />

Subjects 1<br />

Max Obs Per Subject 108<br />

Number of Observations<br />

Number of Observations Read 108<br />

Number of Observations Used 108<br />

Number of Observations Not Used 0<br />

<strong>The</strong> Newton-Raphson algorithm converges successfully in seven iterations (Output 56.6.4).<br />

Output 56.6.4 Iteration History and Convergence Status<br />

Iteration History<br />

Iteration Evaluations -2 Res Log Like Criterion<br />

0 1 226.25427252<br />

1 4 187.99336173 .<br />

2 3 186.62579299 0.10431081<br />

3 1 184.38218213 0.04807260<br />

4 1 183.41836853 0.00886548<br />

5 1 183.25111475 0.00075353<br />

6 1 183.23809997 0.00000748<br />

7 1 183.23797748 0.00000000<br />

Convergence criteria met.<br />

<strong>The</strong> first block of the estimated R matrix has the TOEP(4) structure, and the observations that are<br />

three plots apart exhibit a negative correlation (Output 56.6.5).


4052 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.6.5 Estimated R Matrix for the First Subject<br />

Estimated R Matrix for Block*Cult 1 Bridger<br />

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7<br />

1 0.2850 0.007986 0.001452 -0.09253<br />

2 0.007986 0.2850 0.007986 0.001452 -0.09253<br />

3 0.001452 0.007986 0.2850 0.007986 0.001452 -0.09253<br />

4 -0.09253 0.001452 0.007986 0.2850 0.007986 0.001452 -0.09253<br />

5 -0.09253 0.001452 0.007986 0.2850 0.007986 0.001452<br />

6 -0.09253 0.001452 0.007986 0.2850 0.007986<br />

7 -0.09253 0.001452 0.007986 0.2850<br />

8 -0.09253 0.001452 0.007986<br />

9 -0.09253 0.001452<br />

10 -0.09253<br />

11<br />

12<br />

Estimated R Matrix for Block*Cult 1 Bridger<br />

Row Col8 Col9 Col10 Col11 Col12<br />

1<br />

2<br />

3<br />

4<br />

5 -0.09253<br />

6 0.001452 -0.09253<br />

7 0.007986 0.001452 -0.09253<br />

8 0.2850 0.007986 0.001452 -0.09253<br />

9 0.007986 0.2850 0.007986 0.001452 -0.09253<br />

10 0.001452 0.007986 0.2850 0.007986 0.001452<br />

11 -0.09253 0.001452 0.007986 0.2850 0.007986<br />

12 -0.09253 0.001452 0.007986 0.2850<br />

Output 56.6.6 lists the estimated covariance parameters from both G and R. <strong>The</strong> first three are the<br />

variance components making up the diagonal G, and the final four make up the Toeplitz structure<br />

in the blocks of R. <strong>The</strong> Residual row corresponds to the variance of the Toeplitz structure, and it<br />

represents the parameter profiled out during the optimization process.<br />

Output 56.6.6 Estimated Covariance Parameters<br />

Covariance Parameter Estimates<br />

Cov Parm Subject Estimate<br />

Block 0.2194<br />

Block*Dir 0.01768<br />

Block*Irrig 0.03539<br />

TOEP(2) Block*Cult 0.007986<br />

TOEP(3) Block*Cult 0.001452<br />

TOEP(4) Block*Cult -0.09253<br />

Residual 0.2850


Example 56.6: Line-Source Sprinkler Irrigation ✦ 4053<br />

<strong>The</strong> “ 2 Res Log Likelihood” value in Output 56.6.7 is the same as the final value listed in the<br />

“Iteration History” table (Output 56.6.4).<br />

Output 56.6.7 Fit Statistics Based on the Residual Log Likelihood<br />

Fit Statistics<br />

-2 Res Log Likelihood 183.2<br />

AIC (smaller is better) 197.2<br />

AICC (smaller is better) 198.8<br />

BIC (smaller is better) 190.9<br />

Every fixed effect except for Dir and Cult*Irrig is significant at the 5% level (Output 56.6.8).<br />

Output 56.6.8 Tests for Fixed Effects<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

Cult 2 68 7.98 0.0008<br />

Dir 1 2 3.95 0.1852<br />

Cult*Dir 2 68 3.44 0.0379<br />

Irrig 5 10 102.60


4054 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.6.10 Least Squares Means for Cult, Irrig, and <strong>The</strong>ir Interaction<br />

Least Squares Means<br />

Standard<br />

Effect Cult Irrig Estimate Error DF t Value Pr > |t|<br />

Cult Bridger 5.0306 0.2874 68 17.51


Example 56.7: Influence in Heterogeneous Variance Model ✦ 4055<br />

Example 56.7: Influence in Heterogeneous Variance Model<br />

In this example from Snedecor and Cochran (1976, p. 256), a one-way classification model with<br />

heterogeneous variances is fit. <strong>The</strong> data, shown in the following DATA step, represent amounts of<br />

different types of fat absorbed by batches of doughnuts during cooking, measured in grams.<br />

data absorb;<br />

input FatType Absorbed @@;<br />

datalines;<br />

1 164 1 172 1 168 1 177 1 156 1 195<br />

2 178 2 191 2 197 2 182 2 185 2 177<br />

3 175 3 193 3 178 3 171 3 163 3 176<br />

4 155 4 166 4 149 4 164 4 170 4 168<br />

;<br />

<strong>The</strong> statistical model for these data can be written as<br />

Yij D C i C ij<br />

i D 1; ; t D 4<br />

j D 1; ; r D 6<br />

ij D N.0; 2 i /<br />

where Yij is the amount of fat absorbed by the j th batch of the ith fat type, and i denotes the<br />

fat-type effects. A quick glance at the data suggests that observations 6, 9, 14, and 21 might be<br />

influential on the analysis, because these are extreme observations for the respective fat types.<br />

<strong>The</strong> following <strong>SAS</strong> statements fit this model and request influence diagnostics for the fixed effects<br />

and covariance parameters. <strong>The</strong> ODS GRAPHICS statement requests plots of the influence diagnostics<br />

in addition to the tabular output. <strong>The</strong> ESTIMATES suboption requests plots of “leave-one-out”<br />

estimates for the fixed effects and group variances.<br />

ods graphics on;<br />

proc mixed data=absorb asycov;<br />

class FatType;<br />

model Absorbed = FatType / s<br />

influence(iter=10 estimates);<br />

repeated / group=FatType;<br />

ods output Influence=inf;<br />

run;<br />

ods graphics off;<br />

<strong>The</strong> “Influence” table is output to the <strong>SAS</strong> data set inf so that parameter estimates can be printed<br />

subsequently. Results from this analysis are shown in Output 56.7.1.


4056 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.7.1 Heterogeneous Variance Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Model Information<br />

Data Set WORK.ABSORB<br />

Dependent Variable Absorbed<br />

Covariance Structure Variance Components<br />

Group Effect FatType<br />

Estimation Method REML<br />

Residual Variance Method None<br />

Fixed Effects SE Method Model-Based<br />

Degrees of Freedom Method Between-Within<br />

Covariance Parameter Estimates<br />

Cov Parm Group Estimate<br />

Residual FatType 1 178.00<br />

Residual FatType 2 60.4000<br />

Residual FatType 3 97.6000<br />

Residual FatType 4 67.6000<br />

Solution for Fixed Effects<br />

Fat Standard<br />

Effect Type Estimate Error DF t Value Pr > |t|<br />

Intercept 162.00 3.3566 20 48.26


Example 56.7: Influence in Heterogeneous Variance Model ✦ 4057<br />

Output 56.7.2 Asymptotic Variances of Group Variance Estimates<br />

Asymptotic Covariance Matrix of Estimates<br />

Row Cov Parm CovP1 CovP2 CovP3 CovP4<br />

1 Residual 12674<br />

2 Residual 145<strong>9.2</strong>6<br />

3 Residual 3810.30<br />

4 Residual 1827.90<br />

In groups where the residual variance estimate is large, the precision of the estimate is also small<br />

(Output 56.7.2).<br />

<strong>The</strong> following statements print the “leave-one-out” estimates for fixed effects and covariance parameters<br />

that were written to the inf data set with the ESTIMATES suboption (Output 56.7.3):<br />

proc print data=inf label;<br />

var parm1-parm5 covp1-covp4;<br />

run;<br />

Output 56.7.3 Leave-One-Out Estimates<br />

Residual Residual Residual Residual<br />

Fat Fat Fat Fat FatType FatType FatType FatType<br />

Obs Intercept Type 1 Type 2 Type 3 Type 4 1 2 3 4<br />

1 162.00 11.600 23.000 14.000 0 203.30 60.400 97.60 67.600<br />

2 162.00 10.000 23.000 14.000 0 222.47 60.400 97.60 67.600<br />

3 162.00 10.800 23.000 14.000 0 217.68 60.400 97.60 67.600<br />

4 162.00 9.000 23.000 14.000 0 214.99 60.400 97.60 67.600<br />

5 162.00 13.200 23.000 14.000 0 145.70 60.400 97.60 67.600<br />

6 162.00 5.400 23.000 14.000 0 63.80 60.400 97.60 67.600<br />

7 162.00 10.000 24.400 14.000 0 178.00 60.795 97.60 67.600<br />

8 162.00 10.000 21.800 14.000 0 178.00 64.691 97.60 67.600<br />

9 162.00 10.000 20.600 14.000 0 178.00 32.296 97.60 67.600<br />

10 162.00 10.000 23.600 14.000 0 178.00 72.797 97.60 67.600<br />

11 162.00 10.000 23.000 14.000 0 178.00 75.490 97.60 67.600<br />

12 162.00 10.000 24.600 14.000 0 178.00 56.285 97.60 67.600<br />

13 162.00 10.000 23.000 14.200 0 178.00 60.400 121.68 67.600<br />

14 162.00 10.000 23.000 10.600 0 178.00 60.400 35.30 67.600<br />

15 162.00 10.000 23.000 13.600 0 178.00 60.400 120.79 67.600<br />

16 162.00 10.000 23.000 15.000 0 178.00 60.400 114.50 67.600<br />

17 162.00 10.000 23.000 16.600 0 178.00 60.400 71.30 67.600<br />

18 162.00 10.000 23.000 14.000 0 178.00 60.400 121.98 67.600<br />

19 163.40 8.600 21.600 12.600 0 178.00 60.400 97.60 69.799<br />

20 161.20 10.800 23.800 14.800 0 178.00 60.400 97.60 79.698<br />

21 164.60 7.400 20.400 11.400 0 178.00 60.400 97.60 33.800<br />

22 161.60 10.400 23.400 14.400 0 178.00 60.400 97.60 83.292<br />

23 160.40 11.600 24.600 15.600 0 178.00 60.400 97.60 65.299<br />

24 160.80 11.200 24.200 15.200 0 178.00 60.400 97.60 73.677


4058 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

<strong>The</strong> graphical displays in Output 56.7.4 and Output 56.7.5 are requested by specifying the ODS<br />

GRAPHICS statement. For general information about ODS Graphics, see Chapter 21, “Statistical<br />

Graphics Using ODS.” For specific information about the graphics available in the <strong>MIXED</strong> procedure,<br />

see the section “ODS Graphics” on page 3998.<br />

Output 56.7.4 Fixed-Effects Deletion Estimates


Output 56.7.5 Covariance Parameter Deletion Estimates<br />

Example 56.7: Influence in Heterogeneous Variance Model ✦ 4059<br />

<strong>The</strong> estimate of the intercept is affected only when observations from the last group are removed.<br />

<strong>The</strong> estimate of the “FatType 1” effect reacts to removal of observations in the first and last group<br />

(Output 56.7.4).<br />

While observations can affect one or more fixed-effects solutions in this model, they can affect only<br />

one covariance parameter, the variance in their group (Output 56.7.5). Observations 6, 9, 14, and<br />

21, which are extreme in their group, reduce the group variance considerably.<br />

Diagnostics related to residuals and predicted values are printed with the following statements:<br />

proc print data=inf label;<br />

var observed predicted residual pressres<br />

student Rstudent;<br />

run;


4060 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.7.6 Residual Diagnostics<br />

Internally Externally<br />

Observed Predicted PRESS Studentized Studentized<br />

Obs Value Mean Residual Residual Residual Residual<br />

1 164 172.0 -8.000 -9.600 -0.6569 -0.6146<br />

2 172 172.0 0.000 0.000 0.0000 0.0000<br />

3 168 172.0 -4.000 -4.800 -0.3284 -0.2970<br />

4 177 172.0 5.000 6.000 0.4105 0.3736<br />

5 156 172.0 -16.000 -1<strong>9.2</strong>00 -1.3137 -1.4521<br />

6 195 172.0 23.000 27.600 1.8885 3.1544<br />

7 178 185.0 -7.000 -8.400 -0.9867 -0.9835<br />

8 191 185.0 6.000 7.200 0.8457 0.8172<br />

9 197 185.0 12.000 14.400 1.6914 2.3131<br />

10 182 185.0 -3.000 -3.600 -0.4229 -0.3852<br />

11 185 185.0 0.000 -0.000 0.0000 0.0000<br />

12 177 185.0 -8.000 -9.600 -1.1276 -1.1681<br />

13 175 176.0 -1.000 -1.200 -0.1109 -0.0993<br />

14 193 176.0 17.000 20.400 1.8850 3.1344<br />

15 178 176.0 2.000 2.400 0.2218 0.1993<br />

16 171 176.0 -5.000 -6.000 -0.5544 -0.5119<br />

17 163 176.0 -13.000 -15.600 -1.4415 -1.6865<br />

18 176 176.0 0.000 0.000 0.0000 0.0000<br />

19 155 162.0 -7.000 -8.400 -0.9326 -0.9178<br />

20 166 162.0 4.000 4.800 0.5329 0.4908<br />

21 149 162.0 -13.000 -15.600 -1.7321 -2.4495<br />

22 164 162.0 2.000 2.400 0.2665 0.2401<br />

23 170 162.0 8.000 9.600 1.0659 1.0845<br />

24 168 162.0 6.000 7.200 0.7994 0.7657<br />

Observations 6, 9, 14, and 21 have large studentized residuals (Output 56.7.6). That the externally<br />

studentized residuals are much larger than the internally studentized residuals for these observations<br />

indicates that the variance estimate in the group shrinks when the observation is removed. Also<br />

important to note is that comparisons based on raw residuals in models with heterogeneous variance<br />

can be misleading. Observation 5, for example, has a larger residual but a smaller studentized<br />

residual than observation 21. <strong>The</strong> variance for the first fat type is much larger than the variance in<br />

the fourth group. A “large” residual is more “surprising” in the groups with small variance.<br />

A measure of the overall influence on the analysis is the (restricted) likelihood distance, shown in<br />

Output 56.7.7. Observations 6, 9, 14, and 21 clearly displace the REML solution more than any<br />

other observations.


Output 56.7.7 Restricted Likelihood Distance<br />

Example 56.7: Influence in Heterogeneous Variance Model ✦ 4061<br />

<strong>The</strong> following statements list the restricted likelihood distance and various diagnostics related to the<br />

fixed-effects estimates (Output 56.7.8):<br />

proc print data=inf label;<br />

var leverage observed CookD DFFITS CovRatio RLD;<br />

run;


4062 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.7.8 Restricted Likelihood Distance and Fixed-Effects Diagnostics<br />

Restr.<br />

Observed Cook’s Likelihood<br />

Obs Leverage Value D DFFITS COVRATIO Distance<br />

1 0.167 164 0.02157 -0.27487 1.3706 0.1178<br />

2 0.167 172 0.00000 -0.00000 1.4998 0.1156<br />

3 0.167 168 0.00539 -0.13282 1.4675 0.1124<br />

4 0.167 177 0.00843 0.16706 1.4494 0.1117<br />

5 0.167 156 0.08629 -0.64938 0.9822 0.5290<br />

6 0.167 195 0.17831 1.41069 0.4301 5.8101<br />

7 0.167 178 0.04868 -0.43982 1.2078 0.1935<br />

8 0.167 191 0.03576 0.36546 1.2853 0.1451<br />

9 0.167 197 0.14305 1.03446 0.6416 2.2909<br />

10 0.167 182 0.00894 -0.17225 1.4463 0.1116<br />

11 0.167 185 0.00000 -0.00000 1.4998 0.1156<br />

12 0.167 177 0.06358 -0.52239 1.1183 0.2856<br />

13 0.167 175 0.00061 -0.04441 1.4961 0.1151<br />

14 0.167 193 0.17766 1.40175 0.4340 5.7044<br />

15 0.167 178 0.00246 0.08915 1.4851 0.1139<br />

16 0.167 171 0.01537 -0.22892 1.4078 0.1129<br />

17 0.167 163 0.10389 -0.75423 0.8766 0.8433<br />

18 0.167 176 0.00000 0.00000 1.4998 0.1156<br />

19 0.167 155 0.04349 -0.41047 1.2390 0.1710<br />

20 0.167 166 0.01420 0.21950 1.4148 0.1124<br />

21 0.167 149 0.15000 -1.09545 0.6000 2.7343<br />

22 0.167 164 0.00355 0.10736 1.4786 0.1133<br />

23 0.167 170 0.05680 0.48500 1.1592 0.2383<br />

24 0.167 168 0.03195 0.34245 1.3079 0.1353<br />

In this example, observations with large likelihood distances also have large values for Cook’s D<br />

and values of CovRatio far less than one (Output 56.7.8). <strong>The</strong> latter indicates that the fixed effects<br />

are estimated more precisely when these observations are removed from the analysis.<br />

<strong>The</strong> following statements print the values of the D statistic and the CovRatio for the covariance<br />

parameters:<br />

proc print data=inf label;<br />

var iter CookDCP CovRatioCP;<br />

run;<br />

<strong>The</strong> same conclusions as for the fixed-effects estimates hold for the covariance parameter estimates.<br />

Observations 6, 9, 14, and 21 change the estimates and their precision considerably (Output 56.7.9,<br />

Output 56.7.10). All iterative updates converged within at most four iterations.


Output 56.7.9 Covariance Parameter Diagnostics<br />

Example 56.7: Influence in Heterogeneous Variance Model ✦ 4063<br />

Cook’s D COVRATIO<br />

Obs Iterations CovParms CovParms<br />

1 3 0.05050 1.6306<br />

2 3 0.15603 1.9520<br />

3 3 0.12426 1.8692<br />

4 3 0.10796 1.8233<br />

5 4 0.08232 0.8375<br />

6 4 1.02909 0.1606<br />

7 1 0.00011 1.2662<br />

8 2 0.01262 1.4335<br />

9 3 0.54126 0.3573<br />

10 3 0.10531 1.8156<br />

11 3 0.15603 1.9520<br />

12 2 0.01160 1.0849<br />

13 3 0.15223 1.9425<br />

14 4 1.01865 0.1635<br />

15 3 0.14111 1.9141<br />

16 3 0.07494 1.7203<br />

17 3 0.18154 0.6671<br />

18 3 0.15603 1.9520<br />

19 2 0.00265 1.3326<br />

20 3 0.08008 1.7374<br />

21 1 0.62500 0.3125<br />

22 3 0.13472 1.8974<br />

23 2 0.00290 1.1663<br />

24 2 0.02020 1.4839<br />

Output 56.7.10 displays the standard panel of influence diagnostics that is obtained when influence<br />

analysis is iterative. <strong>The</strong> Cook’s D and CovRatio statistics are displayed for each deletion set for<br />

both fixed-effects and covariance parameter estimates. This provides a convenient summary of<br />

the impact on the analysis for each deletion set, since Cook’s D statistic measures impact on the<br />

estimates and the CovRatio statistic measures impact on the precision of the estimates.


4064 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.7.10 Influence Diagnostics<br />

Observations 6, 9, 14, and 21 have considerable impact on estimates and precision of fixed effects<br />

and covariance parameters. This is not necessarily the case. Observations can be influential on only<br />

some aspects of the analysis, as shown in the next example.<br />

Example 56.8: Influence Analysis for Repeated Measures Data<br />

This example revisits the repeated measures data of Pothoff and Roy (1964) that were analyzed<br />

in Example 56.2. Recall that the data consist of growth measurements at ages 8, 10, 12, and 14<br />

for 11 girls and 16 boys. <strong>The</strong> model being fit contains fixed effects for Gender and Age and their<br />

interaction.<br />

<strong>The</strong> earlier analysis of these data indicated some unusual observations in this data set. Because<br />

of the clustered data structure, it is of interest to study the influence of clusters (children) on the<br />

analysis rather than the influence of individual observations. A cluster comprises the repeated measurements<br />

for each child.


Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4065<br />

<strong>The</strong> repeated measures are first modeled with an unstructured within-child variance-covariance matrix.<br />

A residual variance is not profiled in this model. A noniterative influence analysis will update<br />

the fixed effects only. <strong>The</strong> following statements request this noniterative maximum likelihood analysis<br />

and produce Output 56.8.1:<br />

proc mixed data=pr method=ml;<br />

class person gender;<br />

model y = gender age gender*age /<br />

influence(effect=person);<br />

repeated / type=un subject=person;<br />

ods select influence;<br />

run;<br />

Output 56.8.1 Default Influence Statistics in Noniterative Analysis<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Influence Diagnostics for Levels of Person<br />

Number of<br />

Observations PRESS Cook’s<br />

Person in Level Statistic D<br />

1 4 10.1716 0.01539<br />

2 4 3.8187 0.03988<br />

3 4 10.8448 0.02891<br />

4 4 24.0339 0.04515<br />

5 4 1.6900 0.01613<br />

6 4 11.8592 0.01634<br />

7 4 1.1887 0.00521<br />

8 4 4.6717 0.02742<br />

9 4 13.4244 0.03949<br />

10 4 85.1195 0.13848<br />

11 4 67.9397 0.09728<br />

12 4 40.6467 0.04438<br />

13 4 13.0304 0.00924<br />

14 4 6.1712 0.00411<br />

15 4 24.5702 0.12727<br />

16 4 20.5266 0.01026<br />

17 4 9.9917 0.01526<br />

18 4 7.9355 0.01070<br />

19 4 15.5955 0.01982<br />

20 4 42.6845 0.01973<br />

21 4 95.3282 0.10075<br />

22 4 13.9649 0.03778<br />

23 4 4.9656 0.01245<br />

24 4 37.2494 0.15094<br />

25 4 4.3756 0.03375<br />

26 4 8.1448 0.03470<br />

27 4 20.2913 0.02523<br />

Each observation in the “Influence Diagnostics for Levels of Person” table in Output 56.8.1 represents<br />

the removal of four observations. <strong>The</strong> subjects 10, 15, and 24 have the greatest impact on the<br />

fixed effects (Cook’s D), and subject 10 and 21 have large PRESS statistics. <strong>The</strong> 21st child has a


4066 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

large PRESS statistic, and its D statistic is not that extreme. This is an indication that the model fits<br />

rather poorly for this child, whether it is part of the data or not.<br />

<strong>The</strong> previous analysis does not take into account the effect on the covariance parameters when a<br />

subject is removed from the analysis. If you also update the covariance parameters, the impact of<br />

observations on these can amplify or allay their effect on the fixed effects. To assess the overall<br />

influence of subjects on the analysis and to compute separate statistics for the fixed effects and covariance<br />

parameters, an iterative analysis is obtained by adding the INFLUENCE suboption ITER=,<br />

as follows:<br />

ods graphics on;<br />

proc mixed data=pr method=ml;<br />

class person gender;<br />

model y = gender age gender*age /<br />

influence(effect=person iter=5);<br />

repeated / type=un subject=person;<br />

run;<br />

<strong>The</strong> number of additional iterations following removal of the observations for a particular subject<br />

is limited to five. Graphical displays of influence diagnostics are requested by specifying the ODS<br />

GRAPHICS statement. For general information about ODS Graphics, see Chapter 21, “Statistical<br />

Graphics Using ODS.” For specific information about the graphics available in the <strong>MIXED</strong> procedure,<br />

see the section “ODS Graphics” on page 3998.<br />

<strong>The</strong> <strong>MIXED</strong> procedure produces a plot of the restricted likelihood distance (Output 56.8.2) and a<br />

panel of diagnostics for fixed effects and covariance parameters (Output 56.8.3).


Output 56.8.2 Restricted Likelihood Distance<br />

Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4067


4068 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.8.3 Influence Diagnostics Panel<br />

As judged by the restricted likelihood distance, subjects 20 and 24 clearly have the most influence<br />

on the overall analysis (Output 56.8.2).<br />

Output 56.8.3 displays Cook’s D and CovRatio statistics for the fixed effects and covariance parameters.<br />

Clearly, subject 20 has a dramatic effect on the estimates of variances and covariances.<br />

This subject also affects the precision of the covariance parameter estimates more than any other<br />

subject in Output 56.8.3 (CovRatio near 0).<br />

<strong>The</strong> child who exerts the greatest influence on the fixed effects is subject 24. Maybe surprisingly,<br />

this subject affects the variance-covariance matrix of the fixed effects more than subject 20 (small<br />

CovRatio in Output 56.8.3).<br />

<strong>The</strong> final model investigated for these data is a random coefficient model as in Stram and Lee (1994)<br />

with random effects for the intercept and age effect. <strong>The</strong> following statements examine the estimates<br />

for fixed effects and the entries of the unstructured 2 2 variance matrix of the random coefficients<br />

graphically:


Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4069<br />

proc mixed data=pr method=ml<br />

plots(only)=InfluenceEstPlot;<br />

class person gender;<br />

model y = gender age gender*age /<br />

influence(iter=5 effect=person est);<br />

random intercept age / type=un subject=person;<br />

run;<br />

<strong>The</strong> PLOTS(ONLY)=INFLUENCEESTPLOT option restricts the graphical output from this PROC<br />

<strong>MIXED</strong> run to only the panels of deletion estimates (Output 56.8.4 and Output 56.8.5).<br />

Output 56.8.4 Fixed-Effects Deletion Estimates<br />

In Output 56.8.4 the graphs on the left side of the panel represent the intercept and slope estimate<br />

for boys; the graphs on the right side represent the difference in intercept and slope between boys<br />

and girls. Removing any one of the first eleven children, who are girls, does not alter the intercept<br />

or slope in the group of boys. <strong>The</strong> difference in these parameters between boys and girls is altered<br />

by the removal of any child. Subject 24 changes the fixed effects considerably, subject 20 much less<br />

so.


4070 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.8.5 Covariance Parameter Deletion Estimates<br />

<strong>The</strong> covariance parameter deletion estimates in Output 56.8.5 show several important features.<br />

<strong>The</strong> panels do not contain information about subject 24. Estimation of the G matrix following<br />

removal of that child did not yield a positive definite matrix. As a consequence, covariance<br />

parameter diagnostics are not produced for this subject.<br />

Subject 20 has great impact on the four covariance parameters. Removing this child from<br />

the analysis increases the variance of the random intercept and random slope and reduces<br />

the residual variance by almost 80%. <strong>The</strong> repeated measurements of this child exhibit an<br />

up-and-down behavior.<br />

<strong>The</strong> variance of the random intercept and slope are reduced when child 15 is removed from<br />

the analysis. This child’s growth measurements oscillate about 27.0 from age 10 on.<br />

Examining observed and residual values by levels of classification variables is also a useful tool to<br />

diagnose the adequacy of the model and unusual observations. Box plots for effects in the model that<br />

consist of only classification variables can be requested with the BOXPLOT option of the PLOT=<br />

option in the PROC <strong>MIXED</strong> statement. For example, the following statements produce box plots<br />

for the SUBJECT= effects in the model:


Example 56.8: Influence Analysis for Repeated Measures Data ✦ 4071<br />

proc mixed data=pr method=ml<br />

plot=boxplot(observed marginal conditional subject);<br />

class person gender;<br />

model y = gender age gender*age;<br />

random intercept age / type=un subject=person;<br />

run;<br />

<strong>The</strong> specific boxplot options request a plot of the observed data (Output 56.8.6), the marginal residuals<br />

(Output 56.8.7), and the conditional residuals (Output 56.8.8). Box plots of the observed values<br />

show the variation within and between children clearly. <strong>The</strong> group of girls (subjects 1–11) is distinguishable<br />

from the group of boys by somewhat lesser average growth and lesser within-child<br />

variation (Output 56.8.6). After adjusting for overall (population-averaged) gender and age effects,<br />

the residual within-child variation is reduced but substantial differences in the means remain<br />

(Output 56.8.7). If child-specific inferences are desired, a model accounting for only Gender, Age,<br />

and Gender*Age effects is not adequate for these data.<br />

Output 56.8.6 Distribution of Observed Values


4072 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.8.7 Distribution of Marginal Residuals<br />

<strong>The</strong> conditional residuals incorporate the EBLUPs for each child and enable you to examine whether<br />

the subject-specific model is adequate (Output 56.8.8). By using each child “as its own control,”<br />

the residuals are now centered near zero. Subjects 20 and 24 stand out as unusual in all three sets of<br />

box plots.


Output 56.8.8 Distribution of Conditional Residuals<br />

Example 56.9: Examining Individual Test Components ✦ 4073<br />

Example 56.9: Examining Individual Test Components<br />

<strong>The</strong> LCOMPONENTS option in the MODEL statement enables you to perform single-degree-offreedom<br />

tests for individual rows of the L matrix. Such tests are useful to identify interaction<br />

patterns. In a balanced layout, Type 3 components of L associated with A*B interactions correspond<br />

to simple contrasts of cell mean differences.<br />

<strong>The</strong> first example revisits the data from the split-plot design by Stroup (1989a) that was analyzed<br />

in Example 56.1. Recall that variables A and B in the following statements represent the whole-plot<br />

and subplot factors, respectively:<br />

proc mixed data=sp;<br />

class a b block;<br />

model y = a b a*b / LComponents e3;<br />

random block a*block;<br />

run;<br />

<strong>The</strong> <strong>MIXED</strong> procedure constructs a separate L matrix for each of the three fixed-effects components.<br />

<strong>The</strong> matrices are displayed in Output 56.9.1. <strong>The</strong> tests for fixed effects are shown in


4074 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Output 56.<strong>9.2</strong>.<br />

Output 56.9.1 Coefficients of Type 3 Estimable Functions<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Type 3 Coefficients for A<br />

Effect A B Row1 Row2<br />

Intercept<br />

A 1 1<br />

A 2 1<br />

A 3 -1 -1<br />

B 1<br />

B 2<br />

A*B 1 1 0.5<br />

A*B 1 2 0.5<br />

A*B 2 1 0.5<br />

A*B 2 2 0.5<br />

A*B 3 1 -0.5 -0.5<br />

A*B 3 2 -0.5 -0.5<br />

Type 3 Coefficients for B<br />

Effect A B Row1<br />

Intercept<br />

A 1<br />

A 2<br />

A 3<br />

B 1 1<br />

B 2 -1<br />

A*B 1 1 0.3333<br />

A*B 1 2 -0.333<br />

A*B 2 1 0.3333<br />

A*B 2 2 -0.333<br />

A*B 3 1 0.3333<br />

A*B 3 2 -0.333<br />

Type 3 Coefficients for A*B<br />

Effect A B Row1 Row2<br />

Intercept<br />

A 1<br />

A 2<br />

A 3<br />

B 1<br />

B 2<br />

A*B 1 1 1<br />

A*B 1 2 -1<br />

A*B 2 1 1<br />

A*B 2 2 -1<br />

A*B 3 1 -1 -1<br />

A*B 3 2 1 1


Output 56.<strong>9.2</strong> Type 3 Tests in Split-Plot Example<br />

Example 56.9: Examining Individual Test Components ✦ 4075<br />

Type 3 Tests of Fixed Effects<br />

Num Den<br />

Effect DF DF F Value Pr > F<br />

A 2 6 4.07 0.0764<br />

B 1 9 19.39 0.0017<br />

A*B 2 9 4.02 0.0566<br />

If i: denotes a whole-plot main effect mean, :j denotes a subplot main effect mean, and ij denotes<br />

a cell mean, the five components shown in Output 56.9.3 correspond to tests of the following:<br />

H0 W 1: D 2:<br />

H0 W 2: D 3:<br />

H0 W :1 D :2<br />

H0 W 11 12 D 31 32<br />

H0 W 21 22 D 31 32<br />

Output 56.9.3 Type 3 L Components Table<br />

L Components of Type 3 Tests of Fixed Effects<br />

L Standard<br />

Effect Index Estimate Error DF t Value Pr > |t|<br />

A 1 7.1250 3.1672 6 2.25 0.0655<br />

A 2 8.3750 3.1672 6 2.64 0.0383<br />

B 1 5.5000 1.2491 9 4.40 0.0017<br />

A*B 1 7.7500 3.0596 9 2.53 0.0321<br />

A*B 2 7.2500 3.0596 9 2.37 0.0419<br />

<strong>The</strong> first three components are comparisons of marginal means. <strong>The</strong> fourth component compares<br />

the effect of factor B at the first whole-plot level against the effect of B at the third whole-plot level.<br />

Finally, the last component tests whether the factor B effect changes between the second and third<br />

whole-plot level.<br />

<strong>The</strong> Type 3 component tests can also be produced with these corresponding ESTIMATE statements:<br />

proc mixed data=sp;<br />

class a b block ;<br />

model y = a b a*b;<br />

random block a*block;<br />

estimate ’a 1’ a 1 0 -1;<br />

estimate ’a 2’ a 0 1 -1;


4076 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

estimate ’b 1’ b 1 -1;<br />

estimate ’a*b 1’ a*b 1 -1 0 0 -1 1;<br />

estimate ’a*b 2’ a*b 0 0 1 -1 -1 1;<br />

ods select Estimates;<br />

run;<br />

<strong>The</strong> results are shown in Output 56.9.4.<br />

Output 56.9.4 Results from ESTIMATE Statements<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Estimates<br />

Standard<br />

Label Estimate Error DF t Value Pr > |t|<br />

a 1 7.1250 3.1672 6 2.25 0.0655<br />

a 2 8.3750 3.1672 6 2.64 0.0383<br />

b 1 5.5000 1.2491 9 4.40 0.0017<br />

a*b 1 7.7500 3.0596 9 2.53 0.0321<br />

a*b 2 7.2500 3.0596 9 2.37 0.0419<br />

A second useful application of the LCOMPONENTS option is in polynomial models, where Type<br />

1 tests are often used to test the entry of model terms sequentially. <strong>The</strong> SOLUTION option in the<br />

MODEL statement displays the regression coefficients that correspond to a Type 3 analysis. That<br />

is, the coefficients represent the partial coefficients you would get by adding the regressor variable<br />

last in a model containing all other effects, and the tests are identical to those in the “Type 3 Tests<br />

of Fixed Effects” table.<br />

Consider the following DATA step and the fit of a third-order polynomial regression model.<br />

data polynomial;<br />

do x=1 to 20; input y@@; output; end;<br />

datalines;<br />

1.092 1.758 1.997 3.154 3.880<br />

3.810 4.921 4.573 6.029 6.032<br />

6.291 7.151 7.154 6.469 7.137<br />

6.374 5.860 4.866 4.155 2.711<br />

;<br />

proc mixed data=polynomial;<br />

model y = x x*x x*x*x / s lcomponents htype=1,3;<br />

run;<br />

<strong>The</strong> t tests displayed in the “Solution for Fixed Effects” table are Type 3 tests, sometimes referred<br />

to as partial tests. <strong>The</strong>y measure the contribution of a regressor in the presence of all other regressor<br />

variables in the model.


Output 56.9.5 Parameter Estimates in Polynomial Model<br />

Example 56.9: Examining Individual Test Components ✦ 4077<br />

<strong>The</strong> Mixed <strong>Procedure</strong><br />

Solution for Fixed Effects<br />

Standard<br />

Effect Estimate Error DF t Value Pr > |t|<br />

Intercept 0.7837 0.3545 16 2.21 0.0420<br />

x 0.3726 0.1426 16 2.61 0.0189<br />

x*x 0.04756 0.01558 16 3.05 0.0076<br />

x*x*x -0.00306 0.000489 16 -6.27 |t|<br />

x 1 0.1763 0.01259 16 14.01


4078 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

References<br />

Akritas, M. G., Arnold, S. F., and Brunner, E. (1997), “Nonparametric Hypotheses and Rank Statistics<br />

for Unbalanced Factorial Designs,” Journal of the American Statistical Association, 92: 258–<br />

265.<br />

Akaike, H. (1974), “A New Look at the Statistical Model Identification,” IEEE Transaction on<br />

Automatic Control, AC–19, 716–723.<br />

Allen, D. M. (1974), “<strong>The</strong> Relationship between Variable Selection and Data Augmentation and a<br />

Method of Prediction,” Technometrics, 16, 125–127.<br />

Bates, D. M. and Watts, D. G. (1988), Nonlinear Regression Analysis and Its Applications, New<br />

York: John Wiley & Sons.<br />

Beckman, R. J., Nachtsheim, C. J., and Cook, D. R. (1987), “Diagnostics for Mixed-Model Analysis<br />

of Variance,” Technometrics, 29, 413–426<br />

Belsley, D. A., Kuh, E., and Welsch, R. E. (1980), Regression Diagnostics; Identifying Influential<br />

Data and Sources of Collinearity, New York: John Wiley & Sons.<br />

Box, G. E. P. and Tiao, G. C. (1973), Bayesian Inference in Statistical Analysis, Wiley Classics<br />

Library Edition Published 1992, New York: John Wiley & Sons.<br />

Bozdogan, H. (1987), “Model Selection and Akaike’s Information Criterion (AIC): <strong>The</strong> General<br />

<strong>The</strong>ory and Its Analytical Extensions,” Psychometrika, 52, 345–370.<br />

Brown, H. and Prescott, R. (1999), Applied Mixed Models in Medicine, New York: John Wiley &<br />

Sons.<br />

Brownie, C., Bowman, D. T., and Burton, J. W. (1993), “Estimating Spatial Variation in Analysis<br />

of Data from Yield Trials: A Comparison of Methods,” Agronomy Journal, 85, 1244–1253.<br />

Brownie, C., and Gumpertz, M. L. (1997), “Validity of Spatial Analysis of Large Field Trials,”<br />

Journal of Agricultural, Biological, and Environmental Statistics, 2, 1–23.<br />

Brunner, E., Dette, H., Munk, A. (1997), “Box-Type Approximations in Nonparametric Factorial<br />

Designs,” Journal of the American Statistical Association, 92, 1494–1502.<br />

Brunner, E., Domhof, S., and Langer, F. (2002), Nonparametric Analysis of Longitudinal Data in<br />

Factorial Experiments, New York: John Wiley & Sons.<br />

Burdick, R. K. and Graybill, F. A. (1992), Confidence Intervals on Variance Components, New<br />

York: Marcel Dekker.<br />

Burnham, K. P. and Anderson, D. R. (1998), Model Selection and Inference: A Practical<br />

Information-<strong>The</strong>oretic Approach, New York: Springer-Verlag.<br />

Carlin, B. P. and Louis, T. A. (1996), Bayes and Empirical Bayes Methods for Data Analysis,<br />

London: Chapman and Hall.


References ✦ 4079<br />

Carroll, R. J. and Ruppert, D. (1988), Transformation and Weighting in Regression, London: Chapman<br />

and Hall.<br />

Chilès, J. P. and Delfiner, P. (1999), Geostatistics. Modeling Spatial Uncertainty, New York: John<br />

Wiley & Sons.<br />

Christensen, R., Pearson, L. M., and Johnson, W. (1992), “Case-Deletion Diagnostics for Mixed<br />

Models,” Technometrics, 34, 38–45.<br />

Cook, R. D. (1977), “Detection of Influential Observations in Linear Regression,” Technometrics,<br />

19, 15–18.<br />

Cook, R. D. (1979), “Influential Observations in Linear Regression,” Journal of the American Statistical<br />

Association, 74, 169–174.<br />

Cook, R. D. and Weisberg, S. (1982), Residuals and Influence in Regression, New York: Chapman<br />

and Hall.<br />

Cressie, N. (1993), Statistics for Spatial Data, Revised Edition, New York: John Wiley & Sons.<br />

Crowder, M. J. and Hand, D. J. (1990), Analysis of Repeated Measures, New York: Chapman and<br />

Hall.<br />

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum Likelihood from Incomplete<br />

Data via the EM Algorithm,” Journal of the Royal Statistical Society, Ser. B., 39, 1–38.<br />

Diggle, P. J. (1988), “An Approach to the Analysis of Repeated Measurements,” Biometrics, 44,<br />

959–971.<br />

Diggle, P. J., Liang, K. Y., and Zeger, S. L. (1994), Analysis of Longitudinal Data, Oxford: Clarendon<br />

Press.<br />

Dunnett, C. W. (1980), “Pairwise Multiple Comparisons in the Unequal Variance Case,” Journal of<br />

the American Statistical Association, 75, 796–800.<br />

Edwards, D. and Berry, J. J. (1987), “<strong>The</strong> Efficiency of Simulation-based Multiple Comparisons,”<br />

Biometrics, 43, 913–928.<br />

Everitt, B. S. (1995), “<strong>The</strong> Analysis of Repeated Measures: A Practical Review with Examples,”<br />

<strong>The</strong> Statistician, 44, 113–135.<br />

Fai, A. H. T. and Cornelius, P. L. (1996), “Approximate F-tests of Multiple Degree of Freedom<br />

Hypotheses in Generalized Least Squares Analyses of Unbalanced Split-plot Experiments,” Journal<br />

of Statistical Computation and Simulation, 54, 363–378.<br />

Federer, W. T. and Wolfinger, R. D. (1998), “<strong>SAS</strong> Code for Recovering Intereffect Information in<br />

Experiments with Incomplete Block and Lattice Rectangle Designs,” Agronomy Journal, 90, 545–<br />

551.<br />

Fuller, W. A. (1976), Introduction to Statistical Time Series, New York: John Wiley & Sons.


4080 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Fuller, W. A. and Battese, G. E. (1973), “Transformations for Estimation of Linear Models with<br />

Nested Error Structure,” Journal of the American Statistical Association, 68, 626–632.<br />

Galecki, A. T. (1994), “General Class of Covariance Structures for Two or More Repeated Factors<br />

in Longitudinal Data Analysis,” Communications in Statistics–<strong>The</strong>ory and Methods, 23(11), 3105–<br />

3119.<br />

Games, P. A., and Howell, J. F. (1976), “Pairwise Multiple Comparison <strong>Procedure</strong>s With Unequal<br />

n’s and/or Variances: A Monte Carlo Study,” Journal of Educational Statistics, 1, 113–125.<br />

Gelfand, A. E., Hills, S. E., Racine-Poon, A., and Smith, A. F. M. (1990), “Illustration of Bayesian<br />

Inference in Normal Data Models Using Gibbs Sampling,” Journal of the American Statistical<br />

Association, 85, 972–985.<br />

Ghosh, M. (1992), Discussion of Schervish, M., “Bayesian Analysis of Linear Models,” Bayesian<br />

Statistics 4, eds. J.M. Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford: University<br />

Press, 432–433.<br />

Giesbrecht, F. G. (1989), “A General Structure for the Class of Mixed Linear Models,” Applications<br />

of Mixed Models in Agriculture and Related Disciplines, Southern Cooperative Series Bulletin No.<br />

343, Louisiana Agricultural Experiment Station, Baton Rouge, 183–201.<br />

Giesbrecht, F. G. and Burns, J. C. (1985), “Two-Stage Analysis Based on a Mixed Model: Largesample<br />

Asymptotic <strong>The</strong>ory and Small-Sample Simulation Results,” Biometrics, 41, 477–486.<br />

Golub, G. H. and Van Loan, C. F. (1989), Matrix Computations, Second Edition, Baltimore: Johns<br />

Hopkins University Press.<br />

Goodnight, J. H. (1978), <strong>SAS</strong> Technical Report R-101, Tests of Hypotheses in Fixed-Effects Linear<br />

Models, Cary, NC: <strong>SAS</strong> Institute Inc.<br />

Goodnight, J. H. (1979), “A Tutorial on the Sweep Operator,” American Statistician, 33, 149–158.<br />

Goodnight, J. H. and Hemmerle, W. J. (1979), “A Simplified Algorithm for the W-Transformation<br />

in Variance Component Estimation,” Technometrics, 21, 265–268.<br />

Gotway, C. A. and Stroup, W. W. (1997), “A Generalized Linear Model Approach to Spatial Data<br />

and Prediction,” Journal of Agricultural, Biological, and Environmental Statistics, 2, 157–187.<br />

Greenhouse, S. W. and Geisser, S. (1959), “On Methods in the Analysis of Profile Data,” Psychometrika,<br />

32, 95–112.<br />

Gregoire, T. G., Schabenberger, O., and Barrett, J. P. (1995), “Linear Modelling of Irregularly<br />

Spaced, Unbalanced, Longitudinal Data from Permanent Plot Measurements,” Canadian Journal of<br />

Forest Research, 25, 137–156.<br />

Handcock, M. S. and Stein, M. L. (1993), “A Bayesian Analysis of Kriging,” Technometrics, 35(4),<br />

403–410<br />

Handcock, M. S. and Wallis, J. R. (1994), “An Approach to Statistical Spatial-Temporal Modeling<br />

of Meteorological Fields (with Discussion),” Journal of the American Statistical Association, 89,<br />

368–390.


References ✦ 4081<br />

Hanks, R.J., Sisson, D.V., Hurst, R.L, and Hubbard K.G. (1980), “Statistical Analysis of Results<br />

from Irrigation Experiments Using the Line-Source Sprinkler System,” Soil Science Society American<br />

Journal, 44, 886–888.<br />

Hannan, E.J. and Quinn, B.G. (1979), “<strong>The</strong> Determination of the Order of an Autoregression,”<br />

Journal of the Royal Statistical Society, Series B, 41, 190–195.<br />

Hartley, H. O. and Rao, J. N. K. (1967), “Maximum-Likelihood Estimation for the Mixed Analysis<br />

of Variance Model,” Biometrika, 54, 93–108.<br />

Harville, D. A. (1977), “Maximum Likelihood Approaches to Variance Component Estimation and<br />

to Related Problems,” Journal of the American Statistical Association, 72, 320–338.<br />

Harville, D. A. (1988), “Mixed-Model Methodology: <strong>The</strong>oretical Justifications and Future Directions,”<br />

Proceedings of the Statistical Computing Section, American Statistical Association, New<br />

Orleans, 41–49.<br />

Harville, D. A. (1990), “BLUP (Best Linear Unbiased Prediction), and Beyond,” in Advances in<br />

Statistical Methods for Genetic Improvement of Livestock, Springer-Verlag, 239–276.<br />

Harville, D. A. and Jeske, D. R. (1992), “Mean Squared Error of Estimation or Prediction under a<br />

General Linear Model,” Journal of the American Statistical Association, 87, 724–731.<br />

Hemmerle, W. J. and Hartley, H. O. (1973), “Computing Maximum Likelihood Estimates for the<br />

Mixed AOV Model Using the W-Transformation,” Technometrics, 15, 819–831.<br />

Henderson, C. R. (1984), Applications of Linear Models in Animal Breeding, University of Guelph.<br />

Henderson, C. R. (1990), “Statistical Method in Animal Improvement: Historical Overview,” in<br />

Advances in Statistical Methods for Genetic Improvement of Livestock, New York: Springer-Verlag,<br />

1–14.<br />

Hsu, J. C. (1992), “<strong>The</strong> Factor Analytic Approach to Simultaneous Inference in the General Linear<br />

Model,” Journal of Computational and Graphical Statistics, 1, 151–168.<br />

Huber, P. J. (1967), “<strong>The</strong> Behavior of Maximum Likelihood Estimates under Nonstandard Conditions,”<br />

Proc. Fifth Berkeley Symp. Math. Statist. Prob., 1, 221–233.<br />

Hurtado, G. I. H. (1993), Detection of Influential Observations in Linear Mixed Models, Ph.D.<br />

dissertation, Department of Statistics, North Carolina State University, Raleigh, NC.<br />

Hurvich, C. M. and Tsai, C.-L. (1989), “Regression and Time Series Model Selection in Small<br />

Samples,” Biometrika, 76, 297–307.<br />

Huynh, H. and Feldt, L. S. (1970), “Conditions Under Which Mean Square Ratios in Repeated Measurements<br />

Designs Have Exact F-Distributions,” Journal of the American Statistical Association,<br />

65, 1582–1589.<br />

Jennrich, R. I. and Schluchter, M. D. (1986), “Unbalanced Repeated-Measures Models with Structured<br />

Covariance Matrices,” Biometrics, 42, 805–820.


4082 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Johnson, D. E., Chaudhuri, U. N., and Kanemasu, E. T. (1983), “Statistical Analysis of Line-Source<br />

Sprinkler Irrigation Experiments and Other Nonrandomized Experiments Using Multivariate Methods,”<br />

Soil Science Society American Journal, 47, 309–312.<br />

Jones, R. H. and Boadi-Boateng, F. (1991), “Unequally Spaced Longitudinal Data with AR(1) Serial<br />

Correlation,” Biometrics, 47, 161–175.<br />

Kackar, R. N. and Harville, D. A. (1984), “Approximations for Standard Errors of Estimators of<br />

Fixed and Random Effects in Mixed Linear Models,” Journal of the American Statistical Association,<br />

79, 853–862.<br />

Kass, R. E. and Steffey, D. (1989), “Approximate Bayesian Inference in Conditionally Independent<br />

Hierarchical Models (Parametric Empirical Bayes Models),” Journal of the American Statistical<br />

Association, 84, 717–726.<br />

Kenward, M. G. (1987), “A Method for Comparing Profiles of Repeated Measurements,” Applied<br />

Statistics, 36, 296–308.<br />

Kenward, M. G. and Roger, J. H. (1997), “Small Sample Inference for Fixed Effects from Restricted<br />

Maximum Likelihood,” Biometrics, 53, 983–997.<br />

Keselman, H. J., Algina, J., Kowalchuk, R. K., and Wolfinger, R. D. (1998), “A Comparison of Two<br />

Approaches for Selecting Covariance Structures in the Analysis of Repeated Measures,” Communications<br />

in Statistics–Computation and Simulation, 27(3), 591–604.<br />

Keselman, H. J., Algina, J., Kowalchuk, R. K., and Wolfinger, R. D. (1999). “A Comparison of<br />

Recent Approaches to the Analysis of Repeated Measurements,” British Journal of Mathematical<br />

and Statistical Psychology, 52, 63–78.<br />

Kramer, C. Y. (1956), “Extension of Multiple Range Tests to Group Means with Unequal Numbers<br />

of Replications,” Biometrics, 12, 309–310.<br />

Laird, N. M. and Ware, J. H. (1982), “Random-Effects Models for Longitudinal Data,” Biometrics,<br />

38, 963–974.<br />

Laird, N. M., Lange, N., and Stram, D. (1987), “Maximum Likelihood Computations with Repeated<br />

Measures: Application of the EM Algorithm,” Journal of the American Statistical Association, 82,<br />

97–105.<br />

LaMotte, L. R. (1973), “Quadratic Estimation of Variance Components,” Biometrics, 29, 311–330.<br />

Liang, K.Y. and Zeger, S.L. (1986), “Longitudinal Data Analysis Using Generalized Linear Models,”<br />

Biometrika, 73, 13–22.<br />

Lindsey, J. K. (1993), Models for Repeated Measurements, Oxford: Clarendon Press.<br />

Lindstrom, M. J. and Bates, D. M. (1988), “Newton-Raphson and EM Algorithms for Linear Mixed-<br />

Effects Models for Repeated-Measures Data,” Journal of the American Statistical Association, 83,<br />

1014–1022.


References ✦ 4083<br />

Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., and Schabenberger, O. (2006), <strong>SAS</strong><br />

for Mixed Models, Second Edition, Cary, NC: <strong>SAS</strong> Institute Inc.<br />

Little, R. J. A. (1995), “Modeling the Drop-Out Mechanism in Repeated-Measures Studies,” Journal<br />

of the American Statistical Association, 90, 1112–1121.<br />

Louis, T. A. (1988), “General Methods for Analyzing Repeated Measures,” Statistics in Medicine,<br />

7, 29–45.<br />

Macchiavelli, R. E. and Arnold, S. F. (1994), “Variable Order Ante-dependence Models,” Communications<br />

in Statistics–<strong>The</strong>ory and Methods, 23(9), 2683–2699.<br />

Marx, D. and Thompson, K. (1987), “Practical Aspects of Agricultural Kriging,” Bulletin 903,<br />

Arkansas Agricultural Experiment Station, Fayetteville.<br />

Matérn, B. (1986), Spatial Variation, Second Edition, Lecture Notes in Statistics, New York:<br />

Springer-Verlag.<br />

McKeon, J. J. (1974), “F Approximations to the Distribution of Hotelling’s T 2 0<br />

381–383.<br />

,” Biometrika, 61,<br />

McLean, R. A. and Sanders, W. L. (1988), “Approximating Degrees of Freedom for Standard Errors<br />

in Mixed Linear Models,” Proceedings of the Statistical Computing Section, American Statistical<br />

Association, New Orleans, 50–59.<br />

McLean, R. A., Sanders, W. L., and Stroup, W. W. (1991), “A Unified Approach to Mixed Linear<br />

Models,” <strong>The</strong> American Statistician, 45, 54–64.<br />

Milliken, G. A. and Johnson, D. E. (1992), Analysis of Messy Data, Volume 1: Designed Experiments,<br />

New York: Chapman and Hall.<br />

Murray, D. M. (1998), Design and Analysis of Group-Randomized Trials, New York: Oxford University<br />

Press.<br />

Myers, R. H. (1990), Classical and Modern Regression with Applications, Second Edition, Belmont,<br />

CA: PWS-Kent.<br />

Obenchain, R. L. (1990), STABLSIM.EXE, Version 9010, Eli Lilly and Company, Indianapolis,<br />

Indiana, unpublished C code.<br />

Patel, H. I. (1991), “Analysis of Incomplete Data from a Clinical Trial with Repeated Measurements,”<br />

Biometrika, 78, 609–619.<br />

Patterson, H. D. and Thompson, R. (1971), “Recovery of Inter-block Information When Block Sizes<br />

Are Unequal,” Biometrika, 58, 545–554.<br />

Pillai, K. C. and Samson, P. (1959), “On Hotelling’s Generalization of T 2 ,” Biometrika, 46, 160–<br />

168.<br />

Pothoff, R. F. and Roy, S. N. (1964), “A Generalized Multivariate Analysis of Variance Model<br />

Useful Especially for Growth Curve Problems,” Biometrika, 51, 313–326.


4084 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

Prasad, N. G. N. and Rao, J. N. K. (1990), “<strong>The</strong> Estimation of Mean Squared Error of Small-Area<br />

Estimators,” Journal of the American Statistical Association, 85, 163–171.<br />

Pringle, R. M. and Rayner, A. A. (1971), Generalized Inverse Matrices with Applications to Statistics,<br />

New York: Hafner Publishing Co.<br />

Rao, C. R. (1972), “Estimation of Variance and Covariance Components in Linear Models,” Journal<br />

of the American Statistical Association, 67, 112–115.<br />

Ripley, B. D. (1987), Stochastic Simulation, New York: John Wiley & Sons.<br />

Robinson, G. K. (1991), “That BLUP Is a Good Thing: <strong>The</strong> Estimation of Random Effects,” Statistical<br />

Science, 6, 15–51.<br />

Rubin, D. B. (1976), “Inference and Missing Data,” Biometrika, 63, 581–592.<br />

Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989), “Design and Analysis of Computer<br />

Experiments,” Statistical Science 4, 409–435.<br />

Schabenberger, O. and Gotway, C. A. (2005), Statistical Methods for Spatial Data Analysis, Boca<br />

Raton, FL: CRC Press.<br />

Schluchter, M. D. and Elashoff, J. D. (1990), “Small-Sample Adjustments to Tests with Unbalanced<br />

Repeated Measures Assuming Several Covariance Structures,” Journal of Statistical Computation<br />

and Simulation, 37, 69–87.<br />

Schwarz, G. (1978), “Estimating the Dimension of a Model,” Annals of Statistics, 6, 461–464.<br />

Schervish, M. J. (1992), “Bayesian Analysis of Linear Models,” Bayesian Statistics 4, eds. J.M.<br />

Bernardo, J.O. Berger, A.P. Dawid, and A.F.M. Smith, Oxford: University Press, 419–434 (with<br />

discussion).<br />

Searle, S. R. (1971), Linear Models, New York: John Wiley & Sons.<br />

Searle, S. R. (1982), Matrix Algebra Useful for Statisticians, New York: John Wiley & Sons.<br />

Searle, S. R. (1988), “Mixed Models and Unbalanced Data: Wherefrom, Whereat, and Whereto?”<br />

Communications in Statistics–<strong>The</strong>ory and Methods, 17(4), 935–968.<br />

Searle, S. R., Casella, G., and McCulloch, C. E. (1992), Variance Components, New York: John<br />

Wiley & Sons.<br />

Self, S. G. and Liang, K. Y. (1987), “Asymptotic Properties of Maximum Likelihood Estimators<br />

and Likelihood Ratio Tests under Nonstandard Conditions,” Journal of the American Statistical<br />

Association, 82, 605–610.<br />

Serfling, R. J. (1980), Approximation <strong>The</strong>orems of Mathematical Statistics, New York: John Wiley<br />

& Sons.<br />

Singer, J. D. (1998), “Using <strong>SAS</strong> PROC <strong>MIXED</strong> to Fit Multilevel Models, Hierarchical Models,<br />

and Individual Growth Models,” Journal of Educational and Behavioral Statistics, 23(4), 323–355.


References ✦ 4085<br />

Smith, A. F. M. and Gelfand, A. E. (1992), “Bayesian Statistics without Tears: A Sampling-<br />

Resampling Perspective,” American Statistician, 46, 84–88.<br />

Snedecor, G. W. and Cochran, W. G. (1976), Statistical Methods, Sixth Edition, Ames: Iowa State<br />

University Press.<br />

Snedecor, G. W. and Cochran, W. G. (1980), Statistical Methods, Ames: Iowa State University<br />

Press.<br />

Steel, R. G. D., Torrie, J. H., and Dickey D. (1997), Principles and <strong>Procedure</strong>s of Statistics: A<br />

Biometrical Approach, Third Edition, New York: McGraw-Hill, Inc.<br />

Stram, D. O. and Lee, J. W. (1994), “Variance Components Testing in the Longitudinal Mixed<br />

Effects Model,” Biometrics, 50, 1171–1177.<br />

Stroup, W. W. (1989a), “Predictable Functions and Prediction Space in the Mixed Model <strong>Procedure</strong>,”<br />

in Applications of Mixed Models in Agriculture and Related Disciplines, Southern Cooperative<br />

Series Bulletin No. 343, Louisiana Agricultural Experiment Station, Baton Rouge, 39–48.<br />

Stroup, W. W. (1989b), “Use of Mixed Model <strong>Procedure</strong> to Analyze Spatially Correlated Data: An<br />

Example Applied to a Line-Source Sprinkler Irrigation Experiment,” Applications of Mixed Models<br />

in Agriculture and Related Disciplines, Southern Cooperative Series Bulletin No. 343, Louisiana<br />

Agricultural Experiment Station, Baton Rouge, 104–122.<br />

Stroup, W. W., Baenziger, P. S., and Mulitze, D. K. (1994), “Removing Spatial Variation from<br />

Wheat Yield Trials: A Comparison of Methods,” Crop Science, 86, 62–66.<br />

Sullivan, L. M., Dukes, K. A., and Losina, E. (1999), “An Introduction to Hierarchical Linear<br />

Modelling,” Statistics in Medicine, 18, 855–888.<br />

Swallow, W. H. and Monahan, J. F. (1984), “Monte Carlo Comparison of ANOVA, MIVQUE,<br />

REML, and ML Estimators of Variance Components,” Technometrics, 28, 47–57.<br />

Tamhane, A. C. (1979), “A Comparison of <strong>Procedure</strong>s for Multiple Comparisons of Means With<br />

Unequal Variances,” Journal of the American Statistical Association, 74, 471–480.<br />

Tierney, L. (1994), “Markov Chains for Exploring Posterior Distributions” (with discussion), Annals<br />

of Statistics, 22, 1701–1762.<br />

Verbeke, G. and Molenberghs, G., eds. (1997), Linear Mixed Models in Practice: A <strong>SAS</strong>-Oriented<br />

Approach, New York: Springer.<br />

Verbeke, G. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data, New York:<br />

Springer.<br />

Westfall, P. J. and Young, S. S. (1993), Resampling-based Multiple Testing, New York: John Wiley<br />

& Sons.<br />

Westfall, P. H., Tobias, R. D., Rom, D., Wolfinger, R. D., and Hochberg, Y. (1999), Multiple Comparisons<br />

and Multiple Tests Using the <strong>SAS</strong> System, Cary, NC: <strong>SAS</strong> Institute Inc.


4086 ✦ Chapter 56: <strong>The</strong> <strong>MIXED</strong> <strong>Procedure</strong><br />

White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test<br />

for Heteroskedasticity,” Econometrica, 48, 817–838.<br />

Whittle, P. (1954), “On Stationary Processes in the Plane,” Biometrika, 41, 434–449.<br />

Winer, B. J. (1971), Statistical Principles in Experimental Design, Second Edition, New York:<br />

McGraw-Hill, Inc.<br />

Wolfinger, R. D. (1993), “Covariance Structure Selection in General Mixed Models,” Communications<br />

in Statistics, Simulation and Computation, 22(4), 1079–1106.<br />

Wolfinger, R. D. (1996), “Heterogeneous Variance-Covariance Structures for Repeated Measures,”<br />

Journal of Agricultural, Biological, and Environmental Statistics, 1, 205-230.<br />

Wolfinger, R. D. (1997), “An Example of Using Mixed Models and PROC <strong>MIXED</strong> for Longitudinal<br />

Data,” Journal of Biopharmaceutical Statistics, 7(4), 481–500.<br />

Wolfinger, R. D. and Chang, M. (1995), “Comparing the <strong>SAS</strong> GLM and <strong>MIXED</strong> <strong>Procedure</strong>s for<br />

Repeated Measures,” Proceedings of the Twentieth Annual <strong>SAS</strong> Users Group Conference.<br />

Wolfinger, R. D., Tobias, R. D., and Sall, J. (1991), “Mixed Models: A Future Direction,” Proceedings<br />

of the Sixteenth Annual <strong>SAS</strong> Users Group Conference, 1380–1388.<br />

Wolfinger, R. D., Tobias, R. D., and Sall, J. (1994), “Computing Gaussian Likelihoods and <strong>The</strong>ir<br />

Derivatives for General Linear Mixed Models,” SIAM Journal on Scientific Computing, 15(6),<br />

1294–1310.<br />

Wright, P. S. (1994), “Adjusted F Tests for Repeated Measures with the <strong>MIXED</strong> <strong>Procedure</strong>,” 328<br />

SMC-Statistics Department, University of Tennessee.<br />

Zimmerman, D. L. and Harville, D. A. (1991), “A Random Field Approach to the Analysis of<br />

Field-Plot Experiments and Other Spatial Experiments,” Biometrics, 47, 223–239.


Subject Index<br />

2D geometric anisotropic structure<br />

<strong>MIXED</strong> procedure, 3953<br />

Akaike’s information criterion<br />

example (<strong>MIXED</strong>), 4011, 4025, 4054<br />

<strong>MIXED</strong> procedure, 3901, 3970, 3991<br />

Akaike’s information criterion (finite sample<br />

corrected version)<br />

<strong>MIXED</strong> procedure, 3901, 3991<br />

alpha level<br />

<strong>MIXED</strong> procedure, 3899, 3915, 3919, 3923,<br />

3944<br />

anisotropic power covariance structure<br />

<strong>MIXED</strong> procedure, 3954<br />

anisotropic spatial power structure<br />

<strong>MIXED</strong> procedure, 3954<br />

ANTE(1) structure<br />

<strong>MIXED</strong> procedure, 3953<br />

ante-dependence structure<br />

<strong>MIXED</strong> procedure, 3953<br />

AR(1) structure<br />

<strong>MIXED</strong> procedure, 3953<br />

asymptotic covariance<br />

<strong>MIXED</strong> procedure, 3899<br />

at sign (@) operator<br />

<strong>MIXED</strong> procedure, 3977, 4049<br />

autoregressive moving-average structure<br />

<strong>MIXED</strong> procedure, 3953<br />

autoregressive structure<br />

example (<strong>MIXED</strong>), 4019<br />

<strong>MIXED</strong> procedure, 3953<br />

banded Toeplitz structure<br />

<strong>MIXED</strong> procedure, 3953<br />

bar (|) operator<br />

<strong>MIXED</strong> procedure, 3976, 3977, 4049<br />

Bayesian analysis<br />

<strong>MIXED</strong> procedure, 3939<br />

BLUE<br />

<strong>MIXED</strong> procedure, 3971<br />

BLUP<br />

<strong>MIXED</strong> procedure, 3971<br />

Bonferroni adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

boundary constraints<br />

<strong>MIXED</strong> procedure, 3938, 3939, 4005<br />

CALIS procedure<br />

compared to <strong>MIXED</strong> procedure, 3889<br />

chi-square test<br />

<strong>MIXED</strong> procedure, 3913, 3923<br />

class level<br />

<strong>MIXED</strong> procedure, 3903, 3989<br />

classification variables<br />

<strong>MIXED</strong> procedure, 3910<br />

compound symmetry structure<br />

example (<strong>MIXED</strong>), 3964, 4020, 4025<br />

<strong>MIXED</strong> procedure, 3953<br />

computational details<br />

<strong>MIXED</strong> procedure, 4004<br />

computational problems<br />

convergence (<strong>MIXED</strong>), 4005<br />

conditional residuals<br />

<strong>MIXED</strong> procedure, 3981<br />

confidence limits<br />

<strong>MIXED</strong> procedure, 3900<br />

constraints<br />

boundary (<strong>MIXED</strong>), 3938, 3939<br />

containment method<br />

<strong>MIXED</strong> procedure, 3924, 3925<br />

continuous-by-class effects<br />

<strong>MIXED</strong> procedure, 3978<br />

continuous-nesting-class effects<br />

<strong>MIXED</strong> procedure, 3978<br />

contrasts<br />

<strong>MIXED</strong> procedure, 3911, 3914<br />

convergence criterion<br />

<strong>MIXED</strong> procedure, 3899, 3900, 3990, 4006<br />

convergence problems<br />

<strong>MIXED</strong> procedure, 4005<br />

convergence status<br />

<strong>MIXED</strong> procedure, 3990<br />

Cook’s D<br />

<strong>MIXED</strong> procedure, 3985<br />

Cook’s D for covariance parameters<br />

<strong>MIXED</strong> procedure, 3985<br />

correlation<br />

estimates (<strong>MIXED</strong>), 3945, 3947, 3952, 4021<br />

covariance<br />

parameter estimates (<strong>MIXED</strong>), 3900, 3901<br />

parameter estimates, ratio (<strong>MIXED</strong>), 3909<br />

parameters (<strong>MIXED</strong>), 3886<br />

covariance parameter estimates<br />

<strong>MIXED</strong> procedure, 3991<br />

covariance structure<br />

anisotropic power (<strong>MIXED</strong>), 3960<br />

ante-dependence (<strong>MIXED</strong>), 3956


autoregressive (<strong>MIXED</strong>), 3957<br />

autoregressive moving-average (<strong>MIXED</strong>),<br />

3957<br />

banded (<strong>MIXED</strong>), 3960<br />

compound symmetry (<strong>MIXED</strong>), 3957<br />

equi-correlation (<strong>MIXED</strong>), 3957<br />

examples (<strong>MIXED</strong>), 3954<br />

exponential anisotropic (<strong>MIXED</strong>), 3958<br />

factor-analytic (<strong>MIXED</strong>), 3957<br />

general linear (<strong>MIXED</strong>), 3958<br />

heterogeneous autoregressive (<strong>MIXED</strong>),<br />

3957<br />

heterogeneous compound symmetry<br />

(<strong>MIXED</strong>), 3957<br />

heterogeneous Toeplitz (<strong>MIXED</strong>), 3960<br />

Huynh-Feldt (<strong>MIXED</strong>), 3958<br />

Kronecker (<strong>MIXED</strong>), 3961<br />

Matérn (<strong>MIXED</strong>), 3959<br />

<strong>MIXED</strong> procedure, 3888, 3953<br />

power (<strong>MIXED</strong>), 3960<br />

simple (<strong>MIXED</strong>), 3958<br />

spatial geometric anisotropic (<strong>MIXED</strong>),<br />

3959<br />

Toeplitz (<strong>MIXED</strong>), 3960<br />

unstructured (<strong>MIXED</strong>), 3960<br />

unstructured, correlation (<strong>MIXED</strong>), 3960<br />

variance components (<strong>MIXED</strong>), 3961<br />

covariance structures<br />

examples (<strong>MIXED</strong>), 4013<br />

covariates<br />

<strong>MIXED</strong> procedure, 3976<br />

CovRatio<br />

<strong>MIXED</strong> procedure, 3986<br />

CovRatio for covariance parameters<br />

<strong>MIXED</strong> procedure, 3986<br />

CovTrace<br />

<strong>MIXED</strong> procedure, 3986<br />

CovTrace for covariance parameters<br />

<strong>MIXED</strong> procedure, 3986<br />

crossed effects<br />

<strong>MIXED</strong> procedure, 3977<br />

default output<br />

<strong>MIXED</strong> procedure, 3989<br />

degrees of freedom<br />

between-within method (<strong>MIXED</strong>), 3901,<br />

3925<br />

containment method (<strong>MIXED</strong>), 3924, 3925<br />

Kenward-Roger method (GLIMMIX), 3927<br />

method (<strong>MIXED</strong>), 3924<br />

<strong>MIXED</strong> procedure, 3913, 3915, 3920, 3924<br />

residual method (<strong>MIXED</strong>), 3925<br />

Satterthwaite method (<strong>MIXED</strong>), 3925<br />

DFFITS<br />

<strong>MIXED</strong> procedure, 3985<br />

dimension information<br />

<strong>MIXED</strong> procedure, 3989<br />

dimensions<br />

<strong>MIXED</strong> procedure, 3903<br />

direct product structure<br />

<strong>MIXED</strong> procedure, 3953<br />

Dunnett’s adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

EBLUP<br />

<strong>MIXED</strong> procedure, 3934<br />

effect<br />

name length (<strong>MIXED</strong>), 3903<br />

empirical best linear unbiased prediction<br />

<strong>MIXED</strong> procedure, 3934<br />

empirical estimator<br />

<strong>MIXED</strong> procedure, 3901<br />

estimability<br />

<strong>MIXED</strong> procedure, 3911<br />

estimable functions<br />

<strong>MIXED</strong> procedure, 3933<br />

estimation<br />

mixed model (<strong>MIXED</strong>), 3968<br />

estimation methods<br />

<strong>MIXED</strong> procedure, 3902<br />

examples, <strong>MIXED</strong><br />

ASYCOV matrix, 4017<br />

asymptotic covariance of covariance<br />

parameters, 4017<br />

autoregressive structure, R-side, 4019<br />

box plots, 4070<br />

box plots, paneling, 3908<br />

broad inference space, 3912, 3914<br />

compound symmetry, G-side setup, 3965,<br />

4023<br />

compound symmetry, R-side setup, 3965,<br />

4020<br />

constrained anisotropic model, 3958<br />

covariates in LS-mean construction, 3919<br />

COVTEST option, 4014, 4026<br />

deletion estimates, 4055<br />

doubly repeated measure, 3961<br />

estimate, with subject, 3915<br />

fat absorption data, 4055<br />

fixed-effect solutions, 4041<br />

full-rank parameterization, 4018<br />

GDATA= option in RANDOM statement,<br />

4034<br />

geometrically anisotropic model, 3959<br />

getting started, 3890<br />

GLM procedure, split-plot design, 4011<br />

graphics, box plots, 4070<br />

graphics, influence diagnostics, 4055, 4066


graphics, residual panel, 3998<br />

graphics, studentized residual panel, 3998<br />

GROUP= effect in RANDOM statement,<br />

4023<br />

height data, 3890<br />

holding covariance parameters fixed, 3938,<br />

3958, 3959<br />

IML procedure, reading ASYCOV, 4017<br />

inference space, broad, 3912, 3914<br />

inference space, intermediate, 3914<br />

inference space, narrow, 3912, 3914<br />

inference spaces, 4012<br />

influence analysis, iterative, 4055, 4066<br />

influence analysis, non-iterative, 4065<br />

influence analysis, set deletion, 4065, 4066<br />

influence analysis, tuples, 3931<br />

intermediate inference space, 3914<br />

known covariance parameters, 3937<br />

known G and R matrix, 4034<br />

Kronecker covariance structure, 3961<br />

L-components, 3933, 4073, 4076<br />

least squares mean, slice, 3922<br />

least squares means, AT option, 3919<br />

least squares means, covariate, 3919<br />

least squares means, differences against<br />

control, 3920<br />

line-source sprinkler data, 4049<br />

local power-of-mean model, 3950<br />

maximum likelihood estimation, 4014<br />

mixed model equations, 4026, 4034<br />

mixed model equations, solution, 4026,<br />

4034<br />

multiple plot requests, 3909<br />

multiple traits data, 4033<br />

multivariate analysis, 3961<br />

narrow inference space, 3912, 3914<br />

nested error structure, 4045<br />

nested random effects, 3893<br />

NOITER option, 3937, 4034<br />

oven data (Hemmerle and Hartley, 1973),<br />

4026<br />

parameter grid search, 4026<br />

pharmaceutical stability data, 4041<br />

polynomial model, 4076<br />

POM data set, 3950<br />

POM fitting, iterated, 3951<br />

Pothoff and Roy growth measurements,<br />

4013, 4065<br />

random coefficient model, 4019, 4041, 4069<br />

random-effect solutions, 4041<br />

residual panel, 3998<br />

row-wise multiplicity adjustment, 3918<br />

Satterthwaite method, 3918<br />

set deletion, 4066<br />

SGRENDER procedure, 4032<br />

slice F test, 3922<br />

spatial power structure, 4054<br />

specifying lower bounds, 3938<br />

specifying values for degrees of freedom,<br />

3924<br />

split-plot design, 3966, 4009<br />

split-plot design, data, 4008, 4073<br />

split-plot design, equivalent model, 4013<br />

starting values, 4026<br />

studentized maximum modulus, 3918<br />

studentized residual panel, 3998<br />

subject and no-subject formulation, 3965<br />

subject contrasts, 3915<br />

subject v. no-subject formulation, 4013<br />

subject-specific R matrices, 3951<br />

subject-specific V matrices, 3947<br />

Toeplitxz structure, 4049<br />

tuples, influence analysis, 3931<br />

two-way analysis of variance, 3890<br />

unstructured covariance, G-side, 3947<br />

unstructured covariance, R-side, 4014, 4065<br />

varying covariance parameters, 4023<br />

exponential covariance structure<br />

<strong>MIXED</strong> procedure, 3954<br />

external studentization<br />

<strong>MIXED</strong> procedure, 3981<br />

factor analytic structures<br />

<strong>MIXED</strong> procedure, 3953<br />

Fisher information matrix<br />

example (<strong>MIXED</strong>), 4026<br />

<strong>MIXED</strong> procedure, 3991<br />

Fisher’s scoring method<br />

<strong>MIXED</strong> procedure, 3899, 3909, 4006<br />

fixed effects<br />

<strong>MIXED</strong> procedure, 3888<br />

fixed-effects parameters<br />

<strong>MIXED</strong> procedure, 3886, 3963<br />

G matrix<br />

<strong>MIXED</strong> procedure, 3888, 3943, 3944, 3963,<br />

3964, 4046<br />

gaussian covariance structure<br />

<strong>MIXED</strong> procedure, 3954<br />

general linear covariance structure<br />

<strong>MIXED</strong> procedure, 3953<br />

generalized inverse, 3971<br />

<strong>MIXED</strong> procedure, 3913<br />

GLM procedure<br />

compared to other procedures, 3889<br />

gradient<br />

<strong>MIXED</strong> procedure, 3900, 3990<br />

grid search


example (<strong>MIXED</strong>), 4026<br />

growth curve analysis<br />

example (<strong>MIXED</strong>), 3964<br />

Hannan-Quinn information criterion<br />

<strong>MIXED</strong> procedure, 3901<br />

Hessian matrix<br />

<strong>MIXED</strong> procedure, 3899, 3900, 3909, 3938,<br />

3990, 3991, 4005, 4006, 4017, 4026<br />

heterogeneity<br />

example (<strong>MIXED</strong>), 4023<br />

<strong>MIXED</strong> procedure, 3945, 3949<br />

heterogeneous<br />

AR(1) structure (<strong>MIXED</strong>), 3953<br />

compound-symmetry structure (<strong>MIXED</strong>),<br />

3953<br />

covariance structures (<strong>MIXED</strong>), 3962<br />

Toeplitz structure (<strong>MIXED</strong>), 3953<br />

hierarchical model<br />

example (<strong>MIXED</strong>), 4041<br />

Hotelling-Lawley-McKeon statistic<br />

<strong>MIXED</strong> procedure, 3949<br />

Hotelling-Lawley-Pillai-Samson statistic<br />

<strong>MIXED</strong> procedure, 3949<br />

Hsu’s adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

Huynh-Feldt<br />

structure (<strong>MIXED</strong>), 3953<br />

hypothesis tests<br />

mixed model (<strong>MIXED</strong>), 3972, 3992<br />

inference<br />

mixed model (<strong>MIXED</strong>), 3972<br />

space, mixed model (<strong>MIXED</strong>), 3911, 3912,<br />

3914, 4012<br />

infinite likelihood<br />

<strong>MIXED</strong> procedure, 3948, 4005, 4006<br />

influence diagnostics<br />

<strong>MIXED</strong> procedure, 3982<br />

influence diagnostics, details<br />

<strong>MIXED</strong> procedure, 3980<br />

influence plots<br />

<strong>MIXED</strong> procedure, 4000<br />

information criteria<br />

<strong>MIXED</strong> procedure, 3901<br />

initial values<br />

<strong>MIXED</strong> procedure, 3937<br />

interaction effects<br />

<strong>MIXED</strong> procedure, 3977<br />

intercept<br />

<strong>MIXED</strong> procedure, 3976<br />

internal studentization<br />

<strong>MIXED</strong> procedure, 3981<br />

intraclass correlation coefficient<br />

<strong>MIXED</strong> procedure, 4021<br />

iteration history<br />

<strong>MIXED</strong> procedure, 3990<br />

iterations<br />

history (<strong>MIXED</strong>), 3990<br />

Kenward-Roger method<br />

<strong>MIXED</strong> procedure, 3927<br />

Kronecker product structure<br />

<strong>MIXED</strong> procedure, 3953<br />

L matrices<br />

mixed model (<strong>MIXED</strong>), 3911, 3916, 3972<br />

<strong>MIXED</strong> procedure, 3911, 3916, 3972<br />

LATTICE procedure<br />

compared to <strong>MIXED</strong> procedure, 3889<br />

least squares means<br />

Bonferroni adjustment (<strong>MIXED</strong>), 3918<br />

BYLEVEL processing (<strong>MIXED</strong>), 3920<br />

comparison types (<strong>MIXED</strong>), 3920<br />

covariate values (<strong>MIXED</strong>), 3919<br />

Dunnett’s adjustment (<strong>MIXED</strong>), 3918<br />

examples (<strong>MIXED</strong>), 4026, 4050<br />

Hsu’s adjustment (<strong>MIXED</strong>), 3918<br />

mixed model (<strong>MIXED</strong>), 3916<br />

multiple comparison adjustment (<strong>MIXED</strong>),<br />

3917, 3918<br />

nonstandard weights (<strong>MIXED</strong>), 3921<br />

observed margins (<strong>MIXED</strong>), 3921<br />

Sidak’s adjustment (<strong>MIXED</strong>), 3918<br />

simple effects (<strong>MIXED</strong>), 3922<br />

simulation-based adjustment (<strong>MIXED</strong>),<br />

3918<br />

Tukey’s adjustment (<strong>MIXED</strong>), 3918<br />

leverage<br />

<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />

likelihood distance<br />

<strong>MIXED</strong> procedure, 3987<br />

likelihood ratio test, 4011<br />

example (<strong>MIXED</strong>), 4025<br />

mixed model (<strong>MIXED</strong>), 3972, 3973<br />

<strong>MIXED</strong> procedure, 3992<br />

linear covariance structure<br />

<strong>MIXED</strong> procedure, 3953<br />

log-linear variance model<br />

<strong>MIXED</strong> procedure, 3950<br />

main effects<br />

<strong>MIXED</strong> procedure, 3976<br />

marginal residuals<br />

<strong>MIXED</strong> procedure, 3981<br />

Matérn covariance structure<br />

<strong>MIXED</strong> procedure, 3953<br />

matrix<br />

notation, theory (<strong>MIXED</strong>), 3962


maximum likelihood estimation<br />

mixed model (<strong>MIXED</strong>), 3969<br />

MDFFITS<br />

<strong>MIXED</strong> procedure, 3985<br />

MDFFITS for covariance parameters<br />

<strong>MIXED</strong> procedure, 3986<br />

memory requirements<br />

<strong>MIXED</strong> procedure, 4007<br />

missing level combinations<br />

<strong>MIXED</strong> procedure, 3980<br />

mixed model (<strong>MIXED</strong>), see also <strong>MIXED</strong><br />

procedure<br />

estimation, 3968<br />

formulation, 3963<br />

hypothesis tests, 3972, 3992<br />

inference, 3972<br />

inference space, 3911, 3912, 3914, 4012<br />

least squares means, 3916<br />

likelihood ratio test, 3972, 3973<br />

linear model, 3886<br />

maximum likelihood estimation, 3969<br />

notation, 3888<br />

objective function, 3990<br />

parameterization, 3975<br />

predicted values, 3916<br />

restricted maximum likelihood, 4010<br />

theory, 3962<br />

Wald test, 3972, 4017<br />

mixed model equations<br />

example (<strong>MIXED</strong>), 4026<br />

<strong>MIXED</strong> procedure, 3903, 3970<br />

<strong>MIXED</strong> <strong>Procedure</strong><br />

leverage, 3984<br />

PRESS Residual, 3984<br />

PRESS Statistic, 3984<br />

<strong>MIXED</strong> procedure, see also mixed model<br />

2D geometric anisotropic structure, 3953<br />

Akaike’s information criterion, 3901, 3970,<br />

3991<br />

Akaike’s information criterion (finite sample<br />

corrected version), 3901, 3991<br />

alpha level, 3899, 3915, 3919, 3923, 3944<br />

anisotropic power covariance structure, 3954<br />

anisotropic spatial power structure, 3954<br />

ANTE(1) structure, 3953<br />

ante-dependence structure, 3953<br />

AR(1) structure, 3953<br />

ARIMA procedure, compared, 3889<br />

ARMA structure, 3953<br />

assumptions, 3886<br />

asymptotic covariance, 3899<br />

AUTOREG procedure, compared, 3889<br />

autoregressive moving-average structure,<br />

3953<br />

autoregressive structure, 3953, 4019<br />

banded Toeplitz structure, 3953<br />

basic features, 3887<br />

Bayesian analysis, 3939<br />

between-within method, 3901, 3925<br />

BLUE, 3971<br />

BLUP, 3971, 4040<br />

Bonferroni adjustment, 3918<br />

boundary constraints, 3938, 3939, 4005<br />

BYLEVEL processing of LSMEANS, 3920<br />

CALIS procedure, compared, 3889<br />

chi-square test, 3913, 3923<br />

Cholesky root, 3935, 3982, 4004<br />

class level, 3903, 3989<br />

classification variables, 3910<br />

compound symmetry structure, 3953, 3964,<br />

4020, 4025<br />

computational details, 4004<br />

computational order, 4004<br />

conditional residuals, 3981<br />

confidence interval, 3915, 3944<br />

confidence limits, 3900, 3915, 3920, 3923,<br />

3944<br />

containment method, 3924, 3925<br />

continuous effects, 3945, 3946, 3949, 3952<br />

continuous-by-class effects, 3978<br />

continuous-nesting-class effects, 3978<br />

contrasted <strong>SAS</strong> procedures, 3889<br />

contrasts, 3911, 3914<br />

convergence criterion, 3899, 3900, 3990,<br />

4006<br />

convergence problems, 4005<br />

convergence status, 3990<br />

Cook’s D, 3985<br />

Cook’s D for covariance parameters, 3985<br />

correlation estimates, 3945, 3947, 3952,<br />

4021<br />

correlations of least squares means, 3920<br />

covariance parameter estimates, 3900, 3901,<br />

3991<br />

covariance parameter estimates, ratio, 3909<br />

covariance parameters, 3886<br />

covariance structure, 3888, 3953, 3954,<br />

4013<br />

covariances of least squares means, 3920<br />

covariate values for LSMEANS, 3919<br />

covariates, 3976<br />

CovRatio, 3986<br />

CovRatio for covariance parameters, 3986<br />

CovTrace, 3986<br />

CovTrace for covariance parameters, 3986<br />

CPU requirements, 4007<br />

crossed effects, 3977<br />

default output, 3989


degrees of freedom, 3912–3916, 3920, 3924,<br />

3936, 3973, 3980, 3992, 4005, 4040<br />

DFFITS, 3985<br />

dimension information, 3989<br />

dimensions, 3902, 3903<br />

direct product structure, 3953<br />

Dunnett’s adjustment, 3918<br />

EBLUPs, 3946, 3971, 4032, 4047<br />

effect name length, 3903<br />

empirical best linear unbiased prediction,<br />

3934<br />

empirical estimator, 3901<br />

estimability, 3911, 3913–3916, 3921, 3936,<br />

3972, 3980<br />

estimable functions, 3933<br />

estimation methods, 3902<br />

exponential covariance structure, 3954<br />

factor analytic structures, 3953<br />

Fisher information matrix, 3991, 4026<br />

Fisher’s scoring method, 3899, 3909, 4006<br />

fitting information, 3991, 3992<br />

fixed effects, 3888<br />

fixed-effects parameters, 3886, 3936, 3963<br />

fixed-effects variance matrix, 3936<br />

function evaluations, 3902<br />

G matrix, 3888, 3943, 3944, 3963, 3964,<br />

4046<br />

gaussian covariance structure, 3954<br />

general linear covariance structure, 3953<br />

generalized inverse, 3913, 3971<br />

GLIMMIX procedure, compared, 3890<br />

gradient, 3900, 3990<br />

grid search, 3937, 4026<br />

growth curve analysis, 3964<br />

Hannan-Quinn information criterion, 3901<br />

Hessian matrix, 3899, 3900, 3909, 3938,<br />

3990, 3991, 4005, 4006, 4017, 4026<br />

heterogeneity, 3945, 3949, 4023<br />

heterogeneous AR(1) structure, 3953<br />

heterogeneous compound-symmetry<br />

structure, 3953<br />

heterogeneous covariance structures, 3962<br />

heterogeneous Toeplitz structure, 3953<br />

hierarchical model, 4041<br />

Hotelling-Lawley-McKeon statistic, 3949<br />

Hotelling-Lawley-Pillai-Sampson statistic,<br />

3949<br />

Hsu’s adjustment, 3918<br />

Huynh-Feldt structure, 3953<br />

infinite likelihood, 3948, 4005, 4006<br />

influence diagnostics, 3932, 3982<br />

influence plots, 4000<br />

information criteria, 3901<br />

initial values, 3937<br />

input data sets, 3901<br />

interaction effects, 3977<br />

intercept, 3976<br />

intercept effect, 3934, 3943<br />

intraclass correlation coefficient, 4021<br />

introductory example, 3890<br />

iteration history, 3990<br />

iterations, 3902, 3990<br />

Kenward-Roger method, 3927<br />

Kronecker product structure, 3953<br />

LATTICE procedure, compared, 3889<br />

least squares means, 3920, 4026, 4050<br />

leave-one-out-estimates, 4000<br />

likelihood distance, 3987<br />

likelihood ratio test, 3992<br />

linear covariance structure, 3953<br />

log-linear variance model, 3950<br />

main effects, 3976<br />

marginal residuals, 3981<br />

Matérn covariance structure, 3953<br />

matrix notation, 3962<br />

MDFFITS, 3985<br />

MDFFITS for covariance parameters, 3986<br />

memory requirements, 4007<br />

missing level combinations, 3980<br />

mixed linear model, 3886<br />

mixed model, 3963<br />

mixed model equations, 3903, 3970, 4026<br />

mixed model theory, 3962<br />

model information, 3903, 3989<br />

model selection, 3970<br />

multilevel model, 4041<br />

multiple comparisons of least squares<br />

means, 3917, 3918, 3920<br />

multiple tables, 3995<br />

multiplicity adjustment, 3917<br />

multivariate tests, 3949<br />

nested effects, 3977<br />

nested error structure, 4045<br />

NESTED procedure, compared, 3889<br />

Newton-Raphson algorithm, 3969<br />

non-full-rank parameterization, 3889, 3950,<br />

3980<br />

nonstandard weights for LSMEANS, 3921<br />

nugget effect, 3950<br />

number of observations, 3989<br />

oblique projector, 3984<br />

observed margins for LSMEANS, 3921<br />

ODS graph names, 4002<br />

ODS Graphics, 3905, 3998<br />

ODS table names, 3993<br />

ordering of effects, 3904, 3979<br />

over-parameterization, 3976<br />

parameter constraints, 3938, 4005


parameterization, 3975<br />

Pearson residual, 3935<br />

pharmaceutical stability, example, 4041<br />

plotting the likelihood, 4032<br />

polynomial effects, 3976<br />

power-of-the-mean model, 3950<br />

predicted means, 3935<br />

predicted value confidence intervals, 3923<br />

predicted values, 3934, 4026<br />

prior density, 3940<br />

profiling residual variance, 3904, 3938,<br />

3950, 3969, 4004<br />

R matrix, 3888, 3948, 3951, 3963, 3964<br />

random coefficients, 4019, 4041<br />

random effects, 3888, 3943<br />

random-effects parameters, 3887, 3946,<br />

3963<br />

regression effects, 3976<br />

rejection sampling, 3941<br />

repeated measures, 3887, 3948, 4013<br />

residual diagnostics, details, 3980<br />

residual method, 3925<br />

residual plots, 3998<br />

residual variance tolerance, 3935<br />

restricted maximum likelihood (REML),<br />

3887<br />

ridging, 3909, 3969<br />

sandwich estimator, 3901<br />

Satterthwaite method, 3925<br />

scaled residual, 3936, 3982<br />

Schwarz’s Bayesian information criterion,<br />

3901, 3970, 3991<br />

scoring, 3899, 3909, 4006<br />

Sidak’s adjustment, 3918<br />

simple effects, 3922<br />

simulation-based adjustment, 3918<br />

singularities, 4006<br />

spatial anisotropic exponential structure,<br />

3953<br />

spatial covariance structure, 3954, 3962,<br />

4005<br />

split-plot design, 3966, 4008<br />

standard linear model, 3888<br />

statement positions, 3896<br />

studentized residual, 3935, 3985<br />

subject effect, 3912, 3946, 3952, 4007, 4013<br />

summary of commands, 3897<br />

sweep operator, 3985, 4004<br />

table names, 3993<br />

test components, 3933<br />

Toeplitz structure, 3953, 4050<br />

TSCSREG procedure, compared, 3889<br />

Tukey’s adjustment, 3918<br />

Type 1 estimation, 3902<br />

Type 1 testing, 3928<br />

Type 2 estimation, 3902<br />

Type 2 testing, 3928<br />

Type 3 estimation, 3902<br />

Type 3 testing, 3928, 3992<br />

unstructured correlations, 3953<br />

unstructured covariance matrix, 3953<br />

unstructured R matrix, 3952<br />

V matrix, 3947<br />

VARCOMP procedure, example, 4026<br />

variance components, 3887, 3953<br />

variance ratios, 3938, 3946<br />

Wald test, 3991, 3992<br />

weighted LSMEANS, 3921<br />

weighting, 3962<br />

zero design columns, 3927<br />

zero variance component estimates, 4005<br />

model<br />

information (<strong>MIXED</strong>), 3903<br />

model information<br />

<strong>MIXED</strong> procedure, 3989<br />

model selection<br />

<strong>MIXED</strong> procedure, 3970<br />

multilevel model<br />

example (<strong>MIXED</strong>), 4041<br />

multiple comparison adjustment (<strong>MIXED</strong>)<br />

least squares means, 3917, 3918<br />

multiple comparisons of least squares means<br />

<strong>MIXED</strong> procedure, 3917, 3918, 3920<br />

multiple tables<br />

<strong>MIXED</strong> procedure, 3995<br />

multiplicity adjustment<br />

<strong>MIXED</strong> procedure, 3917<br />

row-wise (<strong>MIXED</strong>), 3917<br />

multivariate tests<br />

<strong>MIXED</strong> procedure, 3949<br />

nested effects<br />

<strong>MIXED</strong> procedure, 3977<br />

nested error structure<br />

<strong>MIXED</strong> procedure, 4045<br />

NESTED procedure<br />

compared to other procedures, 3889<br />

Newton-Raphson algorithm<br />

<strong>MIXED</strong> procedure, 3969<br />

non-full-rank parameterization<br />

<strong>MIXED</strong> procedure, 3889, 3950, 3980<br />

nugget effect<br />

<strong>MIXED</strong> procedure, 3950<br />

number of observations<br />

<strong>MIXED</strong> procedure, 3989<br />

objective function<br />

mixed model (<strong>MIXED</strong>), 3990


oblique projector<br />

<strong>MIXED</strong> procedure, 3984<br />

ODS graph names<br />

<strong>MIXED</strong> procedure, 4002<br />

ODS Graphics<br />

<strong>MIXED</strong> procedure, 3905, 3998<br />

options summary<br />

LSMEANS statement, (<strong>MIXED</strong>), 3916<br />

MODEL statement (<strong>MIXED</strong>), 3922<br />

PROC <strong>MIXED</strong> statement, 3898<br />

RANDOM statement (<strong>MIXED</strong>), 3943<br />

REPEATED statement (<strong>MIXED</strong>), 3948<br />

over-parameterization<br />

<strong>MIXED</strong> procedure, 3976<br />

parameter constraints<br />

<strong>MIXED</strong> procedure, 3938, 4005<br />

parameterization<br />

mixed model (<strong>MIXED</strong>), 3975<br />

<strong>MIXED</strong> procedure, 3975<br />

Pearson residual<br />

<strong>MIXED</strong> procedure, 3935<br />

pharmaceutical stability<br />

example (<strong>MIXED</strong>), 4041<br />

plots<br />

likelihood (<strong>MIXED</strong>), 4032<br />

polynomial effects<br />

<strong>MIXED</strong> procedure, 3976<br />

power-of-the-mean model<br />

<strong>MIXED</strong> procedure, 3950<br />

predicted means<br />

<strong>MIXED</strong> procedure, 3935<br />

predicted value confidence intervals<br />

<strong>MIXED</strong> procedure, 3923<br />

predicted values<br />

example (<strong>MIXED</strong>), 4026<br />

mixed model (<strong>MIXED</strong>), 3916<br />

<strong>MIXED</strong> procedure, 3934<br />

PRESS residual<br />

<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />

PRESS statistic<br />

<strong>MIXED</strong> <strong>Procedure</strong>, 3984<br />

prior density<br />

<strong>MIXED</strong> procedure, 3940<br />

profiling residual variance<br />

<strong>MIXED</strong> procedure, 4004<br />

R matrix<br />

<strong>MIXED</strong> procedure, 3888, 3948, 3951, 3963,<br />

3964<br />

random coefficients<br />

example (<strong>MIXED</strong>), 4019, 4041<br />

random effects<br />

<strong>MIXED</strong> procedure, 3888, 3943<br />

random-effects parameters<br />

<strong>MIXED</strong> procedure, 3887, 3963<br />

regression effects<br />

<strong>MIXED</strong> procedure, 3976<br />

rejection sampling<br />

<strong>MIXED</strong> procedure, 3941<br />

REML, see restricted maximum likelihood<br />

repeated measures<br />

<strong>MIXED</strong> procedure, 3887, 3948, 4013<br />

residual maximum likelihood, see also restricted<br />

maximum likelihood<br />

residual maximum likelihood (REML)<br />

<strong>MIXED</strong> procedure, 3969, 4010<br />

residual plots<br />

<strong>MIXED</strong> procedure, 3998<br />

residuals, details<br />

<strong>MIXED</strong> procedure, 3980<br />

restricted maximum likelihood<br />

<strong>MIXED</strong> procedure, 3887, 3969, 4010<br />

ridging<br />

<strong>MIXED</strong> procedure, 3909, 3969<br />

sandwich estimator<br />

<strong>MIXED</strong> procedure, 3901<br />

Satterthwaite method<br />

<strong>MIXED</strong> procedure, 3925<br />

scaled residual<br />

<strong>MIXED</strong> procedure, 3936, 3982<br />

Schwarz’s Bayesian information criterion<br />

example (<strong>MIXED</strong>), 4011, 4025, 4054<br />

<strong>MIXED</strong> procedure, 3901, 3970, 3991<br />

scoring<br />

<strong>MIXED</strong> procedure, 3899, 3909, 4006<br />

Sidak’s adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

simple effects<br />

<strong>MIXED</strong> procedure, 3922<br />

simulation-based adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

singularities<br />

<strong>MIXED</strong> procedure, 4006<br />

spatial anisotropic exponential structure<br />

<strong>MIXED</strong> procedure, 3953<br />

spatial covariance structure<br />

examples (<strong>MIXED</strong>), 3954<br />

<strong>MIXED</strong> procedure, 3954, 3962, 4005<br />

split-plot design<br />

<strong>MIXED</strong> procedure, 3966, 4008<br />

standard linear model<br />

<strong>MIXED</strong> procedure, 3888<br />

studentized residual<br />

external, 3985<br />

internal, 3985<br />

<strong>MIXED</strong> procedure, 3935, 3985


subject effect<br />

<strong>MIXED</strong> procedure, 3912, 3946, 3952, 4007,<br />

4013<br />

summary of commands<br />

<strong>MIXED</strong> procedure, 3897<br />

table names<br />

<strong>MIXED</strong> procedure, 3993<br />

test components<br />

<strong>MIXED</strong> procedure, 3933<br />

Toeplitz structure<br />

example (<strong>MIXED</strong>), 4050<br />

<strong>MIXED</strong> procedure, 3953<br />

Tukey’s adjustment<br />

<strong>MIXED</strong> procedure, 3918<br />

Type 1 estimation<br />

<strong>MIXED</strong> procedure, 3902<br />

Type 1 testing<br />

<strong>MIXED</strong> procedure, 3928<br />

Type 2 estimation<br />

<strong>MIXED</strong> procedure, 3902<br />

Type 2 testing<br />

<strong>MIXED</strong> procedure, 3928<br />

Type 3 estimation<br />

<strong>MIXED</strong> procedure, 3902<br />

Type 3 testing<br />

<strong>MIXED</strong> procedure, 3928, 3992<br />

unstructured correlations<br />

<strong>MIXED</strong> procedure, 3953<br />

unstructured covariance matrix<br />

<strong>MIXED</strong> procedure, 3953<br />

V matrix<br />

<strong>MIXED</strong> procedure, 3947<br />

VARCOMP procedure<br />

compared to <strong>MIXED</strong> procedure, 3889<br />

example (<strong>MIXED</strong>), 4026<br />

variance components<br />

<strong>MIXED</strong> procedure, 3887, 3953<br />

variance ratios<br />

<strong>MIXED</strong> procedure, 3938, 3946<br />

Wald test<br />

mixed model (<strong>MIXED</strong>), 3972, 4017<br />

<strong>MIXED</strong> procedure, 3991, 3992<br />

weighting<br />

<strong>MIXED</strong> procedure, 3962<br />

zero variance component estimates<br />

<strong>MIXED</strong> procedure, 4005


Syntax Index<br />

ABSOLUTE option<br />

PROC <strong>MIXED</strong> statement, 3899, 3990<br />

ADJDFE= option<br />

LSMEANS statement (<strong>MIXED</strong>), 3917<br />

ADJUST= option<br />

LSMEANS statement (<strong>MIXED</strong>), 3918<br />

ALG= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

ALPHA= option<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement (<strong>MIXED</strong>), 3919<br />

PROC <strong>MIXED</strong> statement, 3899<br />

RANDOM statement (<strong>MIXED</strong>), 3944<br />

ALPHAP= option<br />

MODEL statement (<strong>MIXED</strong>), 3923<br />

ANOVAF option<br />

PROC <strong>MIXED</strong> statement, 3899<br />

ASYCORR option<br />

PROC <strong>MIXED</strong> statement, 3899<br />

ASYCOV option<br />

PROC <strong>MIXED</strong> statement, 3899, 4026<br />

AT MEANS option<br />

LSMEANS statement (<strong>MIXED</strong>), 3919<br />

AT option<br />

LSMEANS statement (<strong>MIXED</strong>), 3919, 3920<br />

BDATA= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

BY statement<br />

<strong>MIXED</strong> procedure, 3910<br />

BYLEVEL option<br />

LSMEANS statement (<strong>MIXED</strong>), 3920, 3921<br />

CHISQ option<br />

CONTRAST statement (<strong>MIXED</strong>), 3913<br />

MODEL statement (<strong>MIXED</strong>), 3923<br />

CL option<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement (<strong>MIXED</strong>), 3920<br />

MODEL statement (<strong>MIXED</strong>), 3923<br />

RANDOM statement (<strong>MIXED</strong>), 3944<br />

CL= option<br />

PROC <strong>MIXED</strong> statement, 3900<br />

CLASS statement<br />

<strong>MIXED</strong> procedure, 3910, 3989<br />

CONTAIN option<br />

MODEL statement (<strong>MIXED</strong>), 3924, 3925<br />

CONTRAST statement<br />

<strong>MIXED</strong> procedure, 3911<br />

CONVF option<br />

PROC <strong>MIXED</strong> statement, 3900, 3990<br />

CONVG option<br />

PROC <strong>MIXED</strong> statement, 3900, 3990<br />

CONVH option<br />

PROC <strong>MIXED</strong> statement, 3900, 3990<br />

CORR option<br />

LSMEANS statement (<strong>MIXED</strong>), 3920<br />

CORRB option<br />

MODEL statement (<strong>MIXED</strong>), 3924<br />

COV option<br />

LSMEANS statement (<strong>MIXED</strong>), 3920<br />

COVB option<br />

MODEL statement (<strong>MIXED</strong>), 3924<br />

COVBI option<br />

MODEL statement (<strong>MIXED</strong>), 3924<br />

COVTEST option<br />

PROC <strong>MIXED</strong> statement, 3901, 3991<br />

DATA= option<br />

PRIOR statement (<strong>MIXED</strong>), 3940<br />

PROC <strong>MIXED</strong> statement, 3901<br />

DDF= option<br />

MODEL statement (<strong>MIXED</strong>), 3924<br />

DDFM= option<br />

MODEL statement (<strong>MIXED</strong>), 3924<br />

DF= option<br />

CONTRAST statement (<strong>MIXED</strong>), 3913<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement (<strong>MIXED</strong>), 3920<br />

DFBW option<br />

PROC <strong>MIXED</strong> statement, 3901<br />

DIFF option<br />

LSMEANS statement (<strong>MIXED</strong>), 3920<br />

DIVISOR= option<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

E option<br />

CONTRAST statement (<strong>MIXED</strong>), 3913<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement (<strong>MIXED</strong>), 3921<br />

MODEL statement (<strong>MIXED</strong>), 3927<br />

E1 option<br />

MODEL statement (<strong>MIXED</strong>), 3927<br />

E2 option<br />

MODEL statement (<strong>MIXED</strong>), 3927<br />

E3 option<br />

MODEL statement (<strong>MIXED</strong>), 3927<br />

EFFECT= modifier


INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3929<br />

EMPIRICAL option<br />

<strong>MIXED</strong>, 3901<br />

EQCONS= option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

ESTIMATE statement<br />

<strong>MIXED</strong> procedure, 3914<br />

ESTIMATES modifier<br />

INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3929<br />

FLAT option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

FULLX option<br />

MODEL statement (<strong>MIXED</strong>), 3919, 3927<br />

G option<br />

RANDOM statement (<strong>MIXED</strong>), 3944<br />

GC option<br />

RANDOM statement (<strong>MIXED</strong>), 3944<br />

GCI option<br />

RANDOM statement (<strong>MIXED</strong>), 3944<br />

GCORR option<br />

RANDOM statement (<strong>MIXED</strong>), 3945<br />

GDATA= option<br />

RANDOM statement (<strong>MIXED</strong>), 3945<br />

GI option<br />

RANDOM statement (<strong>MIXED</strong>), 3945<br />

GRID= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

GRIDT= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

GROUP option<br />

CONTRAST statement (<strong>MIXED</strong>), 3913<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

GROUP= option<br />

RANDOM statement (<strong>MIXED</strong>), 3945<br />

REPEATED statement (<strong>MIXED</strong>), 3949<br />

HLM option<br />

REPEATED statement (<strong>MIXED</strong>), 3949<br />

HLPS option<br />

REPEATED statement (<strong>MIXED</strong>), 3949<br />

HOLD= option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

HTYPE= option<br />

MODEL statement (<strong>MIXED</strong>), 3928<br />

IC option<br />

PROC <strong>MIXED</strong> statement, 3901<br />

ID statement<br />

<strong>MIXED</strong> procedure, 3916<br />

IFACTOR= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

INFLUENCE option<br />

MODEL statement (<strong>MIXED</strong>), 3928<br />

INFO option<br />

PROC <strong>MIXED</strong> statement, 3902<br />

INTERCEPT option<br />

MODEL statement (<strong>MIXED</strong>), 3933<br />

ITDETAILS option<br />

PROC <strong>MIXED</strong> statement, 3902<br />

ITER= modifier<br />

INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3929<br />

JEFFREYS option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

KEEP= modifier<br />

INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3930<br />

LCOMPONENTS option<br />

MODEL statement (<strong>MIXED</strong>), 3933<br />

LDATA= option<br />

RANDOM statement (<strong>MIXED</strong>), 3945<br />

REPEATED statement (<strong>MIXED</strong>), 3950<br />

LOCAL= option<br />

REPEATED statement (<strong>MIXED</strong>), 3950<br />

LOCALW option<br />

REPEATED statement (<strong>MIXED</strong>), 3951<br />

LOGDETH option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

LOGNOTE option<br />

PROC <strong>MIXED</strong> statement, 3902<br />

LOGNOTE= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

LOGRBOUND= option<br />

PRIOR statement (<strong>MIXED</strong>), 3941<br />

LOWERB= option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

LOWERTAILED option<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement<br />

<strong>MIXED</strong> procedure, 3916<br />

MAXFUNC= option<br />

PROC <strong>MIXED</strong> statement, 3902<br />

MAXITER= option<br />

PROC <strong>MIXED</strong> statement, 3902<br />

METHOD= option<br />

PROC <strong>MIXED</strong> statement, 3902, 4014<br />

<strong>MIXED</strong> procedure, 3896<br />

INFLUENCE option, 3928<br />

syntax, 3896<br />

<strong>MIXED</strong> procedure, BY statement, 3910<br />

<strong>MIXED</strong> procedure, CLASS statement, 3910,<br />

3989


TRUNCATE option, 3911<br />

<strong>MIXED</strong> procedure, CONTRAST statement, 3911<br />

CHISQ option, 3913<br />

DF= option, 3913<br />

E option, 3913<br />

GROUP option, 3913<br />

SINGULAR= option, 3914<br />

SUBJECT option, 3914<br />

<strong>MIXED</strong> procedure, ESTIMATE statement, 3914<br />

ALPHA= option, 3915<br />

CL option, 3915<br />

DF= option, 3915<br />

DIVISOR= option, 3915<br />

E option, 3915<br />

GROUP option, 3915<br />

LOWERTAILED option, 3915<br />

SINGULAR= option, 3915<br />

SUBJECT option, 3915<br />

UPPERTAILED option, 3916<br />

<strong>MIXED</strong> procedure, ID statement, 3916<br />

<strong>MIXED</strong> procedure, LSMEANS statement, 3916,<br />

4026<br />

ADJUST= option, 3918<br />

ALPHA= option, 3919<br />

AT MEANS option, 3919<br />

AT option, 3919, 3920<br />

BYLEVEL option, 3920, 3921<br />

CL option, 3920<br />

CORR option, 3920<br />

COV option, 3920<br />

DF= option, 3920<br />

DIFF option, 3920<br />

E option, 3921<br />

OBSMARGINS option, 3921<br />

PDIFF option, 3920, 3921<br />

SINGULAR= option, 3922<br />

SLICE= option, 3922<br />

<strong>MIXED</strong> procedure, MODEL statement, 3922<br />

ALPHAP= option, 3923<br />

CHISQ option, 3923<br />

CL option, 3923<br />

CONTAIN option, 3924, 3925<br />

CORRB option, 3924<br />

COVB option, 3924<br />

COVBI option, 3924<br />

DDF= option, 3924<br />

DDFM= option, 3924<br />

E option, 3927<br />

E1 option, 3927<br />

E2 option, 3927<br />

E3 option, 3927<br />

FULLX option, 3919, 3927<br />

HTYPE= option, 3928<br />

INFLUENCE option, 3928<br />

INTERCEPT option, 3933<br />

LCOMPONENTS option, 3933<br />

NOCONTAIN option, 3934<br />

NOINT option, 3934, 3976<br />

NOTEST option, 3934<br />

ORDER= option, 3980<br />

OUTP= option, 4026<br />

OUTPRED= option, 3934<br />

OUTPREDM= option, 3935<br />

RESIDUAL option, 3935, 3982<br />

SINGCHOL= option, 3935<br />

SINGRES= option, 3935<br />

SINGULAR= option, 3936<br />

SOLUTION option, 3936, 3980<br />

VCIRY option, 3936, 3982<br />

XPVIX option, 3936<br />

XPVIXI option, 3936<br />

ZETA= option, 3936<br />

<strong>MIXED</strong> procedure, MODEL statement,<br />

INFLUENCE option<br />

EFFECT=, 3929<br />

ESTIMATES, 3929<br />

ITER=, 3929<br />

KEEP=, 3930<br />

SELECT=, 3930<br />

SIZE=, 3931<br />

<strong>MIXED</strong> procedure, PARMS statement, 3937,<br />

4026<br />

EQCONS= option, 3938<br />

HOLD= option, 3938<br />

LOGDETH option, 3938<br />

LOWERB= option, 3938<br />

NOBOUND option, 3938<br />

NOITER option, 3938<br />

NOPROFILE option, 3938<br />

OLS option, 3939<br />

PARMSDATA= option, 3939<br />

PDATA= option, 3939<br />

RATIOS option, 3939<br />

UPPERB= option, 3939<br />

<strong>MIXED</strong> procedure, PRIOR statement, 3939<br />

ALG= option, 3941<br />

BDATA= option, 3941<br />

DATA= option, 3940<br />

FLAT option, 3941<br />

GRID= option, 3941<br />

GRIDT= option, 3941<br />

IFACTOR= option, 3941<br />

JEFFREYS option, 3941<br />

LOGNOTE= option, 3941<br />

LOGRBOUND= option, 3941<br />

NSAMPLE= option, 3942<br />

NSEARCH= option, 3942<br />

OUT= option, 3942


OUTG= option, 3942<br />

OUTGT= option, 3942<br />

PSEARCH option, 3942<br />

PTRANS option, 3942<br />

SEED= option, 3942<br />

SFACTOR= option, 3942<br />

TDATA= option, 3943<br />

TRANS= option, 3943<br />

UPDATE= option, 3943<br />

<strong>MIXED</strong> procedure, PROC <strong>MIXED</strong> statement,<br />

3898<br />

ABSOLUTE option, 3899, 3990<br />

ALPHA= option, 3899<br />

ANOVAF option, 3899<br />

ASYCORR option, 3899<br />

ASYCOV option, 3899, 4026<br />

CL= option, 3900<br />

CONVF option, 3900, 3990<br />

CONVG option, 3900, 3990<br />

CONVH option, 3900, 3990<br />

COVTEST option, 3901, 3991<br />

DATA= option, 3901<br />

DFBW option, 3901<br />

IC option, 3901<br />

INFO option, 3902<br />

ITDETAILS option, 3902<br />

LOGNOTE option, 3902<br />

MAXFUNC= option, 3902<br />

MAXITER= option, 3902<br />

METHOD= option, 3902, 4014<br />

MMEQ option, 3903, 4026<br />

MMEQSOL option, 3903, 4026<br />

NAMELEN= option, 3903<br />

NOBOUND option, 3903<br />

NOCLPRINT option, 3903<br />

NOINFO option, 3903<br />

NOITPRINT option, 3903<br />

NOPROFILE option, 3904, 3969<br />

ORD option, 3904<br />

ORDER= option, 3904, 3976<br />

PLOT option, 3905<br />

PLOTS option, 3905<br />

RATIO option, 3909, 3991<br />

RIDGE= option, 3909<br />

SCORING= option, 3909<br />

SIGITER option, 3909<br />

UPDATE option, 3909<br />

<strong>MIXED</strong> procedure, RANDOM statement, 3889,<br />

3943, 4008<br />

ALPHA= option, 3944<br />

CL option, 3944<br />

G option, 3944<br />

GC option, 3944<br />

GCI option, 3944<br />

GCORR option, 3945<br />

GDATA= option, 3945<br />

GI option, 3945<br />

GROUP= option, 3945<br />

LDATA= option, 3945<br />

NOFULLZ option, 3946<br />

RATIOS option, 3946<br />

SOLUTION option, 3946<br />

SUBJECT= option, 3912, 3946<br />

TYPE= option, 3946<br />

V option, 3947<br />

VC option, 3947<br />

VCI option, 3947<br />

VCORR option, 3947<br />

VI option, 3947<br />

<strong>MIXED</strong> procedure, REPEATED statement, 3889,<br />

3948, 4013<br />

GROUP= option, 3949<br />

HLM option, 3949<br />

HLPS option, 3949<br />

LDATA= option, 3950<br />

LOCAL= option, 3950<br />

LOCALW option, 3951<br />

NONLOCALW option, 3951<br />

R option, 3951<br />

RC option, 3952<br />

RCI option, 3952<br />

RCORR option, 3952<br />

RI option, 3952<br />

SSCP option, 3952<br />

SUBJECT= option, 3952<br />

TYPE= option, 3953<br />

<strong>MIXED</strong> procedure, WEIGHT statement, 3962<br />

MMEQ option<br />

PROC <strong>MIXED</strong> statement, 3903, 4026<br />

MMEQSOL option<br />

PROC <strong>MIXED</strong> statement, 3903, 4026<br />

MODEL statement<br />

<strong>MIXED</strong> procedure, 3922<br />

Modifiers of INFLUENCE option<br />

MODEL statement (<strong>MIXED</strong>), 3928<br />

NAMELEN= option<br />

PROC <strong>MIXED</strong> statement, 3903<br />

NOBOUND option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

PROC <strong>MIXED</strong> statement, 3903<br />

NOCLPRINT option<br />

PROC <strong>MIXED</strong> statement, 3903<br />

NOCONTAIN option<br />

MODEL statement (<strong>MIXED</strong>), 3934<br />

NOFULLZ option<br />

RANDOM statement (<strong>MIXED</strong>), 3946<br />

NOINFO option


PROC <strong>MIXED</strong> statement, 3903<br />

NOINT option<br />

MODEL statement (<strong>MIXED</strong>), 3934, 3976<br />

NOITER option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

NOITPRINT option<br />

PROC <strong>MIXED</strong> statement, 3903<br />

NONLOCALW option<br />

REPEATED statement (<strong>MIXED</strong>), 3951<br />

NOPROFILE option<br />

PARMS statement (<strong>MIXED</strong>), 3938<br />

PROC <strong>MIXED</strong> statement, 3904, 3969<br />

NOTEST option<br />

MODEL statement (<strong>MIXED</strong>), 3934<br />

NSAMPLE= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

NSEARCH= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

OBSMARGINS option<br />

LSMEANS statement (<strong>MIXED</strong>), 3921<br />

OLS option<br />

PARMS statement (<strong>MIXED</strong>), 3939<br />

ORD option<br />

PROC <strong>MIXED</strong> statement, 3904<br />

ORDER= option<br />

MODEL statement (<strong>MIXED</strong>), 3980<br />

PROC <strong>MIXED</strong> statement, 3904, 3976<br />

OUT= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

OUTG= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

OUTGT= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

OUTP= option<br />

MODEL statement (<strong>MIXED</strong>), 4026<br />

OUTPRED= option<br />

MODEL statement (<strong>MIXED</strong>), 3934<br />

OUTPREDM= option<br />

MODEL statement (<strong>MIXED</strong>), 3935<br />

PARMS statement<br />

<strong>MIXED</strong> procedure, 3937, 4026<br />

PARMSDATA= option<br />

PARMS statement (<strong>MIXED</strong>), 3939<br />

PDATA= option<br />

PARMS statement (<strong>MIXED</strong>), 3939<br />

PDIFF option<br />

LSMEANS statement (<strong>MIXED</strong>), 3920, 3921<br />

PLOT option<br />

PROC <strong>MIXED</strong> statement, 3905<br />

PLOTS option<br />

PROC <strong>MIXED</strong> statement, 3905<br />

PRIOR statement<br />

<strong>MIXED</strong> procedure, 3939<br />

PROC <strong>MIXED</strong> statement, see <strong>MIXED</strong> procedure<br />

PSEARCH option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

PTRANS option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

R option<br />

REPEATED statement (<strong>MIXED</strong>), 3951<br />

RANDOM statement<br />

<strong>MIXED</strong> procedure, 3943<br />

RATIO option<br />

PROC <strong>MIXED</strong> statement, 3909, 3991<br />

RATIOS option<br />

PARMS statement (<strong>MIXED</strong>), 3939<br />

RANDOM statement (<strong>MIXED</strong>), 3946<br />

RC option<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

RCI option<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

RCORR option<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

REPEATED statement<br />

<strong>MIXED</strong> procedure, 3948, 4013<br />

RESIDUAL option<br />

<strong>MIXED</strong> procedure, MODEL statement,<br />

3982<br />

MODEL statement (<strong>MIXED</strong>), 3935<br />

RI option<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

RIDGE= option<br />

PROC <strong>MIXED</strong> statement, 3909<br />

SCORING= option<br />

PROC <strong>MIXED</strong> statement, 3909<br />

SEED= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

SELECT= modifier<br />

INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3930<br />

SFACTOR= option<br />

PRIOR statement (<strong>MIXED</strong>), 3942<br />

SIGITER option<br />

PROC <strong>MIXED</strong> statement, 3909<br />

SINGCHOL= option<br />

MODEL statement (<strong>MIXED</strong>), 3935<br />

SINGRES= option<br />

MODEL statement (<strong>MIXED</strong>), 3935<br />

SINGULAR= option<br />

CONTRAST statement (<strong>MIXED</strong>), 3914<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

LSMEANS statement (<strong>MIXED</strong>), 3922<br />

MODEL statement (<strong>MIXED</strong>), 3936<br />

SIZE= modifier


INFLUENCE option, MODEL statement<br />

(<strong>MIXED</strong>), 3931<br />

SLICE= option<br />

LSMEANS statement (<strong>MIXED</strong>), 3922<br />

SOLUTION option<br />

MODEL statement (<strong>MIXED</strong>), 3936, 3980<br />

RANDOM statement (<strong>MIXED</strong>), 3946<br />

SSCP option<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

SUBJECT option<br />

CONTRAST statement (<strong>MIXED</strong>), 3914<br />

ESTIMATE statement (<strong>MIXED</strong>), 3915<br />

SUBJECT= option<br />

RANDOM statement (<strong>MIXED</strong>), 3912, 3946<br />

REPEATED statement (<strong>MIXED</strong>), 3952<br />

TDATA= option<br />

PRIOR statement (<strong>MIXED</strong>), 3943<br />

TRANS= option<br />

PRIOR statement (<strong>MIXED</strong>), 3943<br />

TYPE= option<br />

RANDOM statement (<strong>MIXED</strong>), 3946<br />

REPEATED statement (<strong>MIXED</strong>), 3953<br />

UPDATE option<br />

PROC <strong>MIXED</strong> statement, 3909<br />

UPDATE= option<br />

PRIOR statement (<strong>MIXED</strong>), 3943<br />

UPPERB= option<br />

PARMS statement (<strong>MIXED</strong>), 3939<br />

UPPERTAILED option<br />

ESTIMATE statement (<strong>MIXED</strong>), 3916<br />

V option<br />

RANDOM statement (<strong>MIXED</strong>), 3947<br />

VC option<br />

RANDOM statement (<strong>MIXED</strong>), 3947<br />

VCI option<br />

RANDOM statement (<strong>MIXED</strong>), 3947<br />

VCIRY option<br />

<strong>MIXED</strong> procedure, MODEL statement,<br />

3982<br />

MODEL statement (<strong>MIXED</strong>), 3936<br />

VCORR option<br />

RANDOM statement (<strong>MIXED</strong>), 3947<br />

VI option<br />

RANDOM statement (<strong>MIXED</strong>), 3947<br />

WEIGHT statement<br />

<strong>MIXED</strong> procedure, 3962<br />

XPVIX option<br />

MODEL statement (<strong>MIXED</strong>), 3936<br />

XPVIXI option<br />

MODEL statement (<strong>MIXED</strong>), 3936<br />

ZETA= option<br />

MODEL statement (<strong>MIXED</strong>), 3936


Your Turn<br />

We welcome your feedback.<br />

If you have comments about this book, please send them to<br />

yourturn@sas.com. Include the full title and page numbers (if<br />

applicable).<br />

If you have comments about the software, please send them to<br />

suggest@sas.com.


<strong>SAS</strong>®<br />

Publishing Delivers!<br />

Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly<br />

changing and competitive job market. <strong>SAS</strong> ®<br />

Publishing provides you with a wide range of resources to help you set<br />

yourself apart. Visit us online at support.sas.com/bookstore.<br />

<strong>SAS</strong> ®<br />

Press<br />

Need to learn the basics? Struggling with a programming problem? You’ll find the expert answers that you<br />

need in example-rich books from <strong>SAS</strong> Press. Written by experienced <strong>SAS</strong> professionals from around the<br />

world, <strong>SAS</strong> Press books deliver real-world insights on a broad range of topics for all skill levels.<br />

<strong>SAS</strong> ®<br />

Documentation<br />

support.sas.com/saspress<br />

To successfully implement applications using <strong>SAS</strong> software, companies in every industry and on every<br />

continent all turn to the one source for accurate, timely, and reliable information: <strong>SAS</strong> documentation.<br />

We currently produce the following types of reference documentation to improve your work experience:<br />

• Online help that is built into the software.<br />

• Tutorials that are integrated into the product.<br />

• Reference documentation delivered in HTML and PDF – free on the Web.<br />

• Hard-copy books.<br />

support.sas.com/publishing<br />

<strong>SAS</strong> ®<br />

Publishing News<br />

Subscribe to <strong>SAS</strong> Publishing News to receive up-to-date information about all new <strong>SAS</strong> titles, author<br />

podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as<br />

access to past issues, are available at our Web site.<br />

support.sas.com/spn<br />

<strong>SAS</strong> and all other <strong>SAS</strong> Institute Inc. product or service names are registered trademarks or trademarks of <strong>SAS</strong> Institute Inc. in the USA and other countries. ® indicates USA registration.<br />

Other brand and product names are trademarks of their respective companies. © 2009 <strong>SAS</strong> Institute Inc. All rights reserved. 518177_1US.0109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!