Panel data models have exploded in the past 10 years as analysts find it more common to need to analyze a richer structure of data. Some examples of panel data are cross-sectional datasets in which various observations for different types of experimental units are represented in the dataset. An example might be counties (the replication) in various states (the panel identifier). Other examples of panel data are longitudinal where we may have multiple observations (the replication) on the same experimental unit (the panel identifier) over time. The xtgee command will allow either type of panel data. Stata estimates extensions to generalized linear models where the user is free to model the structure of the within-panel correlation. This extension allows one to model GLM type models with panel data. The xtgee command offers a rich collection of models for the analyst. These models correspond to population-averaged (or marginal) models in the panel data literature. What makes xtgee useful is the number of statistical models that it generalizes for use with panel data, the richer correlation structure with models available in other commands, and the availability of robust standard errors (which do not always exist in the equivalent command). In this example, we consider a probit model where we wish to model whether a worker belongs to the union based on the person's age and whether they are living outside of an SMSA. The people in the study appear multiple times in the dataset (this type of panel dataset is commonly referred to as a longitudinal dataset) and we assume that the observations on a given person are more correlated than between different persons.
. xtgee union age not_smsa, fam(binomial) link(probit) corr(exchangeable) nolog
GEE population-averaged model Number of obs = 26200
Group variable: idcode Number of groups = 4434
Link: probit Obs per group: min = 1
Family: binomial avg = 5.9
Correlation: exchangeable max = 12
Wald chi2(2) = 52.10
Scale parameter: 1 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0066116 .0011484 5.76 0.000 .0043608 .0088623
not_smsa | -.1218662 .0269488 -4.52 0.000 -.1746849 -.0690474
_cons | -.9730766 .039148 -24.86 0.000 -1.049805 -.8963479
------------------------------------------------------------------------------
The allowable options for the xtgee command are
Assuming an independent correlation structure amounts to ignoring the panel structure of the data. Under this assumption, xtgee will produce answers that are already provided by Stata's nonpanel estimation commands. Examples of when xtgee provides answers that are the same as an existing command are given in the following table.
If you choose to model the intracluster correlation as an identity matrix (by specifying the name of an existing identity matrix in the option corr), GEE estimation reduces to a generalized linear model and the results will be identical to estimation by glm.
. glm union age not_smsa, link(identity) family(gauss)
Iteration 0: log likelihood = -14095.328
Generalized linear models No. of obs = 26200
Optimization : ML: Newton-Raphson Residual df = 26197
Scale param = .1717383
Deviance = 4499.028831 (1/df) Deviance = .1717383
Pearson = 4499.028831 (1/df) Pearson = .1717383
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
Standard errors : OIM
Log likelihood = -14095.3277 AIC = 1.076208
BIC = 4468.508287
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0021565 .0003948 5.46 0.000 .0013828 .0029303
not_smsa | -.0591923 .0056826 -10.42 0.000 -.0703301 -.0480546
_cons | .1729596 .0123365 14.02 0.000 .1487806 .1971386
------------------------------------------------------------------------------
. xtgee union age not_smsa, link(identity) family(gauss) corr(indep)
Iteration 1: tolerance = 9.073e-15
GEE population-averaged model Number of obs = 26200
Group variable: idcode Number of groups = 4434
Link: identity Obs per group: min = 1
Family: Gaussian avg = 5.9
Correlation: independent max = 12
Wald chi2(2) = 134.68
Scale parameter: .1717187 Prob > chi2 = 0.0000
Pearson chi2(26200): 4499.03 Deviance = 4499.03
Dispersion (Pearson): .1717187 Dispersion = .1717187
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0021565 .0003948 5.46 0.000 .0013828 .0029302
not_smsa | -.0591923 .0056823 -10.42 0.000 -.0703295 -.0480552
_cons | .1729596 .0123357 14.02 0.000 .148782 .1971372
------------------------------------------------------------------------------
We could fill up lots of fact sheets demonstrating other ways that the xtgee command is equivalent to other commands, but the real power is to use it for its intended use and model the correlation that exists in the panels.
© Copyright 2005 Stata Corporation. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||