• probit model with panel-correlation structure
  • Poisson model with panel-correlation structure

Panel data models have exploded in the past 10 years as analysts find it more common to need to analyze a richer structure of data. Some examples of panel data are cross-sectional datasets in which various observations for different types of experimental units are represented in the dataset. An example might be counties (the replication) in various states (the panel identifier). Other examples of panel data are longitudinal where we may have multiple observations (the replication) on the same experimental unit (the panel identifier) over time. The xtgee command will allow either type of panel data.

Stata estimates extensions to generalized linear models where the user is free to model the structure of the within-panel correlation. This extension allows one to model GLM type models with panel data.

The xtgee command offers a rich collection of models for the analyst. These models correspond to population-averaged (or marginal) models in the panel data literature.

What makes xtgee useful is the number of statistical models that it generalizes for use with panel data, the richer correlation structure with models available in other commands, and the availability of robust standard errors (which do not always exist in the equivalent command).

In this example, we consider a probit model where we wish to model whether a worker belongs to the union based on the person's age and whether they are living outside of an SMSA. The people in the study appear multiple times in the dataset (this type of panel dataset is commonly referred to as a longitudinal dataset) and we assume that the observations on a given person are more correlated than between different persons.

. xtgee union age not_smsa, fam(binomial) link(probit) corr(exchangeable) nolog

GEE population-averaged model                   Number of obs      =     26200
Group variable:                     idcode      Number of groups   =      4434
Link:                               probit      Obs per group: min =         1
Family:                           binomial                     avg =       5.9
Correlation:                  exchangeable                     max =        12
                                                Wald chi2(2)       =     52.10
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0066116   .0011484     5.76   0.000     .0043608    .0088623
    not_smsa |  -.1218662   .0269488    -4.52   0.000    -.1746849   -.0690474
       _cons |  -.9730766    .039148   -24.86   0.000    -1.049805   -.8963479
------------------------------------------------------------------------------
The allowable options for the xtgee command are

Families

  • Bernoulli/binomial
  • Gamma
  • Gaussian
  • Inverse Gaussian
  • Negative binomial
  • Poisson

Links

  • Cloglog
  • Identity
  • Log
  • Logit
  • Negative binomial
  • Odds power
  • Power
  • Probit
  • Reciprocal

Correlation Structures

  • Independent
  • Exchangeable
  • Autoregressive
  • Stationary
  • Nonstationary
  • Unstructured
  • User-specified

Assuming an independent correlation structure amounts to ignoring the panel structure of the data. Under this assumption, xtgee will produce answers that are already provided by Stata's nonpanel estimation commands. Examples of when xtgee provides answers that are the same as an existing command are given in the following table.

Family Link Correlation Equivalent Stata command
gaussian identity independent regress
gaussian identity exchangeable xtreg, re (see note 1)
gaussian identity exchangeable xtreg, pa
binomial cloglog independent cloglog (see note 2)
binomial cloglog exchangeable xtclog, pa
binomial logit independent logit or logistic
binomial logit exchangeable xtlogit, pa
binomial probit independent probit (see note 3)
binomial probit exchangeable xtprobit, pa
nbinomial nbinomial independent nbreg (see note 4)
poisson log independent poisson
poisson log exchangeable xtpois, pa
gamma log independent ereg (see note 5)
family link independent glm, irls (see note 6)
Note 1 These methods produce the same results only in the case of balanced panels.
Note 2 For cloglog estimation, xtgee with corr(independent) and cloglog will produce the same coefficients, but the standard errors will be only asymptotically equivalent because cloglog is not the canonical link for the binomial family.
Note 3 For probit estimation, xtgee with corr(independent) and probit will produce the same coefficients, but the standard errors will be only asymptotically equivalent because probit is not the canonical link for the binomial family. If the binomial denominator is not 1, the equivalent maximum-likelihood command is bprobit.
Note 4 Fitting a negative binomial model using xtgee (or using glm will yield results conditional on the specified value of alpha. The nbreg command, however, fits that parameter and provides unconditional estimates.
Note 5 xtgee with corr(independent) can be used to estimate exponential regressions, but this requires specifying scale(1). As with probit, the xtgee-reported standard errors will be only asymptotically equivalent to those produced by ereg because log is not the canonical link for the gamma family. xtgee cannot be used to estimate exponential regressions on censored data.

Using the independent correlation structure, the xtgee command will estimate the same model as estimated with the glm, irls command if the family-link combination is the same.

Note 6 If the xtgee command is equivalent to another command, then the use of corr(independent) and the robust option with xtgee corresponds to using both the robust option and the cluster(varname) option in the equivalent command, where varname corresponds to the i() group variable.

If you choose to model the intracluster correlation as an identity matrix (by specifying the name of an existing identity matrix in the option corr), GEE estimation reduces to a generalized linear model and the results will be identical to estimation by glm.

. glm union age not_smsa, link(identity) family(gauss)

Iteration 0:   log likelihood = -14095.328

Generalized linear models                          No. of obs      =     26200
Optimization     : ML: Newton-Raphson              Residual df     =     26197
                                                   Scale param     =  .1717383
Deviance         =  4499.028831                    (1/df) Deviance =  .1717383
Pearson          =  4499.028831                    (1/df) Pearson  =  .1717383

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = u                        [Identity]
Standard errors  : OIM

Log likelihood   =  -14095.3277                    AIC             =  1.076208
BIC              =  4468.508287

------------------------------------------------------------------------------
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0021565   .0003948     5.46   0.000     .0013828    .0029303
    not_smsa |  -.0591923   .0056826   -10.42   0.000    -.0703301   -.0480546
       _cons |   .1729596   .0123365    14.02   0.000     .1487806    .1971386
------------------------------------------------------------------------------

. xtgee union age not_smsa, link(identity) family(gauss) corr(indep)

Iteration 1: tolerance = 9.073e-15

GEE population-averaged model                   Number of obs      =     26200
Group variable:                     idcode      Number of groups   =      4434
Link:                             identity      Obs per group: min =         1
Family:                           Gaussian                     avg =       5.9
Correlation:                   independent                     max =        12
                                                Wald chi2(2)       =    134.68
Scale parameter:                  .1717187      Prob > chi2        =    0.0000

Pearson chi2(26200):               4499.03      Deviance           =   4499.03
Dispersion (Pearson):             .1717187      Dispersion         =  .1717187

------------------------------------------------------------------------------
       union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0021565   .0003948     5.46   0.000     .0013828    .0029302
    not_smsa |  -.0591923   .0056823   -10.42   0.000    -.0703295   -.0480552
       _cons |   .1729596   .0123357    14.02   0.000      .148782    .1971372
------------------------------------------------------------------------------
We could fill up lots of fact sheets demonstrating other ways that the xtgee command is equivalent to other commands, but the real power is to use it for its intended use and model the correlation that exists in the panels.

© Copyright 2005 Stata Corporation.