Stata has a set of commands for dealing with 2 x 2 tables, including stratified tables, known collectively as the epitab commands. In order to calculate appropriate statistics and suppress inappropriate statistics, these commands are organized in the way epidemiologists conceptualize data. Stata's ir command is used with incidence rate (incidence density or person-time) data; point estimates and confidence intervals for the incidence rate ratio and difference are calculated along with the attributable or prevented fractions for the exposed and total populations. Stata's cs command is used with cohort study data with equal follow-up time per subject. Risk is then the proportion of subjects who become cases. Point estimates and confidence intervals for the risk difference, risk ratio, and (optionally) the odds ratio are calculated along with attributable or prevented fractions for the exposed and total population. Stata's cc command is used with case-control and cross-sectional data. Point estimates and confidence intervals for the odds ratio are calculated along with attributable or prevented fractions for the exposed and total population. mcc is used with matched case-control data. McNemar's chi-squared, point estimates and confidence intervals for the difference, ratio, and relative difference of the proportion with the factor, along with the odds ratio, are calculated. All of these commands come in two flavors: their normal forms and an "immediate" form. In their normal forms, the commands form counts by summing the dataset in use. In their immediate forms, the data is specified on the command line. For instance, Boice and Monson (1977 and reprinted in Rothman and Greenland 1998, 238) reported on breast cancer cases and person-years of observations for women with tuberculosis repeatedly exposed to multiple X-ray fluoroscopies and those not so exposed:
X-ray fluoroscopy
Exposed Unexposed
-------------------------------------------
Breast cancer cases 41 15
Person years 28,010 19,017
Using the immediate form of ir you specify the values in the table following the command:
. iri 41 15 28010 19017
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 41 15 | 56
Person-time | 28010 19017 | 47027
-----------------+------------------------+----------
| |
Incidence Rate | .0014638 .0007888 | .0011908
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Inc. rate diff. | .000675 | .0000749 .0012751
Inc. rate ratio | 1.855759 | 1.005722 3.60942 (exact)
Attr. frac. ex. | .4611368 | .005689 .7229472 (exact)
Attr. frac. pop | .337618 |
+-----------------------------------------------
(midp) Pr(k>=41) = 0.0177 (exact)
(midp) 2*Pr(k>=41) = 0.0355 (exact)
The grander ir command itself can work with individual-level or aggregate data and also work with stratified data. Rothman and Greenland (1998, 259) reports results from Doll and Hill (1966) on age-specific coronary disease deaths among British male doctors by cigarette smoking:
Smokers Nonsmokers
Age Deaths Person-years Deaths Person-years
-------------------------------------------------------
35-44 32 52,407 2 18,790
45-54 104 43,248 12 10,673
55-64 206 28,612 28 5,710
65-74 186 12,663 28 2,585
75-84 102 5,317 31 1,462
We have entered this data into Stata:
. list
age smokes deaths pyears
1. 35-44 1 32 52407
2. 35-44 0 2 18790
3. 45-54 1 104 43248
4. 45-54 0 12 10673
5. 55-64 1 206 28612
6. 55-64 0 28 5710
7. 65-74 1 186 12663
8. 65-74 0 28 2585
9. 75-84 1 102 5317
10. 75-84 0 31 1462
We can obtain the MantelHaenszel combined estimate of the incidence rate ratio along with 90% confidence intervals by typing
. ir deaths smokes pyears, by(age) level(90)
age | IRR [90% Conf. Interval] M-H Weight
-----------------+-------------------------------------------------
35-44 | 5.736638 1.704242 33.62016 1.472169 (exact)
45-54 | 2.138812 1.274529 3.813215 9.624747 (exact)
55-64 | 1.46824 1.044925 2.110463 23.34176 (exact)
65-74 | 1.35606 .9625995 1.953472 23.25315 (exact)
75-84 | .9047304 .6375086 1.305422 24.31435 (exact)
-----------------+-------------------------------------------------
Crude | 1.719823 1.437554 2.068803 (exact)
M-H combined | 1.424682 1.194375 1.699399
-------------------------------------------------------------------
Test of homogeneity (M-H) chi2(4) = 10.41 Pr>chi2 = 0.0340
Rothman and Greenland (1998, 264) obtains the standardized incidence rate ratio and 90% confidence intervals weighting each age category by the population of the exposed group, thus producing the standardized mortality ratio (SMR). This calculation can be reproduced by specifying by(age) to indicate the table is stratified, and istandard to specify we want the internally standardized rate:
. ir deaths smokes pyears, by(age) level(90) istandard
age | IRR [90% Conf. Interval] Weight
-----------------+-------------------------------------------------
35-44 | 5.736638 1.704242 33.62016 52407 (exact)
45-54 | 2.138812 1.274529 3.813215 43248 (exact)
55-64 | 1.46824 1.044925 2.110463 28612 (exact)
65-74 | 1.35606 .9625995 1.953472 12663 (exact)
75-84 | .9047304 .6375086 1.305422 5317 (exact)
-----------------+-------------------------------------------------
Crude | 1.719823 1.437554 2.068803 (exact)
I. Standardized | 1.417609 1.186541 1.693676
If we wanted the externally standardized ratio (weights proportional to the population of the unexposed group), we would substitute estandard for istandard in the command above.
References
|