Below is Chapter 14, A Sample Session, from the Getting Started with Stata for Windows manual. Chapter 14: A Sample SessionIn this chapter we will use the auto.dta file shipped with Stata. If you wish to follow along, you must load this dataset. Launch Stata and choose Open from the File menu. Select the auto.dta file from the directory in which you installed Stata.. use c:\stata\auto, clear (1978 Automobile Data)The data that we loaded contains
Contains data from auto.dta
obs: 74 1978 Automobile Data
vars: 12 14 Oct 2002 09:02
size: 3,478 (99.7% of memory free) (_dta has notes)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
make str18 %-18s Make and Model
price int %8.0gc Price
mpg int %8.0g Mileage (mpg)
rep78 int %8.0g Repair Record 1978
headroom float %6.1f Headroom (in.)
trunk int %8.0g Trunk space (cu. ft.)
weight int %8.0gc Weight (lbs.)
length int %8.0g Length (in.)
turn int %8.0g Turn Circle (ft.)
displacement int %8.0g Displacement (cu. in.)
gear_ratio float %6.2f Gear Ratio
foreign byte %8.0g origin Car type
-------------------------------------------------------------------------------
Sorted by: foreign
Listing can be informativeHere are some of our data:
. list make mpg in 1/10
+---------------------+
| make mpg |
|---------------------|
1. | AMC Concord 22 |
2. | AMC Pacer 17 |
3. | AMC Spirit 22 |
4. | Buick Century 20 |
5. | Buick Electra 15 |
|---------------------|
6. | Buick LeSabre 18 |
7. | Buick Opel 26 |
8. | Buick Regal 20 |
9. | Buick Riviera 16 |
10. | Buick Skylark 19 |
+---------------------+
Question: Which cars yield the lowest gas mileage?
. l make mpg in 1/5
+---------------------+
| make mpg |
|---------------------|
1. | AMC Concord 22 |
2. | AMC Pacer 17 |
3. | AMC Spirit 22 |
4. | Buick Century 20 |
5. | Buick Electra 15 |
+---------------------+
Which 5 cars yield the highest gas mileage?
. l make mpg in -5/-1
+-------------------+
| make mpg |
|-------------------|
70. | VW Dasher 23 |
71. | VW Diesel 41 |
72. | VW Rabbit 25 |
73. | VW Scirocco 25 |
74. | Volvo 260 17 |
+-------------------+
Descriptive statisticsQuestion: Not being familiar with 1978 prices, what is the average price of a car in this data?
. summarize price
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
Aside: summarize works like listwithout arguments it provides a summary of all of the data:
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
make | 0
price | 74 6165.257 2949.496 3291 15906
mpg | 74 21.2973 5.785503 12 41
rep78 | 69 3.405797 .9899323 1 5
headroom | 74 2.993243 .8459948 1.5 5
-------------+--------------------------------------------------------
trunk | 74 13.75676 4.277404 5 23
weight | 74 3019.459 777.1936 1760 4840
length | 74 187.9324 22.26634 142 233
turn | 74 39.64865 4.399354 31 51
displacement | 74 197.2973 91.83722 79 425
-------------+--------------------------------------------------------
gear_ratio | 74 3.014865 .4562871 2.19 3.89
foreign | 74 .2972973 .4601885 0 1
Notemake has 0 observations because it is a stringcalculating a mean is undefined but is not an error. rep78 has only 69 observations because for five cars, it is missing.
Descriptive statistics, continuedQuestion: What is the average price of cars that are below and above the mean MPG?
. summarize price if mpg<21.3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 43 7091.86 3425.019 3291 15906
. summarize price if mpg>=21.3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 31 4879.968 1344.659 3299 9735
Aside: if can be suffixed to almost all commands. This is one of Stata's more useful features.
Question: What is the median MPG?
. summarize mpg, detail
Mileage (mpg)
-------------------------------------------------------------
Percentiles Smallest
1% 12 12
5% 14 12
10% 14 14 Obs 74
25% 18 14 Sum of Wgt. 74
50% 20 Mean 21.2973
Largest Std. Dev. 5.785503
75% 25 34
90% 29 35 Variance 33.47205
95% 34 35 Skewness .9487176
99% 41 41 Kurtosis 3.975005
Answer: 20.
Descriptive statistics, continuedOur dataset contains variable foreign that is 0 if the car was manufactured in the United States or Canada and 1 otherwise.Problem: Obtain summary statistics for price and MPG for each value of foreign. There are two solutions to this problem:
. sort foreign
. by foreign: summarize price mpg
-> foreign = Domestic
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 52 6072.423 3097.104 3291 15906
mpg | 52 19.82692 4.743297 12 34
_______________________________________________________________________________
-> foreign = Foreign
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 22 6384.682 2621.915 3748 12990
mpg | 22 24.77273 6.611187 14 41
Explain the by prefix
More on byProblem: It appears that the average MPG of domestic and foreign cars differs. Test the hypothesis that the means are equal.
. ttest mpg, by(foreign)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
Domestic | 52 19.82692 .657777 4.743297 18.50638 21.14747
Foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396
---------+--------------------------------------------------------------------
combined | 74 21.2973 .6725511 5.785503 19.9569 22.63769
---------+--------------------------------------------------------------------
diff | -4.945804 1.362162 -7.661225 -2.230384
------------------------------------------------------------------------------
Degrees of freedom: 72
Ho: mean(Domestic) - mean(Foreign) = diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
t = -3.6308 t = -3.6308 t = -3.6308
P < t = 0.0003 P > |t| = 0.0005 P > t = 0.9997
Explain by versus by()
Analysis note: We have established that in 1978 domestic cars had poorer gas mileage than foreign cars. Descriptive statistics, making tablesProblem: Obtain counts of the number of domestic and foreign cars.
. tabulate foreign
Car type | Freq. Percent Cum.
------------+-----------------------------------
Domestic | 52 70.27 70.27
Foreign | 22 29.73 100.00
------------+-----------------------------------
Total | 74 100.00
Problem: The data contains variable rep78 recording each car's frequency-of-repair record (1 = poor, ..., 5 = excellent). Obtain frequency counts.
. tabulate rep78
Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Total | 69 100.00
Problem: We have 74 cars; only 69 have frequency-of-repair records recorded. List the cars for which data is missing.
. list make if missing(rep78)
+---------------+
| make |
|---------------|
3. | AMC Spirit |
7. | Buick Opel |
45. | Plym. Sapporo |
51. | Pont. Phoenix |
64. | Peugeot 604 |
+---------------+
Descriptive statistics, making tables, continuedProblem: Compare frequency-of-repair records for domestic and foreign cars (i.e., make a two-way table).
. tabulate rep78 foreign
Repair |
Record | Car type
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 2 0 | 2
2 | 8 0 | 8
3 | 27 3 | 30
4 | 9 9 | 18
5 | 2 9 | 11
-----------+----------------------+----------
Total | 48 21 | 69
Problem: Domestic cars appear to have poorer frequency-of-repair records. Is the difference statistically significant? Obtain a chi-square (even though there are not at least 5 cars expected in each cell):
. tabulate rep78 foreign, chi2
. tabulate rep78 foreign, chi2
Repair |
Record | Car type
1978 | Domestic Foreign | Total
-----------+----------------------+----------
1 | 2 0 | 2
2 | 8 0 | 8
3 | 27 3 | 30
4 | 9 9 | 18
5 | 2 9 | 11
-----------+----------------------+----------
Total | 48 21 | 69
Pearson chi2(4) = 27.2640 Pr = 0.000
Analysis note: We find that frequency-of-repair records differ between domestic and foreign cars. In 1978, domestic cars appear poorer in this regard.
Descriptive statistics, correlation matricesQuestion: What is the correlation between MPG and weight of car?
. correlate mpg weight
(obs=74)
| mpg weight
-------------+------------------
mpg | 1.0000
weight | -0.8072 1.0000
Problem: Compare the correlation for domestic and foreign cars.
. correlate mpg weight if foreign==0
(obs=52)
| mpg weight
-------------+------------------
mpg | 1.0000
weight | -0.8759 1.0000
. correlate mpg weight if foreign==1
(obs=22)
| mpg weight
-------------+------------------
mpg | 1.0000
weight | -0.6829 1.0000
Note We could have obtained this by typing by foreign: correlate mpg weight instead.
Descriptive statistics, correlation matrices, continuedAside: We can produce correlation matrices containing as many variables as we wish.
. correlate mpg weight price length displacement
(obs=74)
| mpg weight price length displa~t
-------------+---------------------------------------------
mpg | 1.0000
weight | -0.8072 1.0000
price | -0.4686 0.5386 1.0000
length | -0.7958 0.9460 0.4318 1.0000
displacement | -0.7056 0.8949 0.4949 0.8351 1.0000
Graphing dataProblem: We know the average MPG of domestic and foreign cars differs. We have learned that domestic and foreign cars differ in other ways as well, such as in frequency-of-repair record. We found a negative correlation of MPG and weightas we would expectbut the correlation appears stronger for domestic cars. Examine, with an eye toward modeling, the relationship between MPG and weight. Begin with a graph.. scatter mpg weight
Comment: scatter is explained in Graphics Reference Manual, but typing scatter y x draws a graph of y against x. The relationship, we note, appears to be nonlinear. Note When you draw a graph, the Graph window appears, probably covering up your Results window. Click on the Results button to put your Results windows back on top. Want to see the graph again? Click on the Graph button. Graphing data, continuedNext, we draw separate graphs for foreign and domestic cars.. sort foreign . scatter mpg weight, by(foreign, total row(1))
Syntax note: by() is on the right of the command, therefore scatter did whatever it is that it does with the grouping information. What scatter did was draw separate graphs for domestic and foreign cars in a single image. We have only two groups, but scatter will allow any numberthe individual graphs just get smaller. The total suboption added an overall graph to the image and the row(1) suboption presented the graphs in a single row. Analysis note: The relationship is not only nonlinear; the domestic-car relationship appears to differ from that of foreign cars. Model estimation: linear regressionRestatement of problem: We are to model the relationship between MPG and weight.Plan of attack: Based on the graphs, we judge the relationship nonlinear and will model MPG as a quadratic in weight. Also based on the graphs, we judge the relationship to be different for domestic and foreign cars. We will include an indicator (dummy) variable for foreign and evaluate afterwards whether this adequately describes the difference. Thus, we will fit the model: mpg = b0 + b1 weight + b2 weight2 + b3 foreign + Î foreign is already a 0/1 variable, so we only need to create the weight-squared variable:
. gen wtsq = weight^2
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 3, 70) = 52.25
Model | 1689.15372 3 563.05124 Prob > F = 0.0000
Residual | 754.30574 70 10.7757963 R-squared = 0.6913
-------------+------------------------------ Adj R-squared = 0.6781
Total | 2443.45946 73 33.4720474 Root MSE = 3.2827
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0165729 .0039692 -4.18 0.000 -.0244892 -.0086567
wtsq | 1.59e-06 6.25e-07 2.55 0.013 3.45e-07 2.84e-06
foreign | -2.2035 1.059246 -2.08 0.041 -4.3161 -.0909002
_cons | 56.53884 6.197383 9.12 0.000 44.17855 68.89913
------------------------------------------------------------------------------
Model estimation: linear regression, continuedAside: Stata can estimate many kinds of models, including logistic regression, Cox proportional hazards, etc. Click on Help, choose Search..., and enter estimation for a complete list or look up estimation in the index of the Stata Base Reference Manual.Continuation of attack: We obtain the predicted values: . predict mpghat (option xb assumed; fitted values)Comment: Be sure to read [U] 23 Estimation and post-estimation commands. There are a number of features available to you after estimationone is calculation of predicted values. predict just created a new variable called mpghat equal to .0165729weight + 1.59 * 10-6wtsq - 2.2035foreign + 56.53884 Model estimation: linear regression, continuedWe can now graph the data and the predicted curve.Continuation of attack: We just created mpghat with predict. We could graph the fit and data, but we want to evaluate the fit on the foreign and domestic data separately to determine if our shift parameter is adequate. Thus, we will draw the graphs separately: . sort weight . scatter mpg weight || line mpghat weight || if foreign==0
. scatter mpg weight || line mpghat weight || if foreign==1
"scatter mpg weight" says to graph mpg vs. weight as a scatterplot. "line mpghat weight" says to graph mpghat vs. weight as a line plot. The || in between says to join the two commands (overlay the two graphs). Model estimation: linear regression, continuedProblem: You show your results to an engineer. "No," he says. "It should take twice as much energy to move 2,000 pounds 1 mile compared with moving 1,000 pounds, and therefore twice as much gasoline. Miles per gallon is not a quadratic in weight, gallons per mile is a linear function of weight."You go back to the computer: . gen gpm = 1/mpg . label var gpm "Gallons per mile" . sort foreign . scatter gpm weight, by(foreign, total row(1))
Satisfied the engineer is indeed correct, you rerun the regression:
. regress gpm weight foreign
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 113.97
Model | .009117618 2 .004558809 Prob > F = 0.0000
Residual | .00284001 71 .00004 R-squared = 0.7625
-------------+------------------------------ Adj R-squared = 0.7558
Total | .011957628 73 .000163803 Root MSE = .00632
------------------------------------------------------------------------------
gpm | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | .0000163 1.18e-06 13.74 0.000 .0000139 .0000186
foreign | .0062205 .0019974 3.11 0.003 .0022379 .0102032
_cons | -.0007348 .0040199 -0.18 0.855 -.0087504 .0072807
------------------------------------------------------------------------------
You find foreign cars in 1978 less efficient. Foreign cars may have yielded better gas mileage than domestic cars in 1978, but this was only because they were so light.
© Copyright 2005 Stata Corporation. |