|
|

Data Analysis Using Stata
Ulrich Kohler and Frauke Kreuter
Preface
1 "The first time"
- 1.1 Starting Stata
- 1.2 Setting up your screen
- 1.3 Your first analysis
- 1.3.1 Inputting commands
- 1.3.2 Files and the working memory
- 1.3.3 Loading data
- 1.3.4 Variables and observations
- 1.3.5 Looking at data
- 1.3.6 Interrupting a command and repeating a command
- 1.3.7 The variable list
- 1.3.8 The in qualifier
- 1.3.9 Summary statistics
- 1.3.10 The if qualifier
- 1.3.11 Define missing values
- 1.3.12 The by prefix
- 1.3.13 Command options
- 1.3.14 Frequency tables
- 1.3.15 Variable labels and value labels
- 1.3.16 Graphs
- 1.3.17 Getting help
- 1.3.18 Recoding of variables
- 1.3.19 Linear regression
- 1.4 Do-files
- 1.5 Exiting Stata
- 1.6 Exercises
2 Working with do-files
- 2.1 From interactive work to working with a do-file
- 2.1.1 Alternative 1
- 2.1.2 Alternative 2
- 2.2 Designing do-files
- 2.2.1 Comments
- 2.2.2 Line breaks
- 2.2.3 Some crucial commands
- 2.3 Organizing your work
- 2.4 Exercises
3 The grammar of Stata
- 3.1 The elements of Stata commands
- 3.1.1 Stata commands
- 3.1.2 The variable list
- List of variables: required or optional
- Abbreviation rules
- Special listings
- 3.1.3 Options
- 3.1.4 The in qualifier
- 3.1.5 The if qualifier
- 3.1.6 Expressions
- 3.1.7 Lists of numbers
- 3.1.8 Using filenames
- 3.2 Repeating similar commands
- 3.2.1 The by prefix
- 3.2.2 The foreach loop
- The types of foreach lists
- Several commands within a foreach loop
- 3.2.3 The forvalues loop
- 3.3 Weights
- Frequency weights
Analytic weights
Probability weights
- 3.4 Exercises
4 Some general comments on the statistical commands
5 Creating and changing variables
- 5.1 The commands generate and replace
- 5.1.1 Variable names
- 5.1.2 Some examples
- 5.1.3 Changing codes with by, _n, and _N
- 5.1.4 Subscripts
- 5.2 Specialized recoding commands
- 5.2.1 The recode command
- 5.2.2 The egen command
- 5.3 Additional tools for recording data
- 5.3.1 String functions
- 5.3.2 Date functions
- 5.4 Commands for dealing with missing values
- 5.5 Labels
- 5.6 Storage types, or, the ghost in the machine
- 5.7 Exercises
6 Creating and changing graphs
- 6.1 A primer on graph syntax
- 6.2 Graph types
- 6.2.1 Examples
- 6.2.2 Specialized graphs
- 6.3 Graph elements
- 6.3.1 Appearance of data
- Choice of marker
- Marker colors
- Marker size
- Lines
- 6.3.2 Graphs and plot regions
- Graph size
- Plot region
- Scaling the axes
- 6.3.3 Information inside the plot region
- Reference lines
- Labeling inside the plot region
- 6.3.4 Information outside the plot region
- Labeling the axes
- Tick lines
- Axis titles
- The legend
- Graph titles
- 6.4 Multiple graphs
- 6.4.1 Overlaying numerous twoway graphs
- 6.4.2 Option by()
- 6.4.3 Combining graphs
- 6.5 Saving and printing graphs
- 6.6 Exercises
7 Describing and comparing distributions
- 7.1 Categories: Few or many?
- 7.2 Variables with few categories
- 7.2.1 Tables
- Frequency tables
- More than one frequency table
- Comparing distributions
- Summary statistics
- More than one contingency table
- 7.2.2 Graphs
- Histograms
- Bar charts
- Pie charts
- Dot chart
- 7.3 Variables with many categories
- 7.3.1 Frequencies of grouped data
- Some remarks on grouping data
- Special techniques for grouping data
- 7.3.2 Describing data using statistics
- Important summary statistics
- The summarize command
- The tabstat command
- Comparing distributions using statistics
- 7.3.3 Graphs
- Box plots
- Histograms
- Kernel density estimation
- Quantile plot
- Comparing distributions with QQ plots
- 7.4 Exercises
8 Introduction to linear regression
- 8.1 Simple linear regression
- 8.1.1 The basic principle
- 8.1.2 Linear regression using Stata
- The table of coefficients
- Standard errors
- The table of ANOVA results
- The model fit table
- 8.2 Multiple regression
- 8.2.1 Multiple regression using Stata
- 8.2.2 More computations
- Adjusted R2
- Standardized regression coefficients
- 8.2.3 What does "under control" mean?
- 8.3 Regression diagnostics
- 8.3.1 Violation of E(εi) = 0
- Linearity
- Influential cases
- Omitted variables
- Multicollinearity
- 8.3.2 Violation of Var(εi) = σ2
- 8.3.3 Violation of Cov(εi, εj) = 0, i ≠ j
- 8.4 Model extensions
- 8.4.1 Categorical independent variables
- 8.4.2 Interaction terms
- 8.4.3 Regression models using transformed variables
- Nonlinear relations
- Eliminating heteroskedasticity
- 8.5 More on standard errors
- 8.5.1 Bootstrap techniques
- 8.5.2 Confidence intervals on cluster samples
- 8.6 Advanced techniques
- 8.6.1 Median regression
- 8.6.2 Regression models for panel data
- From wide to long format
- Fixed-effects models
- 8.6.3 Error-component models
- 8.7 Exercises
9 Regression models for categorical dependent variables
- 9.1 The linear probability model
- 9.2 Basic concepts
- 9.2.1 Odds, log odds, and odds ratios
- 9.2.2 Excursion: The maximum likelihood principle
- 9.3 Logistic regression with Stata
- 9.3.1 The coefficient table
- Sign interpretation
- Interpretation with odds ratios
- Probability interpretation
- 9.3.2 The iteration block
- 9.3.3 The model fit block
- Classification tables
- Pearson chi-squared
- 9.4 Logistic regression diagnostics
- 9.4.1 Linearity
- 9.4.2 Influential cases
- 9.5 Likelihood-ratio test
- 9.6 Refined models
- 9.6.1 Nonlinear relationships
- 9.6.2 Categorical independent variables
- 9.6.3 Interaction effects
- 9.7 Advanced techniques
- 9.7.1 Probit models
- 9.7.2 Multinomial logistic regression
- 9.7.3 Models for ordinal data
- 9.8 Exercises
10 Reading and writing data
- 10.1 The goal: The data matrix
- 10.2 Importing machine-readable data
- 10.2.1 Reading system files from other packages
- 10.2.2 Reading ASCII text files
- Reading data in spreadsheet format
- Reading data in free format
- Reading data in fixed format
- 10.3 Inputting data
- 10.3.1 Input data using the editor
- 10.3.2 The input command
- 10.4 Combining data
- 10.4.1 The GSOEP database
- 10.4.2 The merge command
- The merge procedure
- Keeping track of observations
- Merging more than two files
- Merging data on different levels
- 10.4.3 The append command
- 10.5 Saving and exporting data
- 10.6 Handling big datasets
- 10.6.1 Rules for handling the working memory
- 10.6.2 Using oversized datasets
- 10.7 Exercises
11 Do-files for advanced users and user-written programs
- 11.1 Two examples of usage
- 11.2 Four programming tools
- 11.2.1 Local macros
- Calculating with local macros
- Combining local macros
- Changing local macros
- 11.2.2 Do-files
- 11.2.3 Programs
- The problem of redefinition
- The problem of naming
- The problem of error checking
- 11.2.4 Programs in do-files and ado-files
- 11.3 User-written Stata commands
- 11.3.1 Parsing variable lists
- 11.3.2 Parsing options
- 11.3.3 Parsing if and in qualifiers
- 11.3.4 Generating an unknown number of variables
- 11.3.5 Default values
- 11.3.6 Extended macro functions
- 11.3.7 Avoiding changes in the dataset
- 11.3.8 Help files
- 11.4 Exercises
12 Around Stata
- 12.1 Resources and information
- 12.2 Taking care of Stata
- 12.3 Additional procedures
- 12.3.1 SJ and STB ado-files
- 12.3.2 SSC ado-files
- 12.3.3 Other ado-files
- 12.4 Exercises
References
Author Index
Subject Index
|