List of figures
List of examples
Preface (pdf)
A word about fonts, files, commands, and examples
1 Introduction
- 1.1 Replication: The guiding principle for workflow
- 1.2 Steps in the workflow
- 1.3 Tasks within each step
- 1.4 Criteria for choosing a workflow
- 1.4.1 Accuracy
- 1.4.2 Efficiency
- 1.4.3 Simplicity
- 1.4.4 Standardization
- 1.4.5 Automation
- 1.4.6 Usability
- 1.4.7 Scalability
- 1.5 Changing your workflow
- 1.6 How the book is organized
2 Planning, organizing, and documenting
- 2.1 The cycle of data analysis
- 2.2 Planning
- 2.3 Organization
- 2.3.1 Principles for organization
- 2.3.2 Organizing files and directories
- 2.3.3 Creating your directory structure
- A directory structure for a small project
- A directory structure for a large, one-person project
- Directories for collaborative projects
- Special-purpose directories
- Remembering what directories contain
- Planning your directory structure
- Naming files
- Batch files
- 2.3.4 Moving into a new directory structure (advanced topic)
- 2.4 Documentation
- 2.4.1 What should you document?
- 2.4.2 Levels of documentation
- 2.4.3 Suggestions for writing documentation
- 2.4.4 The research log
- 2.4.5 Codebooks
- 2.4.6 Dataset documentation
- 2.5 Conclusions
3 Writing and debugging do-files
- 3.1 Three ways to execute commands
- 3.2 Writing effective do-files
- 3.2.1 Making do-files robust
- Make do-files self-contained
- Use version control
- Exclude directory information
- Include seeds for random numbers
- 3.2.2 Making do-files legible
- Use lots of comments
- Use alignment and indentation
- Use short lines
- Limit your abbreviations
- Be consistent
- 3.2.3 Templates for do-files
- Commands that belong in every do-file
- A template for simple do-files
- A more complex do-file template
- 3.3 Debugging do-files
- 3.3.1 Simple errors and how to fix them
- Log file is open
- Log file already exists
- Incorrect command name
- Incorrect variable name
- Incorrect option
- Missing comma before options
- 3.3.2 Steps for resolving errors
- Step 1: Update Stata and user-written programs
- Step 2: Start with a clean slate
- Step 3: Try other data
- Step 4: Assume everything could be wrong
- Step 5: Run the program in steps
- Step 6: Exclude parts of the do-file
- Step 7: Starting over
- Step 8: Sometimes it is not your mistake
- 3.3.3 Example 1: Debugging a subtle syntax error
- 3.3.4 Example 2: Debugging unanticipated results
- 3.3.5 Advanced methods for debugging
- 3.4 How to get help
- 3.5 Conclusions
- 4.1 Macros
- 4.1.1 Local and global macros
- 4.1.2 Specifying groups of variables and nested models
- 4.1.3 Setting options with locals
- 4.2 Information returned by Stata commands
- 4.3 Loops: foreach and forvalues
- The foreach command
- The forvalues command
- 4.3.1 Ways to use loops
- Loop example 1: Listing variable and value labels
- Loop example 2: Creating interaction variables
- Loop example 3: Fitting models with alternative measures of education
- Loop example 4: Recoding multiple variables the same way
- Loop example 5: Creating a macro that holds accumulated information
- Loop example 6: Retrieving information returned by Stata
- 4.3.2 Counters in loops
- 4.3.3 Nested loops
- 4.3.4 Debugging loops
- 4.4 The include command
- 4.4.1 Specifying the analysis sample with an include file
- 4.4.2 Recoding data using include files
- 4.4.3 Caution when using include files
- 4.5 Ado-files
- 4.5.1 A simple program to change directories
- 4.5.2 Loading and deleting ado-files
- 4.5.3 Listing variable names and labels
- 4.5.4 A general program to change your working directory
- 4.5.5 Words of caution
- 4.6 Help files
- 4.7 Conclusions
5 Names, notes, and labels
- 5.1 Posting files
- 5.2 The dual workflow of data management and statistical analysis
- 5.3 Names, notes, and labels
- 5.4 Naming do-files
- 5.4.1 Naming do-files to re-create datasets
- 5.4.2 Naming do-files to reproduce statistical analysis
- 5.4.3 Using master do-files
- 5.4.4 A template for naming do-files
- 5.5 Naming and internally documenting datasets
- Never name it final!
- 5.5.1 One time only and temporary datasets
- 5.5.2 Datasets for larger projects
- 5.5.3 Labels and notes for datasets
- 5.5.4 The datasignature command
- 5.6 Naming variables
- 5.6.1 The fundamental principle for creating and naming variables
- 5.6.2 Systems for naming variables
- 5.6.3 Planning names
- 5.6.4 Principles for selecting names
- Anticipate looking for variables
- Use simple, unambiguous names
- Try names before you decide
- 5.7 Labeling variables
- 5.7.1 Listing variable labels and other information
- 5.7.2 Syntax for label variable
- 5.7.3 Principles for variable labels
- 5.7.4 Temporarily changing variable labels
- 5.7.5 Creating variable labels that include the variable name
- 5.8 Adding notes to variables
- 5.8.1 Commands for working with notes
- 5.8.2 Using macros and loops with notes
- 5.9 Value labels
- 5.9.1 Creating value labels is a two-step process
- 5.9.2 Principles for constructing value labels
- 1) Keep labels short
- 2) Include the category number
- 3) Avoid special characters
- 4) Keeping track of where labels are used
- 5.9.3 Cleaning value labels
- 5.9.4 Consistent value labels for missing values
- 5.9.5 Using loops when assigning value labels
- 5.10 Using multiple languages
- 5.10.1 Using label language for different written languages
- 5.10.2 Using label language for short and long labels
- 5.11 A workflow for names and labels
- Step 1: Plan the changes
- Step 2: Archive, clone, and rename
- Step 3: Revise variable labels
- Step 4: Revise value labels
- Step 5: Verify the changes
- 5.11.1 Step 1: Check the source data
- 5.11.2 Step 2: Create clones and rename variables
- 5.11.3 Step 3: Revise variable labels
- 5.11.4 Step 4: Revise value labels
- Step 4a: List the current labels
- Step 4b: Create label define commands to edit
- Step 4c: Revise labels and add them to dataset
- 5.11.5 Step 5: Check the new names and labels
- 5.12 Conclusions
- 6.1 Importing data
- 6.1.1 Data formats
- 6.1.2 Ways to import data
- Stata commands to import data
- Using other statistical packages to export data
- Using a data conversion program
- 6.1.3 Verifying data conversion
- 6.2 Verifying variables
- 6.2.1 Values review
- 6.2.2 Substantive review
- What does time to degree measure?
- Examining high-frequency values
- Links among variables
- Changes in survey questions
- 6.2.3 Missing-data review
- Comparisons and missing values
- Creating indicators of whether cases are missing
- Using extended missing values
- Verifying and expanding missing-data codes
- Using include files
- 6.2.4 Internal consistency review
- 6.2.5 Principles for fixing data inconsistencies
- 6.3 Creating variables for analysis
- 6.3.1 Principles for creating new variables
- New variables get new names
- Verify that new variables are correct
- Document new variables
- Keep the source variables
- 6.3.2 Core commands for creating variables
- 6.3.3 Creating variables with missing values
- 6.3.4 Additional commands for creating variables
- The recode command
- The egen command
- The tabulate, generate() command
- 6.3.5 Labeling variables created by Stata
- 6.3.6 Verifying that variables are correct
- 6.4 Saving datasets
- 6.4.1 Selecting observations
- 6.4.2 Dropping variables
- 6.4.3 Ordering variables
- 6.4.4 Internal documentation
- 6.4.5 Compressing variables
- 6.4.6 Running diagnostics
- The codebook, problem command
- Checking for unique ID variables
- 6.4.7 Adding a data signature
- 6.4.8 Saving the file
- 6.4.9 After a file is saved
- 6.5 Extended example of preparing data for analysis
- Creating control variables
- Creating binary indicators of positive attitudes
- Creating four-category scales of positive attitudes
- 6.6 Merging files
- 6.6.1 Match-merging
- 6.6.2 One-to-one merging
- 6.6.3 Forgetting to match-merge
- 6.7 Conclusions
7 Analyzing data and presenting results
- 7.1 Planning and organizing statistical analysis
- 7.2 Organizing do-files
- 7.3 Documentation for statistical analysis
- 7.4 Analyzing data using automation
- 7.4.1 Locals to define sets of variables
- 7.4.2 Loops for repeated analyses
- 7.4.3 Matrices to collect and print results
- Collecting results of t tests
- Saving results from nested regressions
- Saving results from different transformations of articles
- 7.4.4 Creating a graph from a matrix
- 7.4.5 Include files to load data and select your sample
- 7.5 Baseline statistics
- 7.6 Replication
- 7.6.1 Lost or forgotten files
- 7.6.2 Software and version control
- 7.6.3 Unknown seed for random numbers
- 7.6.4 Using a global that is not in your do-file
- 7.7 Presenting results
- 7.7.1 Creating tables
- 7.7.2 Creating graphs
- Colors, black, and white
- Font size
- 7.7.3 Tips for papers and presentations
- 7.8 A project checklist
- 7.9 Conclusions
- 8.1 Levels of protection and types of files
- 8.2 Causes of data loss and issues in recovering a file
- 8.3 Murphy’s law and rules for copying files
- 8.4 A workflow for file protection
- 8.5 Archival preservation
- 8.6 Conclusions
9 ConclusionsA How Stata works
- A.1 How Stata works
- A.2 Working on a network
- A.3 Customizing Stata
- A.3.1 Fonts and window locations
- A.3.2 Commands to change preferences
- A.3.3 profile.do
- A.4 Additional resources