Ritme Informatique
spécialiste du logiciel scientifique
demande d'info
recherche
contact
bienvenue
accueil
produits
formations
services
support
tarifs
logiciels scientifiques

stata

Demande de devis

Retour à la liste
stata

The Workflow of Data Analysis Using Stata

J. Scott Long

Table of contents

List of tables
List of figures
List of examples
Preface (pdf)
A word about fonts, files, commands, and examples

1 Introduction

  • 1.1 Replication: The guiding principle for workflow
  • 1.2 Steps in the workflow
    • 1.2.1 Cleaning data
    • 1.2.2 Running analysis
    • 1.2.3 Presenting results
    • 1.2.4 Protecting files
  • 1.3 Tasks within each step
    • 1.3.1 Planning
    • 1.3.2 Organization
    • 1.3.3 Documentation
    • 1.3.4 Execution
  • 1.4 Criteria for choosing a workflow
    • 1.4.1 Accuracy
    • 1.4.2 Efficiency
    • 1.4.3 Simplicity
    • 1.4.4 Standardization
    • 1.4.5 Automation
    • 1.4.6 Usability
    • 1.4.7 Scalability
  • 1.5 Changing your workflow
  • 1.6 How the book is organized

2 Planning, organizing, and documenting

  • 2.1 The cycle of data analysis
  • 2.2 Planning
  • 2.3 Organization
    • 2.3.1 Principles for organization
    • 2.3.2 Organizing files and directories
    • 2.3.3 Creating your directory structure
      • A directory structure for a small project
      • A directory structure for a large, one-person project
      • Directories for collaborative projects
      • Special-purpose directories
      • Remembering what directories contain
      • Planning your directory structure
      • Naming files
      • Batch files
    • 2.3.4 Moving into a new directory structure (advanced topic)
      • Example of moving into a new directory structure
  • 2.4 Documentation
    • 2.4.1 What should you document?
    • 2.4.2 Levels of documentation
    • 2.4.3 Suggestions for writing documentation
      • Evaluating your documentation
    • 2.4.4 The research log
      • A sample page from a research log
      • A template for research logs
    • 2.4.5 Codebooks
      • A codebook based on the survey instrument
    • 2.4.6 Dataset documentation
  • 2.5 Conclusions

3 Writing and debugging do-files

  • 3.1 Three ways to execute commands
    • 3.1.1 The Command window
    • 3.1.2 Dialog boxes
    • 3.1.3 Do-files
  • 3.2 Writing effective do-files
    • 3.2.1 Making do-files robust
      • Make do-files self-contained
      • Use version control
      • Exclude directory information
      • Include seeds for random numbers
    • 3.2.2 Making do-files legible
      • Use lots of comments
      • Use alignment and indentation
      • Use short lines
      • Limit your abbreviations
      • Be consistent
    • 3.2.3 Templates for do-files
      • Commands that belong in every do-file
      • A template for simple do-files
      • A more complex do-file template
  • 3.3 Debugging do-files
    • 3.3.1 Simple errors and how to fix them
      • Log file is open
      • Log file already exists
      • Incorrect command name
      • Incorrect variable name
      • Incorrect option
      • Missing comma before options
    • 3.3.2 Steps for resolving errors
      • Step 1: Update Stata and user-written programs
      • Step 2: Start with a clean slate
      • Step 3: Try other data
      • Step 4: Assume everything could be wrong
      • Step 5: Run the program in steps
      • Step 6: Exclude parts of the do-file
      • Step 7: Starting over
      • Step 8: Sometimes it is not your mistake
    • 3.3.3 Example 1: Debugging a subtle syntax error
    • 3.3.4 Example 2: Debugging unanticipated results
    • 3.3.5 Advanced methods for debugging
  • 3.4 How to get help
  • 3.5 Conclusions

4 Automating your work

  • 4.1 Macros
    • 4.1.1 Local and global macros
      • Local macros
      • Global macros
      • Using double quotes when defining macros
      • Creating long strings
    • 4.1.2 Specifying groups of variables and nested models
    • 4.1.3 Setting options with locals
  • 4.2 Information returned by Stata commands
    • Using returned results with local macros
  • 4.3 Loops: foreach and forvalues
    • The foreach command
    • The forvalues command
    • 4.3.1 Ways to use loops
      • Loop example 1: Listing variable and value labels
      • Loop example 2: Creating interaction variables
      • Loop example 3: Fitting models with alternative measures of education
      • Loop example 4: Recoding multiple variables the same way
      • Loop example 5: Creating a macro that holds accumulated information
      • Loop example 6: Retrieving information returned by Stata
    • 4.3.2 Counters in loops
      • Using loops to save results to a matrix
    • 4.3.3 Nested loops
    • 4.3.4 Debugging loops
  • 4.4 The include command
    • 4.4.1 Specifying the analysis sample with an include file
    • 4.4.2 Recoding data using include files
    • 4.4.3 Caution when using include files
  • 4.5 Ado-files
    • 4.5.1 A simple program to change directories
    • 4.5.2 Loading and deleting ado-files
    • 4.5.3 Listing variable names and labels
    • 4.5.4 A general program to change your working directory
    • 4.5.5 Words of caution
  • 4.6 Help files
    • 4.6.1 nmlabel.hlp
    • 4.6.2 help me
  • 4.7 Conclusions

5 Names, notes, and labels

  • 5.1 Posting files
  • 5.2 The dual workflow of data management and statistical analysis
  • 5.3 Names, notes, and labels
  • 5.4 Naming do-files
    • 5.4.1 Naming do-files to re-create datasets
    • 5.4.2 Naming do-files to reproduce statistical analysis
    • 5.4.3 Using master do-files
      • Master log files
    • 5.4.4 A template for naming do-files
      • Using subdirectories for complex analysis
  • 5.5 Naming and internally documenting datasets
    • Never name it final!
    • 5.5.1 One time only and temporary datasets
    • 5.5.2 Datasets for larger projects
    • 5.5.3 Labels and notes for datasets
    • 5.5.4 The datasignature command
      • A workflow using the datasignature command
      • Changes datasignature does not detect
  • 5.6 Naming variables
    • 5.6.1 The fundamental principle for creating and naming variables
    • 5.6.2 Systems for naming variables
      • Sequential naming systems
      • Source naming systems
      • Mnemonic naming systems
    • 5.6.3 Planning names
    • 5.6.4 Principles for selecting names
      • Anticipate looking for variables
      • Use simple, unambiguous names
      • Try names before you decide
  • 5.7 Labeling variables
    • 5.7.1 Listing variable labels and other information
      • Changing the order of variables in your dataset
    • 5.7.2 Syntax for label variable
    • 5.7.3 Principles for variable labels
      • Beware of truncation
      • Test labels before you post the file
    • 5.7.4 Temporarily changing variable labels
    • 5.7.5 Creating variable labels that include the variable name
  • 5.8 Adding notes to variables
    • 5.8.1 Commands for working with notes
      • Listing notes
      • Removing notes
      • Searching notes
    • 5.8.2 Using macros and loops with notes
  • 5.9 Value labels
    • 5.9.1 Creating value labels is a two-step process
      • Step 1: Defining labels
      • Step 2: Assigning labels
      • Why a two-step system?
      • Removing labels
    • 5.9.2 Principles for constructing value labels
      • 1) Keep labels short
      • 2) Include the category number
      • 3) Avoid special characters
      • 4) Keeping track of where labels are used
    • 5.9.3 Cleaning value labels
    • 5.9.4 Consistent value labels for missing values
    • 5.9.5 Using loops when assigning value labels
  • 5.10 Using multiple languages
    • 5.10.1 Using label language for different written languages
    • 5.10.2 Using label language for short and long labels
  • 5.11 A workflow for names and labels
    • Step 1: Plan the changes
    • Step 2: Archive, clone, and rename
    • Step 3: Revise variable labels
    • Step 4: Revise value labels
    • Step 5: Verify the changes
    • 5.11.1 Step 1: Check the source data
      • Step 1a: List the current names and labels
      • Step 1b: Try the current names and labels
    • 5.11.2 Step 2: Create clones and rename variables
      • Step 2a: Create clones
      • Step 2b: Create rename commands
      • Step 2c: Rename variables
    • 5.11.3 Step 3: Revise variable labels
      • Step 3a: Create variable-label commands
      • Step 3b: Revise variable labels
    • 5.11.4 Step 4: Revise value labels
      • Step 4a: List the current labels
      • Step 4b: Create label define commands to edit
      • Step 4c: Revise labels and add them to dataset
    • 5.11.5 Step 5: Check the new names and labels
  • 5.12 Conclusions

6 Cleaning your data

  • 6.1 Importing data
    • 6.1.1 Data formats
      • ASCII data formats
      • Binary-data formats
    • 6.1.2 Ways to import data
      • Stata commands to import data
      • Using other statistical packages to export data
      • Using a data conversion program
    • 6.1.3 Verifying data conversion
      • Converting the ISSP 2002 data from Russia
  • 6.2 Verifying variables
    • 6.2.1 Values review
      • Values review of data about the scientific career
      • Values review of data on family values
    • 6.2.2 Substantive review
      • What does time to degree measure?
      • Examining high-frequency values
      • Links among variables
      • Changes in survey questions
    • 6.2.3 Missing-data review
      • Comparisons and missing values
      • Creating indicators of whether cases are missing
      • Using extended missing values
      • Verifying and expanding missing-data codes
      • Using include files
    • 6.2.4 Internal consistency review
      • Consistency in data on the scientific career
    • 6.2.5 Principles for fixing data inconsistencies
  • 6.3 Creating variables for analysis
    • 6.3.1 Principles for creating new variables
      • New variables get new names
      • Verify that new variables are correct
      • Document new variables
      • Keep the source variables
    • 6.3.2 Core commands for creating variables
      • The generate command
      • The clonevar command
      • The replace command
    • 6.3.3 Creating variables with missing values
    • 6.3.4 Additional commands for creating variables
      • The recode command
      • The egen command
      • The tabulate, generate() command
    • 6.3.5 Labeling variables created by Stata
    • 6.3.6 Verifying that variables are correct
        • Checking the code
        • Listing variables
        • Plotting continuous variables
        • Tabulating variables
        • Constructing variables multiple ways
  • 6.4 Saving datasets
    • 6.4.1 Selecting observations
      • Deleting cases versus creating selection variables
    • 6.4.2 Dropping variables
      • Selecting variables for the ISSP 2002 Russian data
    • 6.4.3 Ordering variables
    • 6.4.4 Internal documentation
    • 6.4.5 Compressing variables
    • 6.4.6 Running diagnostics
      • The codebook, problem command
      • Checking for unique ID variables
    • 6.4.7 Adding a data signature
    • 6.4.8 Saving the file
    • 6.4.9 After a file is saved
  • 6.5 Extended example of preparing data for analysis
    • Creating control variables
    • Creating binary indicators of positive attitudes
    • Creating four-category scales of positive attitudes
  • 6.6 Merging files
    • 6.6.1 Match-merging
      • Sorting the ID variable
    • 6.6.2 One-to-one merging
      • Combining unrelated datasets
    • 6.6.3 Forgetting to match-merge
  • 6.7 Conclusions

7 Analyzing data and presenting results

  • 7.1 Planning and organizing statistical analysis
    • 7.1.1 Planning in the large
    • 7.1.2 Planning in the middle
    • 7.1.3 Planning in the small
  • 7.2 Organizing do-files
    • 7.2.1 Using master do-files
    • 7.2.2 What belongs in your do-file?
  • 7.3 Documentation for statistical analysis
    • 7.3.1 The research log and comments in do-files
    • 7.3.2 Documenting the provenance of results
      • Captions on graphs
  • 7.4 Analyzing data using automation
    • 7.4.1 Locals to define sets of variables
    • 7.4.2 Loops for repeated analyses
      • Computing t tests using loops
      • Loops for alternative model specifications
    • 7.4.3 Matrices to collect and print results
      • Collecting results of t tests
      • Saving results from nested regressions
      • Saving results from different transformations of articles
    • 7.4.4 Creating a graph from a matrix
    • 7.4.5 Include files to load data and select your sample
  • 7.5 Baseline statistics
  • 7.6 Replication
    • 7.6.1 Lost or forgotten files
    • 7.6.2 Software and version control
    • 7.6.3 Unknown seed for random numbers
      • Bootstrap standard errors
      • Letting Stata set the seed
      • Training and confirmation samples
    • 7.6.4 Using a global that is not in your do-file
  • 7.7 Presenting results
    • 7.7.1 Creating tables
      • Using spreadsheets
      • Regression tables with esttab
    • 7.7.2 Creating graphs
      • Colors, black, and white
      • Font size
    • 7.7.3 Tips for papers and presentations
      • Papers
      • Presentations
  • 7.8 A project checklist
  • 7.9 Conclusions

8 Protecting your files

  • 8.1 Levels of protection and types of files
  • 8.2 Causes of data loss and issues in recovering a file
  • 8.3 Murphy’s law and rules for copying files
  • 8.4 A workflow for file protection
    • Part 1: Mirroring active storage
    • Part 2: Offline backups
  • 8.5 Archival preservation
  • 8.6 Conclusions

9 ConclusionsA How Stata works

  • A.1 How Stata works
    • Stata directories
    • The working directory
  • A.2 Working on a network
  • A.3 Customizing Stata
    • A.3.1 Fonts and window locations
    • A.3.2 Commands to change preferences
      • Options that can be set permanently
      • Options that need to be set each session
    • A.3.3 profile.do
      • Function keys
  • A.4 Additional resources

References
Author index (pdf)

Subject index (pdf)