* based on mus03p1reg.do Stata version 11 --- chapter 3, slightly modified for this class
cap log close //capture log close'' will close a log if any is open and do nothing if no log is open
********** OVERVIEW OF mus03p1reg.do **********
* Stata program
* copyright C 2010 by A. Colin Cameron and Pravin K. Trivedi
* used for "Microeconometrics Using Stata, Revised Edition"
* by A. Colin Cameron and Pravin K. Trivedi (2010)
* Stata Press
* Chapter 3
* 3.2: DATA: SUMMARY STATISTICS
* 3.4: BASIC REGRESSION ANALYSIS
* 3.5: SPECIFICATION ANALYSIS
* 3.6: PREDICTION
* 3.7: SAMPLING WEIGHTS
* 3.8: OLS USING MATA
* To run you need files
* mus03data.dta
* in your directory
* Stata user-written commands esttab and estadd are used
********** SETUP **********
clear all
set scheme s1mono
/* Stata provides a number of so-called schemes that define the overall look of graphs.
Graphics scheme, s1mono is that s1color uses solid lines of different colors to connect points */
********** DATA DESCRIPTION **********
* File mus03data is extract from MEPS -- medical expendituer panel survey
************ 3.2: DATA SUMMARY STATISTICS
* Variable description for medical expenditure dataset
clear
cd c:/dba2020
use mus03data.dta
/*
*preliminary understanding of the data
describe totexp ltotexp posexp suppins phylim actlim totchr age female income
*float variables are stored in 4 bytes, and double variables are stored in 8 bytes.
*Numbers can be stored in one of five variable types: byte, int, long, float (the default), or
*double. bytes are, naturally, stored in 1 byte. ints are stored in 2 bytes, longs and floats in 4
*bytes, and doubles in 8 bytes.
*/
*More about the data
* Summary statistics for medical expenditure dataset; mean std min max.
summarize totexp ltotexp posexp suppins phylim actlim totchr age female income
/*
outreg2 using myfile.doc, sum(log) replace keep( totexp ltotexp posexp suppins phylim actlim totchr age female income)
*can get the univariarte retults in word format directly
* Summary statistics for medical expenditure dataset, and more details, such as skewness, kurtoise
summarize totexp ltotexp posexp suppins phylim actlim totchr age female income, detail
set matsize 800, permanently //this is necessary. set matsize sets the maximum number of variables that can be included in any of Stata’s estimation commands
outreg2 using myfile_detail.doc, sum(detail) replace keep( totexp ltotexp posexp suppins phylim actlim totchr age female income) eqkeep (N mean sknew kurt)
*can get a full report in word for the univariate analysis
* Tabulate variable; One-way table of frequencies; this is to check for the fequency of negative incomes
tabulate income if income <= 0
*get the results for income lower that 0*1000
summarize income, detail
tabulate income if income >= 100
*get the results for income higher that 100*1000
* Detailed summary statistics of a single variable
summarize totexp, detail
* Two-way table of frequencies
table female totchr
* Two-way table with row and column percentages and Pearson chi-squared
tabulate female suppins, row col chi2
*compare male vs female with or without insurance.
*report male vs female with (without) insurance
*report insurance vs uninsured among man (woman)
* Three-way table of frequencies
table female totchr suppins
*report female and male with (without) insurance and # of illness
* One-way table of summary statistics
table female, contents(N totchr mean totchr sd totchr p50 totchr)
* Two-way table of summary statistics
table female suppins, contents(N totchr mean totchr)
*report female (male) insurance (or uninsured) frequence and mean of illness for each
* Summary statistics obtained using command tabstat
tabstat totexp ltotexp, stat (count mean p50 sd skew kurt) col(stat)
*very useful to get the results directly
*can also use summarize to do the same
*3.2.6 two sample t test*
*two sample t test*
ttest ltotexp, by (female) unequal
*testing the hypothesis regarding the equal expenditures between man and woman? Usually need to be included in research
*3.2.7 density plots - distribution of data
* Kernel density plots with adjustment for highly skewed data
* refer to https://www.stata.com/manuals13/rkdensity.pdf
kdensity totexp if posexp==1, generate (kx1 kd1) n(500) /*this is total expenses density function*/
graph twoway (line kd1 kx1) if kx1 < 40000, name(levels)
kdensity ltotexp if posexp==1, generate (kx2 kd2) n(500) /*this is for log of total expenses*/
graph twoway (line kd2 kx2) if kx2 < ln(40000), name(logs)
graph combine levels logs, iscale(1.0)
graph export chapter3_fig1.eps, replace /*the file is stored in the folder*/
*kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.
*it is similar to histrogram but it smoothes out the curve
***********************************************************************
*********** 3.4: BASIC REGRESSION ANALYSIS ****************************
***********************************************************************
*/
* Pairwise correlations for dependent variable and regressor variables
correlate ltotexp suppins phylim actlim totchr age female income
/*
* export to a word file for the correlation matrix*
estpost correl ltotexp suppins phylim actlim totchr age female income , matrix
esttab . using "correlation_1.rtf", replace notype unstack compress noobs nogaps nostar ///
title({\b Table 1:} {\i Correlations Matrix}) ///
label varwidth(6) modelwidth(7)
shellout using `"correlation_1.rtf"'
*the above file stores the correlation matrix. very convinient*
*/
* OLS regression with heteroskedasticity-robust standard errors
regress ltotexp suppins phylim actlim totchr age female income, vce(robust)
*vce(robust) is the "Huber/White/sandwich estimator"
* https://www.stata.com/manuals13/xtvce_options.pdf
* or refer to textbook equation 3.3 for the robust variance equation
/*
outreg2 using ols.doc, replace
shellout using `"ols.doc"'
*the above could export the results to a word file*
* Display stored results and list available postestimation commands
ereturn list
help regress postestimation
*/
*the following is to get the predicted y values using predict or adjust commands
predict yhat,xb
tabstat yhat, statistics(mean) by(female)
/*
adjust, by (female)