Randomization code

Stata function for download: rndm.ado

Notes: This is a preliminary version of this function. Graphing is currently not implemented. Time-series operators lead to errors, so lags, leads and differences must be stored as new variables. To use the file place it in your ado sub-directory. If you use this code, please send me an email to let me know and send any suggestions. Also, please cite [Hsiang and Jina (2014)].

This function performs a series of randomization tests for exploring model-generated bias in panel data models. It randomizes a treatment variable in three different ways and stores coefficients from the same model after randomization. Coefficients can then be plotted to see how the randomized distribution compares to the original point estimates. Randomization is performed in three ways (shown graphically in figure):

1. Entire sample – Randomly re-assign the value of the “treatment” variable across the whole sample.

2. Between panel variable – Randomly re-assign each panel unit’s complete time series to another panel unit while preserving the time-ordering. This preserves the time structure within the data, thereby testing whether temporal trends might generate spurious correlations.

3. Within panel variable – Randomly re-order each panel unit’s time-series while keeping it assigned to the original panel unit. This alters only the time structure of the data, thereby testing whether time invariant cross-sectional patterns might generate spurious correlations.

cartoon

In order to run the function, the variables listed below must be created and entered as arguments of the function in the order:

. rndm data pvar tvar regcmd outlab rndmvar strvar iter

This will output three datasets in a new folder (called “rndm_workdata”, in the working directory) which store coefficients using the prefix specified in variable “outlab”.

“data” is the location of the dataset that will be used (file path)
“pvar” is the panel variable in the data
“tvar” is the time variable in the data
“regcmd” is the regression model that will be randomized
“outlab” is the name that prefixes outputed files
“rndmvar” is the variable that will be randomised
“strvar” is the variable that will be stored and plotted (can be different to rndmvar)
“iter” is number of randomization iterations

Example output (from Hsiang and Jina ,2013) shows a randomization of the effect of lagged tropical cyclone intensity on GDPpc level after 15 years. The three randomization schemes are shown, with vertical lines representing the coefficient values from the original model. Exact p-values are calculated. Randomizations result in distributions of coefficients that are normal and centered at zero, allowing us to reject the presence of model-generated bias.

rndm_example

EXAMPLE USING STATA-USE DATASET

*Load NLS dataset
webuse nlswork
save nlswork.dta

*Generate argument inputs
gen data ="nlswork.dta"
gen regcmd="xtreg ln_w grade age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure 2.race not_smsa south, fe"
gen rndmvar="tenure"
gen pvar = "idcode"
gen tvar = "year"
gen outlab = "nlswork"
gen strvar = "tenure"
gen iter = "100"

*Run randomization code. Output stored in rndm_workdata folder
rndm data pvar tvar regcmd outlab rndmvar strvar iter