Check whether variable exists in if-conditions

In some applications, e.g. if you want to save coefficient estimates from a regression with many dummies (e.g. fixed effects), you might want to store coefficients as estimates. In this example, we are interested in storing the estimates of the GROUPVAR dummies, but not the dummies of OTHERVAR. While this is usually straightforward by writing

Continue reading

Standardize variables by group

Unfortunately, the otherwise great Stata command egen does not allow to standardize variables group, e.g. for each year separately. There is a small get-around by calculating mean and SD first, and then manually creating the standardized the variable (and then you really wonder why this is not implemented in Stata).

* StandardizeVAR by year
by year: egen VAR_mean= mean(VAR)
by year: egen VAR_sd = sd(VAR)
by year: gen VAR_std = (VAR-VAR_mean)/VAR_sd

or, if you need to do it for several variables at once

* StandardizeVAR1 VAR2 VAR3 by year
foreach var of varlist VAR1 VAR2 VAR3 {

by year: egen `var'_mean= mean(`var')
by year: egen `var'_sd = sd(`var')
by year: gen `var'_std = (`var'-`var'_mean)/`var'_sd

Manipulating variable labels

Occasionally we want to use variable labels in loops or we want to apply them to other labels. a very simple local syntax function for that is local localname: variable label varname.

As an example, suppose that I have three variables (x, y, z) with their respective labels. I want to create three other variables with their mean and I want to label those variables using the labels from x, y and z. I can type:

global varlist1 x y x Continue reading

Recording time stata needs for running a file

Running do-files with a large number of simulations or iterations often takes a lot of time. Either just for fun or in order to make your do-file more efficient, you may want to record the time Stata needs in order to run the whole file or just a part of it. The solution is –timer–. Just specify in the beginning Continue reading

Repetitive tasks … let STATA do the work

Especially in the process of data preparation, but also when one runs whole sets of analysis, we start repeating commands and sets of commands for similar variables. For example in one of my projects, I had to process salary information, that was monthly in wide format, and for several reasons I could not use reshape:

I could have typed:
gen str10 v201_1=""
replace v201_1=c2 if c1=="201"
gen str10 v201_2=""
replace v201_2=c3 if c1=="201"
gen str10 v201_12=""
replace v201_12=c13 if c1=="201"

Continue reading

Using Stata to randomise vignettes in NetQuestionnaire

One huge disadvantage of NetQuestionnaire (NetQ) is that randomisation is only possible for the order of sub-items of questions. There is, however, a way to use Stata to make anything random (e.g. the order of questions, content of questions). Below, I describe a way how to generate randomly sets of vignettes. At the end of the do-file, text is saved to an Excel-file). Merged with an address-list, this information can be imported to NetQ, and used in questions using the NetQ-variables. Continue reading