Preamble in do-files

When writing a lot of do-files during a research process it is hard to keep track of what a do-file was for, what it needs in terms of input, and what it generates in terms of output. Especially, if you get your paper back from the (journal) referees with comments what you should change, and want to re-run some part of the analysis — a year after you have done it –, it is hard to remember exactly what you need to do.

I use a preamble in my do-files to document (somewhat) this information, but also to set a couple of standard pointers that make my work easier …
Continue reading “Preamble in do-files”

Log-files

Log-files are important in the workflow for two reasons:

  1. Most importantly they keep track of any messages that are “non-fatal”, i.e. that do not stop the progress of the do-file. However, quite often you want to ignore those messages, unless you expect an error to have occurred, then you search through your log files.
  2. Convenient is to use the log-file to collect only that part of the output (results) that you will actually need for your research project.

Continue reading “Log-files”

Putting a spell on data

For those of you working with spell-data (could be panel, but I am thinking more of event-history data), there is a great tool that you should be aware of. You can get it at SSC by typing the following command:


ssc install tsspell

What can it do for you? Well, as I said it puts a spell on your data. By giving the command
Continue reading “Putting a spell on data”

capture and nostop

In some instance the standard option that STATA stops whenever an error occurs is a (minor) annoyance. In one of my projects I was running the same set of regressions over several groups with loops. However, whenever STATA found a group that could not run the regressions it would stop, stating the error no observations. Similar things can happen when you select (sub)groups to run commands like summarize, tabulate etc.

One solution is to “capture” the command, so that any error that is returned does not stop the do-file. My favourite example for capture is the following statement that can be found in almost all of my do-files…:

Continue reading “capture and nostop”

Running regressions with similar sets of variables

Quite often we run variations on regressions, including or excluding (sets of) variables. Copy-pasting the regression and eliminating the variables to be excluded is one way, but given that we speak of sets of variables why not use locals to do the work for you:
Continue reading “Running regressions with similar sets of variables”

Repetitive tasks … let STATA do the work

Especially in the process of data preparation, but also when one runs whole sets of analysis, we start repeating commands and sets of commands for similar variables. For example in one of my projects, I had to process salary information, that was monthly in wide format, and for several reasons I could not use reshape:

I could have typed:
gen str10 v201_1=""
replace v201_1=c2 if c1=="201"
gen str10 v201_2=""
replace v201_2=c3 if c1=="201"
[...]
gen str10 v201_12=""
replace v201_12=c13 if c1=="201"

Continue reading “Repetitive tasks … let STATA do the work”

IMR based on Logit FE

This is a way to calculate a logistic Inverse Mills ratio.

The logistic IMR has some benefits when estimating a model (including correction for selection) on panel data. Because of the incidental parameter problem, it is not possible to estimate Probit FE. Hence, many researchers use a Probit RE model for the selection equation and then estimate the main FE model including the retrieved IMR. A problem which this approach is that the assumptions made are usually not plausible (differences in the correlation between regressors and the unobserved heterogeneity terms in the selection equation and the equation of interest). Continue reading “IMR based on Logit FE”

Using Stata to randomise vignettes in NetQuestionnaire

One huge disadvantage of NetQuestionnaire (NetQ) is that randomisation is only possible for the order of sub-items of questions. There is, however, a way to use Stata to make anything random (e.g. the order of questions, content of questions). Below, I describe a way how to generate randomly sets of vignettes. At the end of the do-file, text is saved to an Excel-file). Merged with an address-list, this information can be imported to NetQ, and used in questions using the NetQ-variables. Continue reading “Using Stata to randomise vignettes in NetQuestionnaire”

Why doesn’t this do-file run through …?

Quite often a do-file is written to run on various data-files that all seem to be the same. Say, you have done it for one year of a data-set and want to repeat the same for the subsequent years that you have data. Now, a check whether the data actually has some data (given your selection) is often a good idea, here is how I did it in a recent project:


count
assert `r(N)'>1

I am using the saved return local variable r(N) that STATA automatically generates after count. This is a need feature that you should consider for many other commands (try return list after your favourite command).

Continue reading “Why doesn’t this do-file run through …?”

Export single numbers to LaTeX or MS Word

In a text, you often you refer to a number (e.g. the number of observations in the estimation sample). There is a simple way to automise the export of this number from a Stata-do-file to a Latex-document. Continue reading “Export single numbers to LaTeX or MS Word”