When writing a lot of do-files during a research process it is hard to keep track of what a do-file was for, what it needs in terms of input, and what it generates in terms of output. Especially, if you get your paper back from the (journal) referees with comments what you should change, and want to re-run some part of the analysis — a year after you have done it –, it is hard to remember exactly what you need to do.
I use a preamble in my do-files to document (somewhat) this information, but also to set a couple of standard pointers that make my work easier …
Log-files are important in the workflow for two reasons:
- Most importantly they keep track of any messages that are “non-fatal”, i.e. that do not stop the progress of the do-file. However, quite often you want to ignore those messages, unless you expect an error to have occurred, then you search through your log files.
- Convenient is to use the log-file to collect only that part of the output (results) that you will actually need for your research project.
In some instance the standard option that STATA stops whenever an error occurs is a (minor) annoyance. In one of my projects I was running the same set of regressions over several groups with loops. However, whenever STATA found a group that could not run the regressions it would stop, stating the error no observations. Similar things can happen when you select (sub)groups to run commands like summarize, tabulate etc.
One solution is to “capture” the command, so that any error that is returned does not stop the do-file. My favourite example for capture is the following statement that can be found in almost all of my do-files…:
Quite often a do-file is written to run on various data-files that all seem to be the same. Say, you have done it for one year of a data-set and want to repeat the same for the subsequent years that you have data. Now, a check whether the data actually has some data (given your selection) is often a good idea, here is how I did it in a recent project:
I am using the saved return local variable r(N) that STATA automatically generates after count. This is a need feature that you should consider for many other commands (try
return list after your favourite command).