Data storage type matters

Despite most sources tell that the storage type in stata should not matter, it is worth checking whether this is the case for your dataset. I just came across a situation where two identically constructed datasets (one stored in default type (float) and one stored in double) generated different output. Also before that i encountered a problem with person identifiers in the GSOEP if using the default data storage. If your dataset is not huge (with the GSOEP it still works quite ok) it might be worth to take the safe side and use

 set type double 

before you assemble your data set. This saves the data in the most precise way stata offers.

Preamble when switching between OS X and Windows

A problem when working on one and the same project on different platforms (here: Windows and Mac/OS X) is that path-names differ. There are two straightforward solutions to this:

1) When defining a number of different path (e.g. one path where data is stored, one where results/output is stored), it is handy to define the paths as globals and to add an “if” condition. The platform can be detected by the local `c(os)’: Continue reading

Why I don’t need SPSS any more …

Most of you know that I don’t like to have SPSS on my computer, let alone use it. Over time I had several versions of SPSS installed nevertheless. And that is because SPSS allowed me to open a SPSS file and then save it in … STATA format. Now there is no reason to do that anymore, and that is not due to StatTransfer!
Continue reading

Transform string variable to categorical integer variable

When cleaning datasets one often has string variables containing categories (e.g. country names). A simple way of transforming such a variable to one variable containing the same information is encode. Encode assigns numerical values 1, 2, … to newvar, while the original values (e.g. country names) are kept as labels. Continue reading

Divide and conquer

Working scientifically with statistics software implies that the analysis one performs should be done using batch-files, in STATA terms using do-files. This is important so that results can be reproduced, and if errors are found, the analysis can be run anew. I have been using a set-up in which I divide the empirical research in several steps, that allow me to reproduce my steps, save time, and insure that I can easily back-up my research without too much of a hassle.
Continue reading

Putting a spell on data

For those of you working with spell-data (could be panel, but I am thinking more of event-history data), there is a great tool that you should be aware of. You can get it at SSC by typing the following command:


ssc install tsspell

What can it do for you? Well, as I said it puts a spell on your data. By giving the command
Continue reading

StatTransfer … who needs it?

Once more working on some standarized data to generate some standard output and graphs in * yuck * Excel. Well, we have to learn to live with the fact that some people and organizations don’t want STATA graphs and tables. So we have to export to Excel. In my case this were several tables and cross tables that were then linked to Excel Graphs. So far I did this with copy-pasting or using StatTransfer, today I found a different way …
Continue reading