Stata – Data Management

Useful string functions in Stata (updated list)

Most often when I search the internet for help on Stata, it is probably when I need to work with string variables (such as names). There are some very good summaries that cover aspects of string variables (e.g., this page). In this post –which will be continuously updated– we present random string functions that we think are extremely useful for Stata users.

Remove blanks from string variables in Stata

Identify and remove blank space in Stata’s string variables

When dealing with string variables in Stata, blanks spaces can make it difficult to identify values. For example, if a variable contains " Arizona", a command that contains an if command such as ... if state="Arizona" won’t detect this observation.

Create duplicate observations in Stata

For certain cleaning jobs, it can be useful to duplicate an observation (often only temporarily). To create a identical copy of an observation, just type

Data storage type matters

Despite most sources tell that the storage type in stata should not matter, it is worth checking whether this is the case for your dataset. I just came across a situation where two identically constructed datasets (one stored in default type (float) and one stored in double) generated different output. Also before that i encountered a problem with person identifiers in the GSOEP if using the default data storage. If your dataset is not huge (with the GSOEP it still works quite ok) it might be worth to take the safe side and use

 set type double

before you assemble your data set. This saves the data in the most precise way stata offers.

Preamble when switching between OS X and Windows

A problem when working on one and the same project on different platforms (here: Windows and Mac/OS X) is that path-names differ. There are two straightforward solutions to this:

1) When defining a number of different path (e.g. one path where data is stored, one where results/output is stored), it is handy to define the paths as globals and to add an “if” condition. The platform can be detected by the local `c(os)’: Continue reading “Preamble when switching between OS X and Windows”

Transform string variable to categorical integer variable

When cleaning datasets one often has string variables containing categories (e.g. country names). A simple way of transforming such a variable to one variable containing the same information is encode. Encode assigns numerical values 1, 2, … to newvar, while the original values (e.g. country names) are kept as labels. Continue reading “Transform string variable to categorical integer variable”

Putting a spell on data

For those of you working with spell-data (could be panel, but I am thinking more of event-history data), there is a great tool that you should be aware of. You can get it at SSC by typing the following command:

ssc install tsspell

What can it do for you? Well, as I said it puts a spell on your data. By giving the command
Continue reading “Putting a spell on data”