Transform string variable to categorical integer variable

When cleaning datasets one often has string variables containing categories (e.g. country names). A simple way of transforming such a variable to one variable containing the same information is encode. Encode assigns numerical values 1, 2, … to newvar, while the original values (e.g. country names) are kept as labels. The full syntax could be:

encode varname, generate(newvar)

A related command that generates one dummy for each value of the original variable is tab varname, generate(newvar) (when there are many values it is handy to use the quietly command in order to minimize output)

Leave a Reply