Weekly data: calendar week vs. Stata weeks

In my dataset, I have information on a number of workers for each week. The raw data I receive (no Stata format) contains information on the year and the week (1, 2, …, 52). Here, a week is defined as the first week of a year which has at least 4 days in January. E.g., the week 1/2009 starts already on December 29, 2008. A result of that is that for some years, a year has 53 calendar weeks. Continue reading

Cleaning up messy (string) variables

Working on firm level data (again), I have the experience of cleaning up hundreds of different spelinngs of occupations that should eventually be categorized into a set of occupations that should only differ when actual different occupations are needed.Let me call the variable occupation.

34. slesar po remontu la
38. slesar po rem. la
44. slesar po rem. i obsluzh. vent. i kondicionirovaniya
54. slesar po rem. i obsluzh. ven. i kondicionirovaniya
146. slesar po rem. la
205. slesar po remontu agregatov
259. slesar po rem.agregatov
313. slesar po remontu kompressornyh ustanovok i oborudovaniya
343. slesar po remontu oborud

Wonderful mess, acutally only a minor part of the full data-set.
Continue reading