“There are two things that you should never see being made, sausage and demographics”
— Jim Stone, Geovue, 1998.
Today we tour the sausage factory, at least as it relates to our recent foray into weekly small area unemployment estimates.
Typically, our models are run twice a year and rely on sources which at least have normal and predictable properties, in the case of our labor force module, a hybrid of two survey sources – one which is geographically detailed but not particularly timely (the American Community Survey, ACS), the other of which is timely but not particularly detailed (the Current Population Survey, CPS). Life is about trade-offs after all.
To estimate a table, such as employment by occupation, we force the cells of a geographically detailed matrix to fit the totals of the more recent data with minimal disruption to the “structure” of the detailed data. To GIS type people, this is basically a rubber-sheeting exercise where you are stitching together a set of maps which are all distorted in different ways.
For weekly unemployment estimates, we clearly needed data which is updated weekly. For example, the Department of Labor (DOL) releases of September 3 included the following elements by state –
- Advance UI Claims, week ending August 29 and finalized UI Claims, week ending August 22
- Advance insured unemployment, week ending August 22 and finalized Insured Unemployment, week ending August 15
Unlike the sampled CPS, the UI claims should be a pretty simple deal. It is just counting, right? And the total insured unemployed should be equally a simple task. The mathematics should be pretty easy – the total unemployed should equal the previous week unemployed plus the new claimants less those coming off the system. Sadly….
But first, to rekindle your recurrent nightmares about high school algebra, a word problem:
On Saturday, Johnny has 108 cookies. Amazon delivers 14 cookies during the week. On the next Saturday, he has 246 cookies. If we know that Johnny ate none of the newly delivered cookies but ate at least one cookie (witness the crumbs on his face), how many cookies did Johnny eat that week?
And for the bonus question, even though you did not get the first one right:
The following week, with his 246 cookies in tow, Amazon delivers 640 cookies and again, Johnny eats an undisclosed number but none of the newly delivered ones. On Saturday, his cookie inventory records 256.
Both are equally impossible unless you are recording unemployment statistics for the state of California. We found equally disturbing arithmetic sins in the numbers of several other states – New York, New Jersey, Georgia, Kentucky. There were undoubtedly other violations which did not result in the eating of negative cookies.
So what is a modeler to do? There are really only three options:
- Make up data which doesn’t violate the known rules of mathematics. Usually, it will take the form of something that fits your theory perfectly, so will be of no real benefit to anybody.
- Discard the otherwise valuable data and use the less viable monthly data and, well, invent the data for the intervening weeks. It again will fit your theories quite beautifully but will be of no particular benefit.
- Use the stable monthly dataset to smooth out the wrinkles in the weekly materials to remove as much of the offending mathematical impossibilities as possible.
We chose door number three on this one by using a set of time lagged multipliers to force the weekly data closer to the monthly data, atoning for the worst of the mathematical sins. The desirable property of having our estimate lie between the CPS and UI numbers seemed like a good goal, and one which has been achieved.
Of course, the question was immediately asked, why don’t your numbers for Georgia match the monthly CPS number? At this point, most of you will run screaming from the sausage factory, never to return. We don’t blame you, but hope that you will continue to enjoy sausage, especially when served with chili on a nice fresh bun.