Education is Expensive, Can You Really Afford Machine Learning for Site Selection?
Gary Menger, July 2019
The other night, I was watching a 60 Minutes piece on the growing role of Chinese software companies in the AI world. Specifically, the piece was on the growing use of surveillance cameras to identify individuals and their characteristics. This is what governments who are afraid of their people do so that they can intimidate them into submission, but that’s not the point here.
By way of example, the system correctly identified the interviewer, Scott Pelley, as being male, got his age right, and knew that he was wearing a suit, but said it was black not a darker shade of gray. The fix was simple. Show the system a million or so pictures of gray suits of various hues, and all would be well.
This got me to thinking about the current crop of companies out there touting “machine learning” and “artificial intelligence” for site location. The claims of these companies are brash to say the least. They claim that their systems will learn to identify not just good sites, but the best or optimal sites. Wonderful if it is true.
And yet It struck me that when my children were young, they learned their colors effortlessly. Show them two completely different shape red objects and immediately they set off around the room pointing out red objects and saying “red”. The fun stopped only temporarily when they hit the orange object, asking “what color is that?”. Inherently, they knew that their classification system did not have enough groups to handle this particular object and they waited to be taught the next color.
Similarly, my dogs know the voices of many people and will get excited upon hearing a familiar voice on speakerphone. More than that, they can distinguish certain cars from others. They hear my son’s Dodge Hellcat – from a half mile away – and head for the front door to greet him. Moreover, they can tell that my neighbor’s nearly identical car is not his, even though I personally lack this magical power. Distinguishing between two engines which are near identical in sound is simply something that they have managed to learn by trial and error.
The brain – human or otherwise – is adept at identifying the salient features which distinguish one object from another. The machine, on the other hand, was satisfied with misclassifying the object (the gray suit) and moving on. It needed intervention to be made aware of its error, although both the human and computer were effectively doing the same task – classifying an object on the basis of its measured characteristics for which it has been given instructions.
What on earth has this to do with site location you ask? Well, a couple of points immediately come to mind.
First, the ability to correctly describe the object requires that the object has been introduced properly. Absent data, the machine is unable to know what to do with an unfamiliar object as it must be told, indeed shown by example, how to correctly classify the object and that it should, in some cases, create new classifications altogether. Otherwise, it will simply group the object with those it is already familiar with. If color is not something it has been “trained” to recognize, it will happily and properly be ignored.
Translate this into the real world of site location. The identification of a “good” site is based on a range of complex components, which interact with each other in surprising ways. We can easily list off the characteristics of the site itself – for example – accessibility, visibility, parking capabilities, nearby complementary and competitive facilities – and the trade area of the site – demographics, traffic volumes, competing opportunities, and so on. In some cases, such as traffic volumes, both too little and too much can be a bad thing. Demographics as well are complex and multidimensional in determining the effective demand at a location. This is several orders of magnitude beyond identifying subtle hues of colors.
The “throw all the data you have and see what sticks” philosophy is all the rage. Gather up as much data as you can muster, and let the system sort out what criteria are effective discriminators to build your classification system. All well and good, if your data covers the range of possible influencers on performance. But what if you have no data that adequately measures the visibility of signage? Happily, this will not be a problem for the learning algorithm. Ignorance is bliss in this world. Problem number one, therefore: in order to be effective as a learning system, the designer must thoroughly understand what characteristics might have an impact on site performance and have collected accurate and meaningful data which describes those characteristics. I know of very few people who can sort this stuff all out, although I do know a few who instinctively can look at a site and forecast its sales (they tend to be the grizzled old-timers in this business, not the slick twenty-something applied mathematics doctorate holders). Letting the machine learn from whatever data you throw at it is dangerous – as it will confidently predict performance solely based on what it now knows.
And here is where it gets really tricky. A small retailer with aggressive growth plans almost inevitably has only “good” sites. If they had bad sites, they would be out of business, right? Yes, there are shades of good, and the algorithm can and will work with whatever it has. But it has not been shown a “bad” site in all probability and must be instructed on what that looks like.
This brings us to the second problem – these are very data hungry applications, and in order to correctly classify “dark gray” as a separate color from “black” or “light gray” it must be shown oodles of examples in order to get the fine differentiation between them. By contrast, the child learning colors usually need not be shown more than a few in order to learn the color, and inherently seem to know that red comes in many, many subtle shades.
So, in order to be an effective identifier of sites for a particular chain, we must show it lots and lots of examples, and we must know what to show it in order for it to learn. Again, we have to know ourselves what ingredients go into making a site good or bad, and in what relative quantities. Yes, I know, given sufficient unbiased and comprehensive data, the machine learning algorithms will sort this out.
And herein lies the third, and probably fatal problem. In the case of the child, or even the computer, learning its colors (that is, building a classification system of hue and intensity), the cost of an error is trivial in relation to the number of times the experiment must be repeated. Please notice that in order to avoid “bad” sites, it must have been introduced as to what that actually means. So, before the system gets adept at finding good sites, let alone the best sites, it must first make mistakes. And lots of them.
This is why, for most retail chains, especially small ones, the approach just doesn’t make sense. In order to be at a point where you want to expand, you already have pretty much only “good” sites. So, you turn your data over to a machine, which, given enough data and iterations, will no doubt find better sites for you. Sadly, the cost of making an error is slightly greater here than misidentifying dark gray as black, and since it must make mistakes in order to learn and education is very expensive, you shall in all probability find yourself out of business, or at least out of cash, long before the machine learns enough to be effective.
I am not against using mathematical techniques to improve site location decisions, but that said, trusting a black-box algorithm that must learn by its mistakes is not exactly a recipe for success. While the same could be said about the models of mainline site location companies, any system is only as good as the skills of those who created it, and those who must utilize it. An applied mathematics wizard combined with nearly zero real world real estate/site selection experience is most certainly no magic formula, and those companies which suggest otherwise are kidding both themselves and their customers. They too shall go the way of the many “trust us, it’s too complicated for your little head” modeling companies which went down this path over the four decades I have been in this business.
Maybe the day will indeed arrive where machines have learned sufficiently to replace the human side of analytics, but in all probability the perfection of AI techniques on simple tasks like differentiating between Bob and his brother John will have long allowed the nanny state to thoroughly enslave us. In which case, we will be told what to buy and where to buy it anyway, making all this a rather moot point.