How Do We Know Which Decisions and Policies Will Work?

In 1999, Jim Manzi started a software company called Applied Predictictive Technologies (APT) that pioneered the development of experimental methods now used by dozens of companies to set prices, offer different products, and identify and market to customers.

July 19 2012 by JP Donlon

In 1999, Jim Manzi started a software company called Applied Predictictive Technologies (APT) that pioneered the development of experimental methods now used by dozens of companies to set prices, offer different products, and identify and market to customers. Capital One founders Rich Fairbank and Nigel Morris, for example, used the analytic-intensive approach to experiment with widely differing credit offers in building Capital One into a giant business. Rather than debate which offers were best for which segments, the company tested a wide variety of solicitations in randomly selected households, the measured the relative profitability of resulting customer relationships. The key insight was making relentless experimentation the core methodology for understanding consumer response, which directly contradicted the existing industry paradigm of focusing on pattern-finding models, and using testing as a secondary method.

Soon non-financial companies started randomized experiments, using APT’s predictive analytics software. Executing a randomized experiment—say, to determine whether a pop-up ad should appear in the upper-left or upper-right corner of a web page became routine. Google, Amazon, and eBay are inveterate experimenters. Google claims to have run some 12,000 randomized experiments a year with about 10 percent of these leading to business changes. By 2011, APT had become large enough to provide the software technology used to automate design and measurement experiments for many large corporations. For example, 30 to 40 percent of of the largest retailers, hotel chains, restaurant groups, and retail banks in the U.S. are executing tests on an APT platform.

Recently Manzi wrote Uncontrolled, The Surprising Payoff of Trial and Error for Business, Politics and Society, a book that outlines these methods and how they can be highly effective in addressing both business and social issues such as improving schools, increasing efficiency in government programs, and even community policing. An SB in mathematics from MIT and now a senior fellow at The Manhattan Institute, Manzi was a corporate strategy consultant with Strategic Planning Associates and earlier was a staff member at AT&T Laboratories. Chief Executive caught up with him during a recent visit to New York where we asked him how his approach to making decisions might be applied to wider business and economic issues.

In Uncontrolled. You argue that iterative experimentation as practiced by Google with its customers or Capital One with its cardholder solicitations, can greatly improve decision outcomes in a variety of areas. Where else in business could this use of information technology be put to practical use where it may not yet be applied? For example, if Jamie Dimon, CEO of JPMorgan Chase had been able to tell his traders to use this tool, could he have possibly saved himself and his bank the embarrassment of a the $5.8 billion trading loss?

I don’t know enough about trading operations to say for sure, but I’m skeptical that controlled experiments (as opposed to back-testing) are highly applicable here. On the other hand, the general attitude of humility about how hard it is to make useful, reliable and non-obvious predictions proposed by the book would likely have led to a more “Taleb-like” recognition of model risk, and therefore less willingness to risk massive positions. I accept the explanation that this specific trading loss was created by a failure of controls, and some non-zero failure rate is inevitable in any real-world system, but the greater our respect for model risk, the greater the investment we should make in such controls.

One place that I’m confident that JP Morgan could use more structured experimentation is in its retail branch banking business. In general, Test & Learn has become a plain vanilla capability in direct channels for financial services companies, but the smaller sample sizes required for tests in physical distribution channels (e.g., I can test a new credit card solicitation with 50,000 randomly chosen names from my customer database, but I can test a new branch redesign in perhaps 50 branches) pose difficult analytical challenges that have only recently been overcome. Distribution businesses ranging from branch banking to retail stores to hotels and restaurants to large sales forces from pharma to insurance represent probably the biggest untapped opportunities for business experimentation of the type I described in the book.

The productivity of information systems is at the forefront of most business leaders’ minds. How can using the methods you describe, help leaders with such issues as investment in using social media?

We are deep in the hype phase of the innovation cycle for social media. All the classic signs are there: You could dine out on buffet dinners 52 weeks per year at social media conferences, industry analysts use it their email tag lines, and even the investment bankers are getting bored with it. Any experienced businessperson has seen this movie before with earlier technologies ranging from the Worldwide Web to CRM. As with these other innovations, however, there is real substance at the root of the hype. And – like CRM and the Web – social media is very likely to be a big part of running almost any large corporation in the future.

Most early movers among the users of these prior technologies lost a lot of money, but a small number created enormous shareholder value. By definition, all of the early movers were willing to take risks. But three characteristics distinguished the winners from the losers. First was an unwillingness to be snowed by conventional wisdom, technical jargon or the fairy tales of universal knowledge that abound when everything was still mostly talk and potential. Second was a ruthless focus on profits in excess of capital costs within the foreseeable future as the success criteria for proposed investments of time or money. third was the role for experimentation: A strong bias to act quickly at low cost, learn what works from experience, and then reinforce strength. The ultimate goal was always to exploit the opportunity to pour cash into successful innovations before the competition, but these companies recognized that trial-and-error learning usually uncovers opportunities faster than master plans.

A good practical example of this for social media is the report in Fast Company magazine last year of a series of controlled experiments that major restaurant chains did using my company’s software that showed, despite all the hype, that most Groupon and similar promotions did not create significant value. They could often generate a temporary sales increase, but usually not close to break-even. This doesn’t mean that this model will never create value, just that significant tweaking is required before it can justify scale adoption.

Every year we publish a survey of CEO opinion of the best and worst states in which to do business. Officials from those states that do not rank particularly well such as California, New York, and Illinois claim that their state policies are not harmful to business. What kind of experimentation might they use to test whether or not they are right about their states’ policies?

The application here is straightforward, and described in the book in some detail. State governments can grant policy waivers to cities or counties that want waivers to deviate form state policies. The quid pro quo can be that these lower level jurisdictions must execute randomized controlled experiments according to state specifications that test the effect if the policy change. For example, if New York City believes that it would be better off with a more school choice for K- 12 students than is allowed under state law, the State of New York should grant a waiver to allow this, but require that some randomly chosen students continue to operate under the current law, and then compare the results over several years for the students under the two systems.

The basic concept is simple, and can be explained to any audience: When some place wants to try something new, we should be biased strongly in favor of at least letting them try it out, as long we carefully measure its real results before making a final decision This is not theory, but was done, for example, at the federal and state level in a series of experiments in the late 1980s and early 1990s that had a significant impact on the extremely effective conversion of welfare to workfare.

A great many generalizations have been put forward about income inequality and its purported effects on economic growth and social well-being. Can the methods and thinking presented in Uncontrolled address this problem?

Yes. If we develop specific proposals that can be applied to individuals or jurisdictions that are intended to ameliorate these effects – for example, increases in the minimum wage, or worker retraining – then we could use experiments to evaluate their effectiveness. Based on the history of suck programs, I would be very skeptical that any of them would do much good.

Regulation at all levels of government has reached levels–some argue that it’s total cost to the U.S. economy is now $1.3 trillion–were it is stunting economic growth and job creation. Yet others argue that costs need not be weighed against benefits as no one can place a true value on clean air and clean water in the environment. How can your experimental testing methods be employed to evaluate which regulations may be necessary and which ones are a waste of everyone’s time?

Experiments can never determine for us what we should care about; they only identify the effects if various interventions on a battery of outcomes. Whether we should give up X% growth for cleaner air is a value judgment. Given that, experiments can easily be designed and executed that can measure these trade-offs for any such program that can be run at a local level.