Reinforcement learning is a way of making a computer learn through experience to make a series of decisions that yield positive outcomes—even without any prior knowledge of how its actions will affect its immediate environment. A software-based tutor, for example, would alter its activities in response to how students perform on tests after using it. Reinforcement Learning, Will Knight, 'MIT Technology Review', March/AprilIt's the same approach practised by the computer called DeepMind, which 'mastered the impossibly complex board game Go and beat one of the best human players in the world in a high-profile match last year.' It's also, at least in theory, the basis of our economic system: the learning process that is supposed to go on through the creative destruction of business enterprises that fail to adapt to changing circumstances. As applied to driverless cars, 'the software governing the cars’ behavior wasn’t programmed in the conventional sense at all.' The software learns through trial and error.
Large corporations and their friends in government have seen to it that competitive markets play a lesser role than they should in determining our economic prospects. Corporations succeed today not so much through competing in markets, but more by manipulating the legislative and regulatory environment, and participating in effective cartels with other powerful players, both private- and public-sector. But the principle - that of survival of the most adapted, and the most quick to adapt - has generated enormous material wealth, and continues to operate, though more to the benefit of the already-wealthy than to ordinary people.
With its record of success, why isn't reinforcement learning, or competition to find the best approaches, allowed to operate in the public sector. That is, why don't we allow competition to solve our social problems? It's partly for historical reasons. When governments began to take an interest in the welfare of ordinary people, solutions to their basic problems were fairly easy to identify. Requirements for things like sanitation, elementary education, a police force, fire engines and hospitals were - and are - difficult to argue against. But as society has grown more complex so too have our social and environmental problems. How, for example, do we go about reducing crime, improving our (already relatively high) levels of physical health, improving our mental well-being, or ending war? No obvious solutions leap to mind.
Which is where Social Policy Bonds could enter the picture. Rather than let public-sector organisations have a monopoly on trying to deal with these problems, a bond regime would, in effect, contract out finding the best solutions to the private sector. A goal such as eliminating war is going to need the exploration, deployment and refinement of a multitude of potential approaches. More than that, though, it's going to need to reward the most effective of these approaches and to terminate the useless ones. There are well-meaning organisations working to end war, but the people working for them are not rewarded for their success. There are no built-in incentives for the organisations to find optimal solutions, nor for inefficient organisations to be dissolved if their efforts prove futile or counterproductive. The result is that the challenges humanity faces, including environmental calamities or nuclear proliferation, are nowhere near being effectively met. But, unlike computers learning how to drive cars, or businesses operating in competitive markets, our failed approaches and ineffectual organisations aren't terminated. Sometimes, in fact, our politicians pump even more money into them.
Social Policy Bonds would change that. They would channel the market's incentives and efficiencies into the discovery and implementation of diverse, adaptive and, above all, efficient approaches to our social and environmental problems, including those, like war, that many of us have concluded have no solution. Our current system places responsibility for the solution of our problems in the hands of large, usually monopolistic, organisations that face little competition and have no incentive to try diverse approaches. There's certainly no reinforcement learning. It is because of its ability to stimulate diverse, adaptive approaches that a Social Policy Bond regime could succeed where existing policies have failed.