Annabel Manley discusses potential pitfalls – including the illusion of objectivity – in the growing use of Machine Learning in making policy decisions.
Machine Learning (ML) algorithms in various uses are increasingly affecting individuals on a day-to-day basis. In most cases there is only a small impact on a person’s life, such as affecting the route someone takes to avoid traffic, or on an advert they see when scrolling through social media. However, increasingly ML is being used to make decisions that can have a large impact, such as whether they get a job, qualify for a mortgage, or make bail.
Typically, these new methods are justified by the organisations adopting ML decision procedures as having lower costs and fewer inconsistencies, as well as generating a ‘neutral’ decision-making process that avoids claims of bias. However, increasingly the question of how impartial they are is being questioned, as is the absence of opportunities to challenge decisions made by complex algorithmic models. The limited literature exploring these questions does indicate that machine learning methods are less biased than human decision makers, but bias is far from eradicated. Since such problems are unavoidable, ML methods should be used with care when the outcomes have large implications – and particularly in areas of policy where accountability is essential.
Subjective decisions are unavoidable
One major problem is the perception that using a machine learning model avoids the need for subjective policy decisions, either by a public body or private company. Machine learning algorithms work by optimising an objective function with respect to a dataset, e.g. finding the model that minimises the absolute number of errors made in predicting outcomes.
However, the objective function is heavily reliant on policy decisions and highly subjective, even aside from the well-known issue of biased data. A decision-making process often has numerous normative goals which do not have explicit mathematical definitions. For example, even when efforts are made explicitly to try to design a ‘fair’ model, there are several mathematical definitions for ‘fairness’ the designer could choose, many of which are impossible to achieve simultaneously. Often, fairness is claimed to be achieved when there are similar error rates across different demographics, but this ignores that there could be different negative and positive error rates between demographics, which would be reflected in alternative ‘fairness’ definitions.
The other major subjective policy decision is the weighting given in the model to each error, which is often brushed aside by having equally weighted errors. Certain policy tools, such as the one developed by the Durham Constabulary and the University of Cambridge, explicitly weights “dangerous errors,” such as where the model grants someone bail who then goes on to violently reoffend, more than “cautious errors” where the model does not grant bail to someone who would not have reoffended. The resulting model therefore generates around twice as many “cautious errors” as “dangerous errors”. Explicitly setting and justifying the relative weightings of errors is good practice, even if they are ultimately set to be equal.
Complex does not mean better
Machine learning represents an advance on conventional methods of data analysis, including the standard methods of econometrics, in two main ways. First, it has made huge contributions to computational methods which allow for the analysis of huge datasets with large numbers of related variables. Second, the final model developed by a machine learning algorithm is data-driven, rather than theoretically driven as in other forms of data analysis.
This latter contribution has several benefits, in that these methods can detect more subtle patterns in the data than conventional methods, and in creating more reproducible models without arbitrary decisions by researchers. However, these are often incremental gains and conventional analysis is often able to achieve the vast majority of the explanatory and predictive power of a machine learning method by accounting for all the variables with the largest impact. The relative effect of machine learning analysis in comparison is to effectively fine-tune the models by accounting for variables and interactions with smaller impacts as well.
In addition, there is a very limited ability to predict outcomes in social contexts, particularly when people’s behaviour changes, which calls into question how much there is to gain from using the more complex models, and how to trade off the benefits of greater predictive power with the potential costs of a lower ability to understand and interrogate decisions, lower theoretical justification of a model and therefore decisions, and higher cognitive barriers to people understanding the process. A paper by Salganik et al (2020), finds that for predicting life outcomes for children, predictive models built by 160 teams of scientists were, at best, not very accurate, and only marginally better than a simple four-variable linear model designed by a domain expert, even though the former were using a rich and high quality dataset.
It is also important to note that machine learning analysis is subject to many of the same problems as conventional data analysis. There remains issues of unobserved counterfactuals, subjective variable measurement, and external validity. The large quantity of data used in machine learning does not automatically make up for issues in the quality of data. In many of the controversial contexts that machine learning is being applied to the key factors, such as those that determine whether someone will be a good choice for a job, or default on their mortgage, or reoffend after being released on bail, are so numerous and often intangible, that even a vast dataset will not contain all the relevant information.
What should policy-makers do?
The rather dull conclusion of all this is that, rather than creating a utopian (or dystopian) world, the most desirable outcome will be that machine learning will just become another tool that public and private sector decision-makers can use, sometimes providing large accuracy benefits and allowing models to encompass all types of new data, but sometimes being ignored in favour of simple analysis in cases where complexity causes more harm than good. Machine learning will not negate the importance of domain or statistical expertise, but instead will rely on these to ensure that more complex methods are only used when appropriate, and to highlight when alternative methods may be preferable. Ultimately, it remains the case that, in the well-known saying, “All models are wrong, but some are useful.”
To achieve this mundane ideal, one necessary starting point will be to update policy guidelines across the public sector and in areas of private decision-making subject to legal requirements such as employment or competition law, with specifics on best practice for machine learning methods. This should include best practices such as the explicit statement and justification of the objective function and relative errors; a description of the efforts made to achieve normative aims such as fairness; and setting out a process by which any individual decision can be challenged. These efforts should make it easier for political decisions such as weightings to be made transparently, and for problems in the process to be noticed before they become widespread, and have a damaging and unfair impact on people’s lives.
The views and opinions expressed in this post are those of the author(s) and not necessarily those of the Bennett Institute for Public Policy.