Using machine learning for research and official statistics – how are we supporting ethically appropriate innovation?

The UK Statistics Authority’s Centre for Applied Data Ethics has today released its new ethics guidance focused on the use of machine learning for research and official statistics. Tom Smith, Managing Director at the Data Science Campus and member of the Centre for Applied Data Ethics independent Advisory Committee, considers why this guidance is important for supporting the analytical community.

Machine learning techniques have the potential to substantially advance the production of official statistics and analysis, providing an efficient way to rapidly assess high volumes of data. Data scientists, statisticians and analysts across the public sector are exploring the data innovation opportunities opened-up by machine learning and data science techniques, to strengthen the evidence base that we are able to provide to policy makers. Examples from the ONS Data Science Campus include analysing traffic cameras to understand mobility and activity levels in our towns and cities, using satellite data to assess economic activity and supporting global work on understanding trade levels using shipping data.

But alongside the potential improvements in the way that we use data, we need to keep a strong focus on data ethics. Innovative and ambitious approaches need to be developed, used and managed in ethically appropriate ways.

Which is why I am delighted to see the draft guidance published today by the UK Statistics Authority’s Centre for Applied Data Ethics. And why this guidance is so useful, important and timely.

The guidance provides an overview of key ethical issues for analysts to consider and a foundation from which to apply these concepts to current projects. This is particularly useful in scenarios where analysts, data scientists and statisticians are developing experimental approaches and analysing data sources that are more novel or innovative in their application and use; in these cases, additional ethical issues may emerge that have not been encountered or considered before.

The critical point for me is that data ethics should be considered as a fundamental part of innovation and data use. Not an add-on to be considered at the end of a piece of work, but something that should run right through delivery from that first initial idea.

The guidance focuses on four main areas. First, the importance of understanding, minimising and mitigating social bias when using machine learning approaches, which is vital for ensuring that the analysis that we produce benefits the whole of society. Second, the need to adequately address both transparency and explainability in machine learning projects. Third, the importance of maintaining accountability within machine learning processes, effectively ensuring that there is a human in the loop across the various elements of a machine learning project. Fourth and finally, ensuring sufficient consideration is given to confidentiality and any privacy risks that may arise from the use of such data.

This guidance has emerged following a collective effort with colleagues in the international community. There is a growing international interest in the ethical use of data science techniques more generally and this interest is reflected within the international statistical and research community.

This interest has resulted in the inclusion of a workstream dedicated to exploring data ethics in the use of machine learning to produce official statistics within the wider ONS-UNECE Machine Learning 2021 group programme. This workstream is led by the UK Statistics Authority’s Centre for Applied Data Ethics and the draft guidance published today represents an output from this work, developed in collaboration with colleagues from both the international community and the Data Science Campus.

Fundamentally, this guidance provides a practical resource that can be applied to current machine learning projects by the analytical community. It helps analysts to identify potential ethical issues in their work and most importantly, to consider how these issues can be appropriately mitigated going forward, so that we can continue to apply innovative data science methods in a responsible way.

I hope this guidance supports your work. I look forward to seeing how it is used by teams across the public sector and wider.

Feedback on this guidance: This guidance has been published as an open draft for comment and feedback from the analytical community. If you have any feedback on this guidance, particularly how useful you have found it in the context of the work that you do, please contact the UK Statistics Authority’s data ethics team.