Covid and the Use and Abuse of Statistics: Sir David Norgrove

part of the Policy and Practice series delivered by the UCL Department of Political Science

28 October 2021 (via Zoom)

Note: this is the speech as drafted and may differ from the delivered version. A recording of the speech is available from the UCL Political Science podcast.

Introduction

It’s nearly three years since I last spoke at one of these seminars, and what a three years it has been.

Then I discussed our statistical system – and the two executive arms of my organisation: ONS the producer, which you will be familiar with, and the Office for Statistics Regulation, OSR, which impartially assesses the statistical system (including ministers, their departments, and ONS itself) and finds areas for improvement.

I talked through how it works, how we protect our independence and try to stop statistics being misused, and how we were on the verge of being able to use administrative data to revolutionise our understanding of the economy and society.

Now, three years on I can look back on a period when everyone has been talking numbers, the sitting rooms of the country have been full of armchair epidemiologists, and the demand to understand what is going on has never been greater. On top of that our politicians have been under huge pressure to deliver, increasing the temptation to push the numbers beyond what they can bear.

I think our statistical system, our statisticians, and in statistical terms our politicians too have met the challenge. Nails were hit on the head even though there was some fumbling for the hammer and most of the problems, though not all, have been a result of the pressure and the novelty of the issues, cock-up rather than conspiracy if you like. But there are of course lessons to be learned.

I’m going to talk first about data on Covid, then data on the economy and society, including the exams fiasco, and finally some thoughts on what all this means for the future and what we might do differently in another crisis.

“The past three years have seen a huge effort to generate the data about the infection and how we are handling it, by people working in demanding circumstances both personally and professionally.”

Covid

So, first Covid.

According to the charity National Numeracy, most of the working-age population are only numerate at the level of a primary school student. So understanding an epidemic would be a challenge you might think.

But I remember my father, who left school at 14, but his facility with numbers on a dart board or with betting odds was amazing. People can get to grips with numbers better than we think, if they really want to.

The daily press conferences though didn’t always help them, especially at the beginning. On March 23, during  the Prime Minister’s ‘Stay at Home’ address, Number 10 showed us a fairly amazing equation, I think you would agree, and another chart of the national outbreak which was as clear as mud.

David Spiegelhalter has called the No. 10 press conferences number theatre. But I think they arose out of an admirable urge to inform and persuade people, for their own sake, of the need to take action, and once they’d started it would anyway have been very difficult to cut them back. So ONS seconded people to the centre to work on improving the presentations and I hope you’d agree that they did indeed improve. Contrast those with graphs on newer slides from 2021 which are so much clearer, properly labelled and includes any caveats at the bottom.

Improving the presentation was one concern. The other, and more important, was how to understand what was going on. This led to the Covid Infection Survey, a random sample based on oral swabs and blood samples that allowed us all to see what was happening to the rate of infections not just in total but by area as well. It was set up and rolling in under a week. It costs ONS and the taxpayer around a quarter of a billion a year, so it’s no small thing. But it sets the standard and is respected around the world.

It was fundamental too to our understanding of the spread of variants. And an expanded version of the survey (by now we have done more than 6 million swabs and a million blood tests) gives us understanding of rates at younger ages compared to older ages.

We very much hope that this kind of survey can continue even after the pandemic is over, to give us the basis for tracking population health, but also to allow the kind of analysis we carried out on varying mortality.

This needed ONS to bring together infection data, census data, and data from GP health records. Without that work we wouldn’t have been able to confirm what we all suspected, the vulnerability caused by deprivation and related to ethnicity, a huge piece of work that changed attitudes across the whole of government in dealing with the pandemic and dealing with inequalities in general.

And it was also the basis for our work on long Covid. That’s why I get animated about how this kind of analysis is jeopardised by the campaign for opt-out from sharing of health records. Access to linked data by ethically-approved researchers working in a secure environment is an enormously promising resource to tackle some of the more intractable public health issues we face as a society. I’ll come back to that.

The past 3 years have seen a huge effort to generate the data about the infection and how we are handling it, by people working in demanding circumstances both personally and professionally. There were and are issues around the number of different health bodies, but one of the main issues that arose in an English context related to testing data.

You may remember the Government announced an ambitious target of 100,000 tests per day by the end of April 2020, to track the spread of the virus in the country. Here and now, when yesterday nearly 900,000 tests were conducted, we can stand back and see a system central to our efforts to fight the virus, underpinned by robust, timely and transparent data.

I can’t say that we grabbed that hammer last April, however. The then Health Secretary announced, on the last day of the month, that his target had been met, but only by adding together tests completed to tests sent out.

I told the Health Secretary publicly in a letter that this fell well short of expectations in the Code of Practice for Statistics – that guide I spoke about last time I was here, which exists to ensure the trustworthiness, quality, and value of statistics, as well as their protection from political interference.

I’m sorry to say the response to my letter was not fast. It was only in August that the double-counting was removed – taking more than 1.3 million tests away from a total of around 15 million tests.

This was poor practice, to say the least. But looking at the bigger picture now on the Government’s coronavirus dashboard we get to see statistics on testing daily, by type and by Pillar (the different arms of the Government’s testing operations), and broken down into every nation, region and local authority.

For positive cases, the data is cut again into tiny areas of around 7,000 people, giving us the ability observe how the virus is behaving in our local area. 300,000 people check the dashboard every day. 10,000 of them in fact, at 4:01 pm, just after the team update it.

The economy and society

Covid itself has been a central focus. But its effects too have been profound across our economy and our society generally, and we needed to know what was going on at a time when a lockdown brought its own difficulties.

In the narrowest sense there was the question of how people were responding to the restrictions.

For some things we turned to existing sources, both public and private, using data from traffic cameras, or a private source, Google’s mobile phone location data. Then there were new surveys or new questions in an existing survey, to track social distancing.

“What stands out in all these measures of the economy and society is the urgency of the response”

These data were needed fast, in days not weeks. And the same was and is true for the economy. It’s been suggested, with some calculation, that that had the Bank of England known a month or two earlier about the economic downturn in 2008 a faster response would have saved many billions of pounds in lost output. New sources of data are helping to meet that need, like activity on credit and debit cards, or foot traffic in shops, or job adverts on a leading jobs website.

Then there were the conceptual challenges. What do you do about measuring prices when so much of what people usually buy, they are no longer buying? How do you measure employment when people are furloughed? When you ask people whether they are employed what do they think the question means? How do you measure non-market output, like the output of the health service or teachers, when the normal patterns of work are so greatly disrupted?

In these areas we were and are greatly helped by having access to HMRC real time data as well as private sector data. We would really have struggled without those.

Despite the difficulties the team have continued to produce monthly GDP data and figures on what is happening in the nations and regions.

What stands out in all these measures of the economy and society is the urgency of the response: we needed not just to compile historical data but rather present a picture of what was happening right now, day by day.

“We owe huge thanks to statisticians and analysts around government. They aren’t nurses and doctors, and they wouldn’t want to be applauded on our doorsteps, but without their efforts we would have been in the dark.”

It’s true that a crisis in some ways forced this pace. But in other ways it was really an acceleration of an existing ambition. The statistical system is not still going to be only publishing data by bulletins that come out infrequently, and long after the facts that they describe. Statisticians want to be able to provide the information that matters in real time.

It hasn’t all been rosy. I talked earlier about the testing data. Outside Covid the biggest issue in regulatory terms was the exams fiasco. It’s impossible for everyone to be happy on results day. But it’s fair to say that almost no one was happy on results day last year, in Scotland, Wales, Northern Ireland and England.

People felt that a faceless, black-box algorithm (rather than decisions under difficult circumstances) had taken students’ futures right out of their hands. And on top of that the algorithms appeared to further disadvantage those from a lower socio-economic background.

In their review the Office of Statistics Regulation found that the algorithms themselves did their job (and had successfully been tested to prevent discriminatory outcomes). Yes, some of their construction was arguable. But none of them had anything fundamentally wrong.

What the OSR uncovered in their detailed and incisive review is that the technical propriety of statistics is not enough on its own to guarantee a sense of legitimacy. It is a question, also, of (and I quote) ‘overall organisational approach, including factors like equality, public communication, and quality assurance’.

Algorithms and artificial intelligence are essentially statistical models. There’s a challenge here that I’ll come back to.

But I can’t leave this part of my talk without mentioning the census.

Census day was March 21 2021. The first census was taken in 1801 during the Napoleonic War. I don’t know whether that was a more difficult and unusual time. But in any event I feel proud that ONS has taken through a £900 million programme, involving huge IT investment and putting 30,000 people into the field with minimum fuss and minimal controversy. The plan is for the results to begin to appear in spring next year.

I said before that we want to provide more timely information and the census, with its decennial schedule, is no exception. Indeed we’ve seen the weaknesses of censuses during the pandemic, in the difficulty for example of finding out what has been happening to migration and our student population.

In a couple of years, ONS is going to present its plans for the future of population statistics and the National Statistician has spoken of his ambition that we should have estimates as high-quality as the census and published, not every ten years, rather every month. So there is work underway that may overtake this great survey. But if this is to be the last census, and I hope it will be the last one, it bids fair to have been the most successful ever.

How did we do, and the future

So how did we do and what are the lessons for the future?

I think first we owe huge thanks to statisticians and analysts around government. They aren’t nurses and doctors, and they wouldn’t want to be applauded on our doorsteps. But without their efforts we would have been in the dark about the spread of the pandemic, the extent of the variants, the effect of the social restrictions, and how the economy was developing. The overwhelming need in any crisis is to know what’s happening and our statisticians have given us that.

“The public needs to see what the government is seeing.”

And that’s not just relevant to policy makers. If there’d been no data I just don’t believe that people would have stuck to the rules.

There would have been no trust. Either that or there would have been much more anxiety than there was;  you can even envisage panic. Trustworthy data are essential to allow trust.

And the data need not just to be trustworthy. People need to believe that they are being given all the data, that nothing is being held back.

I can think of only one case where data were deliberately distorted, and that’s the issue on testing data that I’ve already discussed. But there have been too many cases where Ministers have used numbers in public that had not been published and it then took too long for them to be published with their source and their caveats.

This lack of transparency is damaging. The public needs to see what the government is seeing.

That’s one lesson.

My second is the importance of sharing data within government and the benefit of connecting government data with private data where it meets the test that it serves the public good. I’ve shown some examples today of how that work has helped us all and 3 years ago I described how the lack of joined-up data prevents us understanding how best to help children who are taken into care – the issue that led me to take the job I now have. You can’t have joined-up government without joined-up data.

The pandemic has accelerated the work to join up our data, and ONS is in the process of developing a platform to allow this to happen more easily and even more securely. But it needs continued buy-in from departments and from private sector organisations. And above all we need to carry out the work in a way that shows people that their data are secure and being used for the public good.

The third lesson I draw is against the probability of another crisis of whatever kind. It took too long for us to get enough coherence and breadth in our understanding of the pandemic. There were all sorts of reasons for that, some inevitable given the speed of its development.

“The pandemic has shown the increasing complexity of the data and statistics we rely on. That is only going to grow.”

Some were not inevitable, coming as they did from the multiplicity of organisations involved in health provision in this country, and the lack of a structure led by a person who could direct coherence and delivery. I would urge that in a future crisis the National Statistician should be put firmly in charge of data and their delivery.

Finally, the pandemic has shown the increasing complexity of the data and statistics we rely on. That is only going to grow. The exams algorithm is just the start of the kinds of issues we are going to face. The remit and capability of statistics regulation need to grow alongside them.

And that’s almost where I’ll finish. But I’ve mentioned the public good a couple of times and that’s actually where I’ll finish. Our strategy published last year is called “Statistics for the Public Good”, and we took the title from our statutory objective, which is to deliver just that. It runs through ONS and the Government Statistical Service like a stick of rock. I hope I’ve shown that to be the case and we should all be so very pleased that it is.

Thank you.

 

Related links: 

Sir David Norgrove letter to Matt Hancock regarding COVID-19 testing

UK Statistics Authority and Office for Statistics Regulation written evidence to the Public Administration and Constitutional Affairs Committee’s inquiry on data transparency and accountability: COVID-19

UKSA Strategy 2020-25: Statistics for the Public Good