Dear William,
I write in response to the Committee’s call for evidence for its new inquiry, Transforming the UK’s Evidence Base. I very much welcome this inquiry with its focus on the future of data, statistics and analysis in government, as we in the UK Statistics Authority look to the future in a variety of ways.
As you will be aware, the Office for National Statistics (ONS) launched a consultation on the future of population and migration statistics in England and Wales in June. I enclose written evidence from the National Statistician, Sir Ian Diamond, within this submission, which highlights not only this consultation but also the progress of data sharing in government so far, how the Authority uses data ethically and protects users’ privacy, and how the ONS understands and responds to user needs.
Meanwhile, the Office for Statistics Regulation published their review into data sharing and linkage for the public good in July. I also attach written evidence from their Head of Regulation, Ed Humpherson, within this submission, where they discuss their findings from this report in more detail. In addition, they will soon be launching a review and will be seeking feedback on the Code of Practice for Statistics to ensure it remains relevant for today’s world of data and statistics production. We will provide more detail to the Committee on this soon.
Sir Ian Diamond, Ed Humpherson and I stand ready to engage with the Committee to expand on any of these points if helpful, and indeed will follow all oral evidence sessions of the inquiry with interest.
Yours sincerely,
Sir Robert Chote
Chair, UK Statistics Authority
Office for National Statistics response
Data and analysis in government
How are official statistics and analysis currently produced?
- Official statistics are defined as those produced by organisations named in the Statistics and Registration Service Act 2007 (the 2007 Act) or in the Official Statistics Order (SI 878 of 2023). The Code of Practice for Statistics (the Code) sets the standards that producers of official statistics should follow. The Office for Statistics Regulation (OSR) sets this statutory Code, assesses compliance with the Code, and awards the National Statistics designation to official statistics that comply with the highest standards of the Code.
- The majority of official statistics are produced by statisticians operating under the umbrella of the GSS, working in either the Office for National Statistics, UK government departments and agencies, or one of the three devolved administrations in Northern Ireland, Scotland and Wales. Every public body with a significant GSS presence has its own designated Head of Profession for Statistics. Each of the devolved administrations has its own Chief Statistician. The Concordat on Statistics sets out an agreed framework for statistical collaboration between the UK Statistics Authority, UK Government, and the Northern Ireland, Scottish and Welsh Governments.
- The Analysis Function brings together the 16,000 analysts across 7 The strategic aim of the Analysis Function is to integrate analysis into all facets of Government, building on the strengths of professions. The Analysis Function supports analysis in Government through capability building, sharing good practice, championing innovation, and building a strong analytical community. The National Statistician is head of the Analysis Function, as well as the Government Statistical Service (GSS). Each Government department has a Departmental Director of Analysis, who is responsible for analytical standards in their department. The network of Departmental Directors of Analysis (DDANs) form the leadership of the Analysis Function in order to drive and deliver the functional aims.
- A range of analytical techniques and various sources of evidence, combined or individually, including official statistics, can be used to provide insights for key questions for the public and decision makers. Such analytical processes and products are also supported by guidance such as the Green Book (appraisal of options), the Magenta Book (evaluation), and the Aqua book (quality assurance), which set the highest standards for government analysis.
- Official statistics and analysis across government are currently produced in line with the Code and its three pillars: trustworthiness, quality, and value.
How successfully do Government Departments share data?
- Successful data sharing across government departments is critical to operating and transforming the statistical system and producing high quality, trustworthy, and valuable analyses. The importance of sharing, and linking, data and putting data at the heart of statistics is set out in our current consultation on the future of population and migration statistics in England and Wales.
- There are some good examples of effective data sharing across government. The COVID-19 pandemic illustrated the ability of government and public services to use and share data to help and protect people. When data are shared effectively, the speed at which analysis can be done means time-critical policy issues can be understood and addressed quickly. For example, we created the Public Health Data Asset (PHDA) which is a unique population level dataset combining, at individual level, data from the 2011 census, mortality data, primary care records, hospital records, vaccination data and Test and Trace data, and allowed us to link across these data sources to provide new insights.
- Cross Government networks and the Analysis Function have a critical role in the success of cross department data sharing. For example, the Data Liaison Officer network, the National Situation Centre, the ONS and the Analysis Function recently collaborated to produce guidance on data sharing for crisis response. This built on the principles developed and utilised during the COVID-19 pandemic.
- However, there are challenges to building on this success and maintaining the momentum that occurred during the pandemic. Data sharing between departments continues to have asymmetric risk; with the perceived risks – either legal, operational or reputational – falling on the supplier or department sharing the data, while the benefits are diffuse across the system, or perceived to accrue to others. There are several common challenges to data sharing by suppliers, for example their level of risk appetite and differing interpretations of the law, their data preparation (and accordingly, data quality) and engineering capacity, and the governance within their own organisations.
- In terms of risk appetite, even when the environment is judged to be safe and secure by internal and external parties, there is still too much weight being placed on the risks of data sharing as opposed to the very real risk to the public of policy harm and loss of opportunity where valuable data is not being actively used and shared. The OSR notes this point in their report on data sharing and linking.
- Another challenge is that agreements to share data are often narrow; from one department to another for a specific purpose, for example a piece of analysis specific to a policy area or statistical output. It is often challenging to broaden these agreements which put a limit on the amount data can be reused or shared more broadly across multiple departmental boundaries. This creates inefficiency where the value of data is not fully realised and causes government departments to incur unnecessary and duplicative costs in implementing numerous bilateral arrangements, often with the same party.
- The level of data maturity across departments is also varied, which leads to a multitude of different approaches and interpretations to agreeing data sharing, a wide range of people being involved in approving data ownership and stewardship, and a myriad of different templates to formalise agreements. This contributes to a complex and burdensome system, which leads to long lead-in times to agree data shares. This issue is particularly acute when data is brought together and integrated from multiple departments, necessitating different governance processes to be engaged each time a change is required.
- The ONS provides an ‘Acquisition Service’, which proactively supports data suppliers and collaborates to put in place mechanisms which support the sharing of data, reducing the burden on the supplier as far as possible. For example, this could be seconding analysts into a department, drafting up Memorandum of Understanding on behalf of suppliers, and agreeing to undertake significant improvement on the data to make it of high and usable quality.
- The ONS is the lead delivery partner on behalf of government to deliver a cross government major programme, the Integrated Data Service (IDS), a Trusted Research Environment (TRE) which seeks to build on the success of the Secure Research Service (SRS), to bring together ready-to-use data to enable faster and wider collaborative analysis for the public good. The IDS intends to transform the way that data users share and access government data. Firstly, the IDS is a fully cloud-native system which will further enable connectivity across a federated data environment, reducing the friction caused by sharing data multiple times. Secondly, it will provide the facility to fully exploit the opportunities for safe and secure access to data provided for in the Digital Economy Act 2017 (DEA). Thirdly, it will apply a common linkage approach to enable analysts to join data from different departments, repeatedly, to meet diverse analytical requirements.
- The ambition is that the IDS will help overcome some of the existing challenges, costs and delays to effective data sharing across government. Its success, however, will depend on the extent to which government departments can embrace a common approach to sharing, stewarding, linking and accessing data.
How do other nations collect and produce statistics?
- The UK is connected with other National Statistical Organisations (NSOs) on both a multilateral and bilateral basis, to learn and to share best practice. The UK is represented at the highest levels in multilateral fora such as the Organisation for Economic Co-operation and Development (OECD) and the United Nations Economic Commission for Europe (UNECE) and participates in a number of working groups advancing the use of administrative data and new forms of data. For example, we recently presented our work on nowcasting to the UNECE Group of Experts on National Accounts which gathered interest from other countries such as Austria, Canada, Indonesia and the United States.
- The UK chairs the UNECE Expert Group on Modernising Statistical Legislation, ensuring that statistical legislation and frameworks are equipped to deal with a changing data ecosystem. This has led to further work on data ethics, social acceptability, and access to privately held data. We also sit on the UNECE Task Force on Data Stewardship, which aims to develop a common understanding of the concept of ‘data stewardship’ and define the role of NSOs as data stewards.
- We are regularly contacted by other NSOs to share our experiences of utilising data science techniques to produce high quality data in near real time to inform decision making. For example, the ONS recently hosted the German led Future of Statistics Commission to share our experiences of utilising new data and new methodology to keep up with societal needs and respond efficiently to emerging crises. We have hosted colleagues from Statistics Finland to discuss the data landscape in the UK and colleagues from Statistics New Zealand to discuss challenges faced with data collection. We have also shared experiences on our real-time economic indicator suite, particularly on new sources of data such as card data.
- The ONS has been involved with the World Health Organisation’s Pandemic Preparedness toolkit project. This project calls upon the Authority’s pandemic response experience and expertise to develop a toolkit containing practical guidance, statistical methods, knowledge products, case studies and training materials for other NSOs, particularly for data sharing.
- As part of the International Census Forum, the ONS has had ongoing conversations with the US Census Bureau, Statistics Canada, Australian Bureau of Statistics, Statistics New Zealand and CSO Ireland about the use of administrative data for population statistics. All participants have benefited from these conversations over the last few years, covering subjects such as census collection planning and efficiencies, quality assurance, processing improvements and contingency plans.
- The UK is a member of the UN Statistics Division (UNSD) Collaborative on the Use of Administrative Data for Statistics. This group of countries and regional and international agencies was convened by the UNSD and the Global Partnership for Sustainable Development Data (GPSDD), with the aim of strengthening countries’ capacity to use administrative data sources for statistical purposes, including replicating census variables.
- The Collaborative provides a platform to share resources, best practices and experiences. This includes a self-assessment tool, a draft toolkit for quality assessment of administrative data sources, and an inventory of resources which contains recommendations and practical examples on the use of administrative data in different contexts.
The changing data landscape
Is the age of the survey, and the decennial Census, over?
- The ONS’s vision is to improve its statistics so that they can respond more effectively to society’s rapidly changing needs. The ONS is proposing to create a sustainable system for producing essential, up-to-date statistics about the population. To do this, the system would primarily use administrative data like tax, benefit, health and border data, complemented by survey data and a wider range of data sources. This could radically improve the statistics that the ONS produces each year and could replace the current reliance on, and need for, a census every ten years.
- Producing high-quality, timely population statistics is essential to ensure people get the services and support they need, both within their communities and nationwide. Population statistics provide evidence for policies and public services, as well as helping businesses and investors to deliver economic growth across the country. It is important that these statistics are up to date and reliable, so that they can accurately reflect the needs of everyone in society. Currently, the census provides the backbone of these statistics, offering a rich picture of our society at national and local levels every ten years. Every year, the ONS brings together census data with survey and administrative data to reflect changes in society. As a result of this approach of ‘rolling forward’ estimates year-on-year, statistics become less accurate over the ten years between censuses and local detail on important topics becomes increasingly out of date. After each census, the previous decade’s mid-year population estimates are “rebased” to ensure they are consistent with the baseline estimate from the new census. This makes the previous decade’s mid-year population estimates as accurate as possible.
- There has also been a well-documented global trend of declining response rates to surveys and censuses, which can impact representativeness of the data and, as a result, data quality. While Census 2021 in England and Wales enjoyed a high level of public engagement and response, this is an outlier in a wider trend of population censuses and social surveys across the world.
- Data collection is costly, and these costs can be elevated when the need arises to incentivise survey response (often monetarily), chase response many times (particularly in the case of the census) or adjust collection operations. In recent examples where response rates to censuses have been below targets, mitigations have included the deadline for responses to census being extended, boosted communication campaigns, and greater use of administrative data to enable the production of robust estimates, all of which can contribute to elevated cost.
- Building on recent advances in technology and statistical methods, and legislative facilitation of data-sharing across government for statistical purposes, the ONS has for several years been researching the use of administrative data as a primary source for meeting user needs for some statistics. For population statistics specifically, this work is responding to the Government’s ambition, set out in 2014, that censuses after 2021 “be conducted using other sources of data and [provide] more timely statistical information.”
- It has shown that it can produce population estimates with a more consistent level of detail and accuracy over time, and migration estimates based on observed travel patterns rather than respondents’ stated intentions, using administrative data to respond to the difficulties of estimating internal and international migration. The ONS has also developed methods for producing information about the population more often and more quickly. These methods will offer insights into our rapidly changing society as administrative data reach their full potential over the next decade.
- In June 2023, the ONS launched a consultation on its proposals for the future of population and migration statistics in England and Wales, responses to which will inform a recommendation to Government. The consultation’s proposals emphasise that surveys may continue to play an important role whilst ONS works with partners to widen and develop the range of administrative data sources that are collected and used. However, the ONS believes it has reached a point where a serious question can be asked about the role the census plays in its statistical system.
- If implemented, the proposed system would respond more effectively to society’s changing needs by giving users high-quality population statistics each year. It would also offer new and additional insights into the changes and movement of our population across different seasons or times of day. For many topics, it would provide much more local information not just once a decade but every year, exploring them in new detail and covering areas not recorded by the census, such as income. These are ambitious changes, and decisions on the next phase of this work will set the direction for the ONS’s work programme over the coming years, as the ONS continues to improve its population and migration statistics.
- It is worth noting that fast-paced, qualitative surveys have and will continue to have a place in the statistical system. For example, the Opinions and Lifestyle Survey and Business Insights and Conditions Survey in particular illustrate the worth of flexible surveys, adapted from a source of rapid intelligence during the pandemic to useful tools for understanding at pace current issues such as cost of living to the adoption of AI.
What new sources of data are available to government statisticians and analysts?
- The ONS has been a heavy user of Census and survey data and, more recently, government administrative data. Given the new global data landscape, and the increasing society-wide and economy-wide digitisation, new and exciting massive, high-frequency data (‘Big Data’) are being generated, some of which is proprietary and some of which is openly available. These new data sources offer opportunity for the ONS to be radical, and ambitious in what data it uses – in line with the Authority’s strategy, Statistics for the Public Good – and how, to provide better, highly trusted information and statistics to the public, to the private sector, and the public sector (including government) at national and regional level, across a wide range of issues.
- A key part of the current consultation on the future of population and migration statistics seeks input on transforming population and migration statistics using alternative data, as opposed to using predominantly survey-based sources. The ONS has now published a suite of evidence demonstrating the opportunity to use alternative data sources to deliver more timely and granular statistics as well as provide value for money.
- To support statistical transformation and the strategy, the ONS works with hundreds of independent data suppliers including central, devolved and local government, and the private sector, to share data for the public good research. The ONS’ most recent data transparency publication demonstrates the breadth of data sources containing personal data, and the broad opportunity to use novel data sources to support statistics.
- The ONS uses around 700 data sources, including health, tax, benefits, education and demographic information from other government departments and public bodies. We also work with a wide range of data from commercial providers including retail scanner data, where coverage is around 60-70% of the grocery market, alongside financial transaction data, domestic energy usage and others.
- A recent example of the ONS harnessing the power of new data sources is published data on UK Direct Debits, developed with our partners at Vocalink and Pay.UK. This new, fast and anonymised data source provides insight on consumer payments to most of the UK’s largest companies. It gives the ONS, for the first time, an opportunity to rapidly analyse price movements such as changes in the average Direct Debit amount for bills, subscriptions, loans, or mortgages as well as overall payment behaviour via failure to make these Direct Debit payments. This provides extremely useful and timely insights into the state of the UK economy and in time, could feed into wider national accounts estimates.
- In addition, new sources being acquired as part of the transformation of consumer statistics are already being incorporated into headline measures of Consumer Prices Index (CPI). This has started with the incorporation of rail fares, enabling a far greater level of detail to be accessed within our published indices.
- Alongside the opportunities presented by novel data sources, there is also huge potential through continued broader integration of data. Integrated data assets, made up of multiple constituent data sources, provides the potential for greater depth and breadth of research, and for the creation of insight which is not possible from analysing data sources in isolation.
- A good example of the value of integrated data is ONS’ development of a PHDA which has enabled the ONS to produce novel analyses including Ethnic contrasts in COVID-19 mortality and other high impact pandemic related statistics. The ONS is now using the PHDA to explore more indirect impacts of COVID-19 and wider non-COVID research questions. These include the impact of health on labour market participation by linking in Department for Work and Pensions (DWP) and HM Revenue and Customs (HMRC) data.
- The ONS has also made use of ‘open’ data to better inform the public. Using Automatic Identification System (AIS) shipping data to help monitor trade (shipping) flows. This data science work feeds into the regular monthly ONS publication on Economic Activity and Social Change in the UK, Real Time Indicators. We are looking at other forms of open data that can be used to produce statistical outputs and products that deliver on statistics for the public good.
- As well as using linked data, the ONS does work bringing multiple types of data together and uses advanced data science methods to deliver statistics for the public good. The ONS collects travel to work data from the census every ten years, with the most recent being 2021. Travel to work matrices which show movement of people from their home location (origin) to their place of work (destination) at an aggregated level. Information on travel to work provides a basis for transport planning, for example, whether new public transport routes or changes to existing routes are needed. Additionally, it allows the measurement of environmental impacts of commuting, for example traffic congestion and pollution, and how these might change over time, for example because of changes in commuting modes, such as a shift from car to bicycle. This travel to work data helps us generate travel to work matrices for census years, for instance, at 10-year intervals with no updates for years in-between. However, using Census 2011 travel to work data, Census 2021 population data, National Travel Survey data (collected by Department for Transport (DfT)), National Trip Ends Model data (produced by DfT), ONS geography products such as MSOA boundaries and Population Weighted Centroids, we can produce estimates of travel to work matrices at more regular intervals than once every 10 years.
- The approach to integrated data is being taken further as part of the IDS, with data ‘indexed by default,’ enabling common deidentified identifiers to be consistently applied to enable a common linkage approach. Data deidentified in this way can be grouped thematically to create Integrated Data Assets around the themes of health, levelling up and net zero. The value of this approach is that it retains the core value of the source data, while being supported by Privacy Enhancing Technology, and facilitates access to a much broader range of data enabling analysts to exponentially grow the value of their analysis.
What are the strengths and weaknesses of new sources of data?
- Use of new administrative and commercial data sources are transforming the way we produce statistics (a trend seen across the world). The sources provide timely, frequent and granular data about the population that is not possible through survey collection. Through the linkage of different data sources, we can provide coverage across the population down to local levels of geography. Innovations in this area include a new approach to produce more timely and frequent high-quality estimates of the size and composition of the population down to local level and how this changes due to international migration. However, administrative data are not collected for statistical purposes and, as a result, there are both strengths and weaknesses with such data sources.
- Relevance and conceptual fit: the content of administrative data is determined by the services they support. This includes the topics collected, but also the precise definitions of the items that are measured. While surveys can be designed so that the questions they ask capture the statistical concepts we want to measure as accurately as possible, this is rarely possible for administrative sources, especially well-established ones. In practice, that means we need to adjust the data so it fits with statistical definitions using additional data from elsewhere (such as a survey) or can only approximate those definitions if adjustment is not possible. An important area to help with this is collaboration across government, to improve the collection of data in administrative systems, particularly around protected characteristics.
- Coverage: the strength of many of these large data sources is their granularity, which gives us the ability to analyse data for small groups or for small areas. This is not always possible with surveys, particularly surveys with small sample sizes. However, the coverage of most of these datasets will be incomplete. Parts of the population will not interact with the administrative source and therefore they will be missed from the dataset. In other cases, the data may not cover everybody or everything. The power, from an analytical point of view, comes from linking different datasets together to improve the coverage and enable analysis at a local level. However, sometimes surveys are also needed to fill the coverage gaps. There can also be over coverage in data sources, where individuals appear on the dataset who aren’t within a target population, for example the inclusion of emigrants and short-term residents who have recent administrative data activity but are either no longer resident or are not resident for sufficient time to meet the definition for inclusion in our estimates.
- Linkage: new sources of data need to be integrated to improve coverage and allow analysis. There are often not unique identifiers to enable this linkage so this can add to the complexity, time, and cost to process data to allow analysis. The complexity of linkage without common unique identifiers means that it is never perfect. Moreover, the quality of linkage may vary across the population. Understanding and quantifying linkage quality is critical, as issues that arise (such as under-representation) will feed through into statistical analysis and may affect results if not properly mitigated.
- Timeliness, for both collection and delivery (although new data sources are often relatively more timely than survey data):
- Collection lags: there is often a lag between an event occurring and the data for that event becoming available, for example moving address, then registering at a doctor’s surgery or making a profit and filing a tax return. There are different time lags for different datasets. Real time analysis is often not, at this point in time, possible from the new data sources; there is always a time lag.
- Data delivery: timelines can impact on the timeliness of the statistical and analytical outputs. Data takes time to be processed and be delivered to statisticians who analyse it. Often data can be delayed in the delivery or not shared in line with analytical requirements due to the nature of data sharing agreements. Within the ONS we are working with departments to build mature data transfer systems supported by robust Data Sharing Agreements (DSAs).
- Coherence, harmonised standards, and metadata: different operational polices lead to associated administrative data being collected in different ways. One dataset may be at quite a high level, while others could have more detailed information – even datasets that appear to collect information on the same metric may not be comparable, or not wholly comparable. This makes analysis more difficult and can mean that data is not available at the relevant level of granularity. Metadata and detailed information about the data collected is also often lacking, making using the data more difficult. We are using secondments into departments to better understand the data and build the metadata, and thus improve this situation.
- Stability and accuracy: with survey data we have control over the questions asked and the stability of those questions. Administrative or commercial data can change, for example in what and how data is collected because of changes to the operational process. This is both a strength and a weakness: strength, as it allows the data to adapt more to changing requirements and needs; weakness, as it brings in a possibility of breaking series, which is not ideal for statistical analysis of trends. An example of this was the removal of Universal Child benefit, which caused a big drop in the coverage of children in DWP/ HMRC data. To future proof statistics, we need to make sure that we are not reliant on just one source of data. In addition, accuracy will often depend on whether it is a critical variable in the administrative function, when it’s not quality drops, as the data may often be missing (if voluntary) or it does not undergo robust checks on collection.
- Design: sometimes administrative data collection processes were designed a few decades ago and can rely on legacy IT to be analysed. This also implies that design of questions or forms doesn’t fit new needs or requirements, and does not follow user centred design principles, affecting the quality of the data collected. However, some of the major sources that we are currently using have undergone improvement in this area.
- Supplier restrictions: some suppliers place restrictions on the data that is shared, for example by applying techniques to data to enhance privacy (such as hashing, perturbing, aggregating data). This can damage the usefulness of the data and how it can be used within statistical outputs.
Protecting privacy and acting ethically
Who seeks to protect the privacy of UK citizens in the production of statistics and analysis? How?
- All producers of official statistics are legal entities and data controllers in their own right, and therefore are responsible for protecting the data they hold and use. Data protection legislation, including the UK GDPR and the Data Protection Act 2018 provides the statutory framework all producers of official statistics must adhere to, and makes specific reference to personal data processed for statistics and research purposes. The Information Commissioner (ICO) is the independent authority that ensures compliance with data protection legislation and upholds information rights in the public interest. As part of their role the ICO provides advice and guidance, including, for example, a Data Sharing Code of Practice.
- As the executive office of the Authority, the ONS collects and processes information, both directly from individuals and from other organisations, and does so using a variety of methods. The Authority has the statutory objective to promote and safeguard the production of official statistics that serve the public good. Any personal data collected by the Authority can only ever be used to produce statistics or undertake statistical research.
- In addition to data protection legislation, personal information held by the Authority is further protected by the 2007 Act, and makes disclosure of personal information a criminal offence, except in limited prescribed circumstances, for example where disclosure is required by law.
- The DEA provides the Authority with permissive and mandatory gateways to receive data from all public authorities, crown bodies and businesses. These data sharing powers can only be used for the statistical functions of the Authority and sharing can only take place if it is compliant with data protection legislation.
- The 2007 Act requires the Authority to produce and publish the Code, governing the production and publication of official statistics. One of the core principles of the Code is around data governance and this states that organisations should ensure that personal information is kept safe and secure.
- The ONS provides guidance, support, and training on matters across the GSS, including on data protection and privacy.
Data protection
- The Authority has a dedicated Data Protection Officer and teams and colleagues that manage data protection and legal compliance and the security of data. These teams provide advice and guidance across the organisation on data protection matters; deliver training sessions on the protection of data; and engage regularly with the ICO.
- The Authority takes a data protection by design approach when processing data for statistical purposes. Privacy and data protection issues are considered at the design phase of systems or projects. The Authority has published extensive material regarding privacy for members of the public including privacy information for those taking part in surveys and a Data Protection Policy. For new projects that involve the processing of personal data, colleagues are advised to complete Data Protection Impact Assessments, that enable the Authority to identify any risks of processing to data subjects and to mitigate those risks.
Statistical confidentiality
- The Authority collects a vast range of information from survey respondents, as well as administrative data, such as registration information on births, deaths and other vital events. The ONS publishes statistics and outputs from this information, and statistical disclosure methods are applied so that the confidentiality of data subjects, including individuals, households, and corporate bodies, is protected. All statistical outputs are checked for disclosure risk, and disclosure control techniques are applied as required.
- The DEA facilitates the linking and sharing of de-identified data by public authorities for accredited research purposes to support valuable new research insights about UK society and the economy. The Authority is the statutory accrediting body for the accreditation of processors, researchers and their projects under the DEA.
- The Authority allows access to de-identified data within its trusted research environments. To ensure the security of the data and individual privacy, the Authority uses the Five Safes Framework:
- Safe People: trained and accredited researchers trusted to use data appropriately.
- Safe Projects: data that are only used for valuable, ethical research that delivers clear public benefits.
- Safe Settings: settings in which access to data is only possible using our secure technology systems.
- Safe Data: data that have been de-identified.
- Safe Outputs: all research outputs that are checked to ensure they cannot identify data subjects.
Security controls
- The protection of data is a top priority for the Authority, and we implement and operate substantial security measures for our staff, data and services. This security focus ensures that the Authority operates and continues to develop secure options that meet its objectives for data use and maintains public trust in how we access, use, process, store, and make available data for statistics and research purposes.
- To ensure the confidentiality, integrity and availability of our data are protected at all times, we operate a security management framework, which continuously evaluates the threat landscape, evaluates the security risks and ensures that the appropriate controls are in place, so that we are operating within corporate risk appetite, maintaining a strong security posture and complying with the relevant legislation, Code of Practices and industry best practice. This is underpinned by a robust secure by design approach, comprehensive protective monitoring, internal and external assurance and the training of our staff.
What does it mean to use data ethically, in the context of statistics and analysis?
- The Authority owns a set of six ethical principles relating to the use of data for research and statistics. These principles cover: public good, confidentiality & data security, methods & quality, legal compliance, public views & engagement, and transparency. The production, maintenance and review of these principles are conducted by the National Statistician’s Data Ethics Advisory Committee (NSDEC). The NSDEC was established to advise the National Statistician that the access, use and sharing of public data, for research and statistical purposes, is ethical and for the public good. The NSDEC consider project and policy proposals, which make use of innovative and novel data, from the ONS, the GSS and beyond, and advise the National Statistician on the ethical appropriateness of these. The NSDEC meet quarterly and have a key role in ensuring transparency around the access, use and sharing of data for statistical purposes.
- In 2021 the Centre for Applied Data Ethics (CADE) was established within the Authority. CADE provide practical support and thought leadership in the application of data ethics by the research and statistical community. The Centre provides a world leading resource that addresses the current and emerging needs of user communities, collaborating with partners in the UK and internationally to develop user-friendly, practical guidance, training and advice in the effective use of data for the public good. In addition to providing the secretariat to the NSDEC, the CADE mobilise the Authority’s six ethical principles via a self-assessment tool that is available to the entire research and statistics system. This tool supports researchers and analysts to identify ethical concerns in their work and then to engage with CADE to ensure mitigations and solutions are in place. Since January 2021, this tool has enabled nearly 900 pieces of ethical research and statistics and is growing in use by hundreds a year.
- Complementing the independent advice and guidance of the NSDEC and the self-assessment ethics services of CADE, the Centre also produces several bespoke ethics guidance pieces each year. These guidance pieces are typically produced in collaboration with an area of the ONS or the wider statistical system and focus on key concerns, such as identifying and promoting public good, considering public views and engagement, and specific ethical considerations in inclusive data, machine learning and geospatial data. Finally, the CADE also offer bespoke ethics support with specific projects, workstreams and teams and tailor their services to one-off events and longitudinal engagement work. This includes our international development programme where we work to support the work of various other National Statistical Institutes.
- The focus of the CADE’s activities is to ensure that the Authority’s ethical principles are promoted and accessible and that tools to ensure the principles are put into practice are effective and easy to use. We achieve this through promoting CADE at internal and external events, providing secretariat to the independent NSDEC, operating and providing oversight of the CADE self-assessment tool, producing specific, collaborative guidance pieces and providing bespoke ethics advice and support. By engaging with the CADE, researchers and analysts can ensure ethical practice, in-line with the Authority’s ethical principles, in the production of research and statistics.
Are current processes and protections sufficient?
- The Authority has well established processes and procedures in place to ensure the protection of the data of UK citizens. As the statistical and analytical landscape around data changes, as with during the COVID-19 pandemic, the Authority ensures that it remains up to date with any changes to privacy legislation, regulatory guidance or cross-government good practice that could impact data subjects. This ensures a robust statistical system that produces public-good statistics that are trusted by the public.
- The ONS security management framework incorporates and references appropriate recognised security standards and guidance from within Government (Cabinet Office, National Cyber Security Centre (NCSC), Centre for Protection of National Infrastructure (CPNI)) and international standards and best practice from international security organisations including ISO 27001, the American National Institute of Science and Technology (NIST) and the Information Security Forum (ISF).
- From a data ethics perspective, there are dozens of organisations in the statistical system who display their ethical framework and commit to using it. CADE goes beyond this by evidencing, transparently, the impact that engaging with CADE is having on the production of research and statistics. Numerically, by the number of projects that CADE and the NSDEC see each year, and in more detail, by the production of case-studies, publicly displayed meeting minutes and audits of projects that have been signed-off. Where researchers and analysts engage with CADE and their services, ethical practice can be assured and evidenced.
Understanding and responding to evolving user needs
Who should official data and analyses serve?
- Our data, statistics and analysis serve the public through our statutory duty to “promote and safeguard the production and publication of official statistics that serve the public good” (as set out in the 2007 Act). Everyone is a user or potential user of our statistics and can use data to inform their decision making: from policy makers to enquiring citizens, including local businesses, charities and community groups.
- Within the ONS, we have established an Engagement Hub to enable us to coordinate our engagement with users, understand user needs, reach new audiences and evaluate our engagement.
- Users are at the heart of everything we do. When identifying priorities for analysis, we do so through:
- discussions with other government departments and the devolved administrations.
- local engagement: our new ONS Local service works with analytical communities locally and with the wider civic society to support further analysis to target local questions to address local issues.
- Citizen focus groups with members of the public and the ONS Assembly with charities and bodies representing the interests of underrepresented groups of the population.
- drawing on external advice, for example the National Statistician’s Expert User Advisory Committee (who advise on cross-cutting issues).
- regular engagement activities with businesses and third sector organisations.
- A good example of the ONS reflecting the needs of users is the COVID-19 Latest Insights Tool, developed so that members of the public could find reliable, easy to understand information about the COVID-19 pandemic in one place. We engaged in user testing at key stages to make sure it met user need, and it became the most widely read product in the history of the ONS website.
- We also undertake an annual stakeholder deep-dive research which explores stakeholder needs, and a stakeholder satisfaction survey which supports evaluating progress against our strategic objectives.
- According to the Public Confidence in Office Statistics (PCOS) 2021 report, a very high proportion of respondents trusted the ONS (89% of those able to express a view) and its statistics (87%). This was very encouraging to see. It also asked respondents about their level of trust in the ONS compared to other institutions in British public life. Of the institutions listed on the survey, the ONS has the highest levels of trust, similar to that of the Bank of England and the courts system.
- In terms of analysis, the Analysis Function strategy explains how we bring analysts across government together to deliver better outcomes for the public by providing the best analysis to inform decision making. The Function serves policy makers across Government and has regular conversations with stakeholders to ensure that our data, statistics and analyses are relevant to public policy priorities and delivered in a timely way.
- Our Analytical Hub aims to provide capability and capability to deliver radial cross-cutting analysis that supports Government, civil society, and the public to understand the key questions of the day, responding flexibly and in a timely fashion to the ongoing economic and public policy priorities.
How are demands for data changing?
- Changes in society, technology, and legislation mean that more data are available, in richer and more complex forms, than ever before, with the COVID-19 pandemic shifting the expectations of users to receiving more insights more rapidly. The pandemic and its impact on our society and the economy has led to more complex questions which means that the needs for data are also accompanied for growth in expertise and support to use data that reflects the intersectional nature of policy enquiry. Our statistics need to be quick, relevant, trusted and reliable to withstand public prominence and scrutiny, respond to a rapidly changing environment, and inform critical policy
- The ONS aims to respond to the needs of the public, decision makers and society – including providing data and insight on the topics and priorities of the day. We have already evolved our approach to respond to demand increasing for data, statistics, and analysis to be:
- More timely, through more rapid surveys, such as the Opinions and Lifestyle Survey, and the use of new data sources, like financial transaction and mobility data.
- More local, with the production of more granular and hyper local data, which allows users to build up their own bespoke geographies that matter to them such as Gross Value Added (GVA) at lower super output area, and greater support for local users and decision makers using our ONS Local service.
- More inclusive, through making our data more accessible and reflective of our users, allow people to see themselves in our statistics and analysis, such as our shopping prices comparison tool; and
- More relevant, both in terms of topics, for example looking beyond Gross Domestic Product (GDP) to consider multi-dimensional wellbeing alongside improved measures of economic performance, and in terms of how we disseminate our data and statistics, for example through application programming interfaces (APIs), to empower users to do their own analysis.
- We will continue to build on this progress as demands change, for example through the increasing availability and evolving possibilities for artificial intelligence (AI). We did this particularly well during the pandemic, and have since focused on new policy priorities, such as the rising cost of living, the changing nature of the labour market and the experiences of Ukrainian nationals arriving in the UK having been displaced through the conflict.
- As well as responding to emerging issues, we are making it easier for the public to find and consume insights on topics of interest by pulling together our many different data sets in the form of dashboards and data explorers. These include the Health Index, which brings together different datasets at local levels, subnational data explorers considering the economy, society, environment and more across local areas, the new UK measures of wellbeing providing 60 indicators across 10 domains, and the latest data & insights on the cost-of-living. In addition, data collected through the 2021 Census is being made available through our flexible table builder, articles of interest and interactive maps.
- Demands for data are also changing among expert users, with the rise of big data and the need for more data linkage across Government. This is why we are investing in the IDS to provide a secure environment for trusted researchers to analyse new, granular and timely data sources for the public good.
How do users of official statistics and analysis wish to access data?
- We have developed a deep understanding of our diverse user audiences and their unique needs and requirements. We have grouped website users into 5 persona groups:
- Technical User: Someone who only wants data and will create their own datasets and customise their own geography boundaries. Data from the ONS are frequently used in conjunction with data from other government departments. They may be expert at what they do with statistics but can be less expert at looking for base data. There is not the urgency we see from the expert analyst. They do not tend to use written publications.
- Expert Analyst: Someone who creates their own analysis from data. This user downloads spreadsheets into their own statistical models to create personal datasets.
Access to the data for analysis is more important to them than its presentation. - Policy Influencer: Someone who uses data for benchmarking and comparison. For some policy influencers, this requires data and analysis at a regional or local level. They rely on official government statistics, trusted by decision makers, for their reports.
- Information Forager: Someone who wants local data and keeps up to date with the latest economic and population trends to help them make practical, strategic business decisions. They often do not know exactly what to search for, until they come to it.
- Inquiring Citizen: Infrequent visitors to our site who search for unbiased facts about topical issues. They want simply worded, visually engaging summaries, charts and infographics. Data can help make informal decisions about pensions and investments. They engage on social media and browse with smartphones or tablets.
- We have found that citizen type users want ways to get data on their local area or to fact check data by using interactive tools, summaries, dashboards, visualisations and maps, whereas more data literate users are interested in the data itself and the associated metadata and methodology. Technically advanced analysts are also interested in being able to access data via APIs and for data to be easily used in tools such as Python and R. These technical users prefer not to have heavily formatted Excel spreadsheets with multiple tabs.
- We know many of our local users are keen to understand a place by many topics rather than go to several publications with multiple datasets for one datum. As such, we are developing our Explore Subnational Statistics service, that will allow users to select a geography and see metrics across a range of themes. ONS Local also helps local government users to bring together evidence across their area, alongside local intelligence and data and analysis to create greater insights.
- Our search engine optimisation strategy recognises that not all users need to or want to come to our website and that Google and the major search engines often represent our data directly in their search results. This is particularly applicable to those with accessibility needs, since many of these ways of representing our data can be returned via voice search on a variety of platforms.
- Citizen users want data communicated in a way that is easy for them to digest and research has shown that there is a degree of education required about our key topics such as inflation and GDP. Users may also be interested in their local areas as much as national level data.
- Within the ONS Engagement hub, there are dedicated teams focussing on building relationships with different audience segments. The External Affairs team supports stakeholder engagement with key government stakeholders, business and industry groups, consumer bodies and think-tanks. The dedicated Outreach and Engagement team is focused on engaging with local authorities, and building sustainable relationships with community and faith groups, voluntary sector organisations and others representing the interests of those audiences traditionally less well represented in official statistics and government data.
How can we ensure that official data and analyses have impact?
- Ensuring what we are doing focuses on the topics that matter most to people and ensuring that it is disseminated in a way that is easy to understand, engaging and relevant to the audience, is key to achieving impact.
- For example, we worked with colleagues across government to establish new data collection on Ukrainian refugees and Visa sponsors in the UK. This allowed us to provide invaluable insights on the experience of Ukrainians coming to the UK, and the impact on service provision in areas. The publication of this in English and Ukrainian supported both policy decisions around the humanitarian response & provided those impacted with the ability to read the findings in their own language.
- On cost of living, we delivered a broad information, engagement and communications programme that included promoting cost of living insights and data products to a wide range of users; diversifying engagement with non-expert users (for example 99 civil society and community groups attended a session on how they could benefit from our insights); and seeking user feedback to further improve our cost of living products and analysis. Impact from this includes a continued increase in the use of our insights tool with the personal inflation calculator being embedded into Guardian and BBC websites and the shopping prices comparison tool reaching over 700,000 uses in its first week.
- In June we launched a public consultation on the future of population and migration statistics in England and Wales. We engaged extensively with stakeholders before the consultation launch through sector specific round tables. The consultation launch itself was widely promoted across stakeholders in all sectors and around 500 people attended launch events and webinars. Engagement will continue throughout the consultation period to maximise awareness, understanding and response.
- We regularly review the impact of our work, providing impact reports to the Strategic Outputs Committee (was Analysis and Evaluation Committee) on a quarterly basis, providing deep dives on priority topics.
- The ONS has access to a number of metrics that can be used to assess the impact of our outputs, specifically reach and awareness to understand the importance and relevance of the insight and content. We also test engagement levels to understand how well our content performs to achieve cut-through and add value to a debate. In addition, we use our own surveys to test appetite for future topics and outputs.
- Reach / awareness – to test importance and relevance of topic and insights.
- Web page views: unique sessions a page was viewed at least once, within seven days.
- Social media impressions: number of views per posts on social media platforms, within 48 hours
- Print, digital, broadcast: number of views / listens of ONS insight from outputs.
- Engagement – to test content cut-through, clarity of messages and the value added to a debate or discussion.
- Time spent on web page: time users spent viewing a specified page or screen, taken after seven days.
- Social media engagement: shares, favourites, replies and comments, URL click throughs, hashtag clicks, mention clicks and media views, taken after 48 hours.
- Print, digital, broadcast: cut through of ONS comments / main points within coverage.
- Online pop-up survey for targeted releases to test user satisfaction and help set continued improvements.
- An annual stakeholder survey and in-depth interviews targeted at government departments, charities, public institutions and businesses, tests satisfaction and use of statistics and analysis, and future needs.
- Reach / awareness – to test importance and relevance of topic and insights.
- We bring these insights together, alongside granular stakeholder engagement to understand the impact of our work, at both a topic level, and our individual outputs to support ongoing decision making around both what we focus on, and how we can best maximise the impact of our work.
User engagement
- User engagement is key to making an impact. Our User Engagement Strategy for Statistics promotes a theme-based approach to user engagement. This allows all users of government data and statistics to interact with the GSS by their area of interest or by cross-cutting theme. This approach also aims to support collaboration with producers of official statistics to develop work programmes, address data gaps and help improve GSS products and services.
- We have created the ONS Assembly to support regular dialogue and feedback on delivering inclusive data with charities and bodies representing the interests of underrepresented groups of the population. The Assembly aims to be:
- A forum for the ONS to engage and have an open dialogue with charities and member bodies on a range of key topics.
- A space to build trusted, long-lasting relationships between members and the ONS.
- An opportunity for members to share insight, advice and feedback on behalf of their interests and audiences.
- A space to exchange news and move collaboratively toward the future of data.
- A route to help ensure vital themes, such as inclusivity, accessibility, wellbeing etc. are fully explored.
- Alongside working with users in local government, wider civic society, and the public, we build and maintain strong relationships with key policy makers in central government. These relationships with both local and central policy makers, allow the ONS to understand the challenges they face. We can then help build their understanding of our statistics and analysis , and the wider evidence base, enabling greater insight towards the topics that matter to our users, maximising ONS’s impact on decisions that affect the whole country.
Communication
- The way we communicate our statistics has much improved in recent years, having a direct influence on our impact. Statisticians speaking directly to the public via television and radio helps the transparent communication of statistics, assisted by our amend to publication times which ensures parity of communication (from 26 March 2020 we amended release times for market sensitive releases at 7:00 (rather than 9:30) and this was agreed with OSR).
- This includes 226 broadcast media interviews undertaken by ONS spokespeople during 2022/23 financial year, generating an average of 2.5k pieces of quoted coverage in the broadcast/online media each month, as well as our solid presence on Twitter (354.2k followers), which achieves good comparable engagement and reach, with threads created to support outputs and to respond to specific trends on social media.
- The ONS’s ‘Statistically Speaking’ podcast takes a deep dive into hot data topics and explains what’s behind the numbers. Between April 2022 and March 2023, the podcast had more than 18k downloads, including our most popular episode, ‘The R Word: Decoding ‘recession’ and looking beyond GDP’, which achieved 1,891 downloads in its first 30 days. In total since the podcast started in January 2022, it has achieved almost 30k unique downloads.
Dissemination
- Our approach to dissemination plays a pivotal role in maximising the insight and impact our data has. A prime example is our award-winning Census dissemination portfolio, with our Census maps offering users the ability to explore spatial patterns down to the neighbourhood level, empowering planners, and policymakers to precisely target interventions, and for any users to better understand their communities. Since then, we have developed interactive and highly localised content to encourage audiences to engage with the more granular data, producing data visualisation tools and innovative content so citizens can explore the data that is important to them. Users responded positively, saying they were “visually excellent”, “personalisable, visual and really well presented”.
- To promote widespread reuse of our insights and thus amplify their reach and impact, we designed tools to encourage users to embed custom views in their websites and publications. The results have been remarkable, with Census maps accounting for an impressive 24% of total views on the ONS site, garnering around 30 million views since its launch in November 2022.
- We also released our custom area profiles product. We recognised that user needs often extend beyond predefined geographic areas. Users can now draw specific areas of interest and generate tailored profiles with indicators and comparisons that match their unique use cases. The outputs are also exportable for use in websites and presentations, and has already reached tens of thousands of users, bridging gaps in specialised expertise.
- To cater to time-constrained users and unlock the potential of data hidden in spreadsheets, we introduced semi-automated localised reporting. With algorithms generating approximately 350 customised reports, one for each local authority, key trends in respective areas are efficiently explained, making insight more accessible and impactful. These reports are extensively accessed and widely referenced by local authorities and other local users.
- Additionally, we enhanced the reach of these reports by making their content crawlable by search engines. Snippets from these reports now appear directly in search results and voice-based queries, further bolstering the impact of our data and analyses.
- In tandem with our work on the Census, we have goals to transform the presentation of ONS’s day-to-day insights, with a particular focus on enhancing offerings for the general public. We have a digital content team comprised of data visualisation specialists, data journalists, and designers focussed on collaborating with analytical teams to achieve this.
- This approach centres on addressing the most pertinent questions for our users, often focussed on creating more personalised and localised experiences. By empowering users to see themselves within our data, we establish a meaningful connection with our audience.
- Through these collaborations with analytical teams, we are reaching a much-expanded user base, with audiences engaging with our content 40% longer than typical offerings. Our commitment to delivering impactful, accessible, data-driven insights ensures our offerings resonate with diverse audiences and have a lasting impression.
- We gather a range of feedback on our digital products to develop and improve the usability of our content, including our interactive online content and tools. We also undertake analysis of how user groups access our content, needs across our users of data, of statistics and trends, and those who want a deeper understanding of topics.
- There were 6.4m users of the ONS website in 2022/23. Most users (4.3m) use a mobile device to access our website, with 1.9m on desktop and 0.2m on tablets. Engagement levels remain highest with desktop users. There were nearly 26m pageviews on ONS.gov.uk over 2022/23. This only represents users that accept cookies. We estimate this to be approximately 30% of users.
- Peak demand on the website was driven by census releases, with 8x higher daily page views on 29 November for the ethnic group, national identity, language and religion census data than average, and nearly 6x higher daily page views for the demography and migration releases for census on 2 November. The census first release on 28 June saw roughly double the daily average for the year.
- The most popular topics on ONS.gov.uk across the year were census (3.6m pageviews), covid (3.6m pageviews) and inflation (3.2m pageviews).
Analysis
- Ensuring that analysis is good quality will also help ensure that it has impact. The Analysis Function Standard sets expectations for the planning and undertaking of analysis to support well-informed decision making. It provides clear direction and guidance for all users and producers of government analysis.
- The Analysis Function also shares best practice through the Analysis in Government awards, which includes an impact award
- Maximising the impact of across government and for the ONS is in understanding the priorities of the day, both for the citizen but also decision makers at the heart of local and central government, and flexing at pace as new priorities emerge – this often means the evidence base may be less robust or that data do not exist, but ONS’s Analytical Hub is constantly adapting to produce the best analysis at pace to support decision making. The ONS also scans the horizon anticipating what may be becoming an emerging issue.
How do we ensure that users, in the Civil Service, Parliament and beyond, have the skills they need to make effective use of data?
- There are a range of initiatives aimed at improving the analytical skills of civil servants and beyond.
The Analysis Function
- The Analysis Function is a network for all civil servants working in government analysis and aims to help improve the capability of all analysts across government. The Analysis Function website hosts the dedicated analysis function curriculum webpages alongside a range of technical analytical learning for all, as well as a guidance hub providing access to key analysis guidance. The Function also hosts regular information sharing events and webinars.
- The Function works with the policy profession and other teams across government to ensure we are building a level of analytical capability specifically for non-analysts. The Analysis Function have also developed a learning pathway specifically for non-analysts in line with wider government reform priorities.
- The Analysis Function conducted a review in 2022 of the analytical capability of policy officials. Since then, we have been working closely with the policy profession unit through a dedicated implementation working group looking to address the recommendations from the review. Progress has been made against several actions including the launch of the analytical literacy course, data master class and launch of policy to delivery pilot.
The Methodology Advisory Service
- The Methodology Advisory Service (MAS) based within the ONS offers advice, guidance and provide support for the public sector, nationally and internationally, using teams of experts covering different areas of statistical and survey methodology. We offer an advisory service for:
- methodological advice on production and analysis of data
- development of surveys or outputs
- feasibility studies
- methodological research to answer complex problems
- quality assurance of methods or outputs
- cross-cutting reviews of processes and methods across a department’s statistical work
- evaluation of competing sources
- health checks before an OSR assessment
- The ONS’s methodologists and researchers receive their own methodological advice from the Methodological Assurance Review Panel (MARP). They provide external, independent assurance and guidance on the statistical methodology underpinning ONS statistical production and research.
The Data Science Campus
- The Data Science Campus is at the heart of leading-edge data science capacity building with public sector bodies in the UK and abroad. We equip analysts with the latest tools and techniques, giving them the capability to perform effectively in their roles. We also work in partnership with organisations to ensure they have the capacity to develop their own data science skills in the long-term.
- Our evolving range of programmes reflects our focus on using data to drive innovation for public good, and provide analysts across the ONS, the UK public sector and international partners with a developmental framework to build capacity and enhance analytical capability:
- Data Science Accelerator
- Data Science Graduate Programme
- Degree Data Science Apprenticeship
- Masters in Data Analytics for Government
- Cross-government and Public Sector Data Science Community
ONS Local
- The ONS Local service provides peer-to-peer forums and platforms for local, regional, and national analytical communities to share best practice, and helps local users navigate around the extensive subnational offer from the ONS, both what is already available and what’s in development, and wider UK government data. For example, “ONS Local Presents” webinars allow ONS teams and analysts from local or central government to present analysis on a topic to a wide audience for feedback and challenge or to showcase innovation in techniques or data that may be useful to others. We have also held our first in a series of “ONS Local How to” workshop, aimed at a similar audience and run jointly with the Data Science Campus, to support local government analysts create dashboards and use APIs.
ONS Outreach and Engagement
- Finally, the ONS Outreach and Engagement Team are piloting and developing a programme of online engagement activities to help improve data literacy among underrepresented groups, non-expert users and those less likely to engage with data. The sessions vary in topic across the range of statistical production and collection themes at ONS and include a range of engagement formats. Topics and activities so far have included an introduction to the ONS and census webinars, Q&As on how to use census data and show and tells, demonstrations or learn ins on data tools such as Cost of Living Insights Tool, Census Maps Tool and Build a Custom Data Set Tool.
- These sessions can be tailored to the audience, including civil service colleagues who may be less confident or engaged with data, and aim to improve awareness and understanding of the foundations of data use and production.
Professor Sir Ian Diamond, National Statistician
Office for National Statistics
August 2023
Office for Statistics Regulation response
Introduction
About us
- The Office for Statistics Regulation (OSR) is the independent regulatory arm of the UK Statistics Authority. In line with the Statistics and Registration Service Act (2007), our principal roles are to:
- set the statutory Code of Practice for Statistics (the Code).
- assess compliance with the Code to ensure statistics serve the public, in line with the pillars of Trustworthiness, Quality and Value. We do this through our regulatory work that includes assessments, systemic reviews, compliance checks and casework.
- award the National Statistics designation to official statistics that comply fully with the Code.
- report any concerns on the quality, good practice and comprehensiveness of official statistics.
- While our formal remit covers official statistics, we also encourage organisations to voluntarily apply the Code to demonstrate their commitment to trustworthy, high quality and valuable statistics. Our 5-year plan sets out our vision and priorities for 2020-2025 and how we will contribute to fostering the Authority’s ambitions for the statistics system. Our annual business plan shares our focus for the current year.
Data and analysis in Government
How successfully do Government Departments share data?
- For the last five years, OSR has been monitoring and commenting on data sharing and linkage across government, producing reports to understand issues and identify opportunities to move the wider system forward. We are an advocate and a champion for data sharing and linkage, when this is done in a secure way that maintains public trust. It is our ambition that sharing and linking datasets, and using them for research and evaluation, will become the norm across the UK statistical system.
- Our latest data sharing and linkage report takes stock of data sharing and linkage across government. There has been some excellent progress in creating linked datasets and making them available for research, analysis and statistics.
- The Office for National Statistics (ONS) recently published statistics on sociodemographic inequalities in suicides, which utilised linked demographic and socioeconomic data about individuals from the 2011 Census with death registration data and, for the first time, was able to show estimates for rates of suicide across a wide range of different demographic groups. They believe this analysis will support the development of more effective suicide prevention strategies.
- Data First aims to unlock the potential of Ministry of Justice (MoJ) data by linking administrative datasets from across the justice system and enabling accredited researchers, from within government and academia, to access the data. Data First is also enhancing the linking of justice data with data from other government departments, such as the Department for Education (DfE), where linking data has unlocked a wealth of information for researchers about young people who interact with the criminal justice system.
- BOLD, led by the MoJ, is a three-year cross-government data-linking programme which aims to improve the connectedness of government data in England and Wales. It was created to demonstrate how people with complex needs can be better supported by linking and improving the government data held on them in a safe and secure way.
- Our report highlights an emerging theme on the overall willingness to share and link data across government and public bodies. The benefits and value of doing so are widely recognised, with the COVID-19 pandemic helping to change mindsets and highlight opportunities that exist for greater collaboration and sharing.
- However, through speaking with stakeholders across the data sharing and linkage landscape during our review, we also found there is still uncertainty about how to share and link data in a legal and ethical way, and about public perception of data sharing and linkage. There is also a lack of clarity about data access processes and data availability and standards across government. Together, these factors can lead to a nervousness to share and link data, which can cause blockages or delays.
- The picture is not the same in every area of government. Some areas have moved faster than others and we have found that culture and people are key determinants of progress.
- In the report, we summarise and discuss our findings within four themes in the context of both barriers and opportunities:
- Public engagement and social licence: The importance of obtaining a social licence for data sharing and linkage and how public engagement can help build understanding of whether/how much social licence exists and how it could be strengthened. We also explore the role data security plays here.
- People: The risk appetite and leadership of key decision makers, and the skills and availability of staff.
- Processes: The non-technical processes that govern how data sharing and linkage happens across government.
- Technical: The technical specifics of datasets, as well as the infrastructure to support data sharing and linkage.
- Overall, data sharing and linkage in government stands at a crossroads. Great work has been done and there is the potential to build on this. However, there is also the possibility that, should current barriers not be resolved, progress will be lost.
- Our review makes 16 recommendations that, if realised, will enable government to confront ingrained challenges, and ultimately to move towards greater data sharing and linkage for the public good. Following the report, OSR will be following up with those organisations mentioned in our recommendations to monitor how they are being taken forward.
The changing data landscape
Is the age of the survey, and the decennial Census, over?
- Statistics producers are increasingly turning to alternative data sources in the production of official statistics, in light of challenges with survey data collection and increased recognition of the potential of alternative data sources. Administrative data (that is, data that are primarily collected for administrative or operational purposes) are increasingly used to produce official statistics across a range of topics including health, such as waiting times data; crime, such as police recorded crime data; and international migration, such as borders and immigration data. Challenges faced during the COVID-19 pandemic highlighted society’s need for timely statistics and further demonstrated the potential of administrative data.
- However, such methods are unlikely to be able to capture all aspects of our population and society and therefore surveys are likely to play an ongoing but changing role in the statistical system. For instance, many crimes are not reported to the police, and data quality for some crime types is poor, so users cannot rely exclusively on administrative datasets of police recorded crime. To get a full picture of crime, both police recorded crime and the Crime Survey for England and Wales will always need to be used alongside each other.
- Moreover, there is strong interest in opinion and perception data such as the successful ONS Business Insights and Confidence Survey. Our Visibility, Vulnerability and Voice report on statistics on children and young people also demonstrated the strong user demand and importance of data that include children’s voice about their experiences and see the child holistically. These insights would not be available through administrative sources.
What new sources of data are available to government statisticians and analysts?
- We highlight in our State of the Statistical System 2022/23 report that the increasing availability of new data sources such as administrative data, management information and growing use of artificial intelligence should be seen as an opportunity for the statistical system.
- Administrative data are helping to provide new insights and improve the quality of statistics. For example, the Department for Work and Pensions (DWP) is exploring the integration of administrative data into the Family Resources Survey (FRS) and related outputs through its FRS Administrative Data Transformation Project.
- The ONS has developed experimental measures of inflation using new data sources, including scanner and web-scraped data, publishing experimental analysis using web scraped data looking at the lowest cost grocery items. Their Consumer Prices Development Plan details the new sources of data that can be used and the insights they can bring.
- Technology can also provide opportunities to collect data in different ways, such as DfE pupil attendance data that is automatically submitted from participating schools’ management systems and allows for more timely analysis of attendance in schools in England. This data collection won the RSS Campion Award for excellence in Official Statistic
What are the strengths and weaknesses of new sources of data?
- In the wider context of technological advances, statistics need to remain relevant, accurate and reliable, and new data sources support this ambition. However, with the use of these new and innovative data sources in the production of official statistics, producers need to manage risks around quality. Moreover, with more use of data science and statistical models in the production of official statistics it is crucial that producers ensure that any development of models is explainable and interpretable to meet the transparency requirements of the Code.
- To maximise the opportunities from new data sources, the role of the statistician has to evolve and keep pace with the increasing use of data science techniques. Our latest State of the Statistical system report highlights the difficulties producers have getting people with the right skills in post; these challenges are not being consistently felt across the whole UK statistical system. There is a concerning risk that continued financial and resource pressures will hinder future progress and evolution of the system to keep pace with increasing demand. A successful statistical system that is able to utilise new data sources depends on having a workforce that is sufficiently resourced and skilled to deliver.
- New data sources often provide insights in a timelier manner (in some instances this can be near real time such as England’s school attendance data) and provide better coverage (such as the web scraped and supermarket prices data often including all transactions or prices). On the other hand, there is a risk it may not be measuring what people want to measure and there is no option to amend or edit the data or questions being asked. Producers also have little control over the coherence and comparability in the data; there may be differences in how organisations record their data as well as between datasets on a similar topic. Data could also be missing for some observations and variables and the data could be bias by only covering certain groups of people or transactions.
Protecting privacy and acting ethically
What does it mean to use data ethically, in the context of statistics and analysis?
- As the regulator of official statistics in the UK, it is our role to uphold public confidence in statistics. In our view, an oft-neglected question of data ethics concerns not so much how data are collected and processed, but how the resulting statistics are used in public debate. As a result, we consider the question of whether a particular use is misleading as intrinsically ethical.
- One of the areas we continue to develop our thinking on is the topic of misleadingness, publishing a think piece on misleadingness in May 2020 and following up on our initial thinking in May 2021. The latter focuses on feedback to the first think piece that it is important to distinguish between the production of statistics and the use of statistics, as well as identifying areas not covered in the original think piece, like the risk of incomplete evidence. Based on our findings, our thinking has evolved to be clearer on the circumstances in which it is relevant to consider misleadingness: “We are concerned when, on a question of significant public interest, the way statistics are used is likely to leave audiences believing something which the relevant statistical evidence would not support.”
- We are launching a review of the Code of Practice for Statistics in September. As part of it, we will be asking the question “what are the key ethical issues in the age of AI: how do we balance serving public good with the potential for individualised harms?”. The review will run until December, and we will be highlighting how people can engage and contribute, including a planned panel session on this topic.
Understanding and responding to evolving user needs
Who should official data and analyses serve? How do users of official statistics and analysis wish to access data?
- OSR’s vision, based on our founding legislation, is that statistics serve the public good. In 2022 we worked in partnership with ADR UK to explore what the term ‘public good’ means to the public. We found that research and statistics should aim to address real-world needs, including those that may impact future generations and those that only impact a small number of people. There was also clear evidence that members of the public want to be involved in making decisions about whether public good is being served, through meaningful public engagement and full, transparent and easy access to the decision-making process of Data Access Committees (which evaluate applications from trained and accredited researchers for the use of de-identified data for research).
- In 2021, we published a report looking at Defining the Public Good in Applications to Access Public Data. The report highlights how researchers see their research as serving the public good or providing public benefits, and this differed between organisations. For example, the most frequently mentioned public benefits in National Statistician’s Data Ethics Advisory Committee (NSDEC) applications were to improve statistics and service delivery, whereas Reproducible Analytical Pipeline (RAP) applications mentioned policy decisions and societal benefit more.
How are demands for data changing?
- There continues to be a significant shift in government and public demand for statistics and data from COVID-19 to other key issues. The statistical system has demonstrated its responsiveness to meet these data needs. However, as mentioned at paragraph 17, pressure on resources and finances poses a significant threat to the ability of government analysts to produce the insight government and the wider population needs to make well-informed decisions.
- Working in an efficient way will help address one part of this problem: it will help ensure maximum value is achieved with the resources that are available, which will in turn help others across government appreciate the benefit of having analysts at the table. Our blog on smart statistics: what can the Code tell us about working efficiently highlights ways to support efficiency based on the Code.
- Users of statistics and data should always be at the centre of statistical production; their needs should be understood, their views sought and acted on, and their use of statistics supported. We encourage producers of statistics to have conversations with a wide range of users to identify where statistics can be ceased, or reduced in frequency or detail, to save resources if appropriate. This can free up resource, while helping producers to fulfil their commitment to producing statistics of public value that meet user needs. Ofsted has recently done this to great effect.
- The UK statistical system should maintain the brilliant responsive and proactive approach taken during the COVID-19 pandemic and look to do this in a sustainable way. Improvements to data infrastructure, processes, and systems could all help. For example, the use of technology and data science principles, such as that set out in our 2021 RAP review, supports the more efficient and sustainable delivery of statistics. This review includes several case studies of producers using RAP principles to reduce manual effort and save time, alongside other benefits. The recent Analysis Function RAP strategy sets out the ambition to embed RAP across government, and the Analysis Function can offer RAP support, through its online pages, its Analysis Standards and Pipelines Team and via the cross-government the RAP champion network.
- Statistics and data should be published in forms that enable their reuse, and opportunities for data sharing, data linkage, cross-analysis of sources, and the reuse of data should be acted on. The visualisations and insights generated by individuals, from outside the statistical system, using easily downloadable data from the COVID-19 dashboard nicely demonstrate the benefits of making data available for others to do their own analysis, which can add value without additional resource from producers.
- Promoting data sharing and linkage, in a secure way, is one of OSR’s priorities and we are currently engaging with key stakeholders involved in data to gather examples of good practice, and to better understand the current barriers to sharing and linking. This will be used to champion successes, support positive change, and provide opportunities for learning to be shared.
- To ensure overall success, it requires:
-
- independent decision making and leadership, in particular Chief Statisticians and Heads of Profession for Statistics having authority to uphold and advocate the standards of the Code.
- professional capability, again demonstrating the benefit of investing in training and skills, even when resources are scarce.
How can we ensure that official data and analyses have impact?
- To have impact, official data and analysis need to serve the public good (by being quality, trusted and valued) and be well communicated.
- This is reflected in the three pillars of our Code: Quality sits between Trustworthiness, representing the confidence users can have in the people and organisations that produce data and statistics, and Value, ensuring that statistics support society’s needs for information. All three pillars are essential for achieving statistics that serve the public good. They each provide a particular lens on key areas of statistical practice that complement each other and help to ensure the data are being used as intended.
- Quality is not independent of Trustworthiness and Value. A producer cannot deliver high quality statistics without well-built and functioning systems and skilled staff. It cannot produce statistics that are fit for their intended uses without first understanding the uses and the needs of users. This interface between quality, its institutional context and statistical purpose are also reflected in quality assurance frameworks (QAF), including the European Statistical System’s QAF and the International Monetary Fund’s DQAF. The Code is consistent with these frameworks and with the UN Fundamental Principles of Official Statistics.
- We use assessments and compliance checks to judge compliance with the Code for individual sets of statistics or small groups of related statistics and data (for example, covering the same topics across the UK). Whether we use an assessment or compliance check will often be determined by balancing the value of investigating a specific issue (through a compliance check) versus the need to cover the full scope of the Code (through an assessment).
- There is no ‘typical’ assessment or compliance check – each project is scoped and designed to reflect its needs. An assessment will always be used when it concerns a new National Statistics designation and will also be used to undertake in-depth reviews of the highest profile, highest value statistics, especially where potentially critical issues have been identified.
- We have some useful guidance that can assist producers in their quality management. We published a guide to thinking about quality when producing statistics following our in-depth review of quality management in HMRC, and released a blog to accompany our uncertainty report. It highlights some important resources, top among them the Data Quality Hub guidance on presenting uncertainty. Our quality assurance of administrative data (QAAD) framework is a useful tool to reassure users about the quality of the data sources.
- To support statistics leaders in developing a strategic approach to applying the Code pillars and a quality culture, we have developed a maturity model, ‘Improving Practice’. It provides a business tool to evaluate the statistical organisation against the three Code pillars and helps producers identify the current level of practice achievement and their desired level, and to formulate an action plan to address the priority areas for improvement for the year ahead.
- We are also continuing to promote a Code culture that supports producers opening themselves to check and challenge as they embed Trustworthiness, Quality and Value, because in combination, the three pillars provide the most effective means to deliver relevant and robust statistics that the public can use with confidence when trying to shine a light on important issues in society.
- In our report on presenting uncertainty in the statistical system we found that presenting uncertainty in a meaningful, succinct way that delivers the key messages can be challenging for producers. We found that typically, uncertainty is better depicted and described in statistical bulletins and methodological documents than it is in data tables, data dashboards and downloadable datasets.
- We also found that there is a wide and increasing range of guidance and advice to help producers think about how to best present uncertainty. OSR will do more to promote and support good practice and consider what this means for our regulatory work. We will focus on the judgements that we make and the guidance we produce to help producers to improve the presentation of uncertainty.
- In our report, we concluded that showing uncertainty in estimates, for example through data visualisation, is essential in improving the interpretation of statistics and in bringing clarity to users about what the statistics can and cannot be used for. At the same time, however, we recognise that this is often not always a straightforward task. With support from us and those at the centre of the Government Statistical Service (GSS), we encourage Heads of Profession for Statistics to review whether uncertainty is being assessed appropriately in their data sources, and to review how this is presented in all statistical outputs.
- We will continue to review the communication of uncertainty in our regulatory projects. We already have a good range of experience and effective guidance to help review uncertainty presented in statistical bulletins and methodology documents.
How do we ensure that users in the Civil Service, Parliament and beyond, have the skills they need to make effective use of data?
- Intelligent transparency is fundamental in supporting public trust in statistics. Our campaign and guidance aim to ensure an open and accessible approach to communicating numbers.
- In our blog What is intelligent transparency and how you can help?, we highlight our expectation that at its heart intelligent transparency is about proactively taking an open, clear and accessible approach to the release and use of data, statistics and wider analysis. We also recognise that whilst we will continue to champion intelligent transparency and equal access to data, statistics and wider analysis, it isn’t something we can do on our own. Our expectations for transparency apply regardless of how data are categorised. For many who see numbers used by governments, the distinction between official statistics and other data, such as management information or research, may seem artificial. Therefore, any data which are quoted publicly or where there is significant public interest should be released and communicated in a transparent way.
- We need users of data to continue to question where data comes from and if it is being used appropriately. We also need those based in a department or a public body to champion intelligent transparency in their team, their department and their individual work, build networks to promote our intelligent transparency guidance across all colleagues and senior leaders, and to engage with users to understand what information it is they need to inform their work to inform the case for publishing it.
- Parliamentarians also have a role to play in ensuring intelligent transparency in debate. This includes advocating for best practice around the use of statistics and calling out misuse of statistics where it occurs. Following the principles of intelligent transparency allows the topic discussed to remain the focus of conversation, rather than the provenance of the data.
- We have launched a communicating statistics programme that will in part look to understand how users want to access data and help support producers to communicate their data through those different means. This will include reviewing our existing guidance to understand what more we can do to support the use and range of communication methods while preventing and combatting misuse.
Statistical literacy
- In our regulatory work, when people talk to us about statistical literacy it is often in the context of it being something in which the public has a deficit. For example, ‘statistical literacy’ may be cited to us as a factor in a general discussion on why the public has a poor understanding of economic statistics. OSR commissioned a review of published research on this topic area and published an accompanying article to investigate if this was indeed the case.
- We found wide variability across the general public in the skills and abilities that are linked to statistical literacy. Our review highlights that a substantial proportion of the population display basic levels of foundational skills and statistical knowledge, and that skill level is influenced by demographic factors such as age, gender, education and socioeconomic status.
- Given this, we think that it is important that statistical literacy is not viewed as a deficit that needs to be fixed, but instead as something that is varied and dependent on the context of the statistics and factors that are important in that context. Therefore, rather than address deficits in skills or abilities, we recommend that producers of statistics focus on how best to publish and communicate statistics that can be understood by audiences with varying skill levels and abilities.
- Our review identified a number of areas where there is good evidence on how best to communicate statistics to non-specialist audiences in the following areas:
- Target audience: Our evidence endorses the widely recognised importance of understanding audiences. The evidence highlights that the best approach to communicating information (including data visualisations) can vary substantially depending on the characteristics of the audience for the statistics. Considering the target audience’s characteristics is, therefore, an important factor when designing communication materials.
- Contextual information: Contextual information helps audiences to understand the significance of the statistics. Our evidence highlights the importance of providing narrative aids, and also that providing statistical context can help to establish trust in the statistics. Again, this supports and reflects existing notions of best practice.
- Establishing trust: As well as providing context, we found evidence that highlighting the independent nature of the statistical body and, when needed, providing sufficient information so that the reasons for unexpected result are understood, can increase trust in the statistics. This finding aligns with the Trustworthiness pillar of the Code.
- Language: In the statistical system, statistics producers recognise that they should aim for simple easy to understand language. We found evidence to endorse this recognition – in particular, that, when used, the level of technical language should be dictated by the intended target audience.
- Format and framing of statistical information: We found evidence that different formats (e.g., probability, percentage or natural frequency) and/or framing (e.g., positive or negative) in wording can lead to unintended bias or affect perceptions of the statistics and both need to be considered. This finding is probably the one which is least widely recognised in current best practice in official statistics, and we consider it is an area that would benefit from further thinking.
- Communicating uncertainty: Communicating uncertainty is important and may need to be tailored dependent on the information needs and interest levels of the audience. This topic is a particular focus area for OSR, and we discussed our report on communicating uncertainly at paragraph 39.