What are the critical data gaps?

Across the UK data infrastructure, a considerable amount of data exists to explore the experiences and outcomes of a range of groups of people with different characteristics. As a core demographic variable, sex is collected in most administrative and survey data. The UK collects, reports and produces considerable analyses of data on disability. Furthermore, the UK has, compared with most other European countries, rich data on ethnicity based on detailed consultation and piloting. Ethnic group data are available across many official data sources, including administrative, survey and the UK censuses, though data quality and granularity varies across sources. The UK also has good data on religion from the censuses and some government surveys, although the religion question is a voluntary one, and therefore not answered by all respondents. The harmonised question response options for both religion and ethnicity at the most granular level are also not the same for all the countries of the UK.

Nevertheless, our consultation activities identified a number of areas where participants feel there are critical data gaps, with 70% of organisations and 61% of respondents to the online consultation stating that data gaps had impacted their ability to answer the questions of most importance to them. These critical data gaps can be broadly separated into groups or characteristics that are missing completely from the data and those where insufficient data are available or are of insufficient quality or granularity to meet user needs.

Back to top

Groups or characteristics missing from the data

Across the consultation activities, a number of groups were repeatedly identified for whom even basic demographic information is missing. These included:

  • non-household populations (for example, members of residential establishments such as care homes or prisons, and homeless people, particularly those who do not access any services for rough sleepers)
  • transgender, non-binary, and gender-diverse people
  • groups often deemed “harder to reach” (for example, Gypsy, Roma and Traveller groups, ex-prisoners, asylum seekers, victims of domestic violence and undocumented migrants or victims of human trafficking)

Some of these groups include the most vulnerable and disadvantaged people in the UK, rendering the absence of data reflecting their lives and experiences as especially critical.

Children are another group that many identified as missing from the data. Where we do have data for them, this is often collected from people other than children themselves and therefore children’s own voices may not be heard. The Nuffield Foundation has identified a number of critical gaps in the data on children. This includes a lack of information on all areas of life for looked-after children as well as under-representation of children who have experienced abuse or neglect in early childhood and a lack of information on their outcomes.

report from the London School of Economics also noted the lack of data to understand child poverty and multidimensional disadvantage among children, specifically identifying young carers, migrant children, Gypsy, Roma and Traveller children, and children at risk of abuse or neglect as groups that are “missing” from or “invisible” in existing data. These groups were also highlighted by participants in our consultation activities.

Various governmental participants described gaps in understanding the digitally excluded population, the reasons why and in what circumstances people may be at risk of digital exclusion, and the extent to which they are not represented in routine data collection. Participants felt that this has likely been exacerbated by measures put in place in response to the coronavirus (COVID-19) pandemic, for example with many surveys now being adapted to online platforms, potentially generating charges for participants, and restricting access.

While there will be many variables and types of data that could potentially be useful in understanding inequalities across the population, participants in our consultation activities identified two key variables as having policy and explanatory relevance. The first was income data which is rarely collected alongside personal characteristic information and was identified as a critical gap in census data, seen as essential to understanding disadvantage. The second was socio-economic background, an important variable for understanding topics such as educational inequalities, though rarely available in government data and not included in the censuses; it is often only available in the form of rough approximations, such as the binary Free School Meals’ eligibility measure in the Department for Education administrative data. Participants suggested that these should be included more regularly within data that are collected, alongside personal characteristics.

Back to top

Groups for whom there are insufficient data

Even for those who are included to some extent in the data infrastructure, there are gaps in the information that is collected.

Although data on sexual orientation are collected in several UK data sources, there is a scarcity of information on the differing experiences and outcomes of people in terms of their sexual orientation.

In addition, despite pregnancy and maternity being protected characteristics under the equalities’ legislation in Great Britain, information on inequalities in pregnancy and pregnancy outcomes is partial. For example, recognition of racialised differences in maternity outcomes was only made possible by the collection of data by a charitable organisation over the last few years.

Gaps in the data on religion were noted by both academics and learned society participants and respondents to the online consultation. Specifically, a number of participants said that religion is often not collected in surveys, and when collected, is not routinely reported or is often conflated with beliefs and practices, which can obscure inequalities.

Across several consultation activities, participants also described a lack of data on relevant personal characteristics in administrative data sources to help with understanding equalities issues. To our knowledge, no government data at all is collected on the operation of caste in the UK despite qualitative evidence on caste discrimination.

Back to top

Groups for whom data are of insufficient quality

Even where relevant groups are included in survey or administrative data, there are risks that the quality of these data is poor. First, as noted in the section on groups or characteristics missing from the data, information on children’s characteristics may not be collected directly from the children themselves but provided by their parents, carers, teachers or others responsible for children. The same may apply to household residents who are temporarily absent, for example in a communal establishment. Information provided by proxies may be inaccurate. In the case of administrative data, it is often unclear whether the data were reported by the individual concerned or not.

Second, there can be potential problems of missingness in data on items relevant to inclusion. For example, census questions on religion are voluntary and therefore have lower response levels  than compulsory questions. There also appear to be relatively high levels of non-response on ethnic groups in some administrative data sources. The government’s Ethnicity Facts and Figures website shows that for some administrative data, ethnic group was not available for over twenty percent of cases. This may lead to bias in the data but, without clarity on who this information is missing for, it is not possible to ascertain the direction of any bias.

Third, information from previously collected data sources (for example in a panel study or when data from different sources are linked) can become inaccurate over time as few characteristics are permanently fixed. Characteristics can change as groups express preferences for particular categorical groupings or because different groupings are suitable for different purposes. There are, for example, occasions when the use of “Black” or “Asian” categories may be appropriate, but more granular terms are often necessary. In addition, many important characteristics, such as disability status (for example as a result of a stroke or an accident) or socio-economic position (as a result of redundancy) can change unpredictably.

These three issues all potentially lead to greater “noise” in the data, leading to a more blurred picture, or to bias, which can produce a misleading picture. Sub-optimal data quality and the under-representation of certain populations or groups within data could result in a range of impacts, including discrimination, misrepresentation, reduced life chances, hidden harm and potentially even loss of life for those in highly vulnerable circumstances. As a result, addressing these gaps is viewed as a priority.

Back to top
Download PDF version (1.01 MB)