How inclusive are the UK’s current approaches to data collection?

Data are crucial for understanding the needs and circumstances of different groups of people, enabling the translation of information into insights, from which action can be taken. This is only achievable if we collect complete and appropriate data, to ensure that everyone counts and is counted, and no one is left behind. Our consultation activities identified a range of issues that need to be addressed for us to be able to make this happen.

Back to top

Addressing participation in data collection activities

Research with Civil Society Organisations (CSOs) and individuals from relevant groups and populations indicated that there are a wide range of practical, cultural and emotional factors that impact on people’s willingness, ability and opportunity to provide their personal information and participate in formal research exercises. These can broadly be separated into issues around trust and trustworthiness, a willingness to participate and accessibility of data collection. However, these concerns and issues are not mutually exclusive, and a combination of these factors may affect people’s ability to engage with data collection activities.

Trust emerged across all of our consultation activities as a barrier to participation in data collection. This included a perception among several participants that there is a general sense of distrust in the government, as well as in government statistics, particularly, though not exclusively, among under-represented groups (specifically described as affecting those from Gypsy, Roma and Traveller communities, other minority ethnic groups and documented and undocumented migrants). Participants described how this could result in some groups being under-represented or effectively invisible, ultimately leading to policy decisions which may not adequately reflect these populations and increasing their distrust.

Participants noted a degree of uncertainty and apprehension among relevant groups and populations about how their data may be used by the government. Many identified fears that their participation in data collection or in accessing certain services could lead to unequal treatment, discrimination, or worsen their situation. This was described by groups involved in collecting data, by individuals from relevant groups and populations and CSOs. Individuals described not feeling able to report violent crimes due to a perceived risk of being detained because of disclosure of their immigration status. Respondents to the online consultation felt that the onus for reaching and reassuring these members of society was on those collecting data, to ensure their safety and that the needs of these groups are understood and reflected.

Our consultation activities also identified additional barriers to individuals’ participation in data collection which may affect their willingness to participate. These included:

  • being unable to identify themselves in the options included within data collection tools, and feeling excluded by the use of inappropriate wording (for example, individuals not having the opportunity to express transgender, non-binary or gender-diverse identities, dual nationality or multiple ethnic categories)

    “I like to say, ‘I’m from an African Caribbean [background], my mum’s Ghanaian and my dad’s from Barbados,’ but there’s no form that can get me to say that. Then I just feel outcasted.” Individual

  • exhaustion from over-research (for example, those with mental health issues and racialised and disadvantaged ethnic groups)
  • experiencing competing pressures in their day-to-day lives (for example, managing with a physical disability, managing paid and unpaid work)
  • there being little or no perceived personal or community benefit from participation, especially where previous consultation activities had not led to action or where tangible results from participating were not seen

Academics and learned society participants similarly noted that some people may not want to participate in data collection activities. This results in a “survivorship bias” whereby researchers focus their efforts towards groups who have already participated and neglect those who are under-represented, though this problem is not specific to vulnerable or marginalised groups.

Even where there are no issues with trust or trustworthiness and the individual is willing to participate, we found that people may still be prevented from doing so due to lack of accessibility of data collection exercises.

First, online data collection instruments can exclude those who have no or limited digital access or lack the necessary digital skills. Digital exclusion was described by government participants as reflecting technology and skills gaps, and costs associated with participation. Lack of Wi-Fi availability, and costs incurred through telephone engagement were reported barriers to participation for digitally excluded groups and those experiencing financial hardship. Organisations who participated in the online consultation raised concerns around the move from face-to-face approaches to online surveys during the pandemic, and that specific sub-groups may be excluded as a result. Concerns were expressed by respondents to the paper consultation around the growing number of surveys being hosted online and an awareness of the implications of their own lack of digital skills. Respondents highlighted that decision-makers must recognise and respond to the difficulties relating to digital exclusion to avoid further exclusionary practices.

Second, methods may not consider the language, literacy or comprehension needs of different population groups. For example, participants from government stressed that older people, disabled people and those unable to participate due to language or interpretation barriers are at greater risk of exclusion from research. Those who participated in the online consultation also raised concerns around individuals deemed unable to provide informed consent often being excluded from surveys, such as older people or those lacking mental capacity.

Third, personal identity and characteristics of those responsible for designing and carrying out the data collection and analysis was a concern for some groups. There was a perceived lack of diversity within data collection organisations they felt that greater representation across relevant groups and populations within the research community would ensure better understanding of different cultures, address barriers to participation and reduce the risk of burdening participants with duplication of research.

Back to top

Ensuring that the data collected meet respondent and user needs

The need for data collectors to provide meaningful categorisations that respondents can recognise and use to describe themselves and their circumstances emerged consistently from our consultation activities.

The labels used to capture individual characteristics within data collection were perceived as critically important to enable people to select categories in surveys and on forms that reflect their personal characteristics and circumstances and to ensure that the data allows for an accurate understanding, and actions can be taken in response. Restricting the presentation of data under labels which could homogenise diverse and distinct groups was viewed as highly problematic, misleading and potentially offensive.

If the value of data is to enable people to be better understood and represented in services and policies, data must accurately represent people’s circumstances and identities. Within survey data in particular, the number of respondents selecting an “other” or “mixed” ethnic group category has increased, restricting effective understanding of an individual’s identity.

Questions were often considered outdated by CSOs and individuals from relevant groups and populations. This was particularly noted for questions and definitions around ethnicity, which may conflate race, ethnicity and nationality and provide broad categories that cover a range of ethnic groups and national origins. This was also noted within questions on disability based on outdated, deficit concepts that do not sufficiently capture the experiences of individuals (including needs, structural barriers encountered and experiences of overcoming barriers, or the diversity of disabilities, for example).

The organisations and individuals who took part in the online consultation specifically called for disability data collection to shift from a focus on the medical model, which looks at an individual’s impairments or differences, towards the social model of disability, exploring the individual’s needs and perspectives and viewing society as a major contributor to incapacity. This could better address the organisational and structural barriers which limit people’s participation in society.

Back to top

Ensuring data collected are of sufficient quality to accurately count everyone in society and monitor their outcomes

Those who participated in our consultation activities identified various quality issues in relation to data collection, particularly in terms of conceptual challenges and lack of harmonisation and coherence.

The need to harmonise the data that are collected on personal characteristics was stressed so that the characteristics and circumstances of minority groups are reflected in all UK administrative and national survey data. Participants also highlighted the need for the definitions, categories and types of questions used to collect data on personal characteristics to be more inclusive. A lack of harmonisation was seen to hinder the ability to disaggregate as well as to compare data across different countries of the UK. Multiple definitions, classifications and response options for ethnicity, disability, sex and gender were highlighted by those who participated in the consultation activities as presenting particularly challenging conceptual issues.

It was stressed by CSOs that the lack of harmonisation in the administrative data collected from public services (for example schools, police forces, health services) has resulted in an inconsistent picture of particular subgroups (notably relating to faith and ethnicity) and misalignment between “official” data and those collected by CSOs on the ground. Government participants and online respondents mentioned that the 2001 census categories for ethnicity remain widely used in data collection, even though it has been recognised that this does not always adequately reflect the ethnic diversity of the population at the present time.

The lack of consistency in the use of disability definitions across the UK was highlighted, resulting in disability information being captured in different ways. Inconsistencies in definitions for the term “disadvantaged” were also mentioned as a significant issue for analysts, which was particularly problematic when trying to undertake analyses for specific local areas.

The issues arising due to small sample sizes in household surveys were raised across almost all of our consultation activities. These create a lack of granularity within the data, which undermines understanding of specific sub-groups of the population, whether that be by sector, geography or characteristic and can render entire groups invisible in data. Achieving local level information on specific populations or group characteristics often involves aggregating smaller groups into larger categories. Such larger categories may not adequately reflect populations of interest and individuals may not identify with larger aggregations; the continued use of the broad “BAME” (Black, Asian and Minority Ethnic) category was said to have the potential to marginalise and alienate relevant groups and populations.

Participants noted that small sample sizes also hinder our ability to undertake intersectional analyses. For example, even though information on age and sex distributions may be available at small geographies, analysing the experience of older, migrant women in specific areas will be inhibited by small sample sizes. Respondents to the online consultation suggested that the issues with sample sizes within household surveys could be overcome through the use of qualitative approaches for specific populations, noting that quantitative approaches are limited for understanding the lived experiences of different groups of people.

Frequency of data collection was also raised as an important issue. It was acknowledged that the UK Censuses are a valuable source of inclusive data, providing insights not achievable with other data sources, but the 10-year gap between censuses means that the resulting data are often several years out of date. The contextual data provided by the census on local area characteristics can help in understanding the experience of disadvantage but is more valuable when supplemented by the small-area Indices of Multiple Deprivation (IMD).

However, the IMDs are country-specific and updated at different intervals so cannot provide a harmonised understanding of local area deprivation across the UK. Additionally, there is frustration that publicly funded deep-dives on key equality areas (such as sexual orientation) are undertaken infrequently and seemingly with no long-term strategic intent.

Finally, a concern raised by academics and learned society participants related to the use of existing data as learning data in machine learning algorithms. Any biases in that existing data, for example where they mis-represent certain parts of the population, will result in unrepresentative or biased predictions. This was said to have the potential for biases to be perpetuated in future decision-making.

Back to top
Download PDF version (1.01 MB)