Findings on the inclusivity of methodological practices 

Data collection, research design and analysis procedures

Participants reported a general awareness of the impact of the data collection methodology on the inclusivity of data. When data collection is carried out with an effort to include everyone, this was said to help people feel seen, respected, and valued for who they are. It was seen as important to take a collaborative and thorough approach when collecting and analysing data, with sensitivity to diversity. Although the need for standardised data collection was acknowledged, there was concern around a one-size-fits-all inclusion model which may unintentionally reinforce practices that exclude the most under-represented minority communities from being counted.

Linked to this, a lack of diversity within the research profession was raised as an issue. For example, the personal identity and characteristics of those designing and carrying out the data collection and analysis was raised as a concern by some groups as was a lack of awareness of issues affecting particular groups.

“There exists a current lack of queer data competence among those responsible for the design and execution of data practices.” (Individual).

On the other hand, concerns were also raised regarding research conducted by advocacy groups, with some participants asserting that these groups may have political agendas which do not necessarily reflect the views of all the individuals they represent. It was instead advised that research about under-represented  groups should directly involve people from those groups, including research carried out by public bodies.

“Employ a deaf-blind individual, don’t presume your data from Sense or Deaf-blind UK defines a community.” (Individual).

For more detail on how this links to trust in those collecting the data and the need for better representation, see the findings on trust.

Involving people from affected communities or population groups in all aspects of the research process, referred to as data co-production, was also suggested as a means to improve the appropriateness and relevance of research. For example, this could make data collection methods more accessible and avoid possible confusion and inaccurate recording.

“It is concerning that we have heard a number of reports that people are confused about how to complete questions relating to disability in the new census, as these are framed using the term ill-health. Many people have stated that they are disabled, and not suffering from ill-health. Developing measures and data collection methods in coproduction with disabled people can help to mitigate such issues and support appropriate and accessible data and research.” (Scope).

Participants highlighted the benefits of using qualitative and mixed research methods to provide more context and insight about inclusion and personal characteristics. Additionally, where quantitative analysis is limited due to small sample sizes, several participants suggested that more weight be given to qualitative approaches. This is also described in the section on extending outreach and representation.

“Limitations of survey evidence can sometimes be countered with qualitative research that can explore the rich detail missed in quantitative data collection. For example, when collecting data on other related features of social identity and participation, such as religion, more nuanced information on religiosity and religious practice is not collected, meaning that the entire picture is not captured in survey data.” <(Individual)./blockquote>

However, it was advised that qualitative approaches should be fully examined for forms of structural exclusion or power dynamics such as unconscious bias in the assumptions and framing of research objectives. This links back to the issues around lack of diversity within the research profession, and the need for better representation and cultural awareness within the social research workforce.

Longitudinal data and data linkage were said to provide a fuller picture of the dynamic nature of inequalities, although wider topic coverage is required in existing data sets to facilitate this.

“To study inequalities among subgroups of the population (such as women and men, people with disability, ethnic groups, etc.) in a comprehensive and meaningful way, we need data which contain longitudinal information on individuals’ characteristics (such as sex, ethnicity, disability), family background characteristics (parental education, occupation, income), education (qualifications and training), labour market outcomes (employment status, occupations, earnings) and other life outcomes (such as health).” (Individual).

Participants raised concerns around survey sampling, with solutions to under-representation within sampling frames offered, such as boosting and oversampling.

“We acknowledge the inherent limitations around population size which lead to the limitations of sample size in national surveys. However, we think that there is a strong case for much more frequent uses of over-sampling and boosts for different ethnic groups, which will give us a richer and more accurate image of the situation across the country.” (Centre for Aging Better).

It was noted that guidance was needed on how to deal with both over and under-representation.

“All survey methods are likely to over represent certain groups and  underrepresent others. It would be really useful to have clear guidance from the UK Statistics Authority of how limitations of representativeness can be managed if using online surveys. For example, should a new standard be being set in over representing traditionally under-represented groups?” (Natural England).

Participants also shared perceptions of exclusionary data collection processes and practices. Some organisations noted that common modes of data collection and survey sampling tend to exclude specific groups. The shift to mixed-method and online approaches to research due to the COVID-19 pandemic were also believed to miss out those who are digitally excluded.

“If data are collected online only this will exclude older people who are not online, who are disproportionately older and more socioeconomically disadvantaged.” (Age UK).

“[Online] data collection modes have lower response rates and may result in increased biases due to non-response, thus the views of specific population groups may be even harder to ascertain. Online surveys, by design, exclude the offline population, so should always be combined with another survey mode, to ensure that the data collected is fully inclusive.” (Local authority).

Participants raised further issues relating to survey sampling and data collection which may result in the exclusion of disadvantaged groups. For example, survey samples often focus on private households, excluding those with different living circumstances, such as people in temporary accommodation, residential care facilities, or without a fixed abode. The Refugee and Migrant Children’s Consortium called for further efforts to address this, as counting all children, including undocumented children, is vital. Household surveys were also viewed as potentially generating inaccurate data as some people might not wish to disclose aspects of their identities when other members of their household are present, such as their sexual orientation. Additional sampling concerns were raised around people deemed unable to provide informed consent, such as those lacking mental capacity who are often excluded from surveys. It was advised that further efforts be made to obtain proxy consent wherever possible.

“All these restrictions can lead to underestimates of the levels of disadvantage, poor health, disability and need for care among older people.” (Age UK).

Participants also mentioned inclusivity issues around data collection for administrative purposes. Administrative data from service use was said to exclude those who do not, or cannot, access these services, rendering them invisible in the data. The voluntary nature of some administrative data response categories was also flagged as problematic by some participants. They noted that when people are given the opportunity to choose whether they disclose demographic information, response levels decline and sample sizes become too small to be used in analysis.

“This undermines the potential benefits that the information could provide. For example, response rates for data on ethnicity collected for claims for Universal Credit sometimes fall to under 50%.” (Charity organisation).

Back to top

Question design, categorisation, concepts, and definitions

Individuals and organisations outlined specific issues around survey questions and response options which restrict the collection of accurate, inclusive data.

“The current Personal Wellbeing measures (ONS4) do not work well for people with learning disabilities, and more validity and cognitive testing is needed to address this.” (What Works Centre for Wellbeing).

Problems with question design and response categorisation were said to cause exclusion of certain groups and render others invisible. Specific concerns were raised around the ethnic group categories being too broad, and therefore excluding groups, or limiting their response options. Some organisations were reported to still rely on the 2001 Census classifications, despite this classification framework excluding key ethnic groups.

“Ethnicity classifications using older Census classifications miss out some groups. COVID-19 analyses of NHS data and Hospital Episode Statistics are based on 2001 Census classifications that do not include Gypsy, Roma and Traveller, and Arab ethnic groups.” (Government department).

The use of “mixed” and “other” ethnic group categories was also highlighted as problematic, particularly with regard to ethnicity and health outcomes. As each category covers a broad range of individuals, there is a risk of masking any similarities and differences, both within and across categories.

“The tension rests in being unable to fully understand ‘who’ this category covers and which sections within it are disproportionately affected by COVID-19.” (Individual).

“Review of the continued suitability of the standard ethnic group classifications [is needed], from the starting point of how ethnicity is conceptualised.” (Centre on the Dynamics of Ethnicity).

There were calls among some participants to include additional categories within the ethnic group responses, particularly where surveys do not include a religion question.

“Over the last several years and increasingly over the last year, different services are using ONS ethnic group lists as part of their criteria for funding or planning. Without having an opportunity for Jews to tick a box for Jewish, we become invisible.” (Individual).

Related to this, a general lack of data on religion was highlighted, with expanded collection of religion data viewed as crucial for understanding specific groups.

“UKSA and ONS should strongly encourage the collection of data on religion wherever possible, as this is likely to be highly relevant to understanding the needs, preferences and identity of the population being surveyed.” (The Board of Deputies of British Jews).

With regards to the collection of data on disability, concern was raised about the conflation of disability with ill health within surveys.

“The basic problem is that disability is currently wrapped up within the definition of the Disability Discrimination Act (DDA). Typically, questions are along the lines of ‘Do you have a physical or mental impairment that impact your capacity to have a normal life for the next 12 months’. What this does is conflate disability with health issues. […] As a disabled person myself, there is a world of difference between someone who is disabled and someone with health issues.” (Individual).

It was suggested that use of the word “impairment” in tools which collect data on disability should be clarified and used only for describing either disability or ill-health. This also links back to the importance of co-production in question development to ensure relevance and appropriateness, as discussed in Data collection, research design and analysis procedures. Organisations called for disability data collection to shift from the medical model, which focuses on an individual’s impairments or differences, to the social model of disability focusing on barriers which may limit their participation in society.

“Data on disability should also go beyond looking at just functioning and exploring it further to focus on participation or barriers to participation.” (Charity and voluntary organisation).

Additionally, some participants felt that the definition used in the Measuring Disability for the Equality Act 2010 Harmonisation Guidance, often adopted across government surveys, may have led to an increase in disability prevalence.

“The socio-legal nature of the GSS harmonised definition of disability means that it has been subject to expansion. In order to unpick the effects of changing disability disadvantage from the effects of increased prevalence, a supplementary measure which is based more on a functional definition of disability, and less subject to the effects of changing social norms and values, is needed.” (Disability at Work).

The collection of data which separated out whether a person was deaf from whether a person was blind, rather than collecting information on whether they were deaf and blind, was noted by an individual participant. In addition, they suggested that disability should be further sub-divided into four categories.

  1. Physical – those with physical impairments, including many wheelchair users
  2. Sensory – those with sensory impairments, for example, deaf, blind
  3. Mental – those with mental or learning impairments
  4. Emotional – those with emotional or social impairments, for example, autism


Looking at concepts used to measure other protected characteristics groups, some participants noted that definitions of sex, gender and gender identity should also be clearly distinguished. There was concern that these different concepts may be conflated in data collection, impacting the utility and meaning of the data. Some noted that making distinctions between these concepts clearer would better enable monitoring of any inequalities facing those who are cisgender and those who are transgender.

“We are very worried about the trend to conflating sex with gender identity. It is important for us to have data separately about, for example, the employment experience of females or trans women.” (Fair Play South West).

There were differing views on the issue of self-identification of individual characteristics. For example, some individuals and organisations raised concerns about self-identification of sex. This was linked to the view that the focus should be on collecting biological sex information (sex at birth). They felt that conflating biological sex with other concepts such as gender or gender identity could lead to potentially inaccurate data, (for example, crime rates by sex), the ability to carry out statistical analyses to answer research questions on sex discrimination (for example, within sport), and an overall inability to ”accurately reflect outcomes for males and females, and for transgender women and transgender men” as any differences between these groups would be masked.

In contrast to this position, others felt that self-identification is fundamental to collection of data on individual characteristics, and that this should be acknowledged.

“All data about an individual’s identity characteristics is self-identified. The UK Statistics Authority has a role to play in addressing the spread of misinformation about the concept of ‘self-identification’ and the incorrect view that this only relates to data about trans women and the collection of data on gender and sex.” (Individual).

One participant commented on the fluidity of gender and how self-identification was important for inclusivity.

“Gender is fluid in everyone. We should be breaking down societal gender bias, not reinforcing it.” (Individual).

In relation to the collection of sex data, there was also concern about the lack of harmonised measures for certain characteristics, especially variations of sex characteristics (sometimes referred to as intersex). The lack of standard data collection approaches contributed to the view that these populations may be overlooked. This is further discussed in Findings on the inclusivity of existing data and evidence. Linked to this was the idea that trust and engagement in data collection may also be impacted when participants are not provided the opportunity to select response options which align to their identity.

Another issue raised with measurement of personal characteristics relates to age definitions and categorisations. Some described the inconsistent grouping of age bands across organisations as negatively impacting the utility of the data as it obscures differences between age groups and results in difficulties producing relevant and effective analyses. This was noted as particularly affecting data for children and young people, due to inconsistent definitions and large age groupings.

“We are often unable to answer questions in a satisfactory way due to the ways in which current data are presented. Notably in inappropriate age bandings. This relates to a general inconsistency in definitions of a ‘child’ or ‘young person’ within agencies and organisations. […] young people are grouped into large bands (such as 0-18 or 16-24), which obscures nuances between different age groups. We recommend all data are reported in quinary age bands (0-4, 5-9, 10-14, etc.).” (Association for Young People’s Health).

Finally, the limited scope of “Protected Characteristics” was another issue identified raised in relation to classifications. Many groups who may also experience discrimination, exclusion, or greater risk of disadvantage are not included in the Equality Act (2010) and therefore may neither be protected by this equality legislation in Great Britain, nor be a significant focus of research on inequalities. Examples of this included socioeconomic status, immigration, employment, and housing status.

“This may mean that data about them is not collected when ‘protected characteristics’ set the boundaries of societally acceptable and unacceptable discrimination.” (Charity organisation).

Back to top
Download PDF version (854.36 KB)