Findings on the inclusivity of methods for data collection and analysis


Issues resulting from data collection methods were discussed in depth. The approaches used for data collection and to involve individuals in research could reportedly impact the extent to which certain groups are reflected in the data. For example, several participants highlighted that that due to the coronavirus (COVID-19) pandemic, many surveys have been adapted to telephone or online data collection. However, many people do not have access to a landline or internet connection, and these modes could generate charges for the research participant. Additionally, surveys were described as inaccessible to some due to language, wording or format. People who are older, live in rural areas, have learning needs, are not well off, and without fluent English were identified at greater risk of being unable to participate in online research for these reasons. Digital exclusion was highlighted as an important consideration for data collection and the presentation of findings. Participants highlighted the consequences resulting from a limited understanding around the digitally excluded population, in terms of the areas that are more likely to be excluded and the contextual factors surrounding this.

“We don’t know whether or not we’re getting a sensible view on our policies, on what we’re doing, on whether we’re even targeting the right people.” (Combined authority participant)

“I just don’t know whether we’re really on top of what this means for getting the digitally excluded in our surveys and in our data.” (Academic participant)

Participants noted the neglect of certain populations in official data collection pathways. A Northern Irish Executive participant highlighted that many groups who have dropped out of official channels are already facing marginalisation, such as young people not in education, employment, or training, or non-private household populations, including individuals in institutions or experiencing housing difficulties.”

“The bias is just reproduced, because if you don’t get people into the data on which then other decisions are made, it reinforces the exclusion.” (Central government participant)

Reaching under-represented groups

Additionally, lack of focus around the primary issues facing these groups, and consequently not asking the right questions, was identified by a research funding organisation participant as a major problem in survey design. This was said to result in an increase in non-response rates and data gaps, as well as potentially impacting communities in the way they respond to research, and the way that they’re involved in research going forward.

Retention and attrition rates among under-represented groups were highlighted by academic participants as particularly problematic.

“Inclusion and retention of hard-to-reach populations in national, nationally representative datasets, and nationally representative longitudinal datasets is a major goal.” (Academic participant)

Attrition rates were reported to often be high among under-represented groups, but a lack of statistically sound methods to deal with the biases that this dropout causes could reportedly result in inaccurate assumptions being made. A research funding organisation participant highlighted that longitudinal studies, such as birth cohort studies, are particularly susceptible to issues around attrition, and as under-represented groups are also more likely to drop out.

“We have those kinds of challenges of making sure that we’re maintaining continuity over time but while also wanting to be able to address some of the issues around [attrition].”(Research funding organisation participant)

Religion was mentioned as a characteristic which is rarely captured in data collection practices. It was highlighted that not collecting religion data could render some groups such as Jewish and Sikh communities invisible within data. Participants from the academic and government groups described the voluntary nature of the census religion question as problematic, as you cannot rely on the total number provided for various religious groups.

“The last census said there were about 430,000 Sikhs for example, but that is how many answered the voluntary questions, so how many did not? We do not know.” (Learned society participant)

Data linkage

Data linkage was proposed as a strategy for capturing the relationship between different factors and characteristics, and overcoming challenges arising from a lack of intersectional data. This was described as particularly important due to the multidimensional nature of many personal characteristics, which policy makers seek to capture in data. However, academic participants described barriers to attempts at data linkage, including pushback from organisations on sharing and access agreements for administrative data, which could prevent particularly useful analyses from being undertaken. An example was provided of a previous research project which linked 15 years’ worth of data on homeless individuals from local authorities to significant health records over the same period. While this process was said to be extremely useful for their specific research objectives at the time, there were challenges with wider access and use.

“The data could only be put together for that one project and then it would be destroyed or nobody else could access it to look at anything else.” (Academic participant)

Local government participants described the resource constraints that they faced when processing data, given their limited analytical capacities. Participants stated that reasonably affluent councils and those with strong research teams may be resourced to acquire and analyse the local level data needed themselves, while others cannot. This was said to be particularly challenging for multivariate analyses, where datasets are overlaid to capture intersectional characteristics.

“It was easier when you had regional statisticians in place, because you could go to them, they understood the area, they’d built up that tacit knowledge and also, they knew who to talk to when things had to be checked…that function has gone almost completely now.” (Combined authority participant)

Solutions to methodological issues

To address methodological issues, participants suggested boosting sample sizes by oversampling to specifically target under-represented groups, enabling personal characteristics to be better captured. Aggregating years together to increase sample sizes was also considered as a potentially useful solution, although there were concerns that this might require a trade-off between timeliness and accuracy of data.

Participants recommended identifying what is important to communities and then putting specific mechanisms in place to measure these, so that the data that are produced focus on the questions these populations are interested in. Qualitative research was suggested to better define the parameters that are used within data collection tools.

“[Introducing an] advisory group to actually pose the questions that the data would answer, so that we know what we know, and we know what we don’t know.” (Welsh Government participant)

Several participants suggested that efforts be made to ensure that questions are asked in a manner that allows all sectors of the community to understand and respond to them, to avoid excluding people or producing distorted results through the use of inaccessible language or data collection modes. Research funding organisation participants called for consistent guidance across the board for data collectors on what questions to ask and how to ask them, so that approaches are appropriate, consistent and sustained.

“To generate good sustainable data, that is not just relevant for now, but could be relevant for questions we need to ask in the future.” (Research funding organisation participant)

Improving accessibility for people routinely excluded from data collection was said to be a necessity for inclusive practices. Examples of potential strategies include:

  • using printed materials and providing internet connection as part of a survey incentive for digitally excluded populations
  • translating survey instruments into different languages
  • distributing paper questionnaires in day centres to reach older populations
  • systematic surveys of people living in institutions

However, academic participants noted that that different best practice approaches would be required across settings, due to the differing issues faced between institutional populations.

“I think you need to develop strategies that are about how you engage with [under-represented groups] on their terms, in a way that they can engage with, which may be less efficient from our point of view but is necessary.” (Welsh Government participant)

Finally, reducing silos across government and organisations to improve data sharing and the capability for more data linkage to take place was suggested to allow for intersectionality of personal characteristics to be better captured. An academic participant outlined that if researchers were able to better access administrative datasets and were able to link data the gains from doing it are going to be major.

Back to top
Download PDF version (1.26 MB)