Findings on the appropriateness, availability and quality of data

Timeliness of data

Interview and roundtable participants discussed a range of problems with existing and future data, making it challenging to meet their needs for research, policy and decision-making. Timeliness of data availability was one of the issues identified.

“Because quite often a lot of data is reported on a quarterly basis, so you were then looking for markers that you could look to try and understand, was there a change where there were more people likely to become redundant. […] The data was not frequent enough to be able to tell if there was a change in trend in the timeframe that we needed it.” (Northern Ireland Executive participant)

Census data were said to be widely used among government participants; however, they become out of date very quickly. Additionally, concerns were raised around the timing of the 2021 Census and how the data will be impacted by the current landscape.

“We know people’s lives have changed in a huge way since the pandemic, but we can’t even tell if that census data will in any way represent the people that live in London at any other time, other than March this year.” (Central government participant)

“The basis for so much of the work, and if it’s been done at a particularly unusual time in the labour market, and not just the labour market, socially and everything as well, then I do have some concerns about that.” (Non-metropolitan local authority participant)

Data access

Inadequate ease of access to, and availability of existing data were said to undermine inclusivity, as indicated in the methods theme, with reference to data linkage. Local government participants highlighted the issue of data being dispersed among various organisations, with differing access requirements. Hurdles put in place by some organisations to gain access to data were described as being more complicated than information governance guidelines require. National public databases, such as those provided by the Office for National Statistics (ONS) Secure Research Service, were described as difficult to access due to the limitations imposed, such as the need for a specific research proposal and the time required to determine whether data are suitable for use.

“You have to have the funding in order to get the access, but then it takes ages once you’ve got the funding, in order to get the access, and it’s not guaranteed that you will.” (Academic participant)

This resource intensive process was reported to place severe restrictions on the amount of research that can be done. Participants from the combined authorities mentioned the potential usefulness of numerous HM Revenue and Customs (HMRC) administrative datasets, which could be used at local levels to consider issues surrounding wealth inequalities and long-term unemployed people with disabilities. However, this was described as counterproductive within a policy responsive atmosphere.

“But the toing and froing over what you could do with it, how you could use it, and that process is quite lengthy, and every chart has to get signed-off.” (Combined authority participant)

Additionally, data formatting was highlighted as a hinderance.

“It’s not a matter of data not existing, but a matter of the researcher not being able to access it in the format they need.” (Research funding organisation participant)

Sample sizes and granularity

Small sample sizes within survey datasets were identified as a barrier to data being sufficiently granular to meet user needs. This was described as particularly challenging when trying to understand characteristics of a population, such as ethnicity, at a local level. For example, disclosure issues can occur due to small sample sizes when trying to breakdown data by personal characteristics. Participants found that they were unable to get the granular breakdowns they needed without grouping multiple minority groups together, which was said to hinder intersectional analyses and understanding, particularly when attempting to explore meaningful ethnic breakdowns in data.

“When you want to look at issues around intersectionality, that’s almost impossible to do in surveys.” (Welsh Government participant)

“If you’re doing white, non-white, you might be okay” but that “if you wanted to look at within London versus outside of London, let alone England versus Wales or Scotland… for ethnic minority groups… there’s no chance of doing it in the surveys… it would have to be census.” (Academic participant)

Issues around granularity in household survey data were also highlighted.

“How the cost of living is different for a household with a disabled person versus a severely disabled person or bringing in things around region and ethnicity and being able to look at combinations of those factors.” (Learned society participant)

Additionally, local government participants explained that local policy decisions were often made based on national level data, due to the quality and small sample sizes of local level data.

“The data that we’re able to pull down from national sources…. isn’t representative of what’s happening in those areas.” (Combined authority participant)

For the devolved administrations, concerns were raised regarding the available data for each country and how these can be used. Scottish Government participants questioned the extent to which data published for England and Wales are relevant to Scotland; while Welsh Government participants noted difficulties finding data that solely represent Wales, often being combined with England. For example, data from the England and Wales Longitudinal Study was said to have sampling fractions too small to undertake effective analysis on issues relating to Wales. One academic stated that there is not enough information available to make informed policy decisions. Participants from Northern Ireland emphasised the need for their unique political and legal structures to be recognised within UK data and statistics, and that sensitivity was needed towards Northern Ireland’s history, described as a very contested space on human rights and equality.

Administrative data

Although administrative sources were generally seen as beneficial and useful for more timely data, certain participants noted that they often do not capture the information needed and advised that they be used with caution.

“There is a fine line capacity to what you can get out of administrative records.” (Learned society participant)

Administrative data were seen to not always enable an understanding of background characteristics or what is happening at the micro-data level. Some protected characteristics, such as ethnicity and disability status (as defined by the 2010 Equality Act), were identified as lacking within other important administrative datasets such as death registers and GP healthcare records. This was said to make it difficult to answer questions, draw any conclusions, or understand differing impacts or experiences from the data.

“There is no data on the victim characteristics in Police Recorded Crime… you can get down to the level of offence, but we can’t tell whether or not those are different for protected characteristics.” (Central government participant)

For these reasons, academic participants suggested that there may be general over-reliance on census data. They further highlighted that while census and administrative data are valuable, they lack important subjective perspectives about how people feel or how they view the world. It was therefore recommended that administrative records be more effectively used when linked to other data sources, such as surveys.

Data gaps

A wide range of existing data gaps were identified, which were said to undermine inclusivity in data and evidence. Academic participants highlighted income as a key gap in census data, resulting in analysts having to use bank account data and other creative measures; which were described as less ideal. The lack of longitudinal data on income was said to prevent greater understanding of social mobility.

“So you could have 10% of the population poor in two years. And [whether] it’s a different 10% of the people or the same 10% of the people and the policy implications of that are hugely different. […] Like what’s keeping people in these groups and what’s stopping them from getting out. I think that’s a really important agenda to economists.” (Academic participant)

Additionally, learned society participants noted that, with incomplete understanding of issues such as changes to cost of living in different areas, it is “harder to make a policy in a reactive and appropriate way.” Concerns were raised around the lack of available data on income levels among caregivers and the scale and scope of children acting as carers. Data gaps were also identified in relation to informal care.

“When you see the evidence that is available on the scale of informal care, it is absolutely massive, and if you compare that to what we actually know about that care in terms of who’s doing it and what types of care they’re providing and so on, it’s just a huge mismatch there.” (Research funding organisation participant)

Local government participants highlighted recurring gaps in local data, which were said to inhibit understanding of “gateway communities” (people moving between different local areas such as between home and work). These communities were seen as particularly vital to capture to effectively represent the dynamics of local areas.

Data bias

Bias in training data and the effect this has on machine learning algorithms was also highlighted as an issue by one learned society participant, who noted that machine learning systems have been trained on unrepresentative data.

“There are challenges when machine learning systems are trained on data [which] perhaps may be not representative and might have biases in it, and therefore may have gaps in terms of not covering the needs of certain groups or may represent certain groups unfairly due to existing structural biases in society affecting the data that’s collected through different processes.” (Learned society participant)

Strategies to addressing data issues

,p>To address data issues, participants proposed establishing a data source comparable to the census that is recorded more often, as census data are often not recent enough for policy analysis.

“There is a fundamental need to have something that looks like census data more often.” (London borough participant)

Non-metropolitan local authorities requested that timely small area datasets which cover social and labour market statistics be produced and made accessible.

“So more small area data, which I know is hugely expensive, but that’s the wish list… so that we’re not relying on ten-year census data.” (Non-metropolitan local authority participant)

Participants recommended adding and maintaining accessible and user-friendly interfaces for extracting local data, such as NOMIS (an ONS service providing access to official UK labour market statistics) and removing or easing the requirements for local government and civil society organisations to access data from central government. It was said that the ONS could provide a role of bridging the gap between central government and local government to reduce reliance on individual central government departments for specific datasets.

“Creating a place where people could at least learn to see how their datasets relate to a set of other datasets somewhere else, and how they’ve built up and developed, that would be a step forward.” (Central government participant)

It was also suggested that investing to increase sample sizes in surveys would help to improve the geographical granularity and quality of the data at local levels to accommodate local decision-making.

“The small sample sizes will require a big investment to be able to collect more data.” (Learned society participant)

A big step-change was said to be necessary to better coordinate and combine locally held datasets to build a national picture. An academic participant suggested that while research on vulnerable populations, such as people with mental health challenges has been possible at a local level due to collaboration between academics and individual trusts, this did not provide a national picture.

“Being able to do that [research] in terms of understanding national trends and what’s really happening rather than sort of pockets of deprivation and deep kind of relations, we really need a step change in a lot of those areas.” (Academic participant)

Improving the quality, accessibility and use of administrative data sources was advised.

“There is a need to really optimise the administrative data sources. We need to streamline them. We need to make them transparent in the way they’re constructed and the way they are used.” (Central government participant)

It was noted that despite access issues for these datasets, they do have great potential. Creating new sources of administrative data was also suggested by academic participants, such as using financial data and mobile phone data. However, it was warned that holding these data may be accompanied by ethical and privacy issues and may not capture certain groups.

“Administrative data may capture more of some of the groups that we’re interested in than the surveys do.” (Academic participant)

“Particularly relevant in something like a pandemic and when you’re trying to track the impacts of policy interventions and trying to look at things like how people’s behaviours have changed, how people move around, [what the data] tells you about things like transmission patterns, etcetera.” (Academic participant)

It was suggested that developing a greater understanding of data gaps and prioritising how to fill these data gaps is increasingly important. A complete “data gaps exercise” was advised.

“Undertaking a comprehensive gaps exercise to assess the statistical requirements…of the government’s equality agenda, the Equality Act for example, and all the protected characteristics.” (Learned society participant)

A learned society participant recommended reviewing biases based on historical data fed into machine learning algorithms for decision making; and improving privacy enhancing technologies to lower the exposure risk for certain minority groups and better enable the use of data. An example of this would be using differential privacy techniques such as creating synthetic data, which would help create an accurate representation of society without using personal information.

Back to top
Download PDF version (1.26 MB)