• 12:30 – Introduction (Sir Bernard Silverman)
  • 12:35 – Secretariat update (Tom Tarling)
  • 12:45 – EAP225: Research to evaluate and explore the implementation of an indexing first approach to data linkage (Sarah Cummins, Nick Mavron, Esther Lewis)
  • 13:30 – EAP226: Developing Quality Methods to Identify and Measure Error in the Demographic Index: Trialling a Random Forest model to estimate False Positive Cluster (FPC) error (Rosalind Archer, Peshali Diyasena, Tom Hunter)
  • 14:30 – Break
  • 15:00 – EAP228: Producing disability estimates for England using predictive modelling and administrative data (Charlotte Standeven, Jesse Ransley, Sarah Wood)
  • 16:00 – EAP229: Social Survey Strategy scoping – (Neil Bannister)
  • 16:40 – Any other business and close

Panel members

  • Sir Bernard Silverman (Chair)
  • Ana Basiri
  • Oliver Duke-Williams
  • Nik Lomax
  • Natalie Shlomo

Other attendees

  • Jakob Schneebacher (Observer)
  • Sarah Cummins (Presenter)
  • Nick Mavron (Presenter)
  • Rosalind Archer (Presenter)
  • Peshali Diyasena (Presenter)
  • Tom Hunter (Presenter)
  • Charlotte Standeven (Presenter)
  • Sarah Wood (Presenter)
  • Jesse Ransley (Presenter)
  • Neil Bannister (Presenter)
  • Owen Abbott
  • Tom Tarling (Secretariat)
  • Dale Williams (Secretariat)
  • Susan Williams (Secretariat)
  • Dana Seman-Bobulska
  • Yinka Lawal
  • Charlotte Bradley
  • Alani Odunlami
  • Emma Grant-Holt
  • Lois Garang
  • Matthew Minifie
  • Wajiha Munir
  • Emma Sharland
  • Tolu Adedire

1. Introduction

  1. The chair introduced a new MARP member, Jakob Schneebacher, who would be observing the meeting ahead of joining the panel for future meetings.

2. Secretariat update 

  1. ONS presented updates on actions, and activities of subgroups since the previous meeting.
  2. Action 120: The chair requested the panel discuss the ONS Survey Strategy, adding this topic to the forward agenda.
    • ONS provided a paper in December via correspondence, which was to be discussed further in this meeting. This action can be closed.
  3. Action 121: The panel suggested the 2027 test online questionnaire for Census is made available and kept for research purposes.
    • ONS passed this on to the relevant business areas. This action can be closed.
  4. Action 122: Following MARP40’s statistical disclosure control item, the panel recommended including discussed risk on the risk register.
    • This risk has been logged, and the statistical disclosure control team continues to review future risks of high-powered computing. This action can be closed.
  5. Updates were given for MARP subgroups on Migration and the Dynamic Population Model. The panel discussed the structure and operations of the subgroups, agreeing they were functioning well.
  6. The panel discussed the activity of the MARP subgroup on Labour Market Statistics. ONS stated further updates on the Transformed Labour Force Survey (TLFS) will follow in future.
  7. ONS provided a general update, including the recruitment campaign for National Statistician and the next steps planned from the Census Task Force. Further updates will follow.

3. EAP225: Research to evaluate and explore the implementation of an indexing first approach to data linkage (Sarah Cummins, Nick Mavron, Esther Lewis)

  1. ONS presented the paper. The panel advised that the proposed research questions should also consider changes in the indexes across time.
  2. The panel discussed the benefits and limitations of linking data using indexing-first and bespoke models, stating it would be helpful to set a provisional minimum quality requirement and measure approaches against this. ONS replied it was a challenge to set a quality target, as there would be different quality needs across different users and applications. The panel suggested publishing error rates.
  3. Discussions on bespoke models agreed they would produce higher quality linkages than indexing-first, however are resource intensive and take a long time. The panel commented that scalability and meeting minimum quality requirements was a key focus. The panel commented that some minority populations may need bespoke approaches, but more research and refinement was needed.
  4. The panel commented that consideration of linkage performance for poor quality datasets was needed. The panel therefore recommended ONS add further examples of such scenarios to the work.
  5. Concluding, the panel agreed the ONS could use case studies, and small simulation studies with synthetic data. It also suggested the importance on how linkage quality filters through to final statistical products. It suggested theoretical work in this area would be beneficial, but this was a wider academic problem.
  6. The panel thanked ONS for the paper and discussion.

4. EAP226: Developing Quality Methods to Identify and Measure Error in the Demographic Index (DI): Trialling a Random Forest model to estimate False Positive Cluster (FPC) error (Rosalind Archer, Peshali Diyasena, Tom Hunter)

  1. Before the presentation, the panel suggested an academic referee may be beneficial to look at the specific method of Random Forests used in the paper.
  2. ONS presented the paper.
  3. The panel asked how the DI was created. ONS responded it links data from various sources to create clusters each representing an individual. Each cluster is given a unique ID. The panel questioned if there were better variables to cluster with, perhaps using algorithms to identify them. The panel also questioned what blocking variables were used. ONS responded there was some blocking within the DI but would examine this further.
  4. Discussing the Random Forest method, the panel questioned the removal of variables via logistic regression. ONS commented the model struggled with quantity of variables prior to removal. The panel suggested this was a problem with the data rather than the model, and that removing variables removed more than just noise. ONS clarified some removed variables were due to collinearity. The panel suggested alternatives such as LASSO, XGBoost and examining AI Large Language Model capabilities.
  5. The labelled data not being random was highlighted as a problem by the panel, as some circular selection can result. The panel recommended ONS test using out of sample data.
  6. Finally, the ONS presented a case study showing false positive clustering was correlated with ethnicity, hence the desire for this work. These false positives could result in coverage issues and downstream errors. The panel agreed.
  7. The panel thanked ONS for the paper and discussion.

5. EAP228: Producing disability estimates for England using predictive modelling and administrative data (Charlotte Standeven, Jesse Ransley, Sarah Wood)

  1. ONS presented the paper.
  2. The panel discussed the social model of disability. It stated an important consideration for the work was what the data would be used for.
  3. The panel observed the quantity of variables ONS used should be cut down. It recommended sensitivity testing to understand impact of removing variables, and to develop an approach justifying inclusion of variables.
  4. The panel noted there would be a mismatch between self-reported disability, as in the Census, and estimated disability from administrative data. The panel also noted data gathered during the Coronavirus pandemic could cause an issue, as observations in this time might well be outliers.
  5. The panel observed the choice of data would come back to definition, and there would be biases in certain administrative data’s coverage.
  6. The ONS presented quantifying uncertainty methods, and invited further suggestions from the panel. The panel said it was worth reviewing novel research on-going in academia and the Italian statistics office. Discussions noted Bayesian approaches would be useful to explore.
  7. The panel noted that the bootstrapping methods ONS presented were not suitable for non-random administrative data, with the error coming not from sampling error, but other types of error such as linkage and definitional bias. More widely, the panel stated that confidence intervals would be beneficial.
  8. When considering uses of survey data, changing phrasing of disability questions to align with the measurement being sought would be important, the panel echoing earlier considerations of what data was to be used for. ONS agreed, and would examine the impact of question phrasing by comparing between Census years.
  9. The panel thanked ONS for the paper and discussions.

5. EAP229: Social Surveys Discussion

  1. Following submission and comment on this paper in December, ONS followed up on panel comments and discussion items.
  2. The long-term plan of 5-10 years was commented to be a very long time, and challenges such as survey response rates and opportunities such as emerging technologies would evolve significantly in this timeframe.
  3. The panel discussed potential benefits of mandatory surveys, or surveys as part of the existing census delivery.
  4. Discussing strategy, the panel observed ONS would, regardless of intent, influence matters it observed and measured. As such, a focus on important topics may be better than expanding widely. The panel suggested ONS consider the range of government statistics being collected, and whether ONS were gathering too many. Conversely it noted the advantages of large panel surveys with a wide range of topics that couldn’t easily be replicated in random sample surveys.
  5. Harmonisation was discussed, with the panel noting surveys impacted UK policy and not just England and Wales where much data was gathered. Comparability and harmonisation were important topics.

5. Actions

  1. No actions were raised at the meeting.

The papers that informed this meeting are attached as a PDF document for transparency. If you would like an accessible version of the attached papers, please contact us at authority.enquiries@statistics.gov.uk