- 10:00 – 10:05 – Introduction – Sir Bernard Silverman
- 10:05 – 10:20 – Census update – Jon Wroth-Smith
- 10:20 – 10:30 – Actions Update – Gareth Powell
- 10:30 – 11:10 – Census Benefits Methods Review – EAP180 – Emily Knipe
- 11:10 – 11:30 – Social Statistics/Characteristics Research Update – Presentation – Michael Cole
- 11:30 – 11:45 – Break
- 11:45 – 12:25 – Ethnicity statistics using GSPREE – EAP181 – Zoe Sargent, Alison Morgan
- 12:25 – 13:05 – Methods for producing multivariate population statistics using administrative and survey sources – EAP182 – Michael Cole/Valentina Gribanova/ Alison Whitworth
- 13:05 – 13:50 – Lunch
- 13:50 – 14:40 – Quality work for Demographic Index (MDQA) – EAP182 – Rosalind Archer
- 14:40 – 15:00 – Update – Statistical Population Dataset (SPD) – Presentation – Justine McNally/Ann Blake
- 15:00 – 15:10 – Break
- 15:10 – 15:50 – International migration estimation – EAP183 – Brendan Georgeson/ Dominic Webber
- 15:50 – 16:30 – SPD Estimation options paper – EAP184 – Eleanor Law/Katie O’Farrell
- 16:30 – 16:40 – AOB- Sir Bernard Silverman
Panel Members
(In person)
- Sir Bernard Silverman (Chair)
(via teleconference)
- Prof Ana Basiri
- Dr Nik Lomax
- Dr Oliver Duke-Williams
- Prof David Martin
- Prof Natalie Shlomo
Office for National Statistics
(In person)
- Gareth Powell (ONS Secretariat and Presenter)
- Owen Abbott (ONS Lead)
- Emily Knipe
- Kerry Gladstone
- Alison Morgan
- Zoe Sargen
- Rosalind Archer
- Justine McNally
- Ann Blake
- Pereira, Elizabeth
- Brendan Georgeson
- Eleanor Law
- Ceejay Hammond
- Katie O’Farrell
- Matt Plummer
- Hannah Willmett
- Valentina Gribanova
(via teleconference)
- Becky Tinsley
- Justine McNally
- Cal Ghee
- Charlie Wroth-Smith
- Michael Cole
- Alison Whitworth
- Dominic Webber
- Laura Cheatham
- Rob North
- Cristina Spoiala
- Stephan Tietz
- Joanna Harkrader
- Lesley Brennan
1. Introduction
- The chair welcomed attendees to the session.
2. Census Update
- ONS provided an update on recent Census releases and those coming up.
3. Actions Update
- Action 94, Michael Cole et al. to present a paper to complement the “Producing Social Statistics from Admin Data” slides presented at MARP26. This was brought to this meeting, so item closed.
- Action 93, MARP secretariat to make the “Producing Social Statistics from Admin Data” slides available for panel review via correspondence. This was complete, so action closed.
- Action 92, Sir Bernard Silverman to draft a letter outlining the need for an assessment of the impact of admin data supply issues on ONS producing statistical research towards and beyond the 2023 Recommendation. This was complete, so action closed.
- Action 91, ONS to investigate and evidence the public cost of receiving hashed DWP data. Work is ongoing on this, and the panel will be updated as it progresses. Action closed.
- NEW ACTION: Sir Bernard Silverman and Owen Abbott to review the MARP terms of reference by the next meeting.
4. Census Benefits Methods Review – EAP185
- ONS gave an overview of the paper, focussing on the key questions.
- The panel noted that each of them would tend to focus on their own area, but it would be important to ensure that double counting was avoided as benefits could be cross-sector or interdependent.
- The panel suggested using data on use of the Census 2011 data, for example UKRI could give information on users from that source. The size and number of projects could therefore be looked at.
- ONS noted that they were looking at benefits to the voluntary and community sector and academia for the first time, and so suggestions on wider stakeholders was appreciated.
- The panel as it currently stands had reservations on their expertise in this area, especially with respect to whether best practice had been followed.
- In particular, ONS sought the panel’s view on valuing non-monetised parts. Recognising that international examples where this was attempted were rather arbitrary, the panel suggested that ONS could provide one figure for benefits that are more readily quantifiable, which would be comparable to the forecast. A separate value could be estimated for the less quantifiable areas, acknowledging the challenges in calculating this.
- The panel noted that the method for assessing benefits to the private sector involved scaling up from a single respondent for each industry, which could mean that different views are not very representative. However, the use of additional surveys (as planned) will help validate this.
- It was noted that it was particularly difficult to assess the value of Census 2021 given that the future of population statistics was currently being decided. How do we reflect the decay of the Census when it’s being potentially replaced and/or supplemented by the future work? It was felt that the answer was unclear given the number of possibilities, and that the issue was best noted without quantification.
- ONS thanked the panel for their discussion, and the need for further expertise and guidance was noted.
5. Social Statistics/Characteristics Research Update – Presentation
- ONS gave a presentation, giving an update of work in the area since the presentation in May 22, including the first release by ONS of multivariate statistics based on administrative data.
- The panel asked ONS to think about being able to affect the data collected – for example, ensuring that ethnicity is collected, minimising missingness, and preferably in a consistent way. It was noted that this was a challenge when using administrative data.
- The panel noted that the EPC data on floorspace was said to have larger error – but could be that the EPC is correct. ONS responded that the model validates preconceptions and anecdotal knowledge, so this finding was not surprising given that knowledge.
- The panel expressed support for this research, noting that models and surveys will always be needed to fill the holes in admin data.
6. Ethnicity statistics using GSPREE – EAP181
- ONS gave an overview of the work in this area and the paper.
- The key findings were that the admin-based ethnicity statistics are closer to Census 2021 than the GSPREE estimates are for the majority of local authorities for the Asian, Black, Mixed and White ethnic groups. The GSPREE estimates are closer to the Census 2021 estimates for the majority of local authorities for the ethnic group ‘Other’.
- Overall, it was found that GSPREE does not appear to improve population estimates by ethnic group at LA level compared to admin.
- The panel suggested that some of the diagnostics may need thinking about, as the subtypes add up to 100%. In particular, the ‘White’ group is very large, and relatively small uncertainty there can lead to relatively large uncertainty around the smaller groups.
- The panel asked why the White group was not more broken down, and ONS clarified that the various sources did not have consistent sub-groups that could be used. The panel noted this as a potential issue.
- ONS agreed, and further noted that the harmonised standard was being looked at for change as well, and the panel noted that change in the categories collected across various sources would be an ongoing issue.
- There was some discussion about using other data sources as the national benchmarks, in particular Census 2021 estimates and ETHPOP from Leeds to help ethnicity spread. ONS will examine these and consider their use.
- In response to panel comments, ONS had produced estimates using 3 years of APS data. This did not change the conclusions.
- It was noted that there were some strange outliers in the Mixed category, where Census was low and both the admin-based statistics and GSPREE estimates higher. ONS will take this away and look into the data.
- The panel suggested that bias was a bigger issue than variance here. They also asked ONS to check the meaning of the coefficient of variance in this context, as they felt it was unclear.
- The panel asked about ONS using SPD totals rather than APS or other totals, and ONS noted that the approach was intended to represent what the method would look like going forward, rather than maximising the quality of the outputs as a one-off. ONS need to determine what the realistic quality of results would be, and then consider whether surveys need further improvement.
- After discussion, the conclusion was that GSPREE as a method was viable, but the use-case here (with ethnicity, using APS totals) was not helpful. It was agreed that this research should be paused, at least until new data is available.
7. Methods for producing multivariate population statistics using administrative and survey sources – EAP186
- ONS gave an overview, particularly on issues of interest to the panel.
- The panel raised the issue of solving each requirement in isolation, rather than an overall approach to creating estimates. Solving all need separately leads to very difficult reconciliation etc.
- The panel also noted that there did not appear to be a plan for a periodic large-sample survey to allow benchmarking. APS is unlikely to help much and a large supplementary survey is needed. As covariates aren’t on administrative data that is also a problem. The potential for a partial or rolling census was discussed, as well as a large survey.
- ONS confirmed that the different data collection options were still possibilities, and that the plan was to work out what administrative data was capable of and where there were gaps. The resulting survey data requirements was the next stage of work.
- The panel thought that a periodic check of some sort would be needed, particularly to fill in gaps in the data as well as a quality check.
- The panel queried whether combinatorial optimisation was the most appropriate choice of method to account for undercoverage of the admin population base or whether a logistic regression and weighting approach would provide more information about the quality of the estimates. The panel commented that combinatorial optimisation performs well when lots of benchmark constraints are used but it could be difficult to obtain these benchmarks from the administrative data alone.
- The panel thought that any MNR (missingness not at random) would be an issue with the approaches put forward, especially if using machine learning which could propagate bias. It was not clear where an optimisation approach would get good enough constraints. Again, a survey could help with this issue.
8. Quality work for Demographic Index (MDQA) – EAP182
- ONS introduced this paper with a brief presentation.
- The panel raised some questions around the UPRN. ONS assured that households with no UPRN link were not dropped. If records and linkage information were stored in a graph database, this would allow links between records to be used or dropped at later stages, with different thresholds set as required. The panel commented that while downstream processes may want this ability, users should probably not need to tailor networks as this could cause difficulties.
- ONS raised that as data is added, the clustering may become more complex and that graph databases might prove useful for holding and reconciling linkage information, to support clustering.
- The panel queried whether ethnicity could be used for clustering; but ONS said that it is not available on all input sources. However, in current work on the Refugee Cohort Study, ONS are looking at different algorithms to support better use of name fields for linkage as it is known that non-Western names pose problems for accurate data linkage.
- The panel asked whether the order in which data are added to DI was fixed – ONS said that it was not, and is a preliminary approach. PDS2016 was used as the base, as it has the broadest coverage, and other sources are applied chronologically. Simulation work will hopefully help to guide this approach.
- The possibility of fractional counting was raised, in order to deal with clusters where membership is unclear, as well as potential audit-sampling. ONS will keep both in mind.
- The panel queried whether economic activity could be better used, but ONS reported practical difficulties in doing so. It will continue to be considered as the data is improved.
- The panel raised the question of whether activity data might be used to support the build of the DI. ONS agreed that findings from other projects could be useful for developing the DI build.
9. Update – EAP180 Statistical Population Dataset (SPD) – updated paper
- ONS presented on the ongoing work in this area and showed some early results from the latest version of the SPD. The panel were interested to see the changes and updates and suggested some areas for particular analysis.
10. International migration estimation – EAP183
- ONS introduced this paper with a brief presentation.
- The panel queried how uncertainty could be assessed here. ONS explained that they were currently using Office for Statistical Regulation guidance to develop uncertainty measures. This includes looking at secondary sources and working with ONS methodologists on error estimation.
- The panel liked that the data allowed use of different definitions and being able to compare across those definitions, rather than only using the international standard as before.
- The panel suggested that the ‘challenge’ of users becoming used to revisions and adjustments was a feature, rather than a limitation. The panel also suggested some other potential data sources, and ONS intends to look at them.
- The panel asked about whether the modelled/administrative data approach is consistent with the IPS estimates where they are both available. ONS said that lots of coherence work had been done. It appeared that IPS generally showed fewer migrants, but that there was long-term consistency with the Census estimates.
- The panel also queried what assumption was used for whether students were working; was it assumed that migrant students worked at the same rate as other students? ONS assured that the rate of immigrant students currently working was used for the rate of new migrants.
- The panel asked whether use of record-level RAPID data was likely to be possible. ONS reported that while there were difficulties, there are applications for the data across ONS and so they are attempting to make it available.
- The panel also asked ONS to think about the representativeness of the IPS ports coverage, especially for particular groups. ONS said they would continue to think about this.
11. SPD Estimation options paper – EAP184
- ONS introduced this paper with a brief presentation and put forward the options for the panel to advise on. These were:
- Option 1 – a register with under-coverage and over-coverage error
- Option 2 – a register with over-coverage and negligible under-coverage
- Option 3 – census-like estimates using a high-quality list of usual residents within England and Wales with reference to any time point. This could use a small survey to audit the CPR.
- On the third option, the panel responded that this option may or may not be realistic, but this was not for methodological reasons and so that would be difficult to comment on.
- The panel suggested that a cross between a ratio estimate and weighting class approach may be feasible, and that other data sources – such as the Linked Consumer Register – may be worth investigating.
- It was agreed that while Option 1 should not be dropped, option 2 has had less investigation so should be the focus for now.
- The panel suggested that it is important to define the success criteria for estimation. The paper refers to the aspects of cost and sustainability as well as quality, but clarity would be helpful here.
12. Any Other Business
- The chair thanked everyone for making the effort to attend in person, especially given the transport difficulties.