3 – EAP168 – Statistical Disclosure Control for Census
ONS presented on statistical disclosure control. The discussion focussed on four areas: record swapping, Cell-key perturbation, disclosure rules, and transparency.
The panel question why record swapping occurred by proximity rather than for more statistically similar results. ONS clarified that this is due to records only being swapped within geographically based delivery groups. They further highlighted that this helps to prevent issues with origin/destination questions as the place of work variable is not affected when a record is swapped regardless of the distance from the initial address.
The panel suggested that Cell-key perturbation approach proposed by ONS would be similar to differential privacy. ONS clarified that the method does not strictly fit the definition due to the treatment of zero counts, which are perturbed less than other low counts and only positively. The panel commended ONS for the creation of noise at the moment of tabulation but noted that record swapping meant that perturbation itself was not as key as it otherwise would be. One suggestion from the panel was the use of a global privacy budget, whereby cell-key perturbation would be used only when disclosive tables are still present following record swapping.
The panel discussed the proposed disclosure rules. One comment highlighted the possibility of finding zero counts in published data but a non-zero result in microdata – something that would confirm the application of SDC in a particular record.
The panel suggested publication of multiple papers with varying levels of complexity, something ONS confirmed was already intended. The principle of disclosing parameters was further discussed, in particular the potential for their release in several decades time if appropriate. The panel endorsed this as an approach for future ONS Census staff.
The panel expressed concerns that much of the SDC depends on the table builder software itself, something which is likely to require updating over the coming years as technology advances. ONS explained that this issue is mitigated by the recreation of the results in two further methods (R and Python).
4 – EAP169 – Geography Maintenance Methodology and Plans for the 2021 Census
ONS presented on geography maintenance, one of the key deliverables being an update to Output Areas (OAs). Changes from this are intended to be minimal, in the region of 5% of OAs, to ensure comparability with 2011 results at OA level. ONS explained that one of the criteria in redrawing OAs is social homogeneity, albeit one that is only applied when the OA exceeds limits on population and number of households. The panel observed that this would indirectly increase the heterogeneity between OAs.
The panel expressed a concern about the ONS proposal to not have any OAs made up of a single Communal Establishment. In these cases, the suggestion would be to aggregate the CE with at least 40 households in the surrounding area to prevent the possible identification of an OA with the particular CE. The panel suggested that for the rare cases where this occurred, the addition of households would not themselves provide much disclosure control for the CE with disaggregation of outputs especially problematic in CEs with readily identifiable characteristics, such as all-male prisons. The panel identified both pros and cons of allowing CEs above a certain population size to be their own OA and acknowledged the reasons why this was not currently done.
The panel highlighted the importance of geography changes to the 2023 recommendation work currently ongoing. They raised the concern that it might be challenging to have output areas containing CEs and households if both are collected administratively.
The panel discussed a point implicit in the paper – the copying into their term-time address of students not at University due to the pandemic. In particular, the panel questioned whether information on copied students are recorded so that the impact of the pandemic on student populations could be studied directly. ONS clarified that they intend to be transparent on the number of students copied into their term-time addresses. The treatment of international students by this process was also discussed, with ONS clarifying that these will not be copied in the same way unless they were resident in the country on Census Day. This issue is partially addressed through imputation of residents where the student halls survey indicated an international student had a contract to stay there. The panel questioned whether any other groups would be treated similarly to students, and suggested comparison of 2011 and 2021 results to identify potential differences caused by this.
5 – EAP170 – Proposed Duplication Calibration method for the 2021 Census of England and Wales
ONS presented on the Duplication Calibration Strategy for Census 2021.
The panel sought clarification on the treatment of students who are not at their term time address as they would not be present for Census Coverage Survey (CCS). The point of international students was raised specifically, with many universities instructing students to respond remotely from abroad that then would not be present for the CCS. ONS clarified that the work does not focus on those that left the country between Census and CCS but rather the converse. As Census to CCS matching does not give many observations, the proportion of duplicates within Census-Census matching is instead used.
The panel reflected on variance shown before and after duplicate calibration and noted that in some cases this got worse. ONS relayed that duplication calibration decreased total bias even if it increased relative bias in some cases, further stating that model selection work will further improve variance across the domains.
The panel questioned the use of inverse sampling to achieve a required number of matches rather than automatically matching all Census records to each other. ONS explained that even using an automated checking algorithm there remains nearly one million candidate pairs to be clerically checked. The panel suggested a way of circumventing this would be to instead use the outcome of clerical checks to train a logistic regression model on the match score outputted by the automatic matching process. This would yield an estimate of the number of matches across all Census records with each other. The panel highlighted that the output of this process could be inputted in exactly the same way as proposed in the paper for use in duplication calibration. ONS agreed that this would likely increase the quality of the estimated number of duplicates.
Despite suggested improvements for how to better estimate the number of duplicates, the panel agreed with the overall approach proposed.
6 – Actions Update
The panel agreed to close action 82 on the basis that 2023 progress reports are now on the MARP forward agenda.
7 – Any other business
The panel discussed the draft annual report and agreed for this to be submitted by correspondence before the next meeting.