A weighting classes approach for estimating population size under ‘scenario 3’ (no field follow-up or CCS field collection)
Post MARP note: This paper represents initial thinking on COVID-19 impact when presented to the panel in May 2020. Further information on impact and the statistical design has subsequently been published on the ONS website.
This note summarises a potential method for producing Census population estimates in the scenario that field activities are restricted around the time of the 2021 Census as a consequence of the government’s response to coronavirus.
This is being referred to as ‘scenario 3’ and is an extension to work already being explored to make use of admin records in low census response scenarios. The key differences in scenario 3 is that we would not be able to rely on a CCS field exercise.
Assumptions
- We assume that online Census collection will proceed as planned with anticipated response rate of 75-80%.
- This paper focuses on three main sources of coverage error that would be incurred in this scenario: Census household non-response, Census within household non-response, and admin data over-coverage.
- Other sources of bias in the estimation framework would need to be evaluated before making any decisions about the viability of these options.
Options considered
The first option we considered was to substitute for non-responding households using admin data. This would entail identifying non-responding addresses on the Census frame and looking for strong evidence of individuals living at those addresses in the admin records. There would be no ‘estimation framework’ as such, but a combining of Census and admin records to populate the address register. This option has been ruled out as it would likely under-estimate population size considerably due to, (a) missed enumerations within households that do respond, (b) lags in admin records meaning some people have not yet registered at addresses they currently reside in.
The second option considered was to use administrative records as a second listing for dual system estimation. Under this option, administrative data would assume the role of coverage survey on a much larger scale. The precision of the estimates would be high, however the work of Population and Migration Statistics Transformation (PMST) has shown that over-coverage of administrative records, despite best efforts to remove them, remains persistent. Based on research already undertaken, this option has been ruled due to the risk of over-estimation in Census population estimates.
The third options – and our proposed option for discussion, is an alternative method of capture/recapture based on weighting classes. Under this method, census counts obtained from responding households are compared with counts from administrative data in the same addresses. Assuming that the Census counts are correct, an adjustment weight can be derived from comparison with the equivalent admin data counts. This adjustment weight can be used on the full listing of addresses recorded in admin data to produce a population estimate. While this option lends itself more favourably to the administrative data we currently have in the office, there are additional requirements that would need to be met in preparation to use this method.
Weighting classes – requirements
To tackle potential sources of error with the weighting classes approach, additional steps needs to be taken for: over-coverage of occupied addresses on admin data, incomplete census address frame coverage, within household non-response.
- Administrative Data Requirements (to tackle over-coverage of occupied addresses):
Since most of our work on admin data been targeted at a DSE approach, data acquisition prioritisation up until now has focused on sources with potential to provide ‘activity’ type information. The weighting classes approach instead relies on counts from record level administrative data being used as auxiliary information to adjust for census non-responding households. This has some advantages regarding the current availability of administrative data, since (a) a single source with high coverage of occupied addresses might be optimal, and (b) over-coverage at the person level can be tolerated by the weighting classes estimator if the over-coverage propensity is similar between responding and non-responding households.
While some of the broad coverage datasets we already have access to (PDS and CIS for example) may meet our requirements for residential address coverage, it is vital to have information confirming whether an address is occupied around Census day. For this reason, utilities data (at UPRN level) would be the most important data to acquire to support this approach, as it would help protect against overestimation by removing admin records in addresses that are no longer occupied.
- Address Listing Requirements (to tackle missing addresses on frame):
One of the key functions of the CCS is to identify households which were not on the address frame and adjust in the estimation process accordingly. We have quality targets for the address frame and do not anticipate it being 100% accurate despite best endeavours. In simple terms this is +/-1% (there will be overcount and under count).
To proceed with the weighting classes approach without any additional adjustment for frame under-coverage would require us to be satisfied that the administrative data compiled does in fact cover all (or the vast majority) of residential addresses missing from the census address frame. This is unlikely to be true – while we expect that a proportion of these addresses may be captured in administrative data, we cannot assume it will be all of them.
A separate address checking exercise is therefore likely to be needed, with consideration for all areas across the country. Potential options are:
- Full property listings in selected areas (similar to the pre-interview stage of the CCS)
- Sampling administrative records not found on the frame and either verifying by desktop research or visiting in the field.
The latter of these assumes that there is a reasonable degree of independence between addresses being captured on the frame and those captured on admin data. Any fieldwork undertaken would not require contact with residents, however we assume that there would be constraints on undertaking any field activities around Census time if a lockdown scenario was in place. We would therefore require that any address checking activity that needs to be undertaken in the field should take place at available opportunities between now and spring next year.
An approach based on desktop research would support the requirement for utilities data as it is more like to differentiate between addresses which were occupied by usual residents and those which weren’t (either vacants or second homes). Council tax data and other admin datasets we have at our disposal may struggle in some areas to identify second homes.
- Survey requirements (to tackle within household non-response):
The use of admin data in weighting classes is only viable for estimating population in Census non-responding households. There is still a need to adjust for ‘within-household non-response’, which relates to individuals that are missed within households that have completed Census forms.
A second listing from a sample of census responding households is needed to do this. The sample size of this survey can be relatively small, as it is only intended to adjust for within household non-response, which does not have high variability.
The second listing does need to be independent – it cannot be an online survey as this would likely result in the same capture failures. Under a lockdown scenario, telephone interviewing is the only collection mode that would offer a viable alternative to CCS doorstep interviewing. In order to carry out telephone interviews, phone numbers need to be collected from potential sample households. We’ve considered two ways of doing this:
- Collect telephone numbers from the household reference person on the online Census collection.
- Initiate a boost to the LMS online survey 3 months prior to Census day, and then recontact for a second listing exercise shortly after Census day.
Phone numbers are collected already from the online LMS, with reasonably high completion rates (80-90%). Phone number collection is not included in the Census collection currently and it is not viable to undertake sufficient research to implement when there maybe an impact on response rates.
Research and Delivery Timelines
Research teams are currently committed to delivering the standard design for the Census in 2021 and preparing for low response scenarios already identified if the Census goes ahead as planned. To understand the likely quality of the weighting class approach outlined here would require simulation studies to be undertaken by MDR, the results of which would need to be available by September 2020 to make a decision regarding the suitability of this approach as a contingency option.
If in the event of lockdown there was a requirement to operationalise this method, it is unlikely that estimates of population size could be produced within a year of Census day. The estimation framework is not as well understood as DSE and will require additional methodological development and quality assurance beyond the standard design.