Orlaith Fraser, Cal Ghee. 2021 Census Statistical Design, August 2021
Purpose of this paper
We took an introduction to the design for Communal Establishments (CEs) to the external methodological assurance panel (MARP) in October 2019. The panel wanted an update specifically on imputation and use of admin data. Extract from the minutes:
2.8 – The use of donors for imputation was also discussed, with the panel satisfied with the selection within the same CE process but noted known issues regarding small donor pools and repeated use of donors. It was recommended that the ONS continue to research potential overuse of donors.
A52 – The panel would like a more detailed update on Communal Establishments in the future, including imputation and use of administrative data.
This paper addresses those comments as far as we are currently able, and adds in context on the end-to-end design, from creation of the address frame and collection operation through to coverage estimation and adjustment. It does not cover quality assurance/validation of the census estimates.
CRAG and MARP are asked for their views on the current status of the CE elements, and to feed in any ideas on aspects that are still in progress.
According to the 2011 Census, 1.7% of residents of England and Wales live in managed residential accommodation known as Communal Establishments (CEs), such as care homes, student halls of residence, hospitals or prisons. While this is a relatively small proportion of the total population, they are likely to be clustered in particular locations and share certain characteristics. It is essential that we capture sufficient responses from these groups from a statistical quality, inclusivity and outputs perspective. A tailored enumeration approach is needed to enable those in communal establishments to respond. A similarly bespoke enumeration approach may also be required for a small number of households because of addressing challenges (transient population groups, temporary accommodation sites), access restrictions (e.g. royal households, embassies) and/or because additional engagement with the community may be required (such as gypsy & Roma travellers), which makes them unsuitable for the traditional household collection model. These are known as Special Population Groups (SPGs) and are often grouped with CEs for operational purposes.
Although the Census Coverage Survey (CCS) caters for CEs with up to 50 beds paces, larger CEs are not included. A tailored approach to the processing of the data is therefore also required to ensure that estimates can be published to a standard that meets user needs.
Table 1 below summarises the main CE populations in 2011, showing the proportion they make of the whole population, the proportion of the CE population, and the response rate in 2011. It also summarises the quality concerns we have with each of these main types. Appendix D Table D1 gives a more detailed breakdown of the CE populations from the 2011 Census.
Table 1: Quality concerns with CE population collection
|Type||% of 2011 E&W total pop||% of 2011 CE pop||Resp Rate in 2011||Quality concerns||Comments / alternative sources for QA and estimation|
|Care home residents||0.007||0.38||0.94||· Population numbers, but not geographically clustered||· 97% in ‘small CEs’ in 2011 so in CCS and estimation – more in ‘large CEs’ in 2021|
|· Quality of proxy responses||· 2011 data 70-87% were by proxy. Likely higher in 2021, but managers aware of respondent burden|
|(31% of the 50+ bedsp. CEs)||· Manager/staff respondent burden||· Have to accept quality of proxy responses due to nature of residents’ situation|
|Students in halls of residence||0.005||0.3||0.87||· Population numbers and geographic clustering||· Large increase in private halls since 2011|
|· Young adult non-response||· Info direct from estab? (CE officer liaison, phone)|
|(37% of the 50+ bedsp. CEs)||· Access to individuals - receipt of initial contact & follow-up||· HESA data too lagged for current year – use patterns from previous years?|
|· Other sources tbc|
|Hotel, B&B, guest house usual residents||0.0005||0.025||0.91||· Some geographic clustering||· Area-specific validation|
|(<1% of 50+CEs)||· Understanding ‘usual resident’||· 99% were ‘small CEs’ so included in CCS and coverage estimation|
|Holiday accomm. (caravan parks etc)||5.0E-5||0.003||0.92||· Geographic clustering||· Area-specific validation, investigating what sources are available|
|(<<1% of 50+ CEs)||· Access (time of year)|
|· Definitions of usual residence|
|Prison population||0.001||0.066||0.82||· Geographic clustering||· MoJ data on basic demographics|
|(7% of 50+CEs)||· Definitions, poor response rates|
|School boarders||0.001||0.064||0.95||· Geographic clustering||· Alternative data sources being investigated|
|(13% of 50+CEs)||· Duplication of response (parental home v term time)|
|Armed Forces bases||0.0008||0.043||0.85||· Geographic clustering||· Use of MoD/USAF data – but to note that these sources can’t separate out base v residence address|
|(6% of 50+CEs)||· Families included in comparator data or not|
An accurate address frame is essential to enable us to identify the communal establishments and target those living in CEs and SPGs with a tailored approach. The frame is based on AddressBase Premium (ABP) which is widely used across both the public and private sector and is continually updated by Geoplace. It uses Local Land and Property Gazetteers (LLPGs) in conjunction with a range of address intelligence sources such as from the Valuation Office Agency, Royal Mail and Ordnance Survey. Address type classifications from ABP are initially used to establish the type of address (whether a CE or household address, and if a CE whether it is a prison or a care home, for example).
Because communal establishments present particular challenges for addressing, the frame initially created from AddressBase is supplemented with further information, which enables us to validate the address frame classifications, establish completeness (to make sure no CEs are missing from the frame) and also add in additional information where needed, such as the number of bed spaces. The following administrative and commercial sources are used for these purposes:
- Cushman and Wakefield (Student Halls)
- Care Quality Commission and Care Inspectorate Wales (Care Homes)
- Ministry of Defence and US Armed Forces (Armed Forces Bases)
- Ministry of Justice (Prisons)
- Edubase (Boarding Schools)
- Ministry of Communities and Local Government Survey of Traveller Sites (Travelling Persons)
The initial frame is based on an AddressBase extract taken in summer 2020. A further extract from AddressBase is then extracted in December 2020 to account for any new CE addresses and changes in type from household to CE and vice-versa. The administrative and commercial sources listed above will be used to identify changes in communal establishments at this time to ensure that we have the most up-to-date and accurate frame to underpin the 2021 Census, and ensure that all CEs in England and Wales receive invitations to take part in the census and the appropriate number of forms are delivered.
Where AddressBase and other data sources are either of uncertain quality, or do not provide enough information for addressing or beds paces, there has been concerted desk-based research and clerical work to enhance the address frame. The Communal Establishment Address Resolution Team (CEART) has undertaken this clerical resolution work from May 2020 to confirm address types and capacity information. This additional enhancement to the address frame will assist with the field operations during live collection and ensure there is appropriate coverage.
A particular challenge to addressing when it comes to communal establishments is obtaining accurate unit-level (room-level) addresses. While initial contact material may be distributed without the need for unit-level addresses, identifying which of the residents have, or have not, responded during follow-up visits is much more challenging and considerably less effective when unit level addresses are not available. For some establishments, such as care homes, unit-level addresses may simply not exist or it may be inappropriate to follow-up at this level. For smaller CEs, such as hotels and hostels, unit-level follow-up may not be necessary as there are usually only a small number of usual residents at that address. However, for student halls of residence, which may contain a very large number of students, who are unlikely to interact with the CE manager, unit level follow-up is essential for an effective follow-up process. A decision has been made, therefore, to invest significant effort in obtaining unit-level addresses for student halls of residence through engagement with universities, local authorities and using commercial data sources. Only establishment level addresses will be used for other CE types.
Processes have been developed to ensure that any new CEs (including those misclassified as households in the initial frame) identified in the field, or at any time between the delivery of the address frame and the start of field operations, including unit-level addresses where appropriate, can be added to the address list and ensure that paper forms or access codes are delivered as appropriate.
All managers of CEs are asked (and are legally obliged) to complete a ‘CE1’ form – a short questionnaire asking about the type of establishment, the type of resident it caters for and the count of usual residents and visitors on Census Day (21st of March 2021). All usual residents are requested to complete an individual form, containing the same individual questions as the individual portion for the household form.
Usual residents are defined as anyone who usually lives at the address (including staff) and is expecting to stay there for 6 months or more, or anyone who has no other usual address in the UK. Those expecting to stay for less than 6 months, who also have another address where they are usually resident in the UK, are counted as visitors and should not complete an individual form at the CE address but should be included on a form at their usual address.
Households classified as Special Population Groups (SPGs) complete a standard household form.
The type of establishment or SPG is used to determine whether residents receive an invitation letter with an individual access code to enable them to respond online (with the option of requesting a paper questionnaire if they wish) or a paper questionnaire (including an access code to enable them to respond online if they prefer). The decision around the mode of initial contact is determined by respondent need and the likely impact on response, whilst encouraging people to respond online if they have the means to do so.
In contrast to standard household addresses, where initial contact letters or paper questionnaires are posted out, initial contact materials are all hand delivered to CEs by trained communal establishment census officers. Some SPGs have their initial contact materials hand delivered where additional engagement with the group may be necessary or where no permanent address is available post out to, but other SPGs, such as royal households and embassies, receive their initial contact letters through the post. See Appendix A for summary of each of the CE and SPG types and the method of delivery and type of initial contact.
Wave of Contact for CEs and SPGs
The timeline of interactions with the public known as the ‘wave of contact’ for CEs and SPGs follows as far as possible the household wave of contact (see Appendix B). All residential addresses, including CEs and SPGs receive an unaddressed postcard to be delivered from 4 weeks before Census Day to raise awareness of the census. Hand delivery of initial contact material starts from 4 weeks before Census Day, with priority delivery to university halls of residence to ensure delivery before the end of term and start of Easter holidays. Communal establishment officers ‘own’ a specific set of SPG and CE addresses and make return visits to follow-up visits after delivery of initial contact material throughout the field operation, which ceases three weeks after Census Day. Residents in communal establishments do not get reminder letters posted out, as this would not be consistent with the hand-delivery approach, but some SPG addresses do receive a reminder letter in the first week after Census Day if they have not yet responded.
Response Rate Targets
Targets for response rates to be achieved by the end of the collection period enable us to drive the operation to maximise response and track progress towards achieving sufficient response to enable production of estimates that meet user needs. Tracking progress towards targets also enables us to direct interventions where needed and ensure that resources are placed where they are needed most. However, for such targets to be useful, they need to be operationally feasible and need to take into account some of the challenges faced by census officers when following non-responding CE residents whilst also balancing the need for sufficient response within each establishment to enable individual records to be imputed without bias. The imputation process only ever adds individual records to a CE, based on the characteristics of those who have responded from within that establishment, and entire CEs are never imputed. To enable imputation, it is vitally important that data are collected from each CE on the number of usual residents, either through completion of the CE1 form by the CE manager, or through completion of a ‘dummy form’ by a CE officer where a CE1 form has not been received. A low response rate from an individual CE would increase the risk of non-response bias and could therefore impact the quality of the imputation and subsequent published estimates. A target has been set to achieve establishment-level information for 100% of CEs via either a CE1 form (preferably), or via a CE dummy form if no CE1 form has been returned.
To take into account the varying availability of unit-level addresses and administrative data sources as well as differing operational challenges across establishment types, four priority levels and associated targets have been set for CEs and SPGs.
Category 1 (low priority) addresses include CEs and SPGs where follow-up visits are likely to be less effective, access is likely to be restricted, or where higher quality admin data mean that this groups is a low priority for follow-up. CEs in this group include military bases, prisons and secure establishments. No specific response rate target is set for this group, but one initial visit must be made to successfully deliver initial contact materials and at least one follow-up visit conducted including contact with the CE manager.
Category 2 (medium priority) addresses include CEs and SPGs where a high level of response is desirable but where multiple follow-up visits may not be appropriate. These include royal households, embassies, hospitals, hospices, children’s homes and boarding schools. Similarly to category 1, no hard target is set for these CEs or SPGs, but at least two follow-up visits (including contact with the CE manager) must be made after delivery of initial contact material.
Category 3 (high priority) addresses include the majority of CEs and is where the majority of resources will be invested to achieve a high level of response. These include student halls of residence, care homes, staff accommodation, hotels, hostels and religious establishments. For these CEs, a minimum of 75% of residents at each establishment must have returned a census questionnaire, with an overall response rate across all CEs in this category of 80%. Follow-up visits must be made continuously throughout the field operational period and should not cease before the end of the field operation period unless all responses from all residents have been received. However, as follow-up activities are contingent on being granted access to the establishment, in those cases where access is refused and the manager refuses to complete a CE1 form, these will be escalated for separate management and will not be included in the response rate calculations for this category. This will avoid resources being diverted from other areas to tackle what could seem like poor overall response in one area, but which is in fact caused by low or no response from a single CE for which under these circumstances additional resources would not be helpful and could be detrimental to other areas.
Category 4 is used to denote SPGs where a bespoke approach is required and no specific targets are set for either response rates or the number of follow-up visits. These include day and night centres for the homeless and rough sleepers and other transient groups such as continuous cruisers, fairs and circuses and transient Gypsy and Roma travellers.
Note that all response rate targets are for valid responses as opposed to any return from an address, which may include blank or incomplete questionnaires. However, for operational purposes, return rates will be used as a proxy for response rates to enable timely tracking of progress towards achieving those targets in during the live operation.
CE & SPG Return Profiles
Return profiles have been created to enable the tracking of progress towards targets and to enable identification of any problems early in the operation so that interventions can be put in place to mitigate any shortfalls in expected levels of response.
The proposed 2021 CE & SPG return profiles are presented in Appendix C. The profiles are built from the 2011 CE observed returns but have been created and adjusted to reflect the current CE operational design, in particular the switch to online first for some CE types.
Three profiles are given:
- Student return profile – representing university halls of residence. These CEs will receive invitation letters containing access codes and are expected to be majority online returns.
- Care home return profile – representing care home residents. Care homes will receive paper questionnaires and are not expected to make online returns
- Overall profile for high priority CEs – incorporating returns from all CEs and SPGs in ‘Category 3’ (including halls of residence, care homes, hotels, hostels, staff accommodation, religious establishments, marinas and caravan parks)
It is important to understand that the profiles link to the agreed 2021 operational targets for CEs and SPGs, and these effectively present the predicted route towards achieving the minimum acceptable level of returns. However, the collection operation will strive to maximise response as far as possible and will not cease once the targets are met. Students in halls of residence and care home residents together made up approximately 77% of the CE population in 2011. The size of the population in these two groups and their divergent population characteristics make it necessarily to differentiate the profiles for these groups. Response patterns for other individual CE types, however, are likely to be too volatile due to their smaller numbers and dependence on timing of interactions with CE officers to enable us to produce separate profiles with sufficient confidence to be of practical value.
Making use of the CE & SPG return profiles in the collection operation requires a strategy for monitoring progress against them. Key to this proposal is having a dedicated analytical function for CEs and SPGs, resourced to monitor and appraise the progress of CE and SPG collection during live operations. This will provide capability to evaluate the operation in the context of the CE operational and statistical design. Monitoring CEs & SPGs during live operations will be done through analysis of MI data by the CE analytical function within the Census Statistical Design team. This will comprise tasks to identify areas for concern that need to be flagged for further consideration:
- Ranking student CE returns by LA – to highlight LA differences in return rates for students in CEs, picking out areas where returns are lower than expected. Student CEs have a greater impact to the population base in some LAs.
- Ranking care homes by LA – to highlight LAs with lower return rates.
- All CE and SPG types individually nationally – to provide an overview of the effectiveness of the collection process, again to highlight any issues specific to a type of CE or SPG which may require a change of approach.
- Category 1 CEs and SPGs1: to monitor the number of visits against expected, and highlight action where needed if visits are insufficient. Reporting return rates for information (no action expected).
- Category 2 CEs and SPGs2: to monitor the number of visits, and action where needed. Report return rates (no action expected).
- Category 3 CEs and SPGs3: to monitor return rates and action these further where needed.
- Category 4 SPGs4: to report return rates, action only by exception.
Monitoring is proposed to be done on a weekly MI cycle that allows for receipt of paper responses to be reflected. If further action is deemed necessary based on the monitoring activity, then intervention may be deemed appropriate. Interventions available to the CE collection are:
- Increase field staff hours – for CE field officers, to tackle specific issues where CE returns don’t match against expected progress, by local area and CE type.
- Performance management – where outcomes are not as expected against CE returns by local area and/or CE type, and comparison with other CE officers.
- Additional staff from household – to boost field hours beyond that possible with original CE officers, in reaction to a specific challenge in a local area.
The statistical design for CEs in the collection operation, including the response rate targets and return profiles, and the operational design developed to meet those targets have been developed in line with the 2021 Census quality objectives to ensure that sufficient data of high enough quality are obtained during the collection operation to enable the production of high quality census estimates to meet user needs.
Small CEs (up to 49 bedspaces in 2021)
In 2011 these were processed region by region, but even so, we had to collapse categories (age, sex and often type of CE) to get enough sample for the dual-system estimation process to be robust enough. For 2021, we are investigating using the modelling approach being used for household estimation.
For information, Appendix D Table D2 contains the counts, estimates and adjustment of usual resident in small CEs, with response rates, by establishment type, extracted from ONS(2012).
Notes from Oct 19 panel session on CEs:
Small CE geographic distribution was discussed, leading to the panel recommending ONS consider how the non-uniform nature of this impacts outcomes when using the current CCS sampling methodology
This is one of the things the modelling should help with. We’re not doing a specific sample for CEs – it is too problematic as they tend to be geographically clustered
Large CEs (50+ bedspaces)
In 2011 the Large CE estimates were for 100+ bed space establishments. We have now lowered this threshold to 50+, because:
- The CCS is too difficult to administer (face to face interviews) for 50+ residents
- The number of CEs in the 50-99 bed space group, that may also be in the sample, is not large enough to warrant the additional field effort
- Applying the weights to all 50-99 bed space CEs from calculations mostly done on 1-49 bed space CEs isn’t as robust as for the smaller CEs
- The benefit gained from these additional CEs in the Estimation calculations is not enough to warrant the additional field effort, and
- We anticipate having more admin data on these establishments anyway, to help with the estimation of undercount.
The threshold itself is to a certain extent arbitrary, and bed spaces only predict number of residents, but Field needed a round number to work with in the operation.
Expected number of 50+ bed space CEs (See Table 2 below)
- In 2011, there were 2,157 100+ bed space CEs, 917 of which were flagged for checks – 728 we made a change for
- There were an additional 2,300 50-99 bed space CEs, which would now be in scope of this assessment
- Currently estimating that approximately 4,600 care homes and halls of residence have 50+ bed space capacity (up from 3,000 in 2011) – these are the main groups of large CEs
- Estimates of the capacity of other types is still in progress
Table 2: Number of CEs by bed spaces, 2011 Census and estimated for 2021
|Establishment Type||100+ CEs added to in 2011||100+ assessed in 2011||100+ bed spaces 2011||50-99 bed spaces 2011||50+ bed spaces 2011||likely 50+ bed spaces 2021|
|Armed forces base accom.||38||84||201||59||260||Tbc|
|Other Ed. Establishments||-||-||-||36||39||Tbc|
Prioritisation for investigating and adjusting Small cell sizes have been suppressed. For context, Appendix D Table D3 shows the counts, estimates and adjustments made in large CEs in 2011.
In 2011, we prioritised which CEs to investigate, and to add to, based on the percentage of returns received (if we received fewer than 75%), and the number received (if we were more than 50 short of those issued, and when we had an alternative figure, if that was 50+ higher than the number received). In most cases, we phoned up the establishments to get their estimate of residents, and in some we used the available administrative comparator data. More detail is available in ONS(2012).
We are planning to re-evaluate the prioritisation done in 2011 – the arbitrary cut-off 75% of expected returns, or 50 deficit thresholds. In the first instance this will be done by re-examining the 2011 data, but we will also make the assessment based on the live 2021 situation. In 2021 there will be more information more readily available about the collection operation from the electronic Response Management and Fieldwork Management Tool that should feed into this.
Appendix D Table D4 shows the breakdown of large CE resolution in 2011: although there was some indication of undercount in all of these, there wasn’t the evidence to back up an adjustment in all of them: 31% of those investigated had no adjustment applied. This will be investigated further, but it is likely to be due to an assessment of the definitional differences in the sources.
Use of alternative sources to estimate large CE population
For 2021, we are investigating how well administrative sources cover the establishments. If we can’t get appropriate sources our contingency plan is to ask CE managers for their count of residents, or to phone the establishments ourselves as we did in 2011. Table 3 lists the types of establishments and potential data sources we are pursuing. This work is still in progress.
Table 3: Type of establishment v admin source
|Type||Potential data sources||Comments / work in progress|
Other medical establishments
Children’s care homes
|NHS Personal Demographic Service (PDS)|
Care Quality Commission (CQC), Care Inspectorate Wales (CIW)
Public Health England (PHE)
|We know there are issues with patient data and care homes, from when people go into hospital and stay a long time, or die there – where does death registration note them as living, does it link up with PDS record if that’s still at care home?|
What if PDS record at family home rather than care home?
CQS, CIW, PHE – don’t have age/sex, just bed spaces
|School boarders||English and Welsh School Censuses (ESC, WSC), School level annual school census (SLASC), Stats Wales|
Indep. School Census
|What quality assessment has been done on these?|
Can we compare against PDS? Will boarders be registered on PDS at boarding location?
Independent School Census will not release data at detailed (establishment) geography
HESES and patterns of term-time accommodation/ Halls bed spaces
|HESA data too lagged, asking for special delivery for 2021, but this only likely to be useful if we’re very behind with processing. Uncertain of quality beyond 1st year undergraduates.|
Possibility of getting aggregate HESES data and applying distributions of where students are usually located when at term addresses – this still to be investigated.
|Home armed forces||Ministry of Defence (MoD)||data by base – how to separate out those living elsewhere? Dependence on field info|
|Foreign armed forces||United States Armed Forces (USAF)||data by base – how to separate out those living elsewhere? Dependence on field info|
Immigration removal centres
|Ministry of Justice (MoJ) – data by prison, postcode, age/sex, length of sentence|
|MoJ data on prisoners going ahead ok, able to replicate census definitions ok.|
Does MoJ include probation bail and any other detention centres?
|Caravans/Travellers||Gypsy and Traveller Caravan by Stats Wales, sources from Ministry of Housing, Community and Local Government (MHCLG)||Investigating sources of data|
The Coverage Adjustment team have investigated the over-use of donors in 2011:Use of donors in Coverage Adjustment
- Confirmation that there was over-use of donors in 2011, especially for large CEs (less of an issue for small CEs)
- This was usually caused by the constraints in the estimates – given the age/sex and type targets, there simply weren’t sufficient donors available
- For 2021, we are considering using nearest-neighbour groups:
- a slightly different age group within the CE,
- or a similar CE type geographically close,
- or if it’s a mixed-sex CE, to use a female when a male is needed (and vice versa)
- Hoping to use combinatorial optimisation for CE imputation. When we’ve used CO for households, we’ve found that it doesn’t duplicate on donors as much as the 2011 method.
Coverage Adjustment will be coming back to MARP in September, so will be able to expand on this then.
2021 Census: Estimation and Adjustment for Communal Establishments (ONS, 2012)