Confidentiality and data security
It is important that researchers and statisticians consider and understand how they will maintain the confidentiality and security of data that they are accessing and using. Data may relate to protected characteristics or could be considered sensitive in some way. The sensitivity of data is likely to depend not only on the type of data that is captured, but also the wider social, historical and cultural context surrounding the project.
The potential impact of disclosure of any data that is collected or accessed should be considered at both the individual and group level and to ensure that the information is adequately protected. This is a particularly important concern when collecting or analysing data from minority, or under-represented groups, as they may be more easily identifiable from the data. The risk of this identification may increase when linking administrative data sources, and this may put individuals from minority groups at increased risk of harm. Existing guidance on data confidentiality and appropriate statistical disclosure control may be useful here.
If researchers are transparent in their approaches to data security and confidentiality, this is likely to have positive implications on the quality and validity of the data collected. For instance, research has shown that communicating these processes to research participants decreases the likelihood that they will modify their responses due to concerns that they may be identifiable or that the information may be passed on to third parties. The importance of confidentiality and data security in order to maintain trust is also set out in the Code of Practice for Statistics.
Aspects to consider for confidentiality, data security, and inclusivity
- The privacy and confidentiality of data subjects should be maintained.
- Where possible, the amount of sensitive or re-identifiable information collected and stored should be minimised, with data reviewed at the earliest opportunity. It may be beneficial to use secure research access schemes to protect sensitive data and ensure that trust is maintained in these processes.
- Data security and confidentiality mitigations should be documented for transparency and communicated clearly to data subjects so that they can have confidence in these processes.
- When using administrative data, being transparent about the processes taken to ensure confidentiality can increase trust in the use of public data. These may include:
- Access restrictions like safeguarded licensing agreements and approved researcher access in safe settings (e.g., the Secure Research Service).
- Measures to prevent cases of identification in outputs (e.g., statistical disclosure control and publishing aggregate results).
- Reviewing identification throughout the research process, as the linkage of administrative data sources can increase the risk of re-identification.
- Researchers and statisticians should consider the provenance of the data that they are using, including the stated purpose and method by which the data has been collected and stored, and any limitations or potential harms that may arise from this for different groups. In particular:
- Were people asked to self-identify their characteristics and in what way? Or did someone assign this information to the individual (possibly erroneously)? This can be a particular issue in administrative data sources where data has been collected by a professional rather than the data subject themselves. It is important that measures are taken to fully understand the collection methods of secondary data so inclusivity issues can be identified and mitigated appropriately.
- How long ago was the data collected and has it been updated since? It is important to recognise that people can change the way they identify themselves throughout their lives.