Ethical considerations relating to the creation and use of synthetic data

19 October 2022
Last updated:
21 October 2022

Ethics Checklist

Work through the questions featured in each of the boxes below to help you think through some of the key ethical considerations in the use of machine learning for research and statistics.

1. Think

  • Think about whether you should undertake this piece of work at all. Just because you can – does not mean that you should!
  • Consider why you are using or creating this data. What is it’s use case, and how will it provide benefit.
  • Is it necessary to use or create this synthetic data? If so, why?

Try and think about those things that might go wrong and take steps to avoid them – using the advice provided in our guidance and by seeking further support if necessary. Consider in particular the UK Statistics Authority’s general ethical principles below:

2. Public Good

  • Have the benefits of using synthetic data for a project been clearly documented? For further guidance on communicating public good, see our Public Good guidance.

3. Methods and Quality

  • Have the limitations of the data, methods and technologies been considered? How will you document these? The UK Statistics Authority’s ethics self-assessment tool guidance section on methods and quality and recent journal articles may be useful.
  • Have you considered the potential for bias in your data or even in the choice of study? Are you sure that you are not mirroring or reinforcing an unfair bias? Revisit the bias section for help.
  • Has the potential for individuals or groups to be excluded from datasets been considered? If the data cannot be complete or representative, you must take account of this in your analysis and document it clearly when reporting the results.

4. Transparency

  • Might the data or your results be re-used outside of their original context and purpose in the future to the disadvantage of individuals or communities? What can you do to try and protect against this possibility?
  • Have you clearly documented and communicated the aims, benefits, and limitations of the research to relevant stakeholders?
  • If you have created a synthetic dataset, have you clearly communicated its intended purpose(s), and limitations?

5. Public Views and Engagement

  • Have potential public and stakeholder views regarding particular uses of synthetic data across different contexts been considered? Staying up to date with current research and initiatives on public views regarding geospatial data use may be helpful.

6. Confidentiality and Data Security

  • Have appropriate mechanisms to maintain confidentiality of datasets been considered? How will the security of the data be maintained? See the advice provided in the confidentiality and disclosure risk section of our guidance and the ethics self-assessment tool guidance for help here.

Back to top