Ethical considerations in the use of geospatial data for research and statistics

Published:
18 May 2021
Last updated:
6 September 2021

Ensuring inclusivity

The issue: Inclusivity

Depending on the data source used, geospatial data has the potential to exclude individuals or whole groups from analysis.

The clearest examples of this arises where a sample is drawn that is not representative of the whole population – or where the source of the data itself is inherently unrepresentative because of the collection method.

If mobile phone location data is used, for example, this is likely to exclude individuals who have reduced access to, or engagement with, mobile devices for financial or other reasons. Although there are huge advantages in terms of cost and the ability to code responses at source, enabling or encouraging respondents to complete surveys online can again exclude those who cannot or do not wish to engage with technology.

These concerns may also be relevant when using datasets curated by third parties, where it may not be clear how data has been collected, processed or cleaned. It may not be obvious whether any data has been omitted, or where data may reflect access to, or use of, particular services by different groups.

Note that excluded and hard-to-reach populations are very often geographically concentrated, with potential negative consequences for spatial analysis. This applies for example to the elderly, the homeless and some ethnic populations. Analysis that might inform policies affecting such groups needs to include particular measures to ensure good levels of coverage as an aspect of inclusion.

On the face of it, new digital sources gained from sources such as online completion, mobile phone GPS, cell masts, computer IP addresses, or vehicle telemetry offer great solutions for gathering information. These methods can produce huge amounts of data that are up-to-date, frequent, easy to quality assure and code, and inexpensive to collect.  However, there is a real chance that they do not reflect the whole population.

In census, great care is taken to ensure that coverage is nationally consistent and as far as possible includes all populations. Inconsistency would lead to biased outputs and an unfair allocation of resources.

In addition to online completion (perhaps with support from family or friends), opportunities are provided for completion via a phone contact centre, and support provided via translation leaflets and language services, accessible videos, large print and braille materials and support on the doorstep.

Those who might be digitally excluded are provided support in completing the questionnaire and help at drop-in centres – or can complete a paper questionnaire as in the past.  Communities where language or culture might be a barrier are supported via special events in the community.

Traveller populations are often poorly reflected in official statistics and special steps are taken to engage with these populations through local authority liaison.

All of this is further supported by additional high quality surveys (of around 1% of the national population) to assess coverage and quality of the census. The results of these surveys are used to adjust estimates to take account of any bias or under-coverage in the census.

The census, discussed above, is so fundamental to local and national planning and resource allocation that it is an extreme case in terms of needing to be of high quality and inclusive. Nonetheless, whatever your area of study you should always consider where any gaps might lay in your data, the implications of this and what you can do to fill them.

If you do not have control over data collection, you should certainly take account of any bias in your analysis and how it is reported.

Advice and possible mitigations:

  • Always consider whether your sources are likely to be fully representative. Is the sample truly representative of the population? Does the method of collection exclude any groups?
  • The potential for particular groups to be excluded from geospatial data should be documented and communicated in the early stages of a project, alongside the actions that have been both considered and taken to address this.
  • Engagement with organisations that represent particular populations or communities may help to ensure understanding of how best to include or engage with these groups.
  • Additional data sources may be required to minimise the exclusion of any groups. If these are not available, consideration should be given as to whether these sources should be used at all.
  • If the data cannot be complete or representative you MUST take account of it in your analysis and document this clearly.
Back to top