National Statistician’s Independent Review of the Measurement of Public Services Productivity

Chapter 3: General Methods Improvements

Some of the methods challenges raise wider strategic questions which apply across multiple services and require a consistent treatment. These are considered here and are also reflected in the following chapters specific to service areas.

The topics considered in this chapter are:

Alternatives to cost weighting.
Application of these principles.
The treatment of preventative services.
The treatment of latent capability.
The treatment of complex baskets of quality metrics.
Deflation.
The treatment of labour inputs undergoing induction training.
The treatment of purchased services.
The impact of devolution.

3.1 Alternatives to cost weighting

The absence of prices has long presented the main difficulty in valuing different non-market services, including for the purposes of weighting them into a single output measure. In line with System of National Accounts guidelines, weighting is conventionally done using cost weights. Cost weighting offers a feasible, consistent alternative in the absence of prices, given cost weights are available and feasible, and internally consistent. In addition they weight output against a consistent ‘floor’ for the value of public services as determined by the provider, if not the consumer, of these services, given that a provider would not provide a service at a higher cost than the value they assign it.

However, as explored in depth in paras 6.17 to 6.22 of the Atkinson Review, it is the feasibility of cost weights rather than their representation of value that result in their choice:

‘…we may not be able at present to apply estimates of (value) … So, for the present, the only feasible approach appears to be to continue to use cost weights.’ (para 6.19)

The disadvantage is that costs may be biased in terms of representing the value of the service to citizens. The Review considered this issue as a priority: in terms of the key challenges described in Chapter 2, biased weights are a significant driver of all of them save, arguably, the challenge of collective services.

Cost weights are not a strong metric, but in many cases there are no practical alternatives. The Review audited alternatives from the perspective that any changes proposed had to deliver superior benefits, at least in terms of being more accurate and being able to interact with cost weights. As a minimum any alternative would continue to need to be in money weights in market equivalent prices.

Recommendation 2:

There are instances where cost provides a weaker weighting metric than alternative approaches. These alternatives can and should be applied, but only where a clear case can be made against the principle that the alternative metric better reflects the value of the service than the relevant cost data.

The principles guiding the application of this recommendation are:

Introduction of alternative weighting methodologies should not permit the subjective or arbitrary selection of which method to apply.
Services should be selected based on a clear assessment that cost weights are either inappropriate or inferior to an alternative method.
Selection of an alternative method should be dependent on existing strong evidence on which to base any necessary assumptions or factors. Cost weighting has the advantage of consistency: if alternative methods are not more robust it does not make sense to remove that consistency without gain elsewhere.
Methods should be piloted in experimental series for consultation with users before being introduced into core statistics.

3.2 Application of these principles

The two services where feasible alternatives to cost weights exist are Tax and Social Security Administration. Payments between government and citizens are viewed by the national accounts as ‘transfers’ rather than additional output. That is, they move money from one agent to another (from taxpayers to the government, or from the government to benefit recipients) without changing the overall value of goods and services produced. Transfers are, therefore, excluded by convention from measurement of output of private and public industries, otherwise Gross Domestic Product (GDP) could double simply by transferring all earnings to a different agent, even if all agents end up with the same funds.

The challenge is, in productivity terms, ideally the minimal inputs would be used to deliver each pound of tax collected or each pound of benefits distributed. Weighting different casefiles by cost fails to take into account the amount of tax raised, or the amount of benefits dispersed by different taxes or benefits.

The key example the Review examined was in the social security system: the replacement of six benefits with Universal Credit meant that multiple case files were replaced by a single claimant file. A new benefit, which delivers an equivalent value of benefits to several older ones, may have a lower value in a cost weighted output measure than the older benefits it was replacing if the related costs of processing one file rather than six had fallen. If the Office for National Statistics (ONS) used the number of case files as the measure of output, and they all cost the same, assuming inputs remain constant, this would appear to deliver up to an 86% reduction in productivity, dependent on how inputs varied, which would be counter-intuitive given the purpose of rationalising of the benefits system is to improve productivity by removing duplicative activities.

For this reason, in 2018, the ONS deactivated the Social Security Administration output metric it used at that time and reverted to ‘inputs = outputs’ so as not to present a misleading picture of productivity. The coronavirus (COVID-19) pandemic then delayed plans to revisit this; the Review therefore treated this as a priority.

The Review identified that use of case file measures of output inadequately reflects the fact that not all casefiles are equal in value terms, and thus some method of taking into account the amount of taxes collected or benefits dispersed, other than cost, needs to be considered to address this inequity.

In the case of Universal Credit the Review recommends the use of ‘benefit-weights’ rather than cost weights to reflect that the relative value of Universal Credit is higher if it consolidates various benefits into a single case file. In the case of Taxation, the Review recommends an intermediate ‘revenue-adjustment’ should be applied in lieu of a fully developed quality adjustment, which reflects that the tax-take should be bounded – the fiscal authority determines the amount of tax it wishes to be collected as a social optimum and Tax Administration should not aim to gather more tax than this, even though normally more output would be considered beneficial to society.

Whilst the Review continues to adhere to the principle to exclude transfers in the direct measure of output, given the function of the output is to collect or disperse transfers, these data can be used as a weight to reflect the value of different case-file types in the measure of output, or within a quality adjustment. For further detail see Chapter 3.

Recommendation 3:

The value of transfers can be used to weight together individual components within a service or in a quality adjustment but should not be a direct measure of output.

Using benefit-weights for those parts of Social Security Administration affected by the transition to Universal Credit does not mean every aspect of this service receives this treatment so cost weights need to be retained. Chapter 14 provides more detail.

Recommendation 4:

Where different weights are used within a service, the cost weighted activity index will still be used to weight in any elements which cannot be addressed with the new weighting approaches. When service level statistics are aggregated, a cost weighted activity index should be used to produce the national aggregates.

In some cases, specific adjustments should be applied to cost weighting to avoid treating similar outputs delivered via different channels at different costs in a way which moving from expensive to lower-cost service providers suggests lower output. For example, the Healthcare output measure sets the weights equal in the case of the same hospital procedure being carried out with or without an overnight stay, when patient characteristics and needs are constant.

Recommendation 5:

Adjustments should be considered to equivalise cost weights between services of equivalent value but different cost.

It is important to note this is a different issue to the substitution of higher-cost services for lower-cost services as an alternative if this does not represent a reduction in inputs. The classic example is the replacement of branded drugs and pharmaceuticals with ‘generic’ versions, which are generally lower cost but identical in impact. This question is addressed in detail in the chapter on Healthcare in Chapter 7.

3.3 The treatment of preventative services

Preventative services are those that, by definition, efficiently reduce demand for future, higher cost services, and hence overall cost. The challenge around preventative services is that they are only commissioned if the cost of delivery is lower than the cost avoided. This is made more complex if citizens receive additional benefits from avoiding the high-cost service. There are numerous examples:

Healthcare interventions which reduce demand for higher cost interventions, which citizens may find invasive, painful, or have a wider impact on their life. They may also improve patients’ quality of life prior to and after when they would otherwise have received treatment. Avoiding treatment may also deliver wider benefits, such as not losing time in work, or suffering pain arising from treatment, or subsequent complications.

Children’s Social Care services, such as family-support and similar policies, which reduce demand for high-cost services, such as placements in residential children’s homes.

Enforcement activities around Tax Administration, which aims to deter non-payment, and therefore reduce the cost of future enforcement activity by incentivising payment. Chapter 13 discusses this within the issue of fraud and error.

Defence services act as a deterrence which reduce the risk of future military actions.

The Review commissioned a report from Professor Martin Weale, Kings College London, which used diabetes programmes as an exemplar in this area. Prof. Weale identified two problems: how to value services which are in operation, and how to value new preventative activities (akin to the classic ‘new good’ pricing challenge in national accounting).

Professor Weale argues that the volume effects of introducing preventative services should be derived from the implicit price consequences. In the latter case, in the period before introduction a ‘reservation price’ for the service is derived. This is the amount at which the demand falls to zero because the benefits of the service are exceeded by its costs. In the case of deriving such a reservation price, the impact on final outcomes, such as quality adjusted life years (QALYs) would be taken into account.

In the first instance, the Review focussed its attention on existing services, and methods to implement Prof. Weale’s proposals, which imply the identification of an imputed valuation for the preventative service which better represents its true value. Once a decision is reached on which services are in scope, the Review considers that an ‘imputed avoided cost’ should be used as a substitute for the actual cost where:

Imputed avoided cost of preventative service A =

(probability in reduction in use of service B) x (actual cost of service B)

Recommendation 6:

Further consideration should be made of which services are considered as preventative services. An imputed valuation for the preventative service A should then be used where this is the product of the probability in reduction in use of service B and the actual discounted cost of service B.

Note that this is not the only feasible method Prof. Weale argued the aggregate of avoided costs and increases in the quality of life, as measured in QALYs, could be used as the measure of value. The Review acknowledges this proposal but recognises health improvements of this nature have generally not been intensively used in quality adjustment of healthcare services. This is because not all health outcomes are a result of public health interventions: for example, people’s health improves if they give up smoking, reduce their alcohol consumption, take up exercise, or eat more healthily.

Weale, however, acknowledges that while this proposed approach provides a conceptual basis for accounting for preventative programmes, there are practical challenges in its implementation. In particular, the need to define a total amount for the gain in QALYs attributable to Healthcare Services overall. For example, the approach in Cutler et al (2022) would require estimates of expected QALYs of the population.

Atkinson’s key principle of attribution is also relevant: the output of public services cannot be increased for value not created by them. Prof. Weale (2024) makes the point that if properly established evaluation data are used which have addressed causality, the issue of attribution is no longer a binding constraint.

The Review agreed, but concluded further research is needed on how many such studies exist and are applicable before proceeding. If the ONS applied such an adjustment for diabetes, but not other health issues, would that incentivise investment into diabetes which may not be reflective of true relative value? Methods should be tailored to each service so they are considered fairly, even if that means applying different methods to different questions to arrive at a suitable and consistent level of accuracy.

The Review therefore proposes the methods applied in the short-term focus on existing cost data for avoided activities in the first instance. This should start by agreeing a defined list of preventative services so other items cannot be treated in this way inappropriately. They would require a number of data items:

Each preventative service needs twinning to a particular service which is being prevented (or a basket of such services). This should be defined before calculation to prevent any appearance of arbitrary decisions.
The cost of the prevented service needs to be available, including relevant changes through time and deflators as appropriate. As these data should already be available within the system, this should be the easiest data to access.
The availability of robust probability data around the causal relationship between the delivery of the preventative service and the service avoided. The new Cabinet Office Evaluation Registry, which is accessible to civil servants, provides a single venue for all government evaluations which can form the basis for this dataset. The key issue will be whether these probabilities are constant through time. Most UK National Accounts ‘fixed factors’ of this type would be subject to five yearly revision, so this will need to be carefully considered, including deflation.

Importantly, this methodology does not deviate from the principle that the cost of services is relevant and comparable. By utilising cost data for avoided services the principle of cost weighting is adapted rather than wholly replaced: the preventative services however priced are to be cost weighted. As potentially one of the most substantial potential areas for methods revision, this continuity is valuable, and is intuitively in line with the way public servants think about this type of activity.

Recommendation 7:

For pre-selected preventative services where high quality data on impact of downstream services can be found, the probability weighted cost of these downstream services can be used as a proxy valuation of the preventative services in the cost weighting methodology.

3.4 The treatment of latent capability

The section on preventative services relates most to the delivery of public services, but there is a distinct sub-category which relates to capital investment, and hence a type of input, rather than output. This is when additional capital investment is undertaken not for immediate use but to cope with periods of peak demand in the future. This capital expenditure therefore counts toward the inputs denominator of the productivity equation, but without impacting on the outputs numerator, hence depressing productivity.

The treatment of such ‘latent capability’ is particularly relevant given the coronavirus pandemic in 2020. The construction of ultimately unused ‘Nightingale hospitals’ demonstrated that latent capability could bias productivity calculations. If such institutions had been maintained through 2021 and required for use in 2022, how would the ONS reflect this expenditure in 2020 and 2021? The normal treatment of capital as an input is to take account of the consumption of capital (the rate at which it is worn out), but unused facilities neither depreciate at the normal rate nor contribute to the productivity of the Health Services until they come into active use.

Whilst this example did not come to pass, more routine examples exist in Defence where stores of ammunition and weapons etc are held in contingency of future demand. Whilst defence output measurement is difficult because of its collective nature (see Chapter 9) this is compounded by this aspect of latent capability, on top of being a preventative service. The Review noted the argument that to activate latent capital, staffing would often be necessary.

In the case of latent capability, extending the logic of the proposed methodology relating to preventative services would suggest the ONS needs to consider whether capacity which is not intended for final use in the delivery of the service at that time, should be regarded as a separate final output as investment is made, on an ‘inputs = outputs’ basis. This would smooth output through demand peaks, although this would appear to be a similar approach to those taken by many other comparable countries in their approach to events such as the coronavirus pandemic (see ONS 2022). This is therefore a compromise that attempts to reflect the value of the asset as created and ‘used’ as a latent capability.

The other alternative would be to consider the latent capability as an insurance service, where again the ONS would need to define what the resultant output was. The pros and cons of all options would require further consideration.

The principles the Review has applied for preventative services appear to also apply here. The proposed model requires, in particular, robust data on the probability of peak demand events occurring to justify moving away from the traditional approach. The number and value of instances of latent capability are few, and relatively small outside of defence, so this is not a priority, but provides a sensible model for consideration as public service measures become more sophisticated.

Recommendation 8:

For latent capability, further research is required to identify instances where this method could be piloted using high quality data.

3.5 The treatment of complex baskets of quality metrics

One of the most common challenges identified by the Review was that some services deliver more than one type of outcome. The classic example is Policing, as discussed in Chapter 2, where crime prevention, crime solution, crowd-management, finding missing persons, and fighting organised crime, alongside counter-terrorism work, are outcomes which are balanced in terms of delivery. Another example is schools, which primarily deliver educational qualifications, but also develops citizens as functioning members of society.

Weighting together such baskets of outcomes into a single quality adjustment necessitates identifying a mechanism to expose the relative value that people as a collective place on different outcomes or outputs in an unbiased fashion. The Review considered that an objective rather than subjective method to aggregate different outcomes is preferred.

There are two ways to consider this: identify a metric that reveals people’s view through their actions (revealed preference) or directly reported through some selection methodology (stated preference), and which can act as a weight within the calculations. The Review recognised the five possible mechanisms identified in Heys (2024) across these two approaches:

The use of prices as a weight, noting that for public services the challenge is that prices are not available.
The use of time as a weight, on the basis that people allocate their time on the basis that time has an equal value – so an activity people devote on average two hours to a week is valued twice as much as an activity people devote on average one hour to a week.
The use of voting or surveys to directly report preferences.
The use of legislative and regulatory decisions as a proxy for social preferences.
The use of a value of a wellbeing year (WELLBY) (as per Layard and De Neve (2023) and HM Treasury’s wellbeing supplementary guidance to the Green Book (2022) using the change in WELLBYs multiplied by the value as the weights).

When the ONS developed new metrics for Public Order and Safety in 2018, the relative weight of topics within the inspectorate regime was used to allocate weights between the different components of quality (reducing re-offending, safety and decency in prison, keeping the public safe etc). The logic was that the inspectorate regime set out by Ministers in Parliament served as the best approximation of social preferences across this set of outcomes that the ONS could feasibly access, as Ministers are elected by society.

This demonstrates that feasible weighting approaches derived from existing structures can be identified and implemented from legislative and regulatory models. Another example is the legal ‘purposes of sentencing’ give potential weights for the court service within the criminal justice system. Services will face different challenges in this regard, but the principle is clear.

Recommendation 9:

The weights used to bring together quality adjustment components need to, as closely as possible, reflect societal preferences in as objective a fashion as possible.

This recommendation applies to the weighting together of quality adjustments, but there could also be relevance to the weighting together of inputs or outputs.

Recommendation 10:

Further research should be undertaken to consider the potential to use alternative weighting regimes proposed for quality adjustment in replacing cost weights, as per recommendations 3 and 9.

3.6 Deflation

Inputs are a challenging measurement area, specifically when the costs of different inputs change over time, in part because the quality of the inputs are changing. Deflators are used to calculate the price change for like-for-like products over time so there can be control for quality change of inputs. This can be difficult if the quality of the input is changing as this may lead to volume change being misinterpreted as price change, and hence will directly affect productivity calculations.

The derivation of appropriate, accurate deflators is an important part of the compilation process across national accounts. In line with recent developments the Review has focussed on where more granular data allows deflation to occur at lower levels of disaggregation. This is particularly important for the improvements proposed in Defence and Policing. More detail is provided in the relevant chapters.

Again in common with the wider national accounts, the degree to which deflators for technology products keep pace with rapid moving technology change is important for public services, and will increasingly be so as technology such as artificial intelligence and cloud computing are adopted. This is a wider area of methods and data improvement which the ONS is undertaking alongside the Review.

Recommendation 11:

The ONS should exploit methods developments around technology and other deflators to improve the measurement of volume input and output in public services, and continue to seek out methods improvements.

How to deflate output volumes is also an important issue, going back to at least Fisher and Shell (1972). Whilst there is a case for using the final consumption deflator, the GDP deflator is generally applied where necessary and local deflators are not available.

The Review also considered the valuation of quality adjustments. This has been debated since the Atkinson Review: does the volume of health services output increase if the value of a QALY goes up? The Review has considered a price change of this nature to be a price effect which increases the value of output, but not the volume; in the same way that in the private sector the volume of gas consumed does not go up if the price changes, but is open to academic feedback on this point.

3.7 The treatment of labour inputs undergoing induction training

An emerging issue across a number of services is where new entrants are brought in as trainees, with salaries which reflect long-term potential, but may over-estimate the value of their contribution whilst they are still undergoing their induction training. Where the volume of trainees is relatively constant this should not affect growth rates observed in the measure of total inputs, but in a period where a substantial additional recruitment regime for trainees is implemented, this may over-estimate the value of the inputs being used in the service, and hence under-estimate productivity.

How to resolve this is dependent on a number of factors: should an adjustment be made to scale-back the inputs delivered by trainees relative to more experienced staff? If so, how big should this be; for how long should it apply; and should the adjustment decrease over time reflecting skills acquisition? Alternatively, an approach similar to how capital inputs are treated might make sense: the labour inputs of new entrants who are undergoing intensive training programmes could be accrued over later accounting periods.

The Atkinson Review (2005) recommends weighting labour by skill level, hence the Review looked to use salary bands in the compilation of labour inputs. Where sliding pay scales are available this weighting will give a reflection of staff experience. The detail and the availability of these pay scales dictates whether these are sufficient to account for the level of staff experience, including new entrants undergoing a training phase.

This approach allows the most consistency across services however, where these data are not sufficient, or there are more extreme changes to recruitment or training practices, then further adjustment could be considered. It will be more difficult to capture the quality effects on inputs from training for fully qualified staff. In this case you have examples of dedicated time given by employers for training, and on-the-job training conducted by staff during working hours that may not result in a change in pay. Where this remains constant between periods there will be minimal impact on growth rates, but the ONS should consider the impacts of larger changes in training practices. Atkinson referenced that a better way to measure staff skill levels would be implementing an estimate of human capital into the input measure, however, there is insufficient information to do this at present.

The important principles to be applied are:

If such an adjustment is made to one service for staff induction, it should be applied similarly across all areas.
Treatment needs to be symmetric – if an intervention is made when the share of trainees is above average and likely to bias productivity down, it should intervene in the opposite direction if the share of trainees is below average.
Any such adjustment would need to be evidenced-based in terms of determining its scale.

Recommendation 12:

Further research should be undertaken to consider the potential and value or necessity to apply an adjustment to account for labour inputs undergoing training.

3.8 The treatment of purchased services

The Review refers to ‘public service’ rather than ‘public sector’ because the UK uses a mixed-market in delivery. Public services can be delivered by private as well as third sector and public entities. The public service productivity data takes account of this by recognising private and third sector provision as inputs and often utilising ‘inputs = outputs’ to derive their output.

This may impose similar problems to those identified elsewhere, that is, changes in delivery design may result in improvements in allocative efficiency that only capture technical efficiency – changes in the delivery of an existing service. It would be anticipated that part of the reason for using an alternative provider is to secure productivity gains, or the potential for future productivity gains.

Given the interest in the relative performance of the productivity of the public and private sectors, how to reflect this to ensure the productivity of public service is correctly estimated is clearly an important question. Given the productivity of private sector delivery of services can be derived from other ONS statistics, there is merit in further research to apply appropriate Gross Value Added growth factors to publicly procured private sector delivery for inclusion in estimates of public service productivity. But this would not be the case with public sector output, as clearly this would double-count the private sector’s contribution to GDP.

Recommendation 13:

The ONS should consider how best to reflect private sector productivity growth within the measurement of public service productivity, where this captures private sector delivery of services.

3.9 The impact of devolution

The published ONS public service productivity statistics are UK-wide. Across the public sector some aspects are reserved (decisions are taken by the UK Parliament) whilst others are devolved (decisions are taken by the devolved governments) and in some cases services are partially devolved. In devolved services differences in policy and subsequent legislation can lead to differences in data availability and its definition.

In compiling the UK-wide estimate of public service productivity, the ONS currently estimates some components for the devolved nations. The Review has worked with the devolved governments to map data availability for Wales, Scotland and Northern Ireland, as well as England. This has enabled identification of data gaps and investigation of potential data sources which would lead to improved coherence across the UK measure.

Methodological and data improvements for each service are outlined in Chapters 7 to 16. Information is also provided on whether the improvements are being made or proposed across the UK, highlighting which sectors are devolved or reserved. Moving forward, the ONS will continue to build relationships with the devolved governments to share updates and understand differences across countries.

Recommendation 14:

The ONS should continue to work with the devolved governments to understand the devolved service-delivery landscape and improve data coverage, quality and consistency in the UK measure of public service productivity.

The Review looked at the feasibility of creating metrics at a devolved level, using education as a test case. This work is still in very early development, and to move forward further investigation is needed on whether data are suitably comparable, the nature of the user need for devolved public service productivity statistics, and handling of potential misinterpretations.

Regardless of whether the ONS publishes the estimates at a devolved level, it is a useful exercise to complete to better understand the data and differences across the four countries of the UK.

Recommendation 15:

The ONS should further investigate the feasibility and user need for devolved metrics on public services productivity, particularly the education sector, working with the devolved governments.

« Previous

Download PDF version (11.98 MB)