Creative use of existing clinical and health outcomes data to assess NHS performance in England: Part 2—more challenging aspects of monitori
1 National Centre for Health Outcomes Development, London School of Hygiene and Tropical Medicine, London WC1E 6AZ,2 CASPE Research, London W1G 0AN,3 Northgate Information Solutions, Hemel Hempstead HP2 7HU,4 Department of Public Health and Policy, London School of Hygiene and Tropical Medicine
In the second of their two articles about using existing routine data to assess performance in the NHS, the authors make practical suggestions about using data for mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes, and raise issues about assumptions and technical aspects for discussion
, http://www.100md.com
Introduction
There have been recent calls for better data on NHS outputs and outcomes in England.1-4 However, this will require new data collection that could take several years. In the meantime, creative and informed use of existing data, with clear admission of the known shortcomings, may give some indication of how outcomes are changing.5 The main challenges are to measure health validly and to judge how much any improvements are due to NHS interventions.
, 百拇医药
In this, the second of our two articles, we explore some of the technical issues involved and make practical illustrative suggestions about how best to use existing data regarding mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes.6 Many other quality indicators could be produced along similar lines.7
Suggestions for indicators illustrating a range of methodological issues
Patterns of mental health care
, 百拇医药
Monitoring the quality of mental health care is problematic. Case fatality rates are low, so indicators based on numbers of deaths are inadequate. Much of mental health care occurs outside hospital without direct data on activity or outcomes. Also, there are few explicit standards. The challenge is to find a way of using data from hospitals to make some inferences about both hospital and community care and to identify aspects of care that are materially below optimum.
, 百拇医药
The national service framework for mental health highlights the preference for community over hospital care.8 With this policy being expected to produce better outcomes, assessing the way the service for mental health is delivered may be used as a proxy for quality of care. Mentally ill patients vary in numbers of readmissions to hospital and cumulative lengths of stay, and these may reflect variations in the quality and availability of care and support in the community. Too many readmissions and long cumulative lengths of stay may reflect inadequate community care, but too few readmissions and short cumulative lengths of stay may reflect inadequate provision of necessary hospital care.
, http://www.100md.com
A combination of the total number of admissions and the total time spent in hospital by patients during a year could be used to assess this. There may well be a trade-off between time spent in hospital and frequency of admission, and it is therefore important to consider the balance between the two variables as well as the individual measures. Studies of observed variation between populations could be used to derive target ranges for acceptable patterns of care.
, http://www.100md.com To test this approach, we followed individual patients aged 17-64 admitted to hospital in April of each year in mental health specialties throughout the financial year, using continuous inpatient spells and the linkage methods described in our first article.6 We calculated the total length of inpatient stay per person during the year and the total number of admissions per person during the year. Readmissions could be to any NHS hospital in England and for any condition, such as injury, not just mental illness. We attributed the values to the primary care organisation that covered the patient's place of residence at the time of first admission.
, 百拇医药
We found substantial variation between primary care organisations for each of the two variables. To give each component equal weight, we transformed them into z scores (by measuring distance from their reference points and dividing by the standard deviations of the individual distributions). Conventionally, the mean of a distribution is the reference point for z scores. However, the mean is not necessarily a suitable target for a performance score because it is partly a reflection of "poor performance" at the tail end of the distribution.9 We used modified z scores in which the reference points were 1.65 admissions per person and 45.0 days total stay in hospital during the year. We chose these points as a realistic joint target because they had been achieved by a strategic health authority in 2000-1 (the year selected as standard). This "best achieved combination" of a low rate of admissions per person and a low total stay per person, giving each the same level of importance, equates to the lowest composite z score at the level of strategic health authority (see Year 3 in table 1), although at the level of primary care organisation, some trusts achieved even better z scores. We used these reference points to calculate modified z scores across all five years.
, http://www.100md.com
The z scores measure how far each primary care organisation is from the defined optimum for total admissions and for total stay. We then produced a composite score for each organisation by adding its two z scores, equating to a summary of that organisation's mix of experience on two fronts. A high composite score would be considered undesirable as it would indicate more hospitalisation than expected. Weighting the z scores before adding them would be a refinement if there was particular concern about the relative importance of the two variables.
, http://www.100md.com
A possible cause of variation between primary care organisations might be differences in the prevalence of illness and in the level of support provided in the community, both by professional bodies and informal networks such as families. A full assessment of the effect of such factors is beyond the scope of this article, but we examined whether variation might be reduced when looking at similar geographical areas. For example, the populations of big cities can be more transient, have a greater incidence of some mental illnesses, and have fewer informal networks. The Office for National Statistics has used cluster analyses to create an area classification for grouping primary care organisations that are most similar in terms of 42 demographic, socioeconomic, housing, and other Census 2001 variables.10 Figure 2 shows the composite modified z score for each primary care organisation grouped within its area group. Although it shows some differences between groups (a slowly increasing z score across the chart), there is much greater residual variation within each group, suggesting that influences other than demography and socioeconomic conditions are largely responsible for the variation, such as service availability and clinical practice.
, 百拇医药
A measure of wider population mortality attributable to health care
Attempts to assess the contribution of health services to the entire population (not just those using health services) have relied on population based indicators of potentially avoidable mortality. Causes of death are included if there is evidence that they are amenable to healthcare interventions and—given timely, appropriate, and high quality care—death rates should be low among the age groups specified.11 Healthcare intervention includes preventing disease onset as well as treating existing disease.
, 百拇医药
Two such indicators based on potentially avoidable mortality are published annually for the NHS in the Compendium of Clinical and Health Indicators.7 Nolte and McKee reviewed the use of this concept and proposed an updated list of conditions and age bands for international comparisons, based on more recent evidence of amenability to healthcare interventions.12 We have used their list but have added asthma at ages 0-44 years, which they excluded because of lack of comparability in international studies.
, 百拇医药
In England 138 346 such deaths occurred in people aged less than 75 during 2001 and 2002, of which 48% were from ischaemic heart disease, 16% from cerebrovascular disease, 9% from colorectal cancer, 9% from female breast cancer, and 6% from pneumonia.
Forecasting future outcomes attributable to current investments
The observation that today's survival and death rates are at least partly a reflection of the quality of earlier health care applies particularly to primary and secondary prevention of conditions such as heart disease, stroke, diabetes, some cancers, and diseases of childhood. The converse of this is that many of the benefits to health from improved care today will not be seen for many years. One of the implications of this is that a comprehensive assessment of the quality of a healthcare system should include formal forecasts of the longer term effects of recent changes in provision and activity.
, 百拇医药
Several mathematical models have been and are being developed for doing this. For example, long term relative survival can be predicted for patients with recently diagnosed cancer.13 Another example is a microsimulation model that provides estimates of the annual benefits and costs over the middle and longer term (up to 20 years) of different patterns of healthcare provision and use for coronary heart disease.14 In terms of primary prevention, the model allows exploration of the population effect of improvements in the control of blood pressure and cholesterol and of changes in rates of cigarette smoking. In terms of treatment, it can explore the effects of changing ambulance response times, thrombolysis, and revascularisation rates. The model can produce, for example, estimates of the likely impact of meeting national service framework activity targets for coronary heart disease. Tables 2 and 3 show some illustrative examples of this, extracted from a report on this developmental work to the Department of Health,14 to demonstrate how simulation could be used to inform policy, subject to various assumptions and constraints.14 Currently, the model is being extended to other clinical conditions such as stroke and diabetes. Such models could be used to show the likely impact of new investments in prevention, incremental shifts from treatment to prevention, or alternative mixes of interventions.
, 百拇医药
Models of this kind are inevitably very demanding of data and assumptions, and there may be a trade-off between rigour and transparency. Their requirements include estimates of baseline levels of risk factors, disease prevalence, and healthcare use; estimates of trends over the forecasting period in exogenous factors (those not determined within the model); and, for cost effectiveness analyses, estimates of how treatment costs vary as levels of activity change. As well as modelling relationships between risk factors and outcomes, they have to be able to deal with combinations of changes in risk factors (such as reducing blood pressure and cholesterol concentrations) and interactions between risk factors (such as the effect of stopping smoking on blood pressure). Management of heart disease is one of the best researched aspects of health care, and, as well as the scientific literature, this model is based on new analyses of data from the health survey for England, the Framingham cohort study, and the British heart survey.14 However, different studies define variables in different ways, and substantial gaps in the literature remain, such as the effects of stopping treatment. Also, such models need maintenance, with new research findings needing to be incorporated regularly.
, 百拇医药
For discussion and debate
Our two articles are confined to health outcome measures and their proxies. There are, however, many other types of outputs that could be included in assessments of productivity. The following issues and assumptions require further discussion.
Selection of indicators and targets
Attribution of changes in health status to healthcare activity would normally require experimentation such as randomised controlled trials. Since this is not feasible as part of routine delivery of health services, judgment must be used, based on three criteria:
, 百拇医药
Research evidence or consensus (expressed in policies) suggest that health services (including public health, health partnerships, health advocacy) can have a significant influence on the outcome being measured
Variation between organisations in current performance suggests scope for improvement, with the best showing what is realistically achievable given optimum circumstances9
Variation between organisations in changes in performance over time suggests scope for improvement, with the greatest improvement showing what is realistically achievable given optimum circumstances.
, http://www.100md.com
The first criterion is essential for selection of indicators. The other two may not be, as the services may already be performing at an optimal level. Even if all three criteria are met, the outcomes may still reflect interventions not attributable to health services.
Aspects such as quality of life may be of more concern to patients than clinical measures, and therefore more appropriate as measures of outcome, albeit with greater problems of attribution. Absence of routine data on health related quality of life is a serious gap in our knowledge.
, 百拇医药
For annual cross sectional monitoring, it is important to select indicators that reflect short term impact unless long term impact is clear or can be forecast. Indicators such as incidence of stroke and deaths may reflect the cumulative effect of several natural events and interventions or resource use in the past. Some of these, such as prevalence of obesity and high blood pressure, may also act as proxies for future adverse outcomes and may therefore have a dual role in annual cross sectional monitoring.
, http://www.100md.com
Where there is clear evidence of the relation between intervention and health, such evidence may be used to create explicit standards for performance audit, and measures of the level of intervention may be used as proxies for future outputs or health outcomes.
Methodology
Ideally, numerators and denominators should match—for example, case fatality rates for stroke should be based on all deaths among all patients with stroke, including those not admitted to hospital and who may have either mild disease with lower case fatality or severe disease and death before admission. This is not always possible, and the limitations of what is feasible must be acknowledged. Some indicators measure what happens to known patients, with a risk that those needing care but not receiving it, possibly with poorer outcomes, are excluded.
, http://www.100md.com
Any measure of geographical variation or time trends needs to ensure comparability of numerator and denominator data. This may require adjustment of indicators for differences in age, sex, case mix (mix or severity of conditions), etc. A major constraint with existing routinely collected national data is the lack of grouping systems for case mix that are based on prognosis. Grouping systems, such as healthcare resource groups, were designed to create subgroups for comparison that are homogeneous with respect to resource use but not necessarily outcomes. Standardisation also raises questions about what adjustment is legitimate. People in deprived populations might have relatively poor outcomes because of relatively intractable health problems or because of substandard care, or both. Standardisation is undiscriminating and would "protect" the providers against both kinds of effects. Likewise, where there is sex variation, there is a choice between using sex standardised person rates or sex specific rates.
, 百拇医药
When standardising rates for age (and other variables) the choice of method (direct or indirect) and of the standard population used may affect the results, particularly when comparing sub-national rates. We tested this for hospital case fatality, calculating trends at the England level using both direct and indirect methods and using various years as the standard, and found little difference (table 2, part 16). However, this should be monitored in any new approach to measuring performance. For the correct analysis of trends, data for all years should be adjusted with the same standard and time period.
, http://www.100md.com
The stability of the indicator needs to be taken into account. For example, data on strategic health authorities are less prone to yearly fluctuations in rankings than data on primary care organisations because of their larger populations.
Interpretation of data
Variation in data quality (in levels of missing records and missing or invalid codes) could influence trends, particularly if there were biases in such records compared with the rest. In the extra technical material on bmj.com, table 1.1 shows that the levels of incompleteness for indicators based on hospital episode statistics are too small to affect England indicator values and do not vary much between years. Within each year, however, completeness varies by strategic health authority, requiring caution in interpreting comparative strategic health authority data. The accuracy of seemingly valid diagnostic codes has been a source of concern.15 There are now local routine audits of the quality of clinical coding (personal communication, NHS Information Authority) but no national reporting system, which remains a serious gap.
, 百拇医药
National aggregate values may mask variation in component parts that could be important for productivity assessment. Table 2.2 in the extra technical material on bmj.com shows that there are age and sex specific variations in hospital case fatality and varying time trends. For example, there is convergence between sexes in the 0-5 year old age group over the five years but persistent sex differences in the 60-64 year age group. There are falling trends in deaths in the 75-79 year group but not in the 45-49 year group.
, 百拇医药
The potential for competition for resources between types of care and conditions needs to be acknowledged at national and local level, because the "best" achieved in one locality for one indicator may have been at the expense of poorer performance in other aspects of health care, reflecting local priorities.
Most service based indicators are incomplete, as data from the independent healthcare sector are missing.
Geographical monitoring is useful, as data can then be interpreted in the context of the strategic roles of strategic health authorities, the commissioning roles of primary care organisations, and local demographic and socioeconomic conditions. Local conditions may explain (although should not justify) poorer outcomes if there are known effective interventions. We found variations (some statistically significant) both between and within the area groups created by the Office for National Statistics for grouping healthcare organisations that are most similar in terms of a range of demographic and socioeconomic conditions. Significant variation within these groups probably reflects influences other than demography and socioeconomic conditions, such as quality of health care.
, http://www.100md.com
Summary points
More rigorous analysis of existing routine clinical data would allow assessment of NHS performance across a wide range of services
Examples of such performance indicators include mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes
Various assumptions and technical issues need discussion and debate—that is, the selection of indicators and targets, methods, interpretation of data, and application in productivity measurement
, 百拇医药
Application in productivity assessment
Practical ways need to be found to incorporate multiple indicators in productivity assessment: they may overlap or interact; some may be more important or relevant than others and may need to be weighted; some may reflect mismatching performance for a given time (see the above discussion on stroke). Techniques for dealing with these issues, such as weighted scores, profiles, etc, are beyond the scope of our two articles but need consideration.
, http://www.100md.com
High levels of activity in treatment, rehabilitation, and long term care may show desirable high productivity and improvement, but would be considered an undesirable or negative output of preventive activity for a preventable condition such as stroke.
A cross sectional approach does not take account of sequentially linked events over time, such as patients with myocardial infarction having further infarcts in due course.
Reality is even more complex than the approach taken here, and this should be acknowledged explicitly in any output to avoid sweeping simplistic generalisations during interpretation.
, http://www.100md.com
Conclusions
We have shown the feasibility of a variety of ways of measuring health related outputs and outcomes. Data from initiatives such as the new mental health minimum dataset and the new general practice contract should lead to better measurement. Any assessment of productivity requires careful matching of outcomes to the inputs used to achieve them, and this brings in a separate set of issues and assumptions that are beyond the scope of our articles.
, http://www.100md.com
Extra technical details of the methods described appear on bmj.com
AL's contributions to the study were made within his role at the Oxford branch of the National Centre for Health Outcomes Development, based at Oxford University, Headington, Oxford.
Contributors: AL conceived of the study, drafted the article, and produced the hospital episode statistics based indicators. JC helped draft the article and produced the mental health indicators. DE helped draft the article and produced the population mortality indicators. CSp analysed hospital episode statistics data. CSa helped draft the article and produced the forecastingmodels. Lee Mellers helped analyse the hospital episode statistics data. Bernard Rachet contributed information on the forecasting models (cancer survival). David Rudrum provided editorial support. AL is guarantor for the study.
, 百拇医药
Competing interests: All authors are involved in the work of the National Centre for Health Outcomes Development, either directly or via subcontracts. The centre is funded by the Department of Health and commissioned by it and the Healthcare Commission to develop and produce clinical and health indicators for them and the NHS. The views expressed here are those of the authors and not necessarily of the commissioners.
Ethical approval: Not needed.
, 百拇医药
References
Smith R. Is the NHS getting better or worse [editorial] BMJ 2003;327: 1239-41.
Department of Health. Chief executive's report to the NHS. London: DoH, 2004.
Atkinson A. Atkinson review: interim report-measurement of government output and productivity for the national accounts. London: Stationery Office, 2004.
Rudd A, Goldacre M, Amess M, Fletcher J, Wilkinson E, Mason A, et al, eds. Health outcome indicators: stroke. Report of a working group to the Department of Health. Oxford: National Centre for Health Outcomes Development, 1999.
, http://www.100md.com
Lakhani A. Assessment of clinical and health outcomes within the National Health Service in England. In: Leadbeter D, ed. Harnessing official statistics. Abingdon: Radcliffe Medical Press, 2000.
Lakhani A, Coles J, Eayres D, Spence C, Rachet B. Creative use of existing clinical and health outcomes data to assess NHS performance in England: Part 1—performance indicators closely linked to clinical care. BMJ 2005;330: 1426-31.
Lakhani A, Olearnik H, Eayres D, eds. Compendium of clinical and health indicators. London: Department of Health, National Centre For Health Outcomes Development, 2003.
, 百拇医药
Department of Health. National service framework for mental health: modern standards and service models. London: DoH, 1999.
Keppel KG, Pearcy JN, Klein RJ. Measuring progress in healthy people 2010. Statistical notes No 25. Hyattsville, MD: National Center for Health Statistics, 2004.
Office for National Statistics. National statistics 2001 area classification for health areas. www.statistics.gov.uk/about/methodology_by_theme/area_classification/ha/default.asp (accessed 1 Mar 2005).
, 百拇医药
Charlton JRH, Bauer R, Lakhani A. Outcome measures for district and regional health care planners. Commun Med 1984;6: 306-15.
Nolte E, McKee M. Does healthcare save lives—Avoidable mortality revisited. London: Nuffield Trust, 2004.
Brenner H, Gefeller O. An alternative approach to monitoring cancer patient survival. Cancer 1996;78: 2004-10.
Davies R, Normand C, Raftery J, Roderick P, Sanderson C. Policy analysis for coronary heart disease: a simulation model of interventions, costs and outcomes—Report to the Department of Health, July 2003 (revised April 2004). London: London School of Hygiene and Tropical Medicine, 2004.
Dixon J, Sanderson C, Elliott P, Walls P, Jones J, Petticrew M. Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals. J Public Health Med 1998;20: 63-9., http://www.100md.com(Azim Lakhani, James Coles)
In the second of their two articles about using existing routine data to assess performance in the NHS, the authors make practical suggestions about using data for mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes, and raise issues about assumptions and technical aspects for discussion
, http://www.100md.com
Introduction
There have been recent calls for better data on NHS outputs and outcomes in England.1-4 However, this will require new data collection that could take several years. In the meantime, creative and informed use of existing data, with clear admission of the known shortcomings, may give some indication of how outcomes are changing.5 The main challenges are to measure health validly and to judge how much any improvements are due to NHS interventions.
, 百拇医药
In this, the second of our two articles, we explore some of the technical issues involved and make practical illustrative suggestions about how best to use existing data regarding mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes.6 Many other quality indicators could be produced along similar lines.7
Suggestions for indicators illustrating a range of methodological issues
Patterns of mental health care
, 百拇医药
Monitoring the quality of mental health care is problematic. Case fatality rates are low, so indicators based on numbers of deaths are inadequate. Much of mental health care occurs outside hospital without direct data on activity or outcomes. Also, there are few explicit standards. The challenge is to find a way of using data from hospitals to make some inferences about both hospital and community care and to identify aspects of care that are materially below optimum.
, 百拇医药
The national service framework for mental health highlights the preference for community over hospital care.8 With this policy being expected to produce better outcomes, assessing the way the service for mental health is delivered may be used as a proxy for quality of care. Mentally ill patients vary in numbers of readmissions to hospital and cumulative lengths of stay, and these may reflect variations in the quality and availability of care and support in the community. Too many readmissions and long cumulative lengths of stay may reflect inadequate community care, but too few readmissions and short cumulative lengths of stay may reflect inadequate provision of necessary hospital care.
, http://www.100md.com
A combination of the total number of admissions and the total time spent in hospital by patients during a year could be used to assess this. There may well be a trade-off between time spent in hospital and frequency of admission, and it is therefore important to consider the balance between the two variables as well as the individual measures. Studies of observed variation between populations could be used to derive target ranges for acceptable patterns of care.
, http://www.100md.com To test this approach, we followed individual patients aged 17-64 admitted to hospital in April of each year in mental health specialties throughout the financial year, using continuous inpatient spells and the linkage methods described in our first article.6 We calculated the total length of inpatient stay per person during the year and the total number of admissions per person during the year. Readmissions could be to any NHS hospital in England and for any condition, such as injury, not just mental illness. We attributed the values to the primary care organisation that covered the patient's place of residence at the time of first admission.
, 百拇医药
We found substantial variation between primary care organisations for each of the two variables. To give each component equal weight, we transformed them into z scores (by measuring distance from their reference points and dividing by the standard deviations of the individual distributions). Conventionally, the mean of a distribution is the reference point for z scores. However, the mean is not necessarily a suitable target for a performance score because it is partly a reflection of "poor performance" at the tail end of the distribution.9 We used modified z scores in which the reference points were 1.65 admissions per person and 45.0 days total stay in hospital during the year. We chose these points as a realistic joint target because they had been achieved by a strategic health authority in 2000-1 (the year selected as standard). This "best achieved combination" of a low rate of admissions per person and a low total stay per person, giving each the same level of importance, equates to the lowest composite z score at the level of strategic health authority (see Year 3 in table 1), although at the level of primary care organisation, some trusts achieved even better z scores. We used these reference points to calculate modified z scores across all five years.
, http://www.100md.com
The z scores measure how far each primary care organisation is from the defined optimum for total admissions and for total stay. We then produced a composite score for each organisation by adding its two z scores, equating to a summary of that organisation's mix of experience on two fronts. A high composite score would be considered undesirable as it would indicate more hospitalisation than expected. Weighting the z scores before adding them would be a refinement if there was particular concern about the relative importance of the two variables.
, http://www.100md.com
A possible cause of variation between primary care organisations might be differences in the prevalence of illness and in the level of support provided in the community, both by professional bodies and informal networks such as families. A full assessment of the effect of such factors is beyond the scope of this article, but we examined whether variation might be reduced when looking at similar geographical areas. For example, the populations of big cities can be more transient, have a greater incidence of some mental illnesses, and have fewer informal networks. The Office for National Statistics has used cluster analyses to create an area classification for grouping primary care organisations that are most similar in terms of 42 demographic, socioeconomic, housing, and other Census 2001 variables.10 Figure 2 shows the composite modified z score for each primary care organisation grouped within its area group. Although it shows some differences between groups (a slowly increasing z score across the chart), there is much greater residual variation within each group, suggesting that influences other than demography and socioeconomic conditions are largely responsible for the variation, such as service availability and clinical practice.
, 百拇医药
A measure of wider population mortality attributable to health care
Attempts to assess the contribution of health services to the entire population (not just those using health services) have relied on population based indicators of potentially avoidable mortality. Causes of death are included if there is evidence that they are amenable to healthcare interventions and—given timely, appropriate, and high quality care—death rates should be low among the age groups specified.11 Healthcare intervention includes preventing disease onset as well as treating existing disease.
, 百拇医药
Two such indicators based on potentially avoidable mortality are published annually for the NHS in the Compendium of Clinical and Health Indicators.7 Nolte and McKee reviewed the use of this concept and proposed an updated list of conditions and age bands for international comparisons, based on more recent evidence of amenability to healthcare interventions.12 We have used their list but have added asthma at ages 0-44 years, which they excluded because of lack of comparability in international studies.
, 百拇医药
In England 138 346 such deaths occurred in people aged less than 75 during 2001 and 2002, of which 48% were from ischaemic heart disease, 16% from cerebrovascular disease, 9% from colorectal cancer, 9% from female breast cancer, and 6% from pneumonia.
Forecasting future outcomes attributable to current investments
The observation that today's survival and death rates are at least partly a reflection of the quality of earlier health care applies particularly to primary and secondary prevention of conditions such as heart disease, stroke, diabetes, some cancers, and diseases of childhood. The converse of this is that many of the benefits to health from improved care today will not be seen for many years. One of the implications of this is that a comprehensive assessment of the quality of a healthcare system should include formal forecasts of the longer term effects of recent changes in provision and activity.
, 百拇医药
Several mathematical models have been and are being developed for doing this. For example, long term relative survival can be predicted for patients with recently diagnosed cancer.13 Another example is a microsimulation model that provides estimates of the annual benefits and costs over the middle and longer term (up to 20 years) of different patterns of healthcare provision and use for coronary heart disease.14 In terms of primary prevention, the model allows exploration of the population effect of improvements in the control of blood pressure and cholesterol and of changes in rates of cigarette smoking. In terms of treatment, it can explore the effects of changing ambulance response times, thrombolysis, and revascularisation rates. The model can produce, for example, estimates of the likely impact of meeting national service framework activity targets for coronary heart disease. Tables 2 and 3 show some illustrative examples of this, extracted from a report on this developmental work to the Department of Health,14 to demonstrate how simulation could be used to inform policy, subject to various assumptions and constraints.14 Currently, the model is being extended to other clinical conditions such as stroke and diabetes. Such models could be used to show the likely impact of new investments in prevention, incremental shifts from treatment to prevention, or alternative mixes of interventions.
, 百拇医药
Models of this kind are inevitably very demanding of data and assumptions, and there may be a trade-off between rigour and transparency. Their requirements include estimates of baseline levels of risk factors, disease prevalence, and healthcare use; estimates of trends over the forecasting period in exogenous factors (those not determined within the model); and, for cost effectiveness analyses, estimates of how treatment costs vary as levels of activity change. As well as modelling relationships between risk factors and outcomes, they have to be able to deal with combinations of changes in risk factors (such as reducing blood pressure and cholesterol concentrations) and interactions between risk factors (such as the effect of stopping smoking on blood pressure). Management of heart disease is one of the best researched aspects of health care, and, as well as the scientific literature, this model is based on new analyses of data from the health survey for England, the Framingham cohort study, and the British heart survey.14 However, different studies define variables in different ways, and substantial gaps in the literature remain, such as the effects of stopping treatment. Also, such models need maintenance, with new research findings needing to be incorporated regularly.
, 百拇医药
For discussion and debate
Our two articles are confined to health outcome measures and their proxies. There are, however, many other types of outputs that could be included in assessments of productivity. The following issues and assumptions require further discussion.
Selection of indicators and targets
Attribution of changes in health status to healthcare activity would normally require experimentation such as randomised controlled trials. Since this is not feasible as part of routine delivery of health services, judgment must be used, based on three criteria:
, 百拇医药
Research evidence or consensus (expressed in policies) suggest that health services (including public health, health partnerships, health advocacy) can have a significant influence on the outcome being measured
Variation between organisations in current performance suggests scope for improvement, with the best showing what is realistically achievable given optimum circumstances9
Variation between organisations in changes in performance over time suggests scope for improvement, with the greatest improvement showing what is realistically achievable given optimum circumstances.
, http://www.100md.com
The first criterion is essential for selection of indicators. The other two may not be, as the services may already be performing at an optimal level. Even if all three criteria are met, the outcomes may still reflect interventions not attributable to health services.
Aspects such as quality of life may be of more concern to patients than clinical measures, and therefore more appropriate as measures of outcome, albeit with greater problems of attribution. Absence of routine data on health related quality of life is a serious gap in our knowledge.
, 百拇医药
For annual cross sectional monitoring, it is important to select indicators that reflect short term impact unless long term impact is clear or can be forecast. Indicators such as incidence of stroke and deaths may reflect the cumulative effect of several natural events and interventions or resource use in the past. Some of these, such as prevalence of obesity and high blood pressure, may also act as proxies for future adverse outcomes and may therefore have a dual role in annual cross sectional monitoring.
, http://www.100md.com
Where there is clear evidence of the relation between intervention and health, such evidence may be used to create explicit standards for performance audit, and measures of the level of intervention may be used as proxies for future outputs or health outcomes.
Methodology
Ideally, numerators and denominators should match—for example, case fatality rates for stroke should be based on all deaths among all patients with stroke, including those not admitted to hospital and who may have either mild disease with lower case fatality or severe disease and death before admission. This is not always possible, and the limitations of what is feasible must be acknowledged. Some indicators measure what happens to known patients, with a risk that those needing care but not receiving it, possibly with poorer outcomes, are excluded.
, http://www.100md.com
Any measure of geographical variation or time trends needs to ensure comparability of numerator and denominator data. This may require adjustment of indicators for differences in age, sex, case mix (mix or severity of conditions), etc. A major constraint with existing routinely collected national data is the lack of grouping systems for case mix that are based on prognosis. Grouping systems, such as healthcare resource groups, were designed to create subgroups for comparison that are homogeneous with respect to resource use but not necessarily outcomes. Standardisation also raises questions about what adjustment is legitimate. People in deprived populations might have relatively poor outcomes because of relatively intractable health problems or because of substandard care, or both. Standardisation is undiscriminating and would "protect" the providers against both kinds of effects. Likewise, where there is sex variation, there is a choice between using sex standardised person rates or sex specific rates.
, 百拇医药
When standardising rates for age (and other variables) the choice of method (direct or indirect) and of the standard population used may affect the results, particularly when comparing sub-national rates. We tested this for hospital case fatality, calculating trends at the England level using both direct and indirect methods and using various years as the standard, and found little difference (table 2, part 16). However, this should be monitored in any new approach to measuring performance. For the correct analysis of trends, data for all years should be adjusted with the same standard and time period.
, http://www.100md.com
The stability of the indicator needs to be taken into account. For example, data on strategic health authorities are less prone to yearly fluctuations in rankings than data on primary care organisations because of their larger populations.
Interpretation of data
Variation in data quality (in levels of missing records and missing or invalid codes) could influence trends, particularly if there were biases in such records compared with the rest. In the extra technical material on bmj.com, table 1.1 shows that the levels of incompleteness for indicators based on hospital episode statistics are too small to affect England indicator values and do not vary much between years. Within each year, however, completeness varies by strategic health authority, requiring caution in interpreting comparative strategic health authority data. The accuracy of seemingly valid diagnostic codes has been a source of concern.15 There are now local routine audits of the quality of clinical coding (personal communication, NHS Information Authority) but no national reporting system, which remains a serious gap.
, 百拇医药
National aggregate values may mask variation in component parts that could be important for productivity assessment. Table 2.2 in the extra technical material on bmj.com shows that there are age and sex specific variations in hospital case fatality and varying time trends. For example, there is convergence between sexes in the 0-5 year old age group over the five years but persistent sex differences in the 60-64 year age group. There are falling trends in deaths in the 75-79 year group but not in the 45-49 year group.
, 百拇医药
The potential for competition for resources between types of care and conditions needs to be acknowledged at national and local level, because the "best" achieved in one locality for one indicator may have been at the expense of poorer performance in other aspects of health care, reflecting local priorities.
Most service based indicators are incomplete, as data from the independent healthcare sector are missing.
Geographical monitoring is useful, as data can then be interpreted in the context of the strategic roles of strategic health authorities, the commissioning roles of primary care organisations, and local demographic and socioeconomic conditions. Local conditions may explain (although should not justify) poorer outcomes if there are known effective interventions. We found variations (some statistically significant) both between and within the area groups created by the Office for National Statistics for grouping healthcare organisations that are most similar in terms of a range of demographic and socioeconomic conditions. Significant variation within these groups probably reflects influences other than demography and socioeconomic conditions, such as quality of health care.
, http://www.100md.com
Summary points
More rigorous analysis of existing routine clinical data would allow assessment of NHS performance across a wide range of services
Examples of such performance indicators include mental health care, potentially avoidable deaths, and forecasting coronary heart disease outcomes
Various assumptions and technical issues need discussion and debate—that is, the selection of indicators and targets, methods, interpretation of data, and application in productivity measurement
, 百拇医药
Application in productivity assessment
Practical ways need to be found to incorporate multiple indicators in productivity assessment: they may overlap or interact; some may be more important or relevant than others and may need to be weighted; some may reflect mismatching performance for a given time (see the above discussion on stroke). Techniques for dealing with these issues, such as weighted scores, profiles, etc, are beyond the scope of our two articles but need consideration.
, http://www.100md.com
High levels of activity in treatment, rehabilitation, and long term care may show desirable high productivity and improvement, but would be considered an undesirable or negative output of preventive activity for a preventable condition such as stroke.
A cross sectional approach does not take account of sequentially linked events over time, such as patients with myocardial infarction having further infarcts in due course.
Reality is even more complex than the approach taken here, and this should be acknowledged explicitly in any output to avoid sweeping simplistic generalisations during interpretation.
, http://www.100md.com
Conclusions
We have shown the feasibility of a variety of ways of measuring health related outputs and outcomes. Data from initiatives such as the new mental health minimum dataset and the new general practice contract should lead to better measurement. Any assessment of productivity requires careful matching of outcomes to the inputs used to achieve them, and this brings in a separate set of issues and assumptions that are beyond the scope of our articles.
, http://www.100md.com
Extra technical details of the methods described appear on bmj.com
AL's contributions to the study were made within his role at the Oxford branch of the National Centre for Health Outcomes Development, based at Oxford University, Headington, Oxford.
Contributors: AL conceived of the study, drafted the article, and produced the hospital episode statistics based indicators. JC helped draft the article and produced the mental health indicators. DE helped draft the article and produced the population mortality indicators. CSp analysed hospital episode statistics data. CSa helped draft the article and produced the forecastingmodels. Lee Mellers helped analyse the hospital episode statistics data. Bernard Rachet contributed information on the forecasting models (cancer survival). David Rudrum provided editorial support. AL is guarantor for the study.
, 百拇医药
Competing interests: All authors are involved in the work of the National Centre for Health Outcomes Development, either directly or via subcontracts. The centre is funded by the Department of Health and commissioned by it and the Healthcare Commission to develop and produce clinical and health indicators for them and the NHS. The views expressed here are those of the authors and not necessarily of the commissioners.
Ethical approval: Not needed.
, 百拇医药
References
Smith R. Is the NHS getting better or worse [editorial] BMJ 2003;327: 1239-41.
Department of Health. Chief executive's report to the NHS. London: DoH, 2004.
Atkinson A. Atkinson review: interim report-measurement of government output and productivity for the national accounts. London: Stationery Office, 2004.
Rudd A, Goldacre M, Amess M, Fletcher J, Wilkinson E, Mason A, et al, eds. Health outcome indicators: stroke. Report of a working group to the Department of Health. Oxford: National Centre for Health Outcomes Development, 1999.
, http://www.100md.com
Lakhani A. Assessment of clinical and health outcomes within the National Health Service in England. In: Leadbeter D, ed. Harnessing official statistics. Abingdon: Radcliffe Medical Press, 2000.
Lakhani A, Coles J, Eayres D, Spence C, Rachet B. Creative use of existing clinical and health outcomes data to assess NHS performance in England: Part 1—performance indicators closely linked to clinical care. BMJ 2005;330: 1426-31.
Lakhani A, Olearnik H, Eayres D, eds. Compendium of clinical and health indicators. London: Department of Health, National Centre For Health Outcomes Development, 2003.
, 百拇医药
Department of Health. National service framework for mental health: modern standards and service models. London: DoH, 1999.
Keppel KG, Pearcy JN, Klein RJ. Measuring progress in healthy people 2010. Statistical notes No 25. Hyattsville, MD: National Center for Health Statistics, 2004.
Office for National Statistics. National statistics 2001 area classification for health areas. www.statistics.gov.uk/about/methodology_by_theme/area_classification/ha/default.asp (accessed 1 Mar 2005).
, 百拇医药
Charlton JRH, Bauer R, Lakhani A. Outcome measures for district and regional health care planners. Commun Med 1984;6: 306-15.
Nolte E, McKee M. Does healthcare save lives—Avoidable mortality revisited. London: Nuffield Trust, 2004.
Brenner H, Gefeller O. An alternative approach to monitoring cancer patient survival. Cancer 1996;78: 2004-10.
Davies R, Normand C, Raftery J, Roderick P, Sanderson C. Policy analysis for coronary heart disease: a simulation model of interventions, costs and outcomes—Report to the Department of Health, July 2003 (revised April 2004). London: London School of Hygiene and Tropical Medicine, 2004.
Dixon J, Sanderson C, Elliott P, Walls P, Jones J, Petticrew M. Assessment of the reproducibility of clinical coding in routinely collected hospital activity data: a study in two hospitals. J Public Health Med 1998;20: 63-9., http://www.100md.com(Azim Lakhani, James Coles)