当前位置: 首页 > 期刊 > 《英国医生杂志》 > 2005年第17期 > 正文
编号:11384452
Was Rodney Ledward a statistical outlier? Retrospective analysis using
http://www.100md.com 《英国医生杂志》
     1 Inter-Authority Comparisons and Consultancy, Health Services Management Centre, University of Birmingham, Birmingham B15 2RT, 2 Department of Public Health and Epidemiology, University of Birmingham, Birmingham B15 2TT, 3 Department of Primary Care and General Practice, University of Birmingham

    Correspondence to: M Harley M.J.Harley@bham.ac.uk

    Objectives To investigate whether routinely collected data from hospital episode statistics could be used to identify the gynaecologist Rodney Ledward, who was suspended in 1966 and was the subject of the Ritchie inquiry into quality and practice within the NHS.

    Design A mixed scanning approach was used to identify seven variables from hospital episode statistics that were likely to be associated with potentially poor performance. A blinded multivariate analysis was undertaken to determine the distance (known as the Mahalanobis distance) in the seven indicator multidimensional space that each consultant was from the average consultant in each year. The change in Mahalanobis distance over time was also investigated by using a mixed effects model.

    Setting NHS hospital trusts in two English regions, in the five years from 1991-2 to 1995-6.

    Population Gynaecology consultants (n = 143) and their hospital episode statistics data.

    Main outcome measure Whether Ledward was a statistical outlier at the 95% level.

    Results The proportion of consultants who were outliers in any one year (at the 95% significance level) ranged from 9% to 20%. Ledward appeared as an outlier in three of the five years. Our mixed effects (multi-year) model identified nine high outlier consultants, including Ledward.

    Conclusion It was possible to identify Ledward as an outlier by using hospital episode statistics data. Although our method found other outlier consultants, we strongly caution that these outliers should not be overinterpreted as indicative of "poor" performance. Instead, a scientific search for a credible explanation should be undertaken, but this was outside the remit of our study. The set of indicators used means that cancer specialists, for example, are likely to have high values for several indicators, and the approach needs to be refined to deal with case mix variation. Even after allowing for that, the interpretation of outlier status is still as yet unclear. Further prospective evaluation of our method is warranted, but our overall approach may be potentially useful in other settings, especially where performance entails several indicator variables.

    The Ritchie report was based on one of the most detailed inquiries yet undertaken into the clinical practice of an individual gynaecologist, Rodney Ledward.1 It focused on the clinical work of Ledward in the NHS and the private sector and examined allegations about failings in his practice. The criticisms made, and subsequently substantiated, against Ledward included lack of care and judgment preoperatively, failings in surgical skills, inappropriate delegation to junior staff, and poor postoperative care and judgment.

    In common with many other external and internal inquiries, little use was made of comparative data regarding the performance of individual consultants or surgical teams. For over 20 years, routine data sources such as the hospital episode statistics have been widely perceived as being of little value because of problems with completeness and accuracy, and it has been assumed that the type of information required to identify poor performance would necessitate a new data collection system. The Department of Health proposed the introduction of a "near miss" reporting system and dismissed the use of hospital episode statistics for identifying poor clinical quality, observing that historically, the uses of these data have concentrated on recording and assessing activity levels and on performance, including technical efficiency.2 Much is of variable quality and equally variable relevance to the quality and outcomes of the care that the NHS provides.2

    Despite these concerns, hospital episode statistics data were used in the Bristol inquiry,3 albeit not to study the work of individual surgeons or teams. The conclusion of the subsequent Kennedy report regarding hospital episode statistics was unequivocal; hospital episode statistics "was not recognised as a valuable tool for analysing the performance of hospitals. It is now, belatedly." This paper explores this theme, by comparing the performance of 142 gynaecology consultants with the performance of Ledward over a period of five years, to determine if Ledward was a statistical outlier according to hospital episode statistics data.

    Methods

    Disaster theory4 5 proposes that poor performance in an organisation usually manifests itself in several ways. Applying a mixed scanning approach,6 7 we sought to identify several measurable characteristics that might imply consistent failure in performance. Using the review of the Ritchie report, other reports of alleged malpractice, a general review of literature on performance failures, and discussions with a practising gynaecologist, we compiled a provisional list of 11 variables that could be indicative of poor performance and could be derived from hospital episode statistics. We refined this list by eliminating any variables that had high inter-correlations (for example, a multiple correlation coefficient R2 of over 0.2) with another variable. The selection of which indicator to retain was based on face validity. Furthermore we did not use mortality as one of our indicator variables because death in gynaecology is a rare event, and we were scanning for overall poor quality of care. We produced a list of seven indicator variables (table 1), largely on the grounds that they were clinically relevant (face validity) and seemed to have some directional properties, in that high values were in general likely to indicate poor performance. Nevertheless, we emphasise that, for each indicator, valid reasons may exist that could credibly explain performance occurring in the high end of that indicator distribution. However, what is considered much less likely is that the same team would display extreme performance across a basket of indicators. A team in this context refers to a single consultant and the junior doctors who deal with his or her patients.

    Table 1 Seven clinically relevant indicator variables from hospital episode statistics

    We obtained complications by scanning all seven diagnostic fields of hospital episode statistics for International Classification of Diseases, 9th edition (ICD-9) codes 996-999 and ICD-10 codes T80-T88: "Complications of surgical and medical care not elsewhere classified."

    We then calculated each indicator for each of the years from 1991-2 to 1995-6 for Ledward, his three colleagues in the same hospital, and all the gynaecologists in one other region, the West Midlands. The West Midlands data contained only anonymised consultant codes. At the time of our study, reliable data were not readily available for the whole of the region in which Ledward practised, so we were able to use the data only for Ledward's own hospital.

    We undertook a retrospective desktop statistical analysis to determine whether Ledward could be identified as a statistical outlier. We assigned a study code to all consultants. Throughout the analysis, the analysts (SH and MAM) were blinded to the code of Ledward. The analysis proceeded in three stages.

    Stage 1

    Exploratory data analysis—In all, 143 consultants (coded 1-143) were in our data set, of whom 68 appeared in all five years. Table 22 shows the number of consultants in each year and the numbers excluded because of any missing data item. According to Little's D2 statistic for missing data in multivariate data sets,8 the pattern of missing data was consistent with data missing at random (P < 0.0005).

    Table 2 Numbers of consultants who were outliers at the 95% cut-off each year

    Stage 2

    We carried out a multivariate analysis to detect outliers, based on the computation of a robust Mahalanobis distance9 for each consultant in each year. The statistical details are provided in the appendix on bmj.com. For each year we computed, from the variable space of the seven indicators, a Mahalanobis distance for each consultant. The Mahalanobis distance is in essence a measure of the "distance" between the origin in the seven indicator variable space and a given data point. So a consultant with average values for each variable will have a Mahalanobis distance of zero, and this represents the origin. Consultants who are furthest away from the origin will have relatively larger distances. For each Mahalanobis distance we also derived an approximate 95% confidence interval, using computer simulation techniques. We randomly simulated each variable, for each consultant, 1000 times from an underlying binomial or normal distribution (the parameters of which were based on the observed data and the sample size). We used this simulated data set to derive 1000 simulated Mahalanobis distances for each consultant, which in turn were used to determine the approximate 95% confidence intervals for each consultant's distance.

    The square root of the Mahalanobis distance (MD) is known to follow approximately a 2 distribution with k degrees of freedom (k being equal to the number of indicator variables, seven in our case),9 and so we used the mean of the 2, which is given by the k degrees of freedom (7 = 2.66) to define outliers.9 Consultants with 95% intervals above the 2.66 threshold were deemed to be outliers. We report the number of outlier consultants for each year.

    Stage 3

    We also investigated the change in MD over the five years, using hierarchical analyses for repeated measurements. We constructed a two level hierarchical model, with consultant at level 1 (highest level) and their respective Mahalanobis distances at level 2 (lowest level). We used the standardised residual output from this model (see figure 2) to identify outliers beyond 2 standard deviations.

    Fig 2 Fitted values versus the standardised residuals from statistical model. Consultants with standardised residuals outside the ±2 standardised residuals envelope are deemed as outliers. Ledward is the larger filled circle

    We used S-PLUS, version 6.1 (Insightful Corporation, Seattle, USA), with the Robust Library, version 1 (Beta II),10 and MLwiN, version 2.1c (University of London, London), for our analyses.

    Results

    Figure 1 shows the robust MD for each consultant for each year, and table 2 summarises the number of outlier consultants.

    Fig 1 Plots showing the square root of the robust Malahanobis distance (on the loge scale to aid visualisation) for each consultant in each year (left to right: 1991-2, top left panel, and 1995-6, lower left panel). The horizontal line in each panel is the expected mean. Ledward is indicated by a filled circle. Vertical bars around each point are approximate, simulated, 95% intervals of uncertainty. Note that the ordering of the data in each panel is according to the y axis values, and so a given consultant will not necessarily appear on the same x axis value in each panel. This is illustrated by the filled circle for Ledward. To avoid confusion, we have therefore omitted the consultant codes from each plot

    We also constructed a model to investigate the variation in MD over time (see bmj.com for further details), which reached significance (P = 0.0043). Figure 2 shows standardised residuals from the model. From this figure, we identified nine high outlier consultants and three low outlier consultants.

    After these two analyses, MH revealed the consultant code and confirmed that Ledward was a statistical outlier (in three of the five years of figure 1 and in figure 2). Figure 3 shows the variable values for Ledward. Several other consultants were outliers. Two consultants were outliers in all five years, two consultants were outliers in four years, and seven consultants (including Ledward) were outliers in three years. Exploratory visual examination of the variable values for all these outlier consultants, also using figure 3 (results not shown) did not show any consultant as having consistently low values in all seven indicators.

    Fig 3 Histograms for the seven indicator variables, the total number of episodes per consultant, and the square root of the Mahalanobis distance for all years combined. Coloured boxes show the values for Ledward for each of the five years (1991-2 to 1995-6), respectively

    Discussion

    Department of Health. The report of the inquiry into quality and practice within the National Health Service arising from the actions of Rodney Ledward. (The Ritchie report.) London: Stationery Office, 2000.

    Department of Health. An organisation with a memory: report of an expert group on learning from adverse events in the NHS chaired by the chief medical officer. London: Stationery Office, 2000.

    Department of Health. Report of the public inquiry into children's heart surgery at the Bristol Royal Infirmary 1984-1995: Learning from Bristol. (The Kennedy report.) London: Stationery Office, 2001.

    Bignell V. Catastrophic failures. Oxford: Oxford University Press, 1977.

    Turner BA. The organisational and interorganisational developments of disasters. Admin Sci Q 1976;21: 378-97.

    Etzioni A. Mixed-scanning: a "third" approach to decision-making. Public Admin Rev;27: 385-92.

    Yates JM. The use of routinely collected information in the measurement of performance in the NHS. Birmingham: University of Birmingham, 1986.

    Little RJA. A test of missing completely at random for multivariate data with missing values. JAmStatAssoc 1988;83: 1198-202.

    Rousseeuw PJ, Leroy AM. Robust regression and outlier detection. New York: Wiley, 1987.

    Insightful Corporation. S-PLUS 6 robust library user's guide version 1.0. Seattle: Insightful Corporation, 2002.

    Lilford RJ, Mohammed MA, Spiegelhalter D, Thomson R. Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma. Lancet 2004;363: 1147-54.

    Campbell SE, Campbell MK, Grimshaw JG, Walker AE. A systematic review of discharge coding. J Public Health Med 2002;23: 205-11.

    Aylin P, Alves B, Best N, Cook A, Elliot P, Evans SJ, et al. Comparison of UK paediatric cardiac surgical performance by analysis of routinely collected data 1984-96: was Bristol an outlier? Lancet 2001;358: 181-7.

    Spiegelhalter D, Murray G, McPherson K, Macfarlane A, Evans S, Curnow R, et al. Monitoring clinical performance: a statistical perspective. Submission to the Bristol Inquiry, 2002.

    Jarman B, Gault S, Alves B, Hider A, Dolan S, Cook A, et al. Explaining differences in English hospital death rates using routinely collected data. BMJ 1999;318: 1515-20.

    Aylin P, Tanna S, Bottle A, Jarman B. Dr Foster's case notes: how often are adverse events reported in English hospital statistics? BMJ 2004;329: 369.

    Mohammed MA, Rathbone A, Myers P, Patel D, Onions H, Stevens A. An investigation into general practitioners associated with high patient mortality flagged up through the Shipman inquiry: retrospective analysis of routine data. BMJ 2004;328: 1474-7.

    Dunn PM. The Wisheart affair: paediatric cardiological services in Bristol, 1990-5. BMJ 1998;317: 1144-5.

    BBC News Online. Wisheart: callous or caring? http://news.bbc.co.uk/1/hi/health/1124755.stm (accessed July 2004)

    BBC News Online. The second surgeon: Janardan Dhasmana. http://news.bbc.co.uk/1/hi/health/1136419.stm (accessed July 2004).

    Mason S, Nicholl J, Lilford R. What to do about poor clinical performance in clinical trials. BMJ 2003;324: 419-20.(Mike Harley, director1, Mohammed A Moham)