Appropriateness of VATS and bedside thoracostomy talc pleurodesis as judged by a panel using the RAND/UCLA appropriateness method (RAM)(百拇医药)

Appropriateness of VATS and bedside thoracostomy talc pleurodesis as judged by a panel using the RAND/UCLA appropriateness method (RAM)

http://www.100md.com 《血管的通路杂志》

     a Thoracic Unit, Guy's Hospital, St Thomas' Street, London SE1 9RT, UK

    b Health Services Research Unit, London School of Hygiene and Tropical Medicine and Clinical Effectiveness Unit, Royal College of Surgeons of England, Keppel Street, London WC1E 7HT, UK

    c Clinical Operational Research Unit, UCL, London WC1E 6BT, UK

    d Royal Berkshire Hospital, Reading, Berkshire, RG1 5AN, UK

    e Department of Epidemiology, UCL, London WC1E 6BT, UK

    Abstract

    We sought formal consensus on the appropriateness of Video-assisted Thoracoscopic Surgery (VATS) talc pleurodesis and bedside thoracostomy talc slurry by use of a well established method – the RAND/UCLA appropriateness method (RAM). We recruited an expert panel of respiratory physicians, oncologists, and surgeons under the leadership of experts in health services research. The panellists were provided with evidence from a systematic review and then were taken through two rounds of opinion gathering, the first individually, the second as a group. The purpose is not to force consensus, but to find scenarios where there is agreement on the appropriateness or inappropriateness of a treatment and scenarios where there is disagreement. In scenarios where the diagnosis was proven and expectation of life beyond six months, pleurodesis was deemed appropriate. If there was no tissue diagnosis surgical VATS was preferred. The response to a trial aspiration played a major part in the recommendation for or against pleurodesis. The attitude to breathlessness was incongruous; it is the target of palliation yet some interpreted it as performance status and thus a contraindication. Although the RAM is well developed and in widespread use, we found it worryingly unreliable and to be used with caution.

    Key Words: Malignant effusion; Pleurodesis; Talc; Rand appropriateness method; Consensus method

    1. Introduction

    Malignant pleural effusion is common and its removal relieves dyspnoea, maintains quality of life, and can be achieved by simple thoracocentesis or chest tube drainage. When fluid re-accumulates, pleurodesis may prevent recurrent symptoms. Taking the Cochrane review of the evidence for pleurodesis in this context [1], and our own more extensive review of the plentiful literature [2,3], there is randomised controlled trial (RCT) evidence to support the following statements:

    Talc is the most effective available agent.

    VATS does not offer a significant advantage over bedside tube pleurodesis in achieving pleurodesis in RCTs.

    Duration of tube drainage is not a significant determinant of success.

    Rolling the patient after instillation of the agent confers no benefit.

    There is no evidence to support the use of larger chest tubes.

    The UK Thoracic Surgery Register reports about 1000 patients each year treated by VATS pleurodesis. We do not know how many cases are managed by bedside tube pleurodesis. Practice varies as to which method to use in which circumstances. RCT evidence is lacking so means of reaching explicit agreement, including ‘consensus methods’, have been developed to identify the collective opinions of experts to enable advice on best practice to be given. The nominal group technique and Delphi method, described by Jones [4], and the consensus development conference, are methods commonly employed in medical, nursing and health service research. The objective of the RAND/UCLA appropriateness method (RAM) [5] is to find areas of agreement on overuse and underuse (unmet need) of procedures. Coronary interventions, for example, have been the subject of both a consensus conference [6] and RAM [7]. In considering pleurodesis, the decision might be influenced by many factors as we strike a balance between considerations of technical success and suitability of a palliative procedure at the end of life. RAM helps incorporate numerous factors into a practical tool to aid decision making [8,9].

    Our objectives were:

    To use a consensus process to develop appropriateness criteria for pleurodesis in a range of indications.

    Identify agreement, disagreement and uncertainty.

    Identify relative indications for the VATS versus bedside methods.

    2. Method

    We composed a list of individual clinical attributes, which might be taken into account in deciding whether talc pleurodesis was appropriate or inappropriate in any given clinical presentation. These are set out in a matrix. The clinical presentations were divided according to:

    life expectancy (4 grades);

    severity of breathlessness (5 grades);

    symptomatic response to pleural aspiration (4 grades);

    radiological response to pleural aspiration (3 grades);

    presence or absence (2 grades) of pleural thickening on chest radiograph or CT scan.

    This yielded 480 (4x5x4x3x2) combinations. There were two procedures to consider – surgical or bedside talc pleurodesis – giving a total of 960 theoretical ‘indications’.

    We required an expert panel of 7–9 individuals to rate the indications on paper and to then meet to review their opinions in the light of discussion. We approached 24 clinicians from relevant specialties. The eventual panel comprised 3 respiratory physicians, 3 thoracic surgeons and 2 oncologists.

    Three documents were produced and mailed to each panellist for the first round of the process.

    the systematic review of the evidence for pleurodesis in malignant effusion

    a matrix containing the 960 permutations

    definitions of terms

    For the first round each panellist was asked to rate each indication on a nine point scale where 9 meant that treatment was very appropriate (defined as the expected benefit greatly outweighs the expected harm), 1 that treatment was very inappropriate (the harm outweighs any likely benefit) and 5 that benefit and harm were thought to be about equal or that the panellist was unable to make a judgement for the situation described. The score sheets were then returned by mail to the project coordinator (CT) for the first round of data entry and analysis.

    The eight panellists then met for a second round of rating and discussion, led by a moderator experienced in the method (HH). Each panellist was provided with his own ratings and the frequency of the responses amongst the panel. For each indication the median score was given, interpreted as appropriate (6.5–9), inappropriate (1–3) or uncertain (3.5–6). In addition, there was an assessment of agreement or a dispersion of views. It was agreed amongst the panellists to make minor changes to the list of indications where useful distinctions could not be made with the result that the list was shortened to consist of 600 indications (300 for each of the two procedures).

    The final appropriateness criteria were based on the median panel rating and level of agreement for each indication in the second round of rating using the following definitions:

    ‘Agreement’ – no more than 2 panellists rate the indication outside the 3-point region (1–3; 3.5–6; 6.5–9) containing the median.

    ‘Disagreement’ – at least three panellists rated the indication in the 1–3 region, and at least three rated it in the 6.5–9 region.

    ‘Indeterminate’ covers all other eventualities.

    3. Results

    Based on the above definitions, bedside slurry and VATS were deemed appropriate in 27/300 and 78/300 scenarios, respectively. There were 99/600 scenarios where appropriateness was uncertain. Beyond that it is not useful to use the 600 scenarios as a denominator because the combinations do not occur with equal frequency in clinical circumstances, and some are clinically extremely unlikely.

    There was a high rating of appropriateness in only 13/600 scenarios (scores 8.5 and 9) with good agreement. These were characterised by having

    an open ended prognosis;

    a favourable symptomatic response to trial aspiration;

    the lung was not trapped.

    The further results are displayed in a series of figures in which the appropriateness ratings have been grey scale coded in these three bands (appropriate, uncertain, inappropriate) for bedside method above and VATS below, to allow visual comparison to be made. They show the panel's rating depending on prognosis (Fig. 1), trapped lung (Fig. 2), dyspnoea score (Fig. 3), response to aspiration (Fig. 4) and pleural thickening (Fig. 5).

    To generate further insight into how the different attributes combined to affect the judgements of the panel, we constructed a multivariate model. The appropriateness ratings for VATS were transformed to a binary variable (1 or 0) for appropriate or not. Although the statistical considerations that underpin regression techniques clearly do not apply to the analysis of ratings data, logistic regression techniques were used as a starting point for the model building. An initial model was constructed and the factors were ranked by the potential contribution to the logistic model. Further model building was guided by the performance and characteristics of this initial model.

    Using the rankings of the first logistic model constructed, a classification tree was constructed to summarise the structure of the 300-row table of ratings for VATS. This approach is similar to that described by Ridley [10]. A logistic model based on prognosis, response, symptoms and trapped lung correctly reproduced the dichotomised judgement of ‘appropriate’ or ‘inappropriate or uncertain’ for 290 of the 300 (97%) indications. As the presence or absence of thickened pleura did not affect the final appropriateness rating for any of the ratings, the model building ignored this variable. The ten indications for which the dichotomised rating was not reproduced by the initial logistic model consisted of four unlikely indications where the hypothetical patient had trapped lung but also had good previous response (wrongly classified as appropriate for VATS using the model) and six indications where the patient had trapped lung but no tissue diagnosis (wrongly classified as ‘inappropriate or uncertain’ for VATS using the model).

    Ranking the factors considered in the model by the potential contribution to the logistic score produced gave a hierarchy:

    Prognosis

    Response

    Symptom

    Trapped lung

    Thickened pleura.

    Prognosis had the biggest impact on the judgements of the panel and thickened pleura the least. Applying this hierarchy gave the classification tree for the appropriateness of VATS shown in Fig. 6.

    On studying the performance of the initial logistic model, an ‘interaction’ variable was introduced to reflect that where the lung is trapped VATS is only appropriate in cases if no tissue diagnosis is currently available. A further model that incorporated this variable in addition to prognosis, response and symptoms correctly reproduced all of the dichotomised ratings for VATS.

    4. Discussion

    4.1. Prognosis

    The bedside method (which does not permit simultaneous biopsy) was regarded as inappropriate in the absence of tissue diagnosis. Where the prognosis is believed to be short (arbitrarily set at under three months), pleurodesis was infrequently considered appropriate and VATS was never appropriate. Pleurodesis was more likely to be deemed appropriate when the prognosis is thought to be longer and there was an equal preference for a VATS and a bedside technique. This outcome is somewhat surprising because three months is quite a long time in terminal palliative care of cancer patients. Survival gains for chemotherapy for example are measured in weeks.

    4.2. Trapped lung

    Where the lung is trapped (Fig. 2) the prospects of success are much less but VATS is indicated for diagnosis and may give an opportunity to free loculations. With trapped pleura a bedside technique was never judged appropriate. Another option is the use of a shunt but this was outside the study.

    4.3. Dyspnoea score

    The degree of breathlessness did not appreciably influence the appropriateness rating for bedside talc slurry pleurodesis (Fig. 3). This was a surprising result and reflected different perceptions of this symptom in decision making. Because the purpose of talc pleurodesis is to relieve breathlessness and is effective in doing so, surgeons argue for it being used in the more breathless patients. In this exercise, dyspnoea score was used by non-surgeons as an indicator of performance status, thus using the very same symptom as a contraindication. Breathlessness that will be relieved is not a contraindication. This divergence illustrates a shortcoming of the method being used as a one-off exercise. The problem of mixed objectives exists in decision making in clinical teams and might merit more critical analysis in future studies.

    4.4. Response to aspiration

    In the scenario where aspiration of fluid does not improve breathlessness or makes it worse, bedside pleurodesis technique was never declared appropriate (Fig. 4). This is in line with logic for if the aspiration gives no relief in breathlessness it will not be helped by pleurodesis. In some instances VATS was still indicated, reflecting the opportunity to make a tissue diagnosis or make a better technical job of drainage.

    4.5. Pleural thickening

    Pleural thickening did not affect the judgement (Fig. 5) as to whether to intervene or not. For example a patient gaining symptomatic improvement from aspiration and a good prognosis is still likely to benefit from a pleurodesis regardless of the thickness, providing the lung does not remain trapped.

    One interesting observation in using the RAM method was that the panel members preferred to recreate clinical descriptions of patients in their minds' eye and found it difficult to disaggregate the factors that would lead them to one decision rather than another. Seeing the nature of just one page of the exercise by which the clinicians were expected to adjudicate on appropriateness makes this unsurprising (Table 1). The fact that simplified ratings were successfully reproduced by a relatively simple logistic model with just 4 terms suggests a strong hierarchy in the factors that combined to determine the panel's judgement of appropriateness. The hierarchy indicated by the simple logistic model differs slightly from that used initially to present the indications to the panel and ranks the contributions of the individual factors in the order of prognosis, response, symptoms, trapped lung and thickened pleura.

    We give one final word of caution. An informed insider could see how the method had failed in one important respect: the interpretation of ‘breathlessness’ as an indication for an intervention and as a contraindication because of its simplistic use as an indicator of risk. Others have found incongruities in the conclusions reached [11,12].

    Acknowledgements

    We are grateful to colleagues Willie Fountain, Robert Cameron, Bob Davies and Bernie Foran.

    References

    Shaw P, Agarwal R. Pleurodesis for malignant pleural effusions. Cochrane Database Syst Rev 2004; CD002916.

    Tan C, Swift S, Sedrakyan A, Browne J, Treasure T. A systematic review of the management of malignant pleural effusion. Accepted EJCTS December 2005.

    Tan C. Pleurodesis for malignant effusion. In: Treasure T, Keogh B, Pagano D, Hunt I (eds). The evidence for cardiothoracic surgery 2005;Shrewsbury: tfm Publishing119–130. In:.

    Jones J, Hunter D. Consensus methods for medical and health services research. Br Med J 1995; 311:376–380.

    Fitch K, Bernstein SJ, Aguilar MD, Burnand B, LaCalle JR, Lazaro P, Van Het Loo M, McDonnell J, Vader JP, Kahan JP. The RAND/UCLA Appropriateness Method User's Manual. 2001;.

    Stocking B. First consensus development conference in United Kingdom: on coronary artery bypass grafting. I. Views of audience, panel, and speakers. Br Med J (Clin Res Ed) 1985; 291:713–716.

    Hemingway H, Crook AM, Feder G, Banerjee S, Dawson JR, Magee P, Philpott S, Sanders J, Wood A, Timmis AD. Underuse of coronary revascularization procedures in patients considered appropriate candidates for revascularization. N Engl J Med 2001; 344:645–654.

    Wietlisbach V, Vader JP, Porchet F, Costanza MC, Burnand B. Statistical approaches in the development of clinical practice guidelines from expert panels: the case of laminectomy in sciatica patients. Med Care 1999; 37:785–797.

    Stoevelaar HJ, McDonnell J, Stals H, Smets L. Gastro-protective treatment in patients using NSAIDs. Development of appropriateness criteria by a multidisciplinary expert panel. Scand J Rheumatol 2003; 32:162–167.

    Ridley S, Jones S, Shahani A, Brampton W, Nielsen M, Rowan K. Classification trees. A possible method for iso-resource grouping in intensive care. Anaesthesia 1998; 53:833–840.

    Raine R, Sanderson C, Hutchings A, Carter S, Larkin K, Black N. An experimental study of determinants of group judgments in clinical guideline development. Lancet 2004; 364:429–437.

    Raine R, Sanderson C, Black N. Developing clinical guidelines: a challenge to current methods. Br Med J 2005; 331:631–633.(Carol Tan, Tom Treasure, )

http://www.100md.com/html/DirDu/2007/01/23/35/45/02.htm