Assessment of methodological quality of primary studies by systematic
http://www.100md.com
《英国医生杂志》
1 Centro Cochrane Italiano, Istituto Mario Negri, Via Eritrea 62, 20157 Milan, Italy, 2 Università degli Studi di Modena e Reggio Emilia, Modena, Italy
Correspondence to: A Liberati, Università degli Studi di Modena e Reggio Emilia, Modena, Italy alesslib@tin.it
Objectives To describe how the methodological quality of primary studies is assessed in systematic reviews and whether the quality assessment is taken into account in the interpretation of results.
Data sources Cochrane systematic reviews and systematic reviews in paper based journals.
Study selection 965 systematic reviews (809 Cochrane reviews and 156 paper based reviews) published between 1995 and 2002.
Data synthesis The methodological quality of primary studies was assessed in 854 of the 965 systematic reviews (88.5%). This occurred more often in Cochrane reviews than in paper based reviews (93.9% v 60.3%, P < 0.0001). Overall, only 496 (51.4%) used the quality assessment in the analysis and interpretation of the results or in their discussion, with no significant differences between Cochrane reviews and paper based reviews (52% v 49%, P = 0.58). The tools and methods used for quality assessment varied widely.
Conclusions Cochrane reviews fared better than systematic reviews published in paper based journals in terms of assessment of methodological quality of primary studies, although they both largely failed to take it into account in the interpretation of results. Methods for assessment of methodological quality by systematic reviews are still in their infancy and there is substantial room for improvement.
Critical appraisal of the methodological quality of primary studies is an essential feature of systematic reviews.1-3 A recent review showed that lack of adherence to a priori defined validity criteria may help explain why primary studies on the same topic provide different results.4 Some key issues still remain unresolved: which checklists and scales are the ideal approaches5 and how the results of quality assessment in a systematic review should be handled in the analysis and interpretation of results.6-10
We compared the approaches used for quality assessment of primary studies by Cochrane systematic reviews with systematic reviews published in paper based journals. We determined how quality assessment is used and whether systematic reviews consider quality assessment in their results.
Methods
We sampled systematic reviews from two databases in the Cochrane Library: the Cochrane Database of Systematic Reviews, which includes reviews prepared by review groups of the Cochrane Collaboration; and the Database of s of Reviews of Effectiveness (DARE), which selects systematic reviews published in peer reviewed journals on the basis of their adherence to a few methodological requirements.11
We stratified all 1297 Cochrane systematic reviews published in issue 1, 2002, of the Cochrane Library by type of intervention (six levels: drugs; rehabilitation or psychosocial; prevention or screening; surgery or radiotherapy intervention; communication, organisational, or educational; other) and by Cochrane review group (50 levels). We used a computer generated randomisation scheme to select at least 50% of the systematic reviews in each cell. Our final sample represented 62.4% (n = 809) of the Cochrane reviews. The paper based systematic reviews were extracted from DARE, including all systematic reviews published in 2001 registered up to November 2002.
Data extraction form
We assessed the systematic reviews by using an ad hoc data extraction form. We developed this form by taking into account published reports on the quality assessment of trials included in systematic reviews.1 2 4-10 12-28
We did not aim at standardised operational definitions of the quality measures but accepted at face value what was reported by the authors of individual studies. As a common taxonomy for quality assessment does not exist, we used a large number of descriptive quality components to capture as many of the different definitions as possible.
For each systematic review we sought general information (title, authors, publication date, type of intervention, and presence of a meta-analysis). We then evaluated what authors reported in the methods section of their review for quality assessment. In particular, we tried to ascertain whether authors stated they would have assessed the quality and how (scale or checklist, components studied, composite score) and in what way they planned to use the quality assessment (for example, as exclusion criteria, for sensitivity analysis). See bmj.com for a summary version of the data extraction form.
We then evaluated how authors assessed quality. We recorded if trials were combined in a quantitative meta-analysis; if the quality was evaluated; if scales, checklists, and scores were used; and how the quality was formally incorporated. Assessors judged whether an attempt had been made to incorporate the quality assessment in the results, either qualitatively or quantitatively.
We purposely did not make our operationalised definition of qualitative too stringent. If authors made some comments or discussed the results with reference to the quality of trials then we considered this sufficient to classify the systematic review as having incorporated qualitatively the quality assessment.
Our definition of quantitative was more stringent and included the carrying out of a sensitivity or subgroup analysis (with quality as a stratifying factor) and use of a quality score as a weight or factor for cumulative meta-analysis or metaregression.
Data extraction
We developed a draft of an extraction checklist and piloted it on 40 randomly selected Cochrane reviews. The checklist was revised and an instruction manual prepared. The checklist was further tested on another random sample of 130 systematic reviews, and further refinements were incorporated. Inter-rater agreement, based on a random sample of 5% of the Cochrane reviews and paper based reviews (48 reviews), was high. Inter-rater reliability was moderate to perfect (percentage mean agreement 94, range 71.1-100; prevalence and bias adjusted statistic mean 0.80, range 0.40-1.00).
Twelve pairs of investigators independently extracted the data. Disagreements were resolved through discussion and, when necessary, centrally reviewed.
Statistical analysis
Owing to the frequent imbalance of marginals in our contingency matrix, we used a prevalence and bias adjusted statistic to assess inter-rater reliability.29 30 We report confidence intervals of differences between proportions and P values of 2 tests.
Results
We analysed 965 systematic reviews: 809 Cochrane reviews and 156 reviews published in paper based journals (figure). Quality assessment was assessed in 854 (88.5%) of the reviews and was more often carried out in Cochrane reviews than in paper based reviews (93.9% v 60.3%, P < 0.0001; table 1). The same was true when we compared the proportions of reviews using quality assessment in an informal fashion (90.5% v 51.9%; P < 0.0001; table 2). The formal approaches most used by both types of review were exclusion criteria (12%), sensitivity analysis (10%), exploration of heterogeneity (8%), and subgroup analysis (4%).
Flow of systematic reviews through trial
Table 1 Distribution of three main quality assessment related variables investigated in study according to Cochrane systematic reviews and systematic reviews published in paper based journals. Values are numbers (percentages) unless stated otherwise
Table 2 Summary of approaches to quality assessment and formal quantitative analyses related to quality assessment used in Cochrane systematic reviews and systematic reviews published in paper journals. Values are numbers (percentages) unless stated otherwise
The quality components most frequently assessed were, in decreasing order, allocation concealment, blinding, and losses to follow-up. The difference between Cochrane reviews and paper based reviews for these and intention to treat analysis significantly favoured Cochrane reviews (table 3).
Table 3 Summary of quality components and quality scales most used in Cochrane systematic reviews and systematic reviews published in paper based journals analysed in study. Values are numbers (percentages) unless stated otherwise
The most commonly used quality scale was the Jadad scale (n = 113, 11.7%; table 3). Cochrane reviews used the scale less often than paper based reviews. In 65.0% (n = 526) of Cochrane reviews and 48.1% (n = 75) of paper based reviews, authors carried out the quality assessment using single components rather than a formal scale.
No significant differences emerged when Cochrane reviews and paper based reviews were analysed separately by type of intervention assessed—for example, drug compared with non-drug interventions.
Utilisation of quality assessment
We found that 496 systematic reviews (51.4%) linked quality to the interpretation of results, with no difference in the proportions of Cochrane reviews and paper based reviews (51.8% v 49.4%, P = 0.58; table 1). This also held true for the subgroup analysis of drug compared with non-drug interventions.
Is quality assessment carried out as stated?
The authors of Cochrane reviews were more likely than those of paper based reviews to state that they would assess quality (93.7% v 63.5%) yet did not always do so (table 1). About 5% of systematic reviews in each group carried out quality assessment despite not being explicitly stated in the methods. Finally, only 328 (33.9%) of the systematic reviews formally specified how they planned to use the quality assessment in the methods (for example, for sensitivity analysis, exclusion criteria): 36.0% (n = 291) of Cochrane reviews and 23.7% (n = 37) of paper based reviews (P = 0.79).(Lorenzo P Moja, researcher1, Elena Telar)
Correspondence to: A Liberati, Università degli Studi di Modena e Reggio Emilia, Modena, Italy alesslib@tin.it
Objectives To describe how the methodological quality of primary studies is assessed in systematic reviews and whether the quality assessment is taken into account in the interpretation of results.
Data sources Cochrane systematic reviews and systematic reviews in paper based journals.
Study selection 965 systematic reviews (809 Cochrane reviews and 156 paper based reviews) published between 1995 and 2002.
Data synthesis The methodological quality of primary studies was assessed in 854 of the 965 systematic reviews (88.5%). This occurred more often in Cochrane reviews than in paper based reviews (93.9% v 60.3%, P < 0.0001). Overall, only 496 (51.4%) used the quality assessment in the analysis and interpretation of the results or in their discussion, with no significant differences between Cochrane reviews and paper based reviews (52% v 49%, P = 0.58). The tools and methods used for quality assessment varied widely.
Conclusions Cochrane reviews fared better than systematic reviews published in paper based journals in terms of assessment of methodological quality of primary studies, although they both largely failed to take it into account in the interpretation of results. Methods for assessment of methodological quality by systematic reviews are still in their infancy and there is substantial room for improvement.
Critical appraisal of the methodological quality of primary studies is an essential feature of systematic reviews.1-3 A recent review showed that lack of adherence to a priori defined validity criteria may help explain why primary studies on the same topic provide different results.4 Some key issues still remain unresolved: which checklists and scales are the ideal approaches5 and how the results of quality assessment in a systematic review should be handled in the analysis and interpretation of results.6-10
We compared the approaches used for quality assessment of primary studies by Cochrane systematic reviews with systematic reviews published in paper based journals. We determined how quality assessment is used and whether systematic reviews consider quality assessment in their results.
Methods
We sampled systematic reviews from two databases in the Cochrane Library: the Cochrane Database of Systematic Reviews, which includes reviews prepared by review groups of the Cochrane Collaboration; and the Database of s of Reviews of Effectiveness (DARE), which selects systematic reviews published in peer reviewed journals on the basis of their adherence to a few methodological requirements.11
We stratified all 1297 Cochrane systematic reviews published in issue 1, 2002, of the Cochrane Library by type of intervention (six levels: drugs; rehabilitation or psychosocial; prevention or screening; surgery or radiotherapy intervention; communication, organisational, or educational; other) and by Cochrane review group (50 levels). We used a computer generated randomisation scheme to select at least 50% of the systematic reviews in each cell. Our final sample represented 62.4% (n = 809) of the Cochrane reviews. The paper based systematic reviews were extracted from DARE, including all systematic reviews published in 2001 registered up to November 2002.
Data extraction form
We assessed the systematic reviews by using an ad hoc data extraction form. We developed this form by taking into account published reports on the quality assessment of trials included in systematic reviews.1 2 4-10 12-28
We did not aim at standardised operational definitions of the quality measures but accepted at face value what was reported by the authors of individual studies. As a common taxonomy for quality assessment does not exist, we used a large number of descriptive quality components to capture as many of the different definitions as possible.
For each systematic review we sought general information (title, authors, publication date, type of intervention, and presence of a meta-analysis). We then evaluated what authors reported in the methods section of their review for quality assessment. In particular, we tried to ascertain whether authors stated they would have assessed the quality and how (scale or checklist, components studied, composite score) and in what way they planned to use the quality assessment (for example, as exclusion criteria, for sensitivity analysis). See bmj.com for a summary version of the data extraction form.
We then evaluated how authors assessed quality. We recorded if trials were combined in a quantitative meta-analysis; if the quality was evaluated; if scales, checklists, and scores were used; and how the quality was formally incorporated. Assessors judged whether an attempt had been made to incorporate the quality assessment in the results, either qualitatively or quantitatively.
We purposely did not make our operationalised definition of qualitative too stringent. If authors made some comments or discussed the results with reference to the quality of trials then we considered this sufficient to classify the systematic review as having incorporated qualitatively the quality assessment.
Our definition of quantitative was more stringent and included the carrying out of a sensitivity or subgroup analysis (with quality as a stratifying factor) and use of a quality score as a weight or factor for cumulative meta-analysis or metaregression.
Data extraction
We developed a draft of an extraction checklist and piloted it on 40 randomly selected Cochrane reviews. The checklist was revised and an instruction manual prepared. The checklist was further tested on another random sample of 130 systematic reviews, and further refinements were incorporated. Inter-rater agreement, based on a random sample of 5% of the Cochrane reviews and paper based reviews (48 reviews), was high. Inter-rater reliability was moderate to perfect (percentage mean agreement 94, range 71.1-100; prevalence and bias adjusted statistic mean 0.80, range 0.40-1.00).
Twelve pairs of investigators independently extracted the data. Disagreements were resolved through discussion and, when necessary, centrally reviewed.
Statistical analysis
Owing to the frequent imbalance of marginals in our contingency matrix, we used a prevalence and bias adjusted statistic to assess inter-rater reliability.29 30 We report confidence intervals of differences between proportions and P values of 2 tests.
Results
We analysed 965 systematic reviews: 809 Cochrane reviews and 156 reviews published in paper based journals (figure). Quality assessment was assessed in 854 (88.5%) of the reviews and was more often carried out in Cochrane reviews than in paper based reviews (93.9% v 60.3%, P < 0.0001; table 1). The same was true when we compared the proportions of reviews using quality assessment in an informal fashion (90.5% v 51.9%; P < 0.0001; table 2). The formal approaches most used by both types of review were exclusion criteria (12%), sensitivity analysis (10%), exploration of heterogeneity (8%), and subgroup analysis (4%).
Flow of systematic reviews through trial
Table 1 Distribution of three main quality assessment related variables investigated in study according to Cochrane systematic reviews and systematic reviews published in paper based journals. Values are numbers (percentages) unless stated otherwise
Table 2 Summary of approaches to quality assessment and formal quantitative analyses related to quality assessment used in Cochrane systematic reviews and systematic reviews published in paper journals. Values are numbers (percentages) unless stated otherwise
The quality components most frequently assessed were, in decreasing order, allocation concealment, blinding, and losses to follow-up. The difference between Cochrane reviews and paper based reviews for these and intention to treat analysis significantly favoured Cochrane reviews (table 3).
Table 3 Summary of quality components and quality scales most used in Cochrane systematic reviews and systematic reviews published in paper based journals analysed in study. Values are numbers (percentages) unless stated otherwise
The most commonly used quality scale was the Jadad scale (n = 113, 11.7%; table 3). Cochrane reviews used the scale less often than paper based reviews. In 65.0% (n = 526) of Cochrane reviews and 48.1% (n = 75) of paper based reviews, authors carried out the quality assessment using single components rather than a formal scale.
No significant differences emerged when Cochrane reviews and paper based reviews were analysed separately by type of intervention assessed—for example, drug compared with non-drug interventions.
Utilisation of quality assessment
We found that 496 systematic reviews (51.4%) linked quality to the interpretation of results, with no difference in the proportions of Cochrane reviews and paper based reviews (51.8% v 49.4%, P = 0.58; table 1). This also held true for the subgroup analysis of drug compared with non-drug interventions.
Is quality assessment carried out as stated?
The authors of Cochrane reviews were more likely than those of paper based reviews to state that they would assess quality (93.7% v 63.5%) yet did not always do so (table 1). About 5% of systematic reviews in each group carried out quality assessment despite not being explicitly stated in the methods. Finally, only 328 (33.9%) of the systematic reviews formally specified how they planned to use the quality assessment in the methods (for example, for sensitivity analysis, exclusion criteria): 36.0% (n = 291) of Cochrane reviews and 23.7% (n = 37) of paper based reviews (P = 0.79).(Lorenzo P Moja, researcher1, Elena Telar)