Optimal search strategies for retrieving scientifically strong studies
http://www.100md.com
《英国医生杂志》
1 Health Information Research Unit, McMaster University, Hamilton, ON, Canada L8N 3Z5, 2 School of Graduate Studies, McMaster University, 3 Department of Clinical Epidemiology and Biostatistics, McMaster University, 4 Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
Correspondence to: R B Haynes bhaynes@mcmaster.ca
Objective To develop and test optimal Medline search strategies for retrieving sound clinical studies on prevention or treatment of health disorders.
Design Analytical survey.
Data sources 161 clinical journals indexed in Medline for the year 2000.
Main outcome measures Sensitivity, specificity, precision, and accuracy of 4862 unique terms in 18 404 combinations.
Results Only 1587 (24.2%) of 6568 articles on treatment met criteria for testing clinical interventions. Combinations of search terms reached peak sensitivities of 99.3% (95% confidence interval 98.7% to 99.8%) at a specificity of 70.4% (69.8% to 70.9%). Compared with best single terms, best multiple terms increased sensitivity for sound studies by 4.1% (absolute increase), but with substantial loss of specificity (absolute difference 23.7%) when sensitivity was maximised. When terms were combined to maximise specificity, 97.4% (97.3% to 97.6%) was achieved, about the same as that achieved by the best single term (97.6%, 97.4% to 97.7%). The strategies newly reported in this paper outperformed other validated search strategies except for two strategies that had slightly higher specificity (98.1% and 97.6% v 97.4%) but lower sensitivity (42.0% and 92.8% v 93.1%).
Conclusion New empirical search strategies have been validated to optimise retrieval from Medline of articles reporting high quality clinical studies on prevention or treatment of health disorders.
Free worldwide internet access to the US National Library of Medicine's Medline service in early 1997 was followed by a 300-fold increase in searches (from 163 000 searches per month in January 1997 to 51.5 million searches per month in December 20041), with direct use by clinicians, students, and the general public growing faster than use mediated by librarians.
If large electronic bibliographic databases such as Medline are to be helpful to clinical users, clinicians must be able to retrieve articles that are scientifically sound and directly relevant to the health problem they are trying to solve, without missing key studies or retrieving excessive numbers of preliminary, irrelevant, outdated, or misleading reports. Few clinicians, however, are trained in search techniques. One approach to enhance the effectiveness of searches by clinical users is to develop search filters ("hedges") to improve the retrieval of clinically relevant and scientifically sound reports of studies from Medline and similar bibliographic databases.2-7 Hedges can be created with appropriate disease content terms combined ("ANDed") with medical subject headings (MeSH), explosions (px), publication types (pt), subheadings (sh), and textwords (tw) that detect research design features indicating methodological rigour for applied healthcare research. For instance, combining clinical trial (pt) AND myocardial infarction in PubMed brings the retrieval for myocardial infarction down by a factor of 13 (from 116 199 to 8956 articles) and effectively removes case reports, laboratory and animal studies, and other less rigorous and extraneous reports.
In the early 1990s, our group developed Medline search filters for studies of the cause, course, diagnosis, or treatment of health problems, based on a small subset of 10 clinical journals.8 These strategies were adapted for use in the Clinical Queries feature in PubMed and other services. In this paper we report improved hedges for retrieving studies on prevention and treatment, developed on a larger number of journals (n = 161) in a more current era (2000) than previously reported.9
Methods
Our methods are detailed elsewhere.10 11 Briefly, research staff hand searched each issue of 161 clinical journals indexed in Medline for the year 2000 to find studies on treatment that met the following criteria: random allocation of participants to comparison groups, outcome assessment for at least 80% of those entering the investigation accounted for in one major analysis for at least one follow-up assessment, and analysis consistent with study design. Search strategies were then created and tested for their ability to retrieve articles in Medline that met these criteria while excluding articles that did not.
Table 1 shows the sensitivity, specificity, precision, and accuracy of single term and multiple term Medline search strategies that we determined. The sensitivity for a given strategy is defined as the proportion of articles retrieved that are scientifically sound and clinically relevant (high quality articles); specificity is the proportion of lower quality articles (did not meet criteria) that are not retrieved; precision is the proportion of retrieved articles that meet criteria (equivalent to positive predictive value in diagnostic test terminology); and accuracy is the proportion of all articles that are correctly dealt with by the strategy (articles that met criteria and were retrieved plus articles that did not meet criteria and were not retrieved divided by all articles in the database).
Table 1 Formula for calculating sensitivity, specificity, precision, and accuracy of Medline searches for detecting sound clinical studies
After extensive attempts, a small fraction (n = 968, 2%) of citations downloaded from Medline could not be matched to the handsearched data. As a conservative approach, unmatched citations that were detected by a given search strategy were included in cell b of the analysis in table 1 (leading to slight underestimates of the precision, specificity, and accuracy of the search strategy). Similarly, unmatched citations that were not detected by a search strategy were included in cell d of the table (leading to slight overestimates of the specificity and accuracy of the strategy).
Steps in data collection to determine optimal retrieval of articles on treatment from Medline
Manual review of the literature
The figure illustrates the steps involved in the data collection and analysis stages. Six research assistants completed rigorous calibration exercises for application of methodological criteria to articles to determine if the article was methodologically sound. Inter-rater agreement for the classification of articles, corrected for chance agreement, exceeded 80%.12
Collecting search terms
To construct a comprehensive set of possible search terms, we listed MeSH terms and textwords related to study criteria and then sought input from clinicians and librarians through interviews and requests at meetings and conferences and through electronic mail, review of published and unpublished search strategies from other groups, and requests to the National Library of Medicine. We compiled a list of 4862 unique terms (data not shown). All terms were tested using the Ovid Technologies searching system. Search strategies developed using Ovid were subsequently translated by the National Library of Medicine for use in the Clinical Queries interface of PubMed and reviewed by RBH.
Data collection
Manual ratings of articles were recorded on data collection forms along with bibliographic information and database specific unique identifiers. Each journal title was searched in Medline for 2000, and the full Medline records (including citation, , MeSH terms, and publication types) were captured for all articles. Medline data were then linked with the manual review data.
Testing strategies
We randomly divided treatment and prevention articles that met criteria in the manual review database into development and validation datasets (60% and 40%). Sensitivity, specificity, precision, and accuracy were calculated for each term in the development subset and then validated in the rest of the database. For a given purpose category, we incorporated individual search terms with sensitivity greater than 25% and specificity greater than 75% into the development of search strategies that included a combination of two or more terms. All combinations of terms used the boolean OR—for example, "random OR controlled". (The boolean AND was not used because this strategy invariably compromised sensitivity.)
For the development of multiple term search strategies to optimise either sensitivity or specificity, we tested all two term search strategies with sensitivity at least 75% and specificity at least 50%. For optimising accuracy, two term search strategies with accuracy greater than 75% were considered for multiple term development. Overall, we tested 18 404 multiple term search strategies. Search strategies were also developed that optimised combined sensitivity and specificity (by keeping the absolute difference between sensitivity and specificity less than 1%, if possible).
To attempt to increase specificity without compromising sensitivity, we used terms with low sensitivity but appreciable specificity to NOT out citations (for example, randomised controlled trial.pt. OR randomized.mp. OR placebo.mp. NOT retrospective studies.mp. (where pt = publication type; mp = multiple posting—term appears in title, , or MeSH heading)). We also used logistic regression analysis models that included terms in a stepwise manner and also NOTed out terms with a regression coefficient less than -2.0.
We compared strategies that maximised each of sensitivity, specificity, precision, and accuracy for both development and validation datasets with 19 previously published strategies. We chose strategies that had been tested against an ideal method such as a hand search of the published literature and for which most Medline records were from 1990 forward, to reflect major changes in the classification of clinical trials by the National Library of Medicine. These changes included new MeSH definitions (for example, "cohort studies" was introduced in 1989 and "single-blind method" in 1990) and publication types (for example, "clinical trial (pt)" and "randomized controlled trial (pt)", which were instituted in 1991). Six papers2-7 and one library website13 provided a total of 19 strategies to test, including the strategy advocated by the Cochrane Collaboration in their handbook (www.cochrane.dk/cochrane/handbook/hbookAPPENDIX_5C_OPTIMAL_SEARCH_STRAT.htm).2
Results
We included 49 028 articles in the analysis; 6568 articles (13.4%) were classified as original studies evaluating a treatment, of which 1587 (24.2%) met our methodological criteria. Overall, 3807 of 4862 proposed unique terms retrieved citations from Medline that could be used in assessment of terms. The development and validation datasets for assessing retrieval strategies included articles that passed and did not pass treatment criteria (930 and 29 397 articles, respectively, for the development dataset; 657 and 19 631 articles for the validation dataset). The validation dataset provided differences in performance that were statistically significant in only three of 36 comparisons, the greatest of which was 1.1% for one set of specificities (data not shown).
Table 2 shows the operating characteristics for the single terms with the highest sensitivity and the highest specificity. The accuracy is driven by the specificity and thus the term with the best accuracy when keeping sensitivity more than 50% was "randomized controlled trial.pt.". The single term that yielded the best precision while keeping sensitivity more than 50% was also "randomized controlled trial.pt.", and this strategy also gave the optimal balance of sensitivity and specificity.
Table 2 Best single terms for high sensitivity searches, high specificity searches, and searches that optimise balance between sensitivity and specificity for retrieving studies of treatment
For strategies combining up to three terms, those yielding the highest sensitivity, specificity, and accuracy are shown in tables 3, 4, 5. Some two term strategies outperformed one term and multiple term strategies (table 5). Table 6 shows the top three search strategies optimising the trade-off between sensitivity and specificity.
Table 3 Top three search strategies yielding highest sensitivity (keeping specificity >50%) with combinations of terms
Table 4 Top three search strategies yielding highest specificity (keeping sensitivity >50%) based on combinations of up to three terms
Table 5 Top three search strategies yielding highest accuracy (keeping sensitivity >50%) based on combinations of up to three terms
Table 6 Top three search strategies for optimising sensitivity and specificity (based on absolute difference (sensitivity—specificity) <1%)
Table 7 shows the best combination of terms for optimising the trade-off between sensitivity and specificity when using the boolean NOT to eliminate terms with the lowest sensitivity. Nonsignificant differences were shown when citations retrieved by the three terms "review tutorial.pt.", "review academic.pt.", and "selection criteri:.tw." were removed from the strategy that optimised sensitivity and specificity.
Table 7 Best combination of terms for optimising the trade-off between sensitivity and specificity in Medline when adding the boolean AND NOT
After the two term and three term computations, search strategies with sensitivity more than 50% and specificity more than 95% were further evaluated by adding search terms selected using logistic regression modelling. Initially, candidate terms for addition to the base strategy were ordered with the most significant first, using stepwise logistic regression, and then added to the model sequentially. The resulting logistic function (data not shown) determined the association between the predicted probabilities and observed responses. We selected the best one term, two term, three term, and four term strategies. Two were already evaluated ("randomized controlled trial.mp." OR "randomized controlled trial.pt." in table 4 and "randomized controlled trial.mp." OR "randomized controlled trial.pt." OR "double-blind:.tw." in table 5). The other two strategies are listed in table 8: both had high performance. We next took the 13 terms that had regression coefficients less than -2.0 ("predict.tw.", "predict.mp.", "economic.tw.", "economic.mp.", "survey.tw.", "survey.mp.", "hospital mortality.mp,tw.", "hospital mortalit:.mp.", "accuracy:.tw.", "accuracy.tw.", "accuracy.mp.", "explode bias (epidemiology)", and "longitudinal.tw.") and NOTed these terms out of the four term search strategy to determine if these terms would improve the operating characteristic values (table 8, last row). We found a small but insignificant decrease in sensitivity and increases in specificity, precision, and accuracy.
Table 8 Top three term and four term search strategies using logistic regression techniques
We compared our best strategies for maximising sensitivity (sensitivity > 99% and specificity > 70%) and for maximising specificity while maintaining a high sensitivity (sensitivity > 94% and specificity > 97%). To ascertain if the less sensitive strategy (which had a much greater specificity) would miss important articles, we assessed the methodologically sound articles that had not been retrieved by the less sensitive strategy, using studies from the four major medical journals (BMJ, JAMA, Lancet, and New England Journal of Medicine). In total, 32 articles were missed by the less sensitive search, of which four were from these four journals. A practising clinician with training in methods for health research found only one of the four articles to be of substantial clinical importance.14 The indexing terms for this randomised controlled trial did not include "randomized controlled trial(pt)". When we contacted the National Library of Medicine about indexing for this article, the article was reindexed and now the "missing" article would be retrieved.
We used our data to test 19 published strategies2-7 13 and we compared these with the best strategies for optimising sensitivity and specificity. The published strategies had a sensitivity range of 1.3% to 98.8% on the basis of our handsearched data. All of these were lower than our best sensitivity of 99.3%. The specificities for the published strategies ranged from 63.3% to 96.6%. Two strategies from Dumbrique6 outperformed our most specific strategy (specificity of 98.1% and 97.6% versus our 97.4%). Both of these strategies had a lower sensitivity than did our search strategy with the best specificity (42.0% and 92.8% v 93.1%).
Discussion
National Library of Medicine, Bibliographic Services Division. Number of Medline searches. www.nlm.nih.gov/bsd/medline_growth_508.html (accessed 3 Jul 2003).
Robinson KA, Dickersin K. Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol 2002;31: 150-3.
Nwosu CR, Khan KS, Chien PF. A two-term Medline search strategy for identifying randomized trials in obstetrics and gynecology. Obstet Gynecol 1998;91: 618-22.
Marson AG, Chadwick DW. How easy are randomized controlled trials in epilepsy to find on Medline? The sensitivity and precision of two Medline searches. Epilepsia 1996;37: 377-80.
Adams CE, Power A, Frederick K, LeFebvre C. An investigation of the adequacy of Medline searches for randomized controlled trials (RCTs) of the effects of mental health care. Psychol Med 1994;24: 741-8.
Dumbrigue HB, Esquivel JF, Jones JS. Assessment of Medline search strategies for randomized controlled trials in prosthodontics. J Prosthodont 2000;9: 8-13.
Jadad AR, McQuay HJ. A high-yield strategy to identify randomized controlled trials for systematic reviews. Online J Curr Clin Trials 1993;No 33.
Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC. Developing optimal search strategies for detecting clinical sound studies in Medline. J Am Med Inform Assoc 1994;1: 447-58.
Wilczynski NL, Haynes RB. Robustness of empirical search strategies for clinical content in Medline. Proc AMIA Symp 2002; 904-8.
Haynes RB, Wilczynski NC for the Hedges Team. Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline. BMJ 2004;328: 1040-2.
Montori VM, Wilczynski NL, Morgan D, Haynes RB for the Hedges Team. Optimal search strategies for retrieving systematic reviews from Medline: an analytical survey. BMJ 2005;330: 68-73.
Wilczynski NL, McKibbon KA, Haynes RB. Enhancing retrieval of best evidence for health care from bibliographic databases: calibration of the hand search of the literature. Medinfo 2001;10(Pt 1): 390-3.
University of Rochester Medical Center. Edward G. Miner Library. Evidence-based filters for Ovid Medline. www.urmc.rochester.edu/Miner/links/eBMlinks.html#TOOLS (accessed 22 May 2003).
Julien JP, Bijker N, Fentiman IS, Peterse JL, Delledonne V, Rouanet P, et al. Radiotherapy in breast-conserving treatment for ductal carcinoma in situ: first results of the EORTC randomised phase III trial 10853. EORTC Breast Cancer Cooperative Group and EORTC Radiotherapy Group. Lancet 2000;355: 528-33.(R Brian Haynes, chief1, K Ann McKibbon, )
Correspondence to: R B Haynes bhaynes@mcmaster.ca
Objective To develop and test optimal Medline search strategies for retrieving sound clinical studies on prevention or treatment of health disorders.
Design Analytical survey.
Data sources 161 clinical journals indexed in Medline for the year 2000.
Main outcome measures Sensitivity, specificity, precision, and accuracy of 4862 unique terms in 18 404 combinations.
Results Only 1587 (24.2%) of 6568 articles on treatment met criteria for testing clinical interventions. Combinations of search terms reached peak sensitivities of 99.3% (95% confidence interval 98.7% to 99.8%) at a specificity of 70.4% (69.8% to 70.9%). Compared with best single terms, best multiple terms increased sensitivity for sound studies by 4.1% (absolute increase), but with substantial loss of specificity (absolute difference 23.7%) when sensitivity was maximised. When terms were combined to maximise specificity, 97.4% (97.3% to 97.6%) was achieved, about the same as that achieved by the best single term (97.6%, 97.4% to 97.7%). The strategies newly reported in this paper outperformed other validated search strategies except for two strategies that had slightly higher specificity (98.1% and 97.6% v 97.4%) but lower sensitivity (42.0% and 92.8% v 93.1%).
Conclusion New empirical search strategies have been validated to optimise retrieval from Medline of articles reporting high quality clinical studies on prevention or treatment of health disorders.
Free worldwide internet access to the US National Library of Medicine's Medline service in early 1997 was followed by a 300-fold increase in searches (from 163 000 searches per month in January 1997 to 51.5 million searches per month in December 20041), with direct use by clinicians, students, and the general public growing faster than use mediated by librarians.
If large electronic bibliographic databases such as Medline are to be helpful to clinical users, clinicians must be able to retrieve articles that are scientifically sound and directly relevant to the health problem they are trying to solve, without missing key studies or retrieving excessive numbers of preliminary, irrelevant, outdated, or misleading reports. Few clinicians, however, are trained in search techniques. One approach to enhance the effectiveness of searches by clinical users is to develop search filters ("hedges") to improve the retrieval of clinically relevant and scientifically sound reports of studies from Medline and similar bibliographic databases.2-7 Hedges can be created with appropriate disease content terms combined ("ANDed") with medical subject headings (MeSH), explosions (px), publication types (pt), subheadings (sh), and textwords (tw) that detect research design features indicating methodological rigour for applied healthcare research. For instance, combining clinical trial (pt) AND myocardial infarction in PubMed brings the retrieval for myocardial infarction down by a factor of 13 (from 116 199 to 8956 articles) and effectively removes case reports, laboratory and animal studies, and other less rigorous and extraneous reports.
In the early 1990s, our group developed Medline search filters for studies of the cause, course, diagnosis, or treatment of health problems, based on a small subset of 10 clinical journals.8 These strategies were adapted for use in the Clinical Queries feature in PubMed and other services. In this paper we report improved hedges for retrieving studies on prevention and treatment, developed on a larger number of journals (n = 161) in a more current era (2000) than previously reported.9
Methods
Our methods are detailed elsewhere.10 11 Briefly, research staff hand searched each issue of 161 clinical journals indexed in Medline for the year 2000 to find studies on treatment that met the following criteria: random allocation of participants to comparison groups, outcome assessment for at least 80% of those entering the investigation accounted for in one major analysis for at least one follow-up assessment, and analysis consistent with study design. Search strategies were then created and tested for their ability to retrieve articles in Medline that met these criteria while excluding articles that did not.
Table 1 shows the sensitivity, specificity, precision, and accuracy of single term and multiple term Medline search strategies that we determined. The sensitivity for a given strategy is defined as the proportion of articles retrieved that are scientifically sound and clinically relevant (high quality articles); specificity is the proportion of lower quality articles (did not meet criteria) that are not retrieved; precision is the proportion of retrieved articles that meet criteria (equivalent to positive predictive value in diagnostic test terminology); and accuracy is the proportion of all articles that are correctly dealt with by the strategy (articles that met criteria and were retrieved plus articles that did not meet criteria and were not retrieved divided by all articles in the database).
Table 1 Formula for calculating sensitivity, specificity, precision, and accuracy of Medline searches for detecting sound clinical studies
After extensive attempts, a small fraction (n = 968, 2%) of citations downloaded from Medline could not be matched to the handsearched data. As a conservative approach, unmatched citations that were detected by a given search strategy were included in cell b of the analysis in table 1 (leading to slight underestimates of the precision, specificity, and accuracy of the search strategy). Similarly, unmatched citations that were not detected by a search strategy were included in cell d of the table (leading to slight overestimates of the specificity and accuracy of the strategy).
Steps in data collection to determine optimal retrieval of articles on treatment from Medline
Manual review of the literature
The figure illustrates the steps involved in the data collection and analysis stages. Six research assistants completed rigorous calibration exercises for application of methodological criteria to articles to determine if the article was methodologically sound. Inter-rater agreement for the classification of articles, corrected for chance agreement, exceeded 80%.12
Collecting search terms
To construct a comprehensive set of possible search terms, we listed MeSH terms and textwords related to study criteria and then sought input from clinicians and librarians through interviews and requests at meetings and conferences and through electronic mail, review of published and unpublished search strategies from other groups, and requests to the National Library of Medicine. We compiled a list of 4862 unique terms (data not shown). All terms were tested using the Ovid Technologies searching system. Search strategies developed using Ovid were subsequently translated by the National Library of Medicine for use in the Clinical Queries interface of PubMed and reviewed by RBH.
Data collection
Manual ratings of articles were recorded on data collection forms along with bibliographic information and database specific unique identifiers. Each journal title was searched in Medline for 2000, and the full Medline records (including citation, , MeSH terms, and publication types) were captured for all articles. Medline data were then linked with the manual review data.
Testing strategies
We randomly divided treatment and prevention articles that met criteria in the manual review database into development and validation datasets (60% and 40%). Sensitivity, specificity, precision, and accuracy were calculated for each term in the development subset and then validated in the rest of the database. For a given purpose category, we incorporated individual search terms with sensitivity greater than 25% and specificity greater than 75% into the development of search strategies that included a combination of two or more terms. All combinations of terms used the boolean OR—for example, "random OR controlled". (The boolean AND was not used because this strategy invariably compromised sensitivity.)
For the development of multiple term search strategies to optimise either sensitivity or specificity, we tested all two term search strategies with sensitivity at least 75% and specificity at least 50%. For optimising accuracy, two term search strategies with accuracy greater than 75% were considered for multiple term development. Overall, we tested 18 404 multiple term search strategies. Search strategies were also developed that optimised combined sensitivity and specificity (by keeping the absolute difference between sensitivity and specificity less than 1%, if possible).
To attempt to increase specificity without compromising sensitivity, we used terms with low sensitivity but appreciable specificity to NOT out citations (for example, randomised controlled trial.pt. OR randomized.mp. OR placebo.mp. NOT retrospective studies.mp. (where pt = publication type; mp = multiple posting—term appears in title, , or MeSH heading)). We also used logistic regression analysis models that included terms in a stepwise manner and also NOTed out terms with a regression coefficient less than -2.0.
We compared strategies that maximised each of sensitivity, specificity, precision, and accuracy for both development and validation datasets with 19 previously published strategies. We chose strategies that had been tested against an ideal method such as a hand search of the published literature and for which most Medline records were from 1990 forward, to reflect major changes in the classification of clinical trials by the National Library of Medicine. These changes included new MeSH definitions (for example, "cohort studies" was introduced in 1989 and "single-blind method" in 1990) and publication types (for example, "clinical trial (pt)" and "randomized controlled trial (pt)", which were instituted in 1991). Six papers2-7 and one library website13 provided a total of 19 strategies to test, including the strategy advocated by the Cochrane Collaboration in their handbook (www.cochrane.dk/cochrane/handbook/hbookAPPENDIX_5C_OPTIMAL_SEARCH_STRAT.htm).2
Results
We included 49 028 articles in the analysis; 6568 articles (13.4%) were classified as original studies evaluating a treatment, of which 1587 (24.2%) met our methodological criteria. Overall, 3807 of 4862 proposed unique terms retrieved citations from Medline that could be used in assessment of terms. The development and validation datasets for assessing retrieval strategies included articles that passed and did not pass treatment criteria (930 and 29 397 articles, respectively, for the development dataset; 657 and 19 631 articles for the validation dataset). The validation dataset provided differences in performance that were statistically significant in only three of 36 comparisons, the greatest of which was 1.1% for one set of specificities (data not shown).
Table 2 shows the operating characteristics for the single terms with the highest sensitivity and the highest specificity. The accuracy is driven by the specificity and thus the term with the best accuracy when keeping sensitivity more than 50% was "randomized controlled trial.pt.". The single term that yielded the best precision while keeping sensitivity more than 50% was also "randomized controlled trial.pt.", and this strategy also gave the optimal balance of sensitivity and specificity.
Table 2 Best single terms for high sensitivity searches, high specificity searches, and searches that optimise balance between sensitivity and specificity for retrieving studies of treatment
For strategies combining up to three terms, those yielding the highest sensitivity, specificity, and accuracy are shown in tables 3, 4, 5. Some two term strategies outperformed one term and multiple term strategies (table 5). Table 6 shows the top three search strategies optimising the trade-off between sensitivity and specificity.
Table 3 Top three search strategies yielding highest sensitivity (keeping specificity >50%) with combinations of terms
Table 4 Top three search strategies yielding highest specificity (keeping sensitivity >50%) based on combinations of up to three terms
Table 5 Top three search strategies yielding highest accuracy (keeping sensitivity >50%) based on combinations of up to three terms
Table 6 Top three search strategies for optimising sensitivity and specificity (based on absolute difference (sensitivity—specificity) <1%)
Table 7 shows the best combination of terms for optimising the trade-off between sensitivity and specificity when using the boolean NOT to eliminate terms with the lowest sensitivity. Nonsignificant differences were shown when citations retrieved by the three terms "review tutorial.pt.", "review academic.pt.", and "selection criteri:.tw." were removed from the strategy that optimised sensitivity and specificity.
Table 7 Best combination of terms for optimising the trade-off between sensitivity and specificity in Medline when adding the boolean AND NOT
After the two term and three term computations, search strategies with sensitivity more than 50% and specificity more than 95% were further evaluated by adding search terms selected using logistic regression modelling. Initially, candidate terms for addition to the base strategy were ordered with the most significant first, using stepwise logistic regression, and then added to the model sequentially. The resulting logistic function (data not shown) determined the association between the predicted probabilities and observed responses. We selected the best one term, two term, three term, and four term strategies. Two were already evaluated ("randomized controlled trial.mp." OR "randomized controlled trial.pt." in table 4 and "randomized controlled trial.mp." OR "randomized controlled trial.pt." OR "double-blind:.tw." in table 5). The other two strategies are listed in table 8: both had high performance. We next took the 13 terms that had regression coefficients less than -2.0 ("predict.tw.", "predict.mp.", "economic.tw.", "economic.mp.", "survey.tw.", "survey.mp.", "hospital mortality.mp,tw.", "hospital mortalit:.mp.", "accuracy:.tw.", "accuracy.tw.", "accuracy.mp.", "explode bias (epidemiology)", and "longitudinal.tw.") and NOTed these terms out of the four term search strategy to determine if these terms would improve the operating characteristic values (table 8, last row). We found a small but insignificant decrease in sensitivity and increases in specificity, precision, and accuracy.
Table 8 Top three term and four term search strategies using logistic regression techniques
We compared our best strategies for maximising sensitivity (sensitivity > 99% and specificity > 70%) and for maximising specificity while maintaining a high sensitivity (sensitivity > 94% and specificity > 97%). To ascertain if the less sensitive strategy (which had a much greater specificity) would miss important articles, we assessed the methodologically sound articles that had not been retrieved by the less sensitive strategy, using studies from the four major medical journals (BMJ, JAMA, Lancet, and New England Journal of Medicine). In total, 32 articles were missed by the less sensitive search, of which four were from these four journals. A practising clinician with training in methods for health research found only one of the four articles to be of substantial clinical importance.14 The indexing terms for this randomised controlled trial did not include "randomized controlled trial(pt)". When we contacted the National Library of Medicine about indexing for this article, the article was reindexed and now the "missing" article would be retrieved.
We used our data to test 19 published strategies2-7 13 and we compared these with the best strategies for optimising sensitivity and specificity. The published strategies had a sensitivity range of 1.3% to 98.8% on the basis of our handsearched data. All of these were lower than our best sensitivity of 99.3%. The specificities for the published strategies ranged from 63.3% to 96.6%. Two strategies from Dumbrique6 outperformed our most specific strategy (specificity of 98.1% and 97.6% versus our 97.4%). Both of these strategies had a lower sensitivity than did our search strategy with the best specificity (42.0% and 92.8% v 93.1%).
Discussion
National Library of Medicine, Bibliographic Services Division. Number of Medline searches. www.nlm.nih.gov/bsd/medline_growth_508.html (accessed 3 Jul 2003).
Robinson KA, Dickersin K. Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol 2002;31: 150-3.
Nwosu CR, Khan KS, Chien PF. A two-term Medline search strategy for identifying randomized trials in obstetrics and gynecology. Obstet Gynecol 1998;91: 618-22.
Marson AG, Chadwick DW. How easy are randomized controlled trials in epilepsy to find on Medline? The sensitivity and precision of two Medline searches. Epilepsia 1996;37: 377-80.
Adams CE, Power A, Frederick K, LeFebvre C. An investigation of the adequacy of Medline searches for randomized controlled trials (RCTs) of the effects of mental health care. Psychol Med 1994;24: 741-8.
Dumbrigue HB, Esquivel JF, Jones JS. Assessment of Medline search strategies for randomized controlled trials in prosthodontics. J Prosthodont 2000;9: 8-13.
Jadad AR, McQuay HJ. A high-yield strategy to identify randomized controlled trials for systematic reviews. Online J Curr Clin Trials 1993;No 33.
Haynes RB, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC. Developing optimal search strategies for detecting clinical sound studies in Medline. J Am Med Inform Assoc 1994;1: 447-58.
Wilczynski NL, Haynes RB. Robustness of empirical search strategies for clinical content in Medline. Proc AMIA Symp 2002; 904-8.
Haynes RB, Wilczynski NC for the Hedges Team. Optimal search strategies for retrieving scientifically strong studies of diagnosis from Medline. BMJ 2004;328: 1040-2.
Montori VM, Wilczynski NL, Morgan D, Haynes RB for the Hedges Team. Optimal search strategies for retrieving systematic reviews from Medline: an analytical survey. BMJ 2005;330: 68-73.
Wilczynski NL, McKibbon KA, Haynes RB. Enhancing retrieval of best evidence for health care from bibliographic databases: calibration of the hand search of the literature. Medinfo 2001;10(Pt 1): 390-3.
University of Rochester Medical Center. Edward G. Miner Library. Evidence-based filters for Ovid Medline. www.urmc.rochester.edu/Miner/links/eBMlinks.html#TOOLS (accessed 22 May 2003).
Julien JP, Bijker N, Fentiman IS, Peterse JL, Delledonne V, Rouanet P, et al. Radiotherapy in breast-conserving treatment for ductal carcinoma in situ: first results of the EORTC randomised phase III trial 10853. EORTC Breast Cancer Cooperative Group and EORTC Radiotherapy Group. Lancet 2000;355: 528-33.(R Brian Haynes, chief1, K Ann McKibbon, )