当前位置: 首页 > 期刊 > 《英国医生杂志》 > 2004年第16期 > 正文
编号:11355088
Issues in the reporting of epidemiological studies: a survey of recent practice
http://www.100md.com 《英国医生杂志》
     1 Department of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT, 2 New England Research Institutes, 9 Galen St, Watertown, MA 02472, USA, 3 Department of Obstetrics and Gynaecology, Harvard Medical School and Beth Israel Deaconess Medical Centre, Boston, MA 02215, USA, 4 Clinical Research Program, Children's Hospital, Boston, MA 02115, USA, 5 Prometrika LLC, Cambridge, MA 02138, USA

    Correspondence to: S Pocock stuart.pocock@lshtm.ac.uk

    Abstract

    Observational epidemiology generates a plethora of publications across numerous epidemiological and medical journals. Many texts tackle the quality of epidemiological studies,1-8 but few directly focus on epidemiological publications.9-14 We reviewed the quality and methodological acceptability of research epidemiology published in January 2001. We concentrated on analytical epidemiology—that is, studies that used observational data on people from the general population to quantify relations between exposures and disease.

    Methods

    Our survey into the current state of epidemiological publications in high impact journals raises concerns regarding aspects of study design, analysis, and reporting that could lead to misleading results in some publications.

    Our focus on high impact epidemiological and general medical journals has by design under-represented epidemiology in the many specialist medical journals. Our sample articles may be better quality as journals that publish only occasional epidemiological articles may be less discriminating.23

    We focused on epidemiological studies in general populations. We excluded studies on clinical epidemiology in people with disease and studies in pharmacoepidemiology, although they raise similar issues. The quality of published randomised controlled trials24 25 and non-randomised intervention studies26 27 have been evaluated. The corresponding paucity of surveys into epidemiological studies motivated our work.

    Types of study

    We have confirmed that research on cancers and cardiovascular diseases dominates published epidemiology. The originality of some such efforts has been questioned,28 and epidemiological research is lacking in many other diseases. We found few articles concerned with developing countries, though they may be published in tropical medicine journals instead.

    Cohort studies were common, especially regarding cardiovascular disease and all cause mortality. Major cohort studies produce many publications: our one month's survey unsurprisingly captured the nurses health study, the Framingham study, the national health and nutrition examination survey (NHANES), and the multiple risk factor intervention trial (MRFIT). Case-control studies were the appropriate chosen design for rarer outcomes—for example, cancers. Other specialties (such as mental health and diabetes) used cross sectional designs. We found only one case-cohort study.16 In such studies29 30 cases are identified from a cohort during follow up. Controls are sampled from the whole cohort, including people who become cases. The case-cohort design is logistically simpler than for case-control studies, although its analysis must handle the potential duplication of cases as controls. Their popularity may increase especially when several outcomes are investigated with large data bases.

    One important question is whether a study's design is appropriate for the topic addressed. We have concentrated here on analysis and reporting, but we encourage subsequent enquiry into this key design concern.

    Exposure variables

    The most commonly investigated issues were lifestyle and behavioural. Genetic studies were few, though genetic epidemiology is a growing discipline.31 Most exposures were quantitative, usually grouped into ordered categories rather than analysed as continuous variables. Methodologists have emphasised the importance of appropriate selection of categories and presentational methods32-35 but few articles gave reasons for the choice of categories and analyses, raising suspicions that alternative groupings might have also been explored. Furthermore, articles generally did not discuss the quality of the data. Ad hoc categorisations and measurement errors might explain many inconsistencies in published results.14

    Measures of association and inferences

    Overall, authors presented appropriate estimates of their associations. Case-control studies used odds ratios, and most cohort studies used some form of rate ratio. Hazard ratios from proportional hazards models appear more often than rate ratios from Poisson models, which are appropriate only when rates stay constant over time.7 Nomenclature and methods did not always match—for example, we had to check the results and methods sections carefully to identify what authors actually meant by "relative risk" or "risk ratio".

    Confidence intervals were usually presented as appropriate expressions of statistical uncertainty, but in some papers text and tables wee made unwieldy by their excessive use. Hypothesis testing appeared in about half of articles, indicating rehabilitation of P values in observational studies.20 36 37 None the less, conclusions should not rely on arbitrary cut offs such as P < 0.05.

    The distribution of P values in the figure has a peak around 0.01 < P < 0.05, suggesting that publication bias affects epidemiology, as such significant findings are presumably more publishable.38 Randomised clinical trials and observational epidemiology have different research philosophies, which may affect publication bias. Trials are more decision oriented, often studying a single primary hypothesis with a (hopefully) unbiased design. As authors of epidemiological studies have more options on what to publish, publication bias is more complex and of potentially greater concern.

    Adjustment for confounders

    Most authors adjusted for potential confounders, though the extent varied greatly. Though techniques for such adjustment are established, their implementation seems inconsistent. For some topics—for example, coronary heart disease—past experience aids the choice of variables, but how confounding is tackled depends on the authors' disposition and the extent of data. Few explained how and why they chose variables for adjustment. A few were overenthusiastic and included too many variables in small studies. Some used stepwise regression to reduce the set of adjustment variables, a practice not without problems.39 40 Such procedures do not consider whether a variable's inclusion in the model affects the estimated effect of the exposure—that is, whether the variable is a confounder.

    Some reported both unadjusted analyses and analyses adjusted for covariates, which appropriately informs readers of the role confounders had.

    Effect modifiers

    Subgroup analyses were common, and half of the articles claimed some effect modification. In clinical trials41 42 and epidemiology22 overinterpretation of subgroup analyses presents three problems: increased risk of false claims of effect modification when several subgroup analyses are explored; insufficient use of statistical tests of interaction, which more directly assess the evidence for an effect modifier, compared with misleading uses of subgroup P values or confidence intervals; and the need to exercise restraint, viewing subgroup findings as exploratory and hypothesis generating rather than definitive.

    Multiplicity

    Some studies explore many associations without considering the consequent increased risk of false positive findings.10 22 Such "data dredging"14 biases publications towards exaggerated claims. Investigators often focus on the most significant associations. This is accentuated in cohort studies with multiple publications, where what gets published can be highly selective. Particularly in small studies, apparently strong associations may be spurious and not supported by subsequent studies.

    Study size

    Few studies gave any power calculation to justify their size. One proposal is that cohort studies, specifically in coronary heart disease, require over 400 events to achieve sufficiently precise estimation.43 This is around the median number of events in our cohort studies, suggesting that many are underpowered, unless the associations with risk are pronounced. For instance, a cohort study relating bone mass to risk of colon cancer had only 44 incident cases.44 With authors seeking positive findings, small studies need inflated associations between exposure and outcome to achieve significance and get published. Selective timing of publication may also increase the risk of false positives.

    The methods of power calculation for case-control studies are well established.45 As they have fewer controls per case compared with the ratio of subjects without and with events in cohort studies, the desired number of cases needs to be just as large, except for detecting strong associations. Our case-control studies had a median of 347 cases, suggesting that many could detect only large effects. For instance, one study with 90 cases and controls needed to observe a steep gradient of risk of breast cancer with birth weight to reach significance.46

    Sample selection

    A study's representativeness depends on the source of participants and the proportion participating.47 Information on refusals and drop outs is often lacking. Authors should document the sample selection process and participation rate.

    Conclusions and key findings

    We have identified issues of concern surrounding the design, analysis, and reporting of epidemiological research. We think primary responsibility for improvement rests with authors, though journals and peer reviewers need to be vigilant to enhance the quality of articles.

    The following limitations merit particular attention:

    The participant selection process—for example, information on exclusions and refusals—often lacks details

    The quality of data collected, and any problems therein, are often insufficiently described

    Some studies are too small and may be prone to exaggerated claims, while few give power calculations to justify their size

    Quantitative exposure variables are commonly grouped into ordered categories, but few state the rationale for choice of grouping and analyses

    The terminology for estimates of association—for example, the term "relative risk"—is used inconsistently

    Confidence intervals are appropriately in widespread use but were presented excessively in some articles

    P values are used more sparingly, but there is a tendency to overinterpret arbitrary cut offs such as P < 0.05

    The selection of and adjustment for potential confounders needs greater clarity, consistency, and explanation

    Subgroup analyses to identify effect modifiers mostly lack appropriate methods—for example, interaction tests—and are often overinterpreted

    Studies exploring many associations tend not to consider the increased risk of false positive findings

    The epidemiological literature seems prone to publication bias

    There are insufficient epidemiological publications in diseases other than cancer and cardiovascular diseases and in developing countries

    Overall, there is a serious risk that some epidemiological publications reach misleading conclusions.

    What is already known on this topic

    Papers in observational epidemiology vary greatly in quality, content, and style

    There are no generally accepted reporting guidelines for epidemiological studies

    What this study adds

    This study presents a survey of recent epidemiological publications.

    Critical evaluation concerns: types of study design, study size, sample selection, disease outcomes investigated, types of exposure variable, handling of confounders, methods of statistical inference, claims of effect modification, the multiplicity of outcome-exposure associations explored and publication bias.

    There is a serious risk that some epidemiological publications reach misleading conclusions

    We are grateful to Nicole Leong for her valuable contributions to getting the study underway. We thank Diana Elbourne, Stephen Evans, and John McKinlay for helpful comments on the draft manuscript.

    Contributors: All authors jointly conceived the project, undertook the survey, and contributed to writing and revising the manuscript. SJP drafted and coordinated the article's content, TJC coordinated the survey's conduct and BLdeS substantially revised the article. SJP is guarantor.

    Funding: None.

    Competing interests: None declared.

    Ethical approval: Not required.

    References

    Rothman KJ, Greenland S. Modern epidemiology. Boston: Lippincott-Raven, 1998.

    Breslow NE, Day NE. Statistical methods in cancer research. Vol 1. The analysis of case-control studies. Lyons: International Agency for Research on Cancer, 1980.

    Breslow NE, Day NE. Statistical methods in cancer research. Vol 2. The analysis of cohort studies. Lyons: International Agency for Research on Cancer, 1980.

    dos Santos Silva I. Cancer epidemiology: principles and methods. Lyons: International Agency for Research on Cancer, 1999.

    Hennekens CH, Buring JE. Epidemiology in medicine. Boston: Little, Brown, 1987.

    Schelsselman JJ. Case-control studies: design, conduct, analysis. New York: Oxford University Press, 1982.

    Clayton D, Hills M. Statistical models in epidemiology. Oxford: Oxford University Press, 1993.

    Grimes D, Schulz KF. Epidemiology series. Lancet 2002;359: 57-61, 145-9, 248-52, 341-5, 431-4.

    Epidemiology Work Group. Guidelines for documentation of epidemiologic studies. Am J Epidemiol 1981;114: 609-18.

    Rushton L. Reporting of occupational and environmental research: use and misuse of statistical and epidemiological methods. Occup Environ Med 2000;57: 1-9.

    Blettner M, Heuer C, Razum O. Critical reading of epidemiological papers: a guide. Eur J Pub Health 2001;11: 97-101.

    Horwitz RI, Feinstein AR. Methodologic standards and contradictory results in case-control research. Am J Med 1979;66: 556-63.

    Downs SH, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomized and non-randomised studies of health care interventions. J Epidemiol Community Health 1998;52: 377-84.

    Davey-Smith G, Ebrahim, S. Data dredging, bias, or confounding. BMJ 2002;325: 1437-8.

    Journal Citation Reports. http://wos.mimas.ac.uk/jcrweb (accessed May 2001).

    Zeegers MPA, Volovics A, Dorant E, Goldbohm RA, van den Brant PA. Alcohol consumption and bladder cancer risk: results from the Netherlands cohort study. Am J Epidemiol 2001;153: 38-41.

    D'Agostino RB, Lee M-L, Belanger AJ, Cupples LA, Anderson K, Kannel WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham heart study. Stat Med 1990;9: 1501-15.

    Rothman KJ. Writing for epidemiology. Epidemiology 1998;9: 333-7.

    Lang JM, Rothman KJ, Cann CI. That confounded P-value. Epidemiology 1998;9: 7-8.

    Weinberg CR. It's time to rehabilitate the P-value. Epidemiology 2001;12: 288-90.

    Su LJ, Arab L. Nutritional status of folate and colon cancer risk: evidence from NHANES I epidemiologic follow-up study. Ann Epidemiol 2001;11: 65-72.

    Ottenbacher KJ. Quantitative evaluation of multiplicity in epidemiology and public health research. Am J Epidemiol 1998;147: 615-9.

    Lee KP, Schotland BA, Bacchetti P, Bero LA. Association of journal quality indicators with methodological quality of clinical research articles. JAMA 2002;287: 2805-8.

    Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials. The CONSORT group. Lancet 2001;357: 1191-4.

    Altman DG, Schulz KF, Moher D, Egger M, Davidoff, F, Elbourne D, et al. The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Ann Intern Med 2001;134: 663-94.

    Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating nonrandomised intervention studies. Health Technol Assess 2003;7: 1-173.

    Concato J, Shah N, Horwitz, RI, Randomised, controlled trials, observational studies, and the hierarchy of research designs. N Eng J Med 2000;342: 1887-92.

    Kuller L H. Invited commentary: circular epidemiology. Am J Epidemiol 1999;150: 897-903.

    Langholz B. Case-cohort Study. In: Gail M, Benichou J, eds. Encyclopedia of epidemiologic methods. Chichester: Wiley, 2000: 139-45.

    Wacholder S. Practical considerations in choosing between the case-cohort and nested case-control design. Epidemiology 1991;2: 155-8.

    Colhoun HM, McKeigue PM, Davey-Smith G. Problems of reporting genetic associations with complex outcomes: can we avoid being swamped by spurious findings? Lancet 2003;361: 865-72.

    Greenland S. Analysis of polytomous exposures and outcomes. In: Rothman K J, Greenland S, eds. Modern epidemiology. Philadelphia: Lippincott-Raven, 1998: 301-28.

    Figueiras A, Cadarso-Suarez C. Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. Am J Epidemiol 2001;154: 264-75.

    Zhao LP, Kolonel LN. Efficiency loss from categorizing quantitative exposures into qualitative exposures in case-control studies. Am J Epidemiol 1992;136: 464-74.

    Altman DG, Lausen B, Sauerbrei W, Schumacher W. Dangers of using "optimal" cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst 1994;86: 1798-9.

    Goodman SN. Of P-values and Bayes: a modest proposal. Epidemiology 2001;12: 295-7.

    Sterne J, Davey-Smith, G. Sifting the evidence—what's wrong with significance tests? BMJ 2001;322: 226-31.

    Higginson J. Publication of "negative" epidemiology studies. J Chron Dis 1987;40: 371-2.

    Greenland S, Rothman K. Introduction to stratified analysis. In: Rothman K J, Greenland S, eds. Modern epidemiology. Philadelphia: Lippincott-Raven, 1998: 253-80.

    Greenland S. Modelling and variable selection in epidemiological analysis. Am J Pub Health 1989;79: 340-9.

    Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet 2000;355: 1064-9.

    Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991;266: 93-8.

    Phillips AN, Pocock SJ. Sample size requirements for prospective studies, with examples for coronary heart disease. J Clin Epidemiol 1989;42: 639-48.

    Zhang Y, Felson DT, Curtis Ellison R, Kreger BE, Schatzkin A, Dorgan JF, et al. Bone mass and the risk of colon cancer among postmenopausal women. The Framingham study. Am J Epidemiol 2001;153: 31-7.

    Schlesselmann JJ. Sample size. In: Case-control studies. New York: Oxford University Press, 1982: 144-70.

    Kaijser M, Lichtenstein P, Granath F, Erlandsson G, Cnattingius S, Ekbom A. In utero exposures and breast cancer: a study of opposite-sex twins. J Natl Cancer Inst 2001;93: 60-2.

    Olson SH, Voigt LF, Begg CB, Weiss NS. Reporting participation in case-control studies. Epidemiology 2002;13: 123-6.(Stuart J Pocock, professo)