Research for Newborn Screening: Developing a National Framework
http://www.100md.com
《小儿科》
Department of Pediatrics and Medical Ethics, University of Utah, Salt Lake City, Utah
ABSTRACT
Newborn metabolic screening represents the largest application of genetic testing in medicine. As new technologies are developed, the number of conditions amenable to newborn screening (NBS) will continue to expand. Despite the scope of these programs, the evidence base for a number of NBS applications remains relatively weak. This article briefly reviews the evidence base for several conditions. The article then develops a proposal for a structured sequence of research protocols to evaluate potential applications for NBS before their formal implementation in public health programs. Such a framework for research will require collaboration between states and the federal government, a collaboration that is emerging through recent federal legislation and funding.
Key Words: newborn screening ethics research
Abbreviations: NBS, newborn screening PKU, phenylketonuria MCAD, medium-chain acyl-coenzyme A dehydrogenase deficiency MS/MS, tandem mass spectrometry RCT, randomized, controlled trial SCD, sickle cell disease CF, cystic fibrosis
Newborn metabolic screening is conducted for 4 million infants per year and represents the largest single application of genetic testing in medicine. Newborn screening (NBS) programs traditionally are run by state public health departments, although there is an emerging commercial sector for the provision of these services. Screening for phenylketonuria (PKU) was initiated in the 1960s, and subsequently the number of conditions on the NBS panels increased considerably. However, there is a broad range among states in the number of conditions targeted, from 4 to >40. With the advent of new technology such as tandem mass spectrometry (MS/MS) and the recognition of the substantial variability between programs, an active national discussion has emerged to support states in bringing to children high-quality services that are effective and efficient.1
Unfortunately, there are significant barriers to conducting research on the efficacy of NBS programs. The basic question relevant to efficacy is whether morbidity and/or mortality rates are reduced for affected children identified through a universal screening program, compared with outcomes after clinical diagnosis or selective screening. Assessing the efficacy of universal screening requires a basis on which to make this comparison, with both short-term and long-term outcomes in mind. However, state departments of health often do not have funds to conduct evaluations of established programs beyond counts of true-positive, false-positive, and true-negative results and laboratory quality assessments. Programs typically do not make systematic attempts to identify affected children who had false-negative results or to evaluate formally the longer-term health benefits for affected children. Also, many of the conditions targeted in NBS programs are rare, meaning that most states identify only a few affected children with each condition per year. This makes outcome studies with sufficient statistical power through state-based projects virtually impossible in all except the largest states.
A more fundamental barrier to research in NBS is ethical concern regarding the use of randomized, controlled trials (RCTs), which are usually considered the standard in research design. The ethical concern arises when an apparently clinically beneficial intervention for affected children is proposed as a component of a population-based screening program. It becomes ethically problematic to propose a control arm for a study in which screening is not provided to a segment of the population, although the efficacy of the screening approach is unproven. This is a question of scale; can we be confident that interventions that are effective on a smaller project scale will be effective when implemented on a population basis To date, the only RCT of NBS in the United States is the Wisconsin cystic fibrosis (CF) project. The Wisconsin CF project has been valuable in addressing the efficacy of CF NBS but the project design, involving randomization, has been the focus of criticism in the lay press and ethical discussion in the professional literature.2, 3 This project is discussed in more detail below. In the absence of randomized designs, research on NBS often is observational after implementation of screening, with either historical control data or control through comparisons with similar populations without screening.
These barriers to research are formidable. Despite the use of this technology for 4 million infants per year in the United States and many more internationally in the past 3 decades, the research basis remains relatively poor. The New York State Task Force on Life and the Law stated, in its 2000 publication on genetic testing, "In fact, only a minority of newborn screening tests that are currently performed have been demonstrated formally to have both clinical validity and utility."4 Wilcken et al,5 in a 2003 publication, concluded more broadly, "Formal evidence of the clinical effectiveness of newborn screening is lacking." Currently, many states are adopting MS/MS for NBS programs, despite uncertainties regarding the sensitivities and specificities of the tests and the natural history and treatability of many conditions identified. Of the 30 conditions detectable with MS/MS, medium-chain acyl-coenzyme A dehydrogenase deficiency (MCAD) shows the greatest promise in terms of screening efficacy for children. Nevertheless, Elliman et al6 observed in 2002, "Despite international experience of screening well over a million newborn infants [for MCAD] ... there has been no report of a systematic follow-up of longer term outcome in affected infants detected by screening." Therefore, although we know that MS/MS can detect affected children and that early intervention can be lifesaving, we remain uncertain about the nature and magnitude of the longer-term benefits of actual population screening programs.
The implication of these concerns is not only that some modalities of NBS may prove to be ineffective when evaluated formally. Experience over several decades and a body of observational research lend support to many of these programs. In this era of evidence-based medicine, however, a less-than-rigorous approach to research on these large, expensive, and important, public health programs is no longer appropriate.7–9 The recent history of medicine illustrates how research on popular screening programs can reveal limited efficacy; hospital admission chest films10 and breast self-examinations11 are prominent examples. Furthermore, not only does research identify screening programs that are ineffective and/or harmful,12 but formal evaluation can identify aspects of valuable programs that reduce efficacy in critical ways. Therefore, the goals of research are not only to make policy decisions to adopt or forgo population screening but also to design programs to maximize benefits and to minimize harm. This article reviews several examples of NBS that illustrate the strengths and weaknesses of the empirical foundation for screening. A proposal to develop a national framework for research on NBS is then outlined.
EXAMPLES
Hemoglobinopathies
Screening for hemoglobinopathies is a component of NBS programs in 49 states and the District of Columbia. Sickle cell disease (SCD) is the primary condition of interest, although other hemoglobinopathies are also detected.13 SCD occurs most commonly in the United States among African Americans, with an incidence at birth of 1 case per 375 infants. Other population groups are affected with incidences of 1 case per 3000 Native American infants, 1 case per 20000 Hispanic infants, and 1 case per 60000 white infants. Young children with SCD are susceptible to systemic infections with Streptococcus pneumoniae at a rate of 8 episodes per 100 person-years, with a case fatality rate of 35%.14 Early detection of SCD for an infant permits the prophylactic administration of penicillin to prevent pneumococcal infections, in addition to vaccination with Hemophilus influenza type b and pneumococcal vaccines.
The seminal study demonstrating the efficacy of preventive therapy was published in 1986 by Gaston et al.15 This multicenter clinical trial randomized children <3 years of age with SCD to either penicillin or placebo. The study was terminated after 15 months of follow-up monitoring when results indicated substantial reductions in infection and mortality rates in the treatment group. These impressive results led to a federally sponsored consensus conference in 1987.16 The conference concluded, "The benefits of screening are so compelling that universal screening should be provided. State law should mandate the availability of these services while permitting parental refusal." Furthermore, the conference concluded, "To be effective, neonatal screening must be part of a comprehensive program for the care of sickle cell patients and their families."
The study by Gaston et al15 clearly demonstrated the efficacy of penicillin prophylaxis in reducing morbidity and mortality rates for young children with SCD who were monitored in a longitudinal research environment. However, a key question is whether the efficacy of a preventive treatment can be maintained when expanded to a population level as part of a routine public health program. It is worth emphasizing that the impressive benefits of penicillin prophylaxis demonstrated by Gaston et al15 were for children diagnosed clinically, not through NBS. Therefore, the benefits added by NBS are for the subsets of affected children who die or become seriously ill before a clinical diagnosis.
NBS is more than a test and an intervention; it must be viewed as a system involving a chain of decisions and actions from the heelstick of the infant through the laboratory, the health department, the primary care provider, and the parents to the effective delivery and maintenance of long-term treatment for the child. Any system is only as good as its weakest link, and the efficacy of all NBS programs is contingent on the integrity of this chain.
Gaston et al15 recognized that compliance with the penicillin regimen was critical to the success of prophylaxis, and one of the valuable aspects of their study was the administration of penicillin by mouth rather than injection. Prophylaxis with penicillin administered through injection is painful and requires frequent visits to the clinic, leading to poor compliance. However, compliance is also poor for many orally administered medicines or diets among both adults and children.17 A parallel problem involves physicians who are not compliant with standard-of-care measures. A report by the Centers for Disease Control and Prevention in 2000 presented data from 1998 on compliance in NBS programs for SCD in California, Illinois, and New York.18 Parents reported that 93% of the children received regular penicillin therapy and 75% had received the pneumococcal vaccine. However, 76% of physicians reported providing penicillin prophylaxis to their patients, and they estimated that only 44% of parents were compliant. Only 25% of patients had received the pneumococcal vaccine. A recent study of children with SCD receiving Medicaid in Washington and Tennessee found that enough prophylactic antibiotic was dispensed to cover only 40% of the year-long study period.19 Teach et al20 found a penicillin compliance rate of 43% among children with SCD, as measured with urine assays. Other reports also illustrated compliance problems with the SCD prophylactic regimen.21
The implication of these data are that it is difficult to know the magnitude of the benefit for NBS for SCD. The general consensus in the literature is that mortality and morbidity rates for young children are decreased with NBS,22 but acquiring definitive data to draw this conclusion is challenging for several reasons. First, there has been no formal controlled trial of NBS for SCD. Comparison with historical mortality rates can provide useful information, but historical control data may be biased because of changes in health care with time. Second, the adverse outcomes preventable with screening for SCD occur for a minority of affected children, whether or not prophylactic interventions are used; therefore, it is difficult to identify the benefits of screening without carefully tracking large populations of affected children over time. The ability to track the health outcomes of a large cohort of children is not a feature of our health care system. Third, the almost-universal use of SCD NBS makes it impossible to compare otherwise comparable states that use and do not use NBS for this condition.
A recent Cochrane review identified no RCTs of NBS for SCD.23 The reviewers concluded, "There is however evidence of benefit from early commencement of treatment in SCD, which is made possible by screening in the neonatal period. ... Information from a well designed prospective RCT of neonatal screening is desirable to make recommendations for practice. However such trials may now be considered unethical in view of the proven benefit of early prophylactic treatment with penicillin."23
The conclusion here is that NBS for SCD probably is effective in saving many lives per year, but we do not have solid data to demonstrate this efficacy or to define the magnitude of the benefits. It is too late to conduct an efficacy trial of population screening, but additional work on enhancing compliance is warranted. This is a frustrating state of affairs for an intervention that has been adopted for virtually all infants born in the United States and its territories.
Galactosemia
NBS for galactosemia is performed in every state and the District of Columbia. This condition is attributable to a genetic defect in an enzyme responsible for breaking down sugars present in milk and occurs at a rate of 1 case per 60000 neonates. Affected infants appear normal at birth but within 2 weeks can develop vomiting, irritability, hepatomegaly, jaundice, and sepsis. In the absence of early detection, death in the neonatal period is thought to occur for 20% to 30% of patients. Galactosemia among survivors is associated with developmental delays. Treatment consists of a diet low in lactose/galactose.
Enthusiasm for NBS for galactosemia developed in the 1960s and 1970s, with the identification of a valid test using dried blood spots. Clinical observations demonstrated that affected children experienced prompt resolution of symptoms with initiation of the appropriate diet. However, an important feature of galactosemia is that symptoms develop rapidly in the first 2 weeks of life, which means that the NBS system must be efficient to identify affected children before death or serious illness occurs. Evidence indicates that approximately two thirds of infants are symptomatic at the time of the report of a positive NBS result.13
As the technology developed to screen for this devastating condition, there was a strong push to initiate universal screening. Levy, an effective early advocate of NBS, wrote an article with Hammersen in 1978, in which they stated: "Galactosemia screening should be routine for all newborn infants. It is a disorder with definite and severe complications, but one in which the complications can be prevented with simple and inexpensive treatment."24 Subsequently, outcome studies showed that the situation is more complicated. In a study of 350 affected children (mean age: 9 years) published in 1990, Waggoner et al25 compared the outcomes of children diagnosed before the advent of NBS on the basis of clinical symptoms alone and children diagnosed shortly after birth by virtue of having an affected sibling. In this context, early detection on the basis of family history is a surrogate for early detection through population screening. The children diagnosed on the basis of clinical symptoms had a mean age of diagnosis of 63 days, whereas those diagnosed on the basis of family history had a mean age of diagnosis of 1 day. If early detection and treatment are effective in reducing morbidity rates, then we would expect that children diagnosed at birth would have better outcomes than children diagnosed late on the basis of clinical symptoms. Unfortunately, the results reported by Waggoner et al25 showed no statistical differences in intellectual function between these groups. Waggoner et al25 concluded, "It is clear that current methods of treatment, even if carefully followed, do little to ameliorate the long-term complications which occur in the majority of cases regardless of when treatment was begun or how successfully galactose intake was restricted." Other authors also raised concerns about our current understanding and treatment of galactosemia.26–28
The study by Waggoner et al25 did not address the efficacy of NBS for galactosemia in terms of reduced infant mortality rates. It may be that the goals of NBS for galactosemia should be stated only in terms of saving lives and not in terms of protecting intellectual function. However, because of the lack of relevant research trials, it is difficult to determine the reduction in mortality rates resulting from NBS. In an Irish study published in 1996, the authors reported 9 deaths among 62 affected children (15%) identified previously through screening, over a 20-year period.29 Eight of the deaths occurred in the first 10 years of life. This mortality rate compares with 7 unexplained infant deaths among 84 siblings of affected infants before the era of screening. With the assumption that 25% of the siblings were affected with galactosemia (an autosomal recessive condition), 21 of the 84 siblings would have been affected with galactosemia. Therefore, a mortality rate of 7 (33%) of 21 affected siblings can be estimated. This evidence suggests a reduction in mortality rates from 33% to 15% with screening, although there is potential for historical bias as well as uncertainty about the affected status of the siblings. In addition, advances in neonatal care over the past 30 years might have produced a lower contemporary mortality rate among affected infants in the absence of screening. Comparable data from the United States are not available, but a reduction in mortality rates of this magnitude would result in 12 fewer infant deaths resulting from galactosemia per year nationwide with screening, or 3 lives saved per 1 million children screened. By comparison, the sixth leading cause of infant death in the United States in 2002 was injuries, with a rate of 235 deaths per 1 million children.30
This brief analysis suggests several conclusions. The early enthusiasm for the efficacy of NBS for galactosemia has not been supported by subsequent data, with respect to the preservation of cognitive function among affected children. These data on the relative efficacy of NBS were acquired 2 decades after some states initiated screening. Early intervention seems to reduce infant mortality rates for galactosemia, but the magnitude of this benefit remains uncertain. Some children still die as a result of galactosemia, despite NBS, and clinical diagnosis can be achieved in the absence of screening. Approaches other than universal NBS have been evaluated, with promising results.31 Again, the purpose of this discussion is not to suggest that NBS for galactosemia does not have value but to highlight the limited knowledge on which this enormous public health effort is based.
Neuroblastoma
Neuroblastoma is the most common extracranial tumor among young children, with an incidence of 1 case per 7000 children.32 Better prognoses are associated with younger age and earlier stages of the disease. These features of the condition suggested that presymptomatic diagnosis and early treatment might improve the mortality rate. In addition, the tumor secretes a characteristic pattern of catecholamines, which enables detection through blood testing before the emergence of clinical symptoms. Enthusiasm for a screening approach to neuroblastoma led to the development of programs in Japan in the early 1970s. However, there was sufficient uncertainty about the efficacy of screening that 2 large screening trials were conducted, one in Germany by Schilling et al33 and the other in Canada by Woods et al.34
In the German study, almost 2.6 million children were screened for neuroblastoma in 6 of 16 German states from 1995 to 2000. There were 2.1 million children who served as control subjects in the other German states. The incidence and outcomes of neuroblastoma cases were compared between the screened and control populations over the same time period. In the Canadian study, 476654 children were screened in Quebec Province between 1989 and 1994, and the results were compared with those for children in separate control populations in Ontario, Minnesota, Florida, and the Greater Delaware Valley.
The results of both studies demonstrated no benefit from population screening, in terms of mortality rates. Of particular interest was the finding that screening identified many more children than would have been predicted on the basis of the clinical incidence of the disease. This confirmed other observations that neuroblastomas can arise and then resolve spontaneously without producing symptoms. These children might be accurately labeled as having the condition, but they represent false-positive results in the sense that they are not destined to be ill with their neuroblastomas. However, children identified as having neuroblastomas are considered for treatment because physicians may not be able to discriminate between children who will become ill and those who have tumors that will resolve spontaneously. In this situation, screening may seem to lead to improved survival rates for children with neuroblastomas, compared with historical control subjects, but this is only because screening identifies a subset of asymptomatic children who would have fared well anyway.
To illustrate this point, imagine that there are 20 children in a population with neuroblastomas identified clinically. Assume treatment cures 10 children, and 10 children die as a result of their disease. Therefore, the cure rate is 50%. After the introduction of screening, 40 children with neuroblastomas are identified but, unbeknownst to the screeners, 20 cases would have resolved spontaneously. Forty children are treated for their cancer and 10 die, as observed previously. The apparent cure rate is now 75%, an improvement of 25% that might be falsely attributed to the benefits of the screening program.
This problem is directly relevant to screening for metabolic diseases, because metabolic conditions usually entail a spectrum of severity and the spectrum may include a proportion of subjects with "abnormal" biochemical test results who will never become sick with the disease.5, 35 These neuroblastoma studies are excellent illustrations of the value of population-based research for assessment of the efficacy of screening approaches.
Another aspect of these studies worth mentioning is the use of separate but relevantly similar populations as control groups. Rather than randomize children within a region to screening versus clinical diagnosis, these studies screened an entire population and compared the outcomes with those for a comparable unscreened population during the same time period. This approach eliminates the problems with historical control data and avoids the complexities of randomizing children to 2 different groups within a population.
The final aspect of the neuroblastoma studies worth emphasizing is the ability to conduct large-scale, population-wide studies within a reasonable time frame. The German study required the collaboration of 6 of 16 states for the screening intervention and that of the remaining states for clinical data only as control populations. With uncommon conditions, no individual state could generate a sufficient number of cases to conduct such a study. Obviously this situation pertains to the United States, in which collaboration between multiple states would be essential to obtain a sufficient number of cases in a reasonable time with a population that is representative of the national population. The complexity of this interstate collaboration should not be underestimated but the obstacles should be confronted to generate high-quality data on population-based screening programs. These examples illustrate the need for a more consistent and comprehensive approach to evaluating screening tests and programs before implementation on a population-wide basis.
COLLABORATIVE RESEARCH AGENDA
A number of commentators, professional bodies, and state programs have developed criteria for deciding when a condition should be added to NBS programs.36–39 These criteria typically address the nature of the disease, the availability of a valid test, evidence for the benefits of screening, and the presence of all necessary service elements for a complete screening program. Here we are concerned primarily about the evidence for the benefits of screening. The criteria for what constitutes adequate evidence of benefit have not been established at the national level, leaving this determination up to individual state programs. The lack of established criteria and sufficient data on benefits is a central reason why there is substantial variation between states and countries regarding the conditions targeted in NBS programs.
We can imagine the confusion if drugs and devices were regulated and funded at the state level. Fortunately we have a national system of drug evaluation and approval through the Food and Drug Administration, by which drugs and devices proposed for human medical use are evaluated through a standard series of research protocols.40 Generally, human studies are pursued only after collection of data on safety in animals, when feasible. In phase I human studies, a small number of participants are involved, primarily for evaluation of safety and pharmacokinetic features. If the drug seems safe, then phase II studies involving up to several hundred participants are pursued to evaluate effectiveness. If these results are promising, then phase III studies are conducted with several hundred to thousands of individuals to assess safety, effectiveness, and dosage. Phase II studies may be performed with or without a control group, and phase III studies often use a randomized, double-blind, controlled protocol to maximize the quality of the data. With the results of these studies, the Food and Drug Administration is in a position to determine whether a drug should be licensed on a national basis for specific indications for specific population groups (such as adults or children). After approval, phase IV studies may be conducted for postmarketing evaluations of safety and efficacy in new or larger patient populations. The method is long, expensive, and by no means foolproof in terms of safety or efficacy, but it is a remarkably robust approach to the scientific assessment of drugs for medical applications.
A similar framework for the methodical evaluation of screening tests is necessary. The Institute of Medicine Committee on Assessing Genetic Risks concluded, in 1994, "The committee recommends the systematic development of basic data on the full range of genetic testing and screening services that is needed to provide a sound basis for policy development in the future."38(p306) Other authors also support a standardized approach to genetic test evaluation.41 The following is a preliminary proposal for a framework to study NBS tests and NBS programs.
There are 3 basic questions for research to address. First, does early detection and treatment of affected infants or children reduce morbidity and/or mortality rates Second, if early detection seems beneficial, does a population-based screening approach result in net benefits to affected children, compared with alternative methods of detection Third, if there are net benefits from population screening, are these benefits sufficient to warrant the use of public health resources for this purpose The proposed research framework is designed to answer these questions in sequence.
Does early detection produce better outcomes There is strong public confidence in the ability of medical science to identify signs of future disease and to act decisively to save lives.42 Screening tests have become quite prevalent in medicine, including mammograms, Pap tests, digital rectal examinations, sigmoidoscopies, amniocentesis, and measurements of blood pressure and blood glucose, cholesterol, and prostate-specific antigen levels, to name only a few. Commercial providers are now prominently advertising full-body computed tomography to the public as a method for early detection of a host of potential problems.43
However, early detection is not beneficial if medicine does not have the ability to affect the course of the disease. This is more common than popularly thought. The US Preventive Services Task Force conducts exhaustive analyses of preventive measures. The US Preventive Services Task Force supports screening for breast cancer, colon cancer, and cervical cancer, but it does not advocate population screening for cancers of the prostate, bladder, pancreas, ovaries, or lung. These decisions are based in large measure on the absence of data indicating that early detection improves outcomes.44
An inability to improve outcomes may mean that there is no ability to treat the condition at all or that there is no net benefit to early detection, as measured in a population of individuals. For some conditions, certain individuals may benefit from early detection whereas others are harmed. This can occur particularly when diagnostic or treatment procedures carry substantial risk. As noted above, this is also a concern when there is a broad spectrum of disease severity. In these situations, it may be that the individuals who are most severely affected do not benefit from early detection, those with mild or subclinical disease may be harmed by unnecessary interventions, and those with intermediate severity can obtain benefit from early detection. If clinicians cannot discriminate between these degrees of severity at the time of diagnosis, then an affected individual may experience burdensome or harmful interventions as often as an improved outcome resulting from a screening program.
STAGE I RESEARCH
For the purposes of this discussion, stage I research refers to projects that seek to determine whether early detection and intervention can improve clinical outcomes. This kind of research can be performed in a variety of ways that do not require population screening. For genetic conditions (the majority of NBS conditions), significant information can be obtained by comparing the outcomes of second affected siblings versus first affected siblings when there are discrepancies in the time of diagnosis. A first affected sibling is diagnosed often only after clinical symptoms emerge and frequently much later, after parents have pursued a "diagnostic odyssey." Once parents have been alerted to the risk for subsequent siblings, the second affected child can be diagnosed prenatally or in early infancy. If a proposed early treatment or preventive strategy is available, then a comparison of the outcomes for the first versus second (and subsequent) affected siblings provides evidence for the efficacy of the intervention. This approach can be used retrospectively, if an intervention is in use for the condition, or prospectively, through enrollment of sibling pairs at the time of diagnosis of the second affected child. The galactosemia study by Waggoner et al25 noted above is an example of this method.
A second option for stage I research is a RCT of the intervention among children diagnosed clinically. This approach is useful when the initial presentation of the condition is not devastating for the majority of children. Stated differently, it is more useful when only a subset of affected children experience the serious adverse outcome to be prevented. This is because investigators needs to know which children are affected before they can be randomized and the children cannot have already experienced the adverse outcome at the time of randomization. A good example here is penicillin prophylaxis for children with SCD. As discussed above, the study by Gaston et al15 demonstrated that children with SCD fare much better with penicillin prophylaxis, and it is an excellent example of stage I research.
A third approach to stage I research is a small-scale screening project. If there is a high-risk group that can be targeted for screening to produce a sufficient number of affected children, then a RCT of screening for the proposed intervention can be conducted. However, most conditions considered appropriate for population-wide NBS are rare enough in the general population, and not strongly associated with a particular racial or ethnic group, that targeted screening is not feasible.
An approach that is not as useful for stage I research is the use of historical data comparing children identified at a younger versus older age. Particularly when there is an association between an earlier era and the later age of diagnosis, there are many factors that may bias the comparison. More specifically, an earlier age of diagnosis in more recent eras may occur in conjunction with many other improvements in care.
The purpose of stage I research is to provide definitive data on the efficacy of early intervention. The move to population-wide screening should be made only when there is solid evidence that early detection and intervention can lead to improved morbidity and/or mortality rates.
STAGE II RESEARCH
Stage II research addresses the second question in sequence. That is, does a population-based screening approach result in net benefits to affected children, compared with alternative methods of detection The central point here is that improvements in clinical outcomes that are demonstrable through stage I research may not be achievable in population-wide programs. Conversely, the benefits of early detection may be brought to affected children through clinical detection schemes in the absence of population screening. After stage I research, the question is how best to bring the benefits of early detection to affected children.
As noted above, NBS programs entail a series of activities from the heelstick through the laboratory to the physician, the family, and ultimately a sustained intervention. Systematic weaknesses in any of these components can seriously hamper the efficacy of the program. The benefits of the SCD program may be significantly reduced by poor compliance of physicians and parents with prophylaxis and vaccination. In galactosemia and congenital adrenal hyperplasia screening programs, the primary value of screening is largely contingent on the ability of the program to provide a rapid test result before the decline and death of some infants at 2 weeks of age. Even for PKU, which represents the paradigm program for NBS, the efficacy of the program may be impaired substantially by the inability of families to obtain the special foods or to comply with the diet over time.45
An ideal design, from a scientific perspective, is the RCT. Newborns can be randomized to receive either screening for the target condition or no screening. The morbidity and/or mortality rates for the condition can be compared between affected children identified through screening and those identified clinically. To date, the only RCTs of NBS are the Wisconsin CF project initiated in 198446 and a CF screening project in the United Kingdom.47 All newborns in Wisconsin were screened after parental permission was obtained. Tests were run on all samples, but results were reviewed and disclosed for only one half of the newborns. For infants in the "unscreened" control group, results were disclosed at 4 years of age. Outcomes have been compared for the screened and unscreened groups over the past 20 years. Although the magnitude and nature of the benefits of CF NBS remain controversial, the Wisconsin trial has been critical in providing data for policy development.48
There are a number of potential problems with a randomized, controlled design from methodologic and ethical perspectives. First, if screening itself is randomized, then it may be difficult or impossible to identify all cases in the unscreened group. This creates a significant potential for bias. In the unscreened group, those who come to medical attention by virtue of clinical symptoms, or who do so at a younger age (within the window of a research project), tend to be those who are more severely affected with the condition. In contrast, a screened population would include children across the full spectrum of severity, including those who are mildly affected and those who may never become ill with the condition. Given this difference in sensitivity for detection, a comparison of outcomes for the screened and unscreened populations would show improved outcomes for the screened group even if the intervention confers no benefit. This problem is similar to "length bias"49 and is primarily a concern for studies that calculate outcome data in terms of number of deaths per affected population. This is because the denominator is expanded through screening to include mildly affected individuals, thereby decreasing the apparent death rate, compared with a group composed of only severely affected individuals.
There are at least 2 ways to address this problem. The neuroblastoma studies measured their outcomes in terms of deaths per 100000 population in the screened and unscreened populations. This eliminates the bias created by calculating death rates for the affected population. Another approach to eliminating this source of bias is to follow the approach used in the Wisconsin CF screening trial, in which blood samples were obtained for all infants but screening test results were disclosed on a randomized basis. As noted, the unscreened group had results reviewed and disclosed at 4 years of age. This allowed the research team to obtain outcome data for all affected members of the unscreened group as if they had been screened at the outset. Through this approach, subclinical cases could be identified in the unscreened group although they were never identified clinically.
A second practical problem with RCTs arises from the low incidence of most conditions targeted by NBS. Because RCTs require at least 2 approximately equivalent groups for comparison and the groups must be of a size to allow determination with sufficient power that a significant difference exists in the outcome measure, trials must be quite large for most NBS conditions. This issue is discussed in greater detail below.
The more fundamental challenges to the performance of RCTs in stage II research are ethical concerns. If early intervention has been shown to be effective in stage I research, is it ethical to randomize infants to an unscreened group50 In addressing this question, a standard approach in research ethics asks whether there is equipoise between the 2 study groups.51, 52 That is, is there genuine uncertainty in the professional community about whether an intervention under study is preferable to an alternative If there is general consensus that one option is preferable to another on the basis of solid scientific evidence, then randomization is not ethically acceptable. Conversely, if there is legitimate uncertainty about the best approach, then randomization is acceptable.
The NBS context and the proposed stage I/stage II approach offer a different level of complexity than most questions over equipoise. If stage I research demonstrates benefit, then there is no longer equipoise with respect to earlier intervention versus later intervention. However, equipoise may still exist with respect to whether the benefits of earlier intervention can be achieved through the complex mechanism of a NBS program. Therefore, the "test article" is the program, ie, the method of delivering the key intervention, rather than the intervention itself.
Let us look at the issue as if the research were to address the efficacy of a delivery method for an intervention that we know to be effective. Would it be ethical to compare a particular delivery method with no delivery at all This is analogous to comparing a placebo in a trial with an intervention of known efficacy. This is generally considered unethical, unless the risks to the placebo group are minor or there are other compelling scientific reasons to consider a placebo group.53 In the context of NBS, however, population-wide screening may be only one approach to early detection. For example, neonatal deaths resulting from congenital adrenal hyperplasia or galactosemia usually occur after symptoms have been present for several days. This symptomatic period offers the opportunity for clinical diagnosis. The more effective parents and the health care system are in recognizing and responding to characteristic symptoms in individual cases, the less marginally effective a population screening approach would be. Therefore, for these conditions, screening is an alternative not to nothing (as with a placebo) but to the health care system that is designed to respond to sick infants. When there is no ability to detect an affected child before the time when permanent damage has been done, as with PKU and congenital hypothyroidism, then an unscreened group in a RCT would be analogous to a placebo group; this study design would not be appropriate. Therefore, randomization need not be framed in terms of screening versus nothing, depending on the condition, but can be regarded as diagnosis through screening versus diagnosis through clinical care or selective screening.
Attempts can be made to promote efficient diagnosis through clinical care, with education programs for clinicians and perhaps parents, or through selective screening. A recent study in Toronto evaluated the possibility of screening for galactosemia by testing every infant <2 weeks of age who presented to the hospital for any reason and infants >2 weeks of age with clinical suspicion of galactosemia.31 The authors suggested that this selective approach to screening identifies severely affected infants with galactosemia as rapidly as does population screening.
The conclusion is that phase II screening RCTs are ethical in the context of NBS when population screening is compared with a potentially effective method of delivering a timely clinical diagnosis or with a more selective screening approach. RCTs of screening for conditions similar in their presentation to congenital adrenal hyperplasia, CF, or SCD can be justified, particularly in conjunction with efforts to enhance provider education about early clinical detection. In contrast, for conditions for which there is no prospect of early clinical detection before significant morbidity or death, a stage II RCT would not be justified.
If a RCT is not deemed ethical or feasible, an alternative is a cohort design. A prospective cohort design compares the outcomes of 2 groups that differ by virtue of the intervention in question. In this context, a screened cohort of children is compared with an unscreened cohort with respect to morbidity and mortality rates over time. For NBS, cohorts could consist of whole state newborn populations or populations of multiple states. There are several significant advantages to a cohort design for stage II screening research. Logistically, it is easier to implement a screening program in a population in a uniform manner. From an ethical perspective, the cohort design avoids explicitly assigning children to an unscreened group when screening could have been made available. Of course, infants in the unscreened cohort are not provided screening, but this is already the situation in the absence of research. After a new screening modality is introduced, some states take years to consider or to implement the program, whereas others are more rapid adopters, providing the opportunity for a comparison of cohorts according to state.
There are 2 principal drawbacks to the cohort design. The first is the potential bias created by comparing populations that may differ with respect to a number of variables in addition to the variable in question (screening). State populations may differ with respect to factors such as socioeconomic status, racial mixture, disease prevalence, health care services, insurance coverage, and efficacy of the NBS programs. If differences in morbidity or mortality rates are found between cohorts, there may be residual concern that the explanation does not depend on screening. The second problem inherent in the cohort design in this context involves the ability to identify and to monitor affected individuals in the unscreened cohort. A screening program establishes the population prevalence at an individual level and creates the infrastructure for tracking. Without a screening program, there is unlikely to be a comprehensive registry of affected children. Furthermore, children who might have died at a young age as a result of the condition might not have been identified as affected or their condition might not have been recorded in a retrievable manner. Many affected children may be known to subspecialty physicians in regional referral centers, but these are likely to be more severely affected children.
This latter problem of defining the affected group in the unscreened population is a fundamental challenge. If a cohort design is used for stage II research, then the unscreened cohort must be evaluated as thoroughly as possible to identify affected children. One way to address this problem adds a retrospective component to the project. In many states, residual NBS samples are stored for variable lengths of time, from months to decades.54, 55 If the analyte is stable with time, then stored NBS samples can be screened for the condition in question at a time when differences in morbidity or mortality rates between the screened and unscreened groups would be expected. Children identified as affected through retrospective screening of residual samples could be traced and their health status measured and compared with that of children identified prospectively through screening. Children who died before a diagnosis was made also could be identified through this approach. Furthermore, children who were mildly affected and never came to clinical recognition would be identified. Identification and tracking would not be 100% with this method, but this approach is likely to be much more comprehensive than other forms of identification. This approach would require the retention and availability of residual NBS samples. A discussion of the extent and content of parental information or permission for this kind of research would be important.3
For stage II research, the best approach from a scientific perspective is a RCT. However, this approach is likely to be expensive, and ethical concerns may be prominent. Nevertheless, a randomized design is justifiable in some circumstances. In other circumstances, a cohort design with retrospective screening of the initially unscreened cohort is most appropriate from both scientific and ethical perspectives.
STAGE III AND STAGE IV RESEARCH
Stage III research addresses the relative costs of a population-wide screening program. Stage III research may demonstrate benefits of screening, but decisions about implementation will be dependent on estimates of the costs necessary to achieve the benefits. Cost-benefit and cost-effectiveness analyses may be feasible with data obtained in stage II projects. An economic analysis may reveal that the benefits do not justify the costs of the program.
To date, cost-benefit and cost-effectiveness analyses of newborn metabolic screening have been limited by the lack of solid data on a number of variables. Economic analyses are often contingent on a variety of assumptions under the best of circumstances, including program costs, health care costs, test parameters, effectiveness of interventions, and the economic value of intangible factors such as anxiety with false-positive results and knowledge for future reproductive decision-making. Overall, the track record for cost-benefit analyses for NBS has been described as "unimpressive."56 Nevertheless, economic analyses are important for policy decisions. Several recent reports addressed MS/MS as an emerging technology. Schoen et al57 found that MS/MS compares favorably with other mass screening programs on a cost-benefit basis. In contrast, Pandor et al,58 in a systematic analysis in the United Kingdom, concluded that the evidence supports the use of MS/MS for PKU and MCAD but sufficient evidence for screening for other conditions is lacking. Both studies revealed the need for additional data to estimate actual costs and benefits. Despite the volume of literature on NBS for CF, Grosse et al48 noted that a full cost-effectiveness analysis has not been performed. Under the proposed research scheme, stage II projects could be designed to collect data on costs and benefits in a manner conducive to stage III economic analysis.
Stage IV research involves projects designed to evaluate established programs on an ongoing basis. To date, state NBS programs have a limited ability to conduct formal program evaluations and quality assurance activities. The American Academy of Pediatrics/Health Resources and Services Administration Newborn Screening Task Force1 and the Council of Regional Networks for Genetic Services59 place a strong emphasis on the funding and development of these activities. Effective programs require periodic evaluation because of changes in test technology, program organization, population demographic features, and health care resources.
COLLABORATION
Central to the ability to conduct stage II and stage III research is a population of sufficient size. Because of the low incidence of many conditions, multiple states must collaborate with a single protocol to achieve adequate statistical power to draw timely conclusions about the efficacy of a screening strategy. Traditionally, development of multistate research collaborations has been a significant challenge, and such collaborations have not been common in the NBS literature beyond survey projects. New federal initiatives may help foster larger-scale, multistate projects.
Title XXVI of the Children's Health Act of 2000, Screening for Heritable Disorders, establishes a program to improve the ability of states to provide newborn and child screening. The Act "authorizes the Secretary to award grants to States, or a political subdivision of a State, or a consortium of two or more States, or political subdivisions of States to enhance, improve or expand the ability of States and local public health agencies to provide screening, counseling or health care services to newborns and children having or at risk for heritable disorders."60 Furthermore, the Act "authorizes the Secretary to award grants to eligible entities to provide for the conduct of demonstration programs to evaluate the effectiveness of screening, counseling or health care services in reducing the morbidity and mortality caused by heritable disorders in newborns and children."60 To assist in this process, the Secretary of Health and Human Services recently established the Advisory Committee on Heritable Disorders in Newborns and Children. The tasks of the committee are to provide advice and recommendations to the Secretary concerning the grants and projects authorized under the Act.
In addition, Title V of the Social Security Act provided funding for 2 new initiatives. A national coordinating center for NBS is being established, and regional genetic services and NBS collaborative systems have been created. Seven national regions have been created to "enhance and support the genetics and newborn screening capacity of States across the nation by undertaking a regional approach toward addressing the maldistribution of genetic resources. These grants are expected to improve the health of children and their families by promoting the translation of genetic medicine into public health and health care services."61
These national priorities and funding opportunities represent an exciting development in the care of children. The state-level organization of NBS services is an accident of history but should not be a barrier to evidence-based analyses of the benefits and risks of these complex programs. The adoption of an accepted sequence of research protocols through multistate collaborations should greatly facilitate the translation of research into effective public health programs. Ultimately, there are no serious methodologic or ethical barriers to conducting stage I, stage II, and stage III research to demonstrate the efficacy of NBS modalities before the implementation of population-based programs.
ACKNOWLEDGMENTS
My thanks go to Mary Ann Bailey, PhD, and colleagues at the Hastings Center for supporting this work under grant 1 R01 HG02579.
FOOTNOTES
Accepted Jan 12, 2005.
No conflict of interest declared.
REFERENCES
American Academy of Pediatrics/Health Resources and Services Administration Newborn Screening Task Force. Serving the family from birth to the medical home: newborn screening: a blueprint for the future: a call for a national agenda on state newborn screening programs. Pediatrics. 2000;106 :389 –422
Begley S. Research involving tests on newborns highlights need for stricter ethics. Wall Street Journal. May 3, 2002
Taylor HA, Wilfond BS. Ethical issues in newborn screening research: lessons from the Wisconsin cystic fibrosis trial. J Pediatr. 2004;145 :292 –296
New York State Task Force on Life and the Law. Genetic Testing and Screening in the Age of Genomic Medicine. Albany, NY: Health Education Services; 2000:143
Wilcken B, Wiley V, Hammond J, Carpenter K. Screening newborns for inborn errors of metabolism by tandem mass spectrometry. N Engl J Med. 2003;348 :2304 –2312
Elliman D, Dezateux C, Bedford HE. Newborn and childhood screening programmes: criteria, evidence, and current policy. Arch Dis Child. 2002;87 :6
Wilfond B. Screening policy for cystic fibrosis: the role of evidence. Hastings Cent Rep. 1995;25 :S21 –S23
Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359 :881 –884
Miller AB. The ethics, the risks and the benefits of screening. Biomed Pharmacother. 1988;42 :439 –442
Mant D, Fowler G. Mass screening: theory and ethics. Br Med J. 1990;300 :916 –918
Kosters JP, Gotzsche PC. Regular self-examination or clinical examination for early detection of breast cancer. Cochrane Database Syst Rev. 2003;(2):CD003373
Feldman W. How serious are the adverse effects of screening J Gen Intern Med. 1990;5 :S50 –S53
American Academy of Pediatrics, Committee on Genetics. Newborn screening fact sheet. Pediatrics. 1996;98 :473 –501
Gaston MH, Verter JI, Woods G, et al. Prophylaxis with oral penicillin in children with sickle cell anemia: a randomized trial. N Engl J Med. 1986;314 :1593 –1599
National Institutes of Health Consensus Conference. Newborn screening for sickle cell disease and other hemoglobinopathies. JAMA. 1987;258 :1205 –1209
Ramgoolam A, Steele R. Formulations of antibiotics for children in primary care: effects on compliance and efficacy. Paediatr Drugs. 2002;4 :323 –333
Centers for Disease Control and Prevention. Update: newborn screening for sickle cell disease: California, Illinois, and New York, 1998. MMWR Morb Mortal Wkly Rep. 2000;49 :729 –731
Sox CM, Cooper WO, Koepsell TD, DiGiuseppe DL, Christakis DA. Provision of pneumococcal prophylaxis for publicly insured children with sickle cell disease. JAMA. 2003;290 :1057 –1061
Teach SJ, Lillis KA, Grossi M. Compliance with penicillin prophylaxis in patients with sickle cell disease. Arch Pediatr Adolesc Med. 1998;152 :274 –278
Wurst KE, Sleath BL. Physician knowledge and adherence to prescribing antibiotic prophylaxis for sickle cell disease. Int J Qual Health Care. 2004;16 :245 –251
Quinn CT, Rogers ZR, Buchanan GR. Survival of children with sickle cell disease. Blood. 2004;103 :4023 –4027
Lees CM, Davies S, Dezateux C. Neonatal screening for sickle cells disease [Cochrane review]. In: The Cochrane Library. Issue 3. Chichester, United Kingdom: John Wiley & Sons; 2004
Levy H, Hammersen G. Newborn screening for galactosemia and other galactose metabolic defects. J Pediatr. 1978;92 :871 –877
Waggoner DD, Buist NR, Donnell GN. Long-term prognosis in galactosemia: results of survey of 350 cases. J Inherit Metab Dis. 1990;13 :802 –818
Gitzelmann R, Steinmann B. Galactosemia: how does long-term treatment change the outcome Enzyme. 1984;32 :37 –46
Matalon R. Galactosemia: promise, frustration and challenge. J Am Coll Nutr. 1997;16 :190 –191
Widhalm K, Miranda de Cruz B, Koch M. Diet does not ensure normal development in galactosemia. J Am Coll Nutr. 1997;16 :204 –208
Badawi N, Cahalane SF, McDonald M, et al. Galactosaemia—a controversial disorder. Screening and outcome. Ireland 1972–1992. Ir Med J. 1996;89 :16 –17
Shah V, Friedman S, Moore AM, Platt BA, Feigenbaum AS. Selective screening for neonatal galactosemia: an alternative approach. Acta Paediatr. 2001;90 :948 –949
Castleberry RP. Biology and treatment of neuroblastoma. Pediatr Clin North Am. 1997;44 :919 –937
Schilling FH, Spix C, Berthold F, et al. Neuroblastoma screening at one year of age. N Engl J Med. 2002;346 :1047 –1053
Woods WG, Gao RN, Shuster JJ, et al. Screening of infants and mortality due to neuroblastoma. N Engl J Med. 2002;346 :1041 –1046
Refsum H, Fedriksen A, Meyer K, Ueland P, Kase BF. Birth prevalence of homocystinuria. J Pediatr. 2004;144 :830 –832
Wilson JMG, Jungner G. Principles and Practice of Screening for Disease. Geneva, Switzerland: World Health Organization; 1968
National Research Council, Committee for the Study of Inborn Errors of Metabolism. Genetic Screening: Programs, Principles, and Research. Washington, DC: National Academy of Sciences; 1975
Andrews LB, Fullarton JE, Holtzman NA, Motulsky AG, eds. Assessing Genetic Risks: Implications for Health and Social Policy. Washington, DC: National Academy of Sciences; 1994
National Institutes of Health. Promoting Safe and Effective Genetic Testing in the United States: Final Report of the Task Force on Genetic Testing. Bethesda, MD: National Institutes of Health; 1997
Burke W, Atkins D, Gwinn M, et al. Genetic test evaluation: information needs of clinicians, policy makers, and the public. Am J Epidemiol. 2002;156 :311 –318
Russell LB. Educated Guesses: Making Policy About Medical Screening Tests. Berkeley, CA: University of California Press; 1994
National Institutes of Health, Consensus Development Panel. National Institutes of Health Consensus Development Conference Statement: phenylketonuria: screening and management, October 16–18, 2000. Pediatrics. 2001;108 :972 –982
Farrell PM, Kosorok MR, Rock MJ, et al. Early diagnosis of cystic fibrosis through neonatal screening prevents severe malnutrition and improves long-term growth. Pediatrics. 2001;107 :1 –13
Chatfield S, Owen G, Ryley HC, et al. Neonatal screening for cystic fibrosis in Wales and the West Midlands: clinical assessment after five years of screening. Arch Dis Child. 1991;66 :29 –33
Grosse SD, Boyle CA, Botkin JR, et al. Newborn screening for cystic fibrosis: evaluation of benefits and risks and recommendations for state newborn screening programs. MMWR Recomm Rep. 2004;53 (RR-13):1–36
Wilcken B. Ethical issues in newborn screening and the impact of new technologies. Eur J Pediatr. 2003;162 :S62 –S66
Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317 :141 –145
Miller PB, Weijer C. Rehabilitating equipoise. Kennedy Inst Ethics J. 2003;13 :93 –118
Mandl KD, Felt S, Larson C, Kohane IA. Newborn screening program practices in the United States: notification, research and consent. Pediatrics. 2002;109 :269 –273
Therrell BL, Hannon WH, Pass KA, et al. Guidelines for the retention, storage, and use of residual dried blood spot samples after newborn screening analysis: statement of the Council of Regional Networks for Genetic Services. Biochem Mol Med. 1996;57 :116 –124
Pollitt RJ. Newborn mass screening versus selective investigation: benefits and costs. J Inherit Metab Dis. 2001;24 :299 –302
Schoen EJ, Baker JC, Colby CJ, To TT. Cost-benefit analysis of universal tandem spectrometry for newborn screening. Pediatrics. 2002;110 :781 –786
Pandor A, Eastham J, Beverley C, Chilcott J, Paisley S. Clinical effectiveness and cost-effectiveness of neonatal screening for inborn errors of metabolism using tandem mass spectrometry: a systematic review. Health Technol Assess. 2004;8(12) :1 –121
Pass KA, Lane PA, Fernhoff PM, et al. US newborn screening system guidelines II: follow-up of children, diagnosis, management, and evaluation: statement of the Council of Regional Networks for Genetic Services (CORN). Pediatrics. 2000;137(suppl) :S1 –S46
Social Security Act, Title V, 501(a)(2)(Jeffrey R. Botkin, MD, MP)
ABSTRACT
Newborn metabolic screening represents the largest application of genetic testing in medicine. As new technologies are developed, the number of conditions amenable to newborn screening (NBS) will continue to expand. Despite the scope of these programs, the evidence base for a number of NBS applications remains relatively weak. This article briefly reviews the evidence base for several conditions. The article then develops a proposal for a structured sequence of research protocols to evaluate potential applications for NBS before their formal implementation in public health programs. Such a framework for research will require collaboration between states and the federal government, a collaboration that is emerging through recent federal legislation and funding.
Key Words: newborn screening ethics research
Abbreviations: NBS, newborn screening PKU, phenylketonuria MCAD, medium-chain acyl-coenzyme A dehydrogenase deficiency MS/MS, tandem mass spectrometry RCT, randomized, controlled trial SCD, sickle cell disease CF, cystic fibrosis
Newborn metabolic screening is conducted for 4 million infants per year and represents the largest single application of genetic testing in medicine. Newborn screening (NBS) programs traditionally are run by state public health departments, although there is an emerging commercial sector for the provision of these services. Screening for phenylketonuria (PKU) was initiated in the 1960s, and subsequently the number of conditions on the NBS panels increased considerably. However, there is a broad range among states in the number of conditions targeted, from 4 to >40. With the advent of new technology such as tandem mass spectrometry (MS/MS) and the recognition of the substantial variability between programs, an active national discussion has emerged to support states in bringing to children high-quality services that are effective and efficient.1
Unfortunately, there are significant barriers to conducting research on the efficacy of NBS programs. The basic question relevant to efficacy is whether morbidity and/or mortality rates are reduced for affected children identified through a universal screening program, compared with outcomes after clinical diagnosis or selective screening. Assessing the efficacy of universal screening requires a basis on which to make this comparison, with both short-term and long-term outcomes in mind. However, state departments of health often do not have funds to conduct evaluations of established programs beyond counts of true-positive, false-positive, and true-negative results and laboratory quality assessments. Programs typically do not make systematic attempts to identify affected children who had false-negative results or to evaluate formally the longer-term health benefits for affected children. Also, many of the conditions targeted in NBS programs are rare, meaning that most states identify only a few affected children with each condition per year. This makes outcome studies with sufficient statistical power through state-based projects virtually impossible in all except the largest states.
A more fundamental barrier to research in NBS is ethical concern regarding the use of randomized, controlled trials (RCTs), which are usually considered the standard in research design. The ethical concern arises when an apparently clinically beneficial intervention for affected children is proposed as a component of a population-based screening program. It becomes ethically problematic to propose a control arm for a study in which screening is not provided to a segment of the population, although the efficacy of the screening approach is unproven. This is a question of scale; can we be confident that interventions that are effective on a smaller project scale will be effective when implemented on a population basis To date, the only RCT of NBS in the United States is the Wisconsin cystic fibrosis (CF) project. The Wisconsin CF project has been valuable in addressing the efficacy of CF NBS but the project design, involving randomization, has been the focus of criticism in the lay press and ethical discussion in the professional literature.2, 3 This project is discussed in more detail below. In the absence of randomized designs, research on NBS often is observational after implementation of screening, with either historical control data or control through comparisons with similar populations without screening.
These barriers to research are formidable. Despite the use of this technology for 4 million infants per year in the United States and many more internationally in the past 3 decades, the research basis remains relatively poor. The New York State Task Force on Life and the Law stated, in its 2000 publication on genetic testing, "In fact, only a minority of newborn screening tests that are currently performed have been demonstrated formally to have both clinical validity and utility."4 Wilcken et al,5 in a 2003 publication, concluded more broadly, "Formal evidence of the clinical effectiveness of newborn screening is lacking." Currently, many states are adopting MS/MS for NBS programs, despite uncertainties regarding the sensitivities and specificities of the tests and the natural history and treatability of many conditions identified. Of the 30 conditions detectable with MS/MS, medium-chain acyl-coenzyme A dehydrogenase deficiency (MCAD) shows the greatest promise in terms of screening efficacy for children. Nevertheless, Elliman et al6 observed in 2002, "Despite international experience of screening well over a million newborn infants [for MCAD] ... there has been no report of a systematic follow-up of longer term outcome in affected infants detected by screening." Therefore, although we know that MS/MS can detect affected children and that early intervention can be lifesaving, we remain uncertain about the nature and magnitude of the longer-term benefits of actual population screening programs.
The implication of these concerns is not only that some modalities of NBS may prove to be ineffective when evaluated formally. Experience over several decades and a body of observational research lend support to many of these programs. In this era of evidence-based medicine, however, a less-than-rigorous approach to research on these large, expensive, and important, public health programs is no longer appropriate.7–9 The recent history of medicine illustrates how research on popular screening programs can reveal limited efficacy; hospital admission chest films10 and breast self-examinations11 are prominent examples. Furthermore, not only does research identify screening programs that are ineffective and/or harmful,12 but formal evaluation can identify aspects of valuable programs that reduce efficacy in critical ways. Therefore, the goals of research are not only to make policy decisions to adopt or forgo population screening but also to design programs to maximize benefits and to minimize harm. This article reviews several examples of NBS that illustrate the strengths and weaknesses of the empirical foundation for screening. A proposal to develop a national framework for research on NBS is then outlined.
EXAMPLES
Hemoglobinopathies
Screening for hemoglobinopathies is a component of NBS programs in 49 states and the District of Columbia. Sickle cell disease (SCD) is the primary condition of interest, although other hemoglobinopathies are also detected.13 SCD occurs most commonly in the United States among African Americans, with an incidence at birth of 1 case per 375 infants. Other population groups are affected with incidences of 1 case per 3000 Native American infants, 1 case per 20000 Hispanic infants, and 1 case per 60000 white infants. Young children with SCD are susceptible to systemic infections with Streptococcus pneumoniae at a rate of 8 episodes per 100 person-years, with a case fatality rate of 35%.14 Early detection of SCD for an infant permits the prophylactic administration of penicillin to prevent pneumococcal infections, in addition to vaccination with Hemophilus influenza type b and pneumococcal vaccines.
The seminal study demonstrating the efficacy of preventive therapy was published in 1986 by Gaston et al.15 This multicenter clinical trial randomized children <3 years of age with SCD to either penicillin or placebo. The study was terminated after 15 months of follow-up monitoring when results indicated substantial reductions in infection and mortality rates in the treatment group. These impressive results led to a federally sponsored consensus conference in 1987.16 The conference concluded, "The benefits of screening are so compelling that universal screening should be provided. State law should mandate the availability of these services while permitting parental refusal." Furthermore, the conference concluded, "To be effective, neonatal screening must be part of a comprehensive program for the care of sickle cell patients and their families."
The study by Gaston et al15 clearly demonstrated the efficacy of penicillin prophylaxis in reducing morbidity and mortality rates for young children with SCD who were monitored in a longitudinal research environment. However, a key question is whether the efficacy of a preventive treatment can be maintained when expanded to a population level as part of a routine public health program. It is worth emphasizing that the impressive benefits of penicillin prophylaxis demonstrated by Gaston et al15 were for children diagnosed clinically, not through NBS. Therefore, the benefits added by NBS are for the subsets of affected children who die or become seriously ill before a clinical diagnosis.
NBS is more than a test and an intervention; it must be viewed as a system involving a chain of decisions and actions from the heelstick of the infant through the laboratory, the health department, the primary care provider, and the parents to the effective delivery and maintenance of long-term treatment for the child. Any system is only as good as its weakest link, and the efficacy of all NBS programs is contingent on the integrity of this chain.
Gaston et al15 recognized that compliance with the penicillin regimen was critical to the success of prophylaxis, and one of the valuable aspects of their study was the administration of penicillin by mouth rather than injection. Prophylaxis with penicillin administered through injection is painful and requires frequent visits to the clinic, leading to poor compliance. However, compliance is also poor for many orally administered medicines or diets among both adults and children.17 A parallel problem involves physicians who are not compliant with standard-of-care measures. A report by the Centers for Disease Control and Prevention in 2000 presented data from 1998 on compliance in NBS programs for SCD in California, Illinois, and New York.18 Parents reported that 93% of the children received regular penicillin therapy and 75% had received the pneumococcal vaccine. However, 76% of physicians reported providing penicillin prophylaxis to their patients, and they estimated that only 44% of parents were compliant. Only 25% of patients had received the pneumococcal vaccine. A recent study of children with SCD receiving Medicaid in Washington and Tennessee found that enough prophylactic antibiotic was dispensed to cover only 40% of the year-long study period.19 Teach et al20 found a penicillin compliance rate of 43% among children with SCD, as measured with urine assays. Other reports also illustrated compliance problems with the SCD prophylactic regimen.21
The implication of these data are that it is difficult to know the magnitude of the benefit for NBS for SCD. The general consensus in the literature is that mortality and morbidity rates for young children are decreased with NBS,22 but acquiring definitive data to draw this conclusion is challenging for several reasons. First, there has been no formal controlled trial of NBS for SCD. Comparison with historical mortality rates can provide useful information, but historical control data may be biased because of changes in health care with time. Second, the adverse outcomes preventable with screening for SCD occur for a minority of affected children, whether or not prophylactic interventions are used; therefore, it is difficult to identify the benefits of screening without carefully tracking large populations of affected children over time. The ability to track the health outcomes of a large cohort of children is not a feature of our health care system. Third, the almost-universal use of SCD NBS makes it impossible to compare otherwise comparable states that use and do not use NBS for this condition.
A recent Cochrane review identified no RCTs of NBS for SCD.23 The reviewers concluded, "There is however evidence of benefit from early commencement of treatment in SCD, which is made possible by screening in the neonatal period. ... Information from a well designed prospective RCT of neonatal screening is desirable to make recommendations for practice. However such trials may now be considered unethical in view of the proven benefit of early prophylactic treatment with penicillin."23
The conclusion here is that NBS for SCD probably is effective in saving many lives per year, but we do not have solid data to demonstrate this efficacy or to define the magnitude of the benefits. It is too late to conduct an efficacy trial of population screening, but additional work on enhancing compliance is warranted. This is a frustrating state of affairs for an intervention that has been adopted for virtually all infants born in the United States and its territories.
Galactosemia
NBS for galactosemia is performed in every state and the District of Columbia. This condition is attributable to a genetic defect in an enzyme responsible for breaking down sugars present in milk and occurs at a rate of 1 case per 60000 neonates. Affected infants appear normal at birth but within 2 weeks can develop vomiting, irritability, hepatomegaly, jaundice, and sepsis. In the absence of early detection, death in the neonatal period is thought to occur for 20% to 30% of patients. Galactosemia among survivors is associated with developmental delays. Treatment consists of a diet low in lactose/galactose.
Enthusiasm for NBS for galactosemia developed in the 1960s and 1970s, with the identification of a valid test using dried blood spots. Clinical observations demonstrated that affected children experienced prompt resolution of symptoms with initiation of the appropriate diet. However, an important feature of galactosemia is that symptoms develop rapidly in the first 2 weeks of life, which means that the NBS system must be efficient to identify affected children before death or serious illness occurs. Evidence indicates that approximately two thirds of infants are symptomatic at the time of the report of a positive NBS result.13
As the technology developed to screen for this devastating condition, there was a strong push to initiate universal screening. Levy, an effective early advocate of NBS, wrote an article with Hammersen in 1978, in which they stated: "Galactosemia screening should be routine for all newborn infants. It is a disorder with definite and severe complications, but one in which the complications can be prevented with simple and inexpensive treatment."24 Subsequently, outcome studies showed that the situation is more complicated. In a study of 350 affected children (mean age: 9 years) published in 1990, Waggoner et al25 compared the outcomes of children diagnosed before the advent of NBS on the basis of clinical symptoms alone and children diagnosed shortly after birth by virtue of having an affected sibling. In this context, early detection on the basis of family history is a surrogate for early detection through population screening. The children diagnosed on the basis of clinical symptoms had a mean age of diagnosis of 63 days, whereas those diagnosed on the basis of family history had a mean age of diagnosis of 1 day. If early detection and treatment are effective in reducing morbidity rates, then we would expect that children diagnosed at birth would have better outcomes than children diagnosed late on the basis of clinical symptoms. Unfortunately, the results reported by Waggoner et al25 showed no statistical differences in intellectual function between these groups. Waggoner et al25 concluded, "It is clear that current methods of treatment, even if carefully followed, do little to ameliorate the long-term complications which occur in the majority of cases regardless of when treatment was begun or how successfully galactose intake was restricted." Other authors also raised concerns about our current understanding and treatment of galactosemia.26–28
The study by Waggoner et al25 did not address the efficacy of NBS for galactosemia in terms of reduced infant mortality rates. It may be that the goals of NBS for galactosemia should be stated only in terms of saving lives and not in terms of protecting intellectual function. However, because of the lack of relevant research trials, it is difficult to determine the reduction in mortality rates resulting from NBS. In an Irish study published in 1996, the authors reported 9 deaths among 62 affected children (15%) identified previously through screening, over a 20-year period.29 Eight of the deaths occurred in the first 10 years of life. This mortality rate compares with 7 unexplained infant deaths among 84 siblings of affected infants before the era of screening. With the assumption that 25% of the siblings were affected with galactosemia (an autosomal recessive condition), 21 of the 84 siblings would have been affected with galactosemia. Therefore, a mortality rate of 7 (33%) of 21 affected siblings can be estimated. This evidence suggests a reduction in mortality rates from 33% to 15% with screening, although there is potential for historical bias as well as uncertainty about the affected status of the siblings. In addition, advances in neonatal care over the past 30 years might have produced a lower contemporary mortality rate among affected infants in the absence of screening. Comparable data from the United States are not available, but a reduction in mortality rates of this magnitude would result in 12 fewer infant deaths resulting from galactosemia per year nationwide with screening, or 3 lives saved per 1 million children screened. By comparison, the sixth leading cause of infant death in the United States in 2002 was injuries, with a rate of 235 deaths per 1 million children.30
This brief analysis suggests several conclusions. The early enthusiasm for the efficacy of NBS for galactosemia has not been supported by subsequent data, with respect to the preservation of cognitive function among affected children. These data on the relative efficacy of NBS were acquired 2 decades after some states initiated screening. Early intervention seems to reduce infant mortality rates for galactosemia, but the magnitude of this benefit remains uncertain. Some children still die as a result of galactosemia, despite NBS, and clinical diagnosis can be achieved in the absence of screening. Approaches other than universal NBS have been evaluated, with promising results.31 Again, the purpose of this discussion is not to suggest that NBS for galactosemia does not have value but to highlight the limited knowledge on which this enormous public health effort is based.
Neuroblastoma
Neuroblastoma is the most common extracranial tumor among young children, with an incidence of 1 case per 7000 children.32 Better prognoses are associated with younger age and earlier stages of the disease. These features of the condition suggested that presymptomatic diagnosis and early treatment might improve the mortality rate. In addition, the tumor secretes a characteristic pattern of catecholamines, which enables detection through blood testing before the emergence of clinical symptoms. Enthusiasm for a screening approach to neuroblastoma led to the development of programs in Japan in the early 1970s. However, there was sufficient uncertainty about the efficacy of screening that 2 large screening trials were conducted, one in Germany by Schilling et al33 and the other in Canada by Woods et al.34
In the German study, almost 2.6 million children were screened for neuroblastoma in 6 of 16 German states from 1995 to 2000. There were 2.1 million children who served as control subjects in the other German states. The incidence and outcomes of neuroblastoma cases were compared between the screened and control populations over the same time period. In the Canadian study, 476654 children were screened in Quebec Province between 1989 and 1994, and the results were compared with those for children in separate control populations in Ontario, Minnesota, Florida, and the Greater Delaware Valley.
The results of both studies demonstrated no benefit from population screening, in terms of mortality rates. Of particular interest was the finding that screening identified many more children than would have been predicted on the basis of the clinical incidence of the disease. This confirmed other observations that neuroblastomas can arise and then resolve spontaneously without producing symptoms. These children might be accurately labeled as having the condition, but they represent false-positive results in the sense that they are not destined to be ill with their neuroblastomas. However, children identified as having neuroblastomas are considered for treatment because physicians may not be able to discriminate between children who will become ill and those who have tumors that will resolve spontaneously. In this situation, screening may seem to lead to improved survival rates for children with neuroblastomas, compared with historical control subjects, but this is only because screening identifies a subset of asymptomatic children who would have fared well anyway.
To illustrate this point, imagine that there are 20 children in a population with neuroblastomas identified clinically. Assume treatment cures 10 children, and 10 children die as a result of their disease. Therefore, the cure rate is 50%. After the introduction of screening, 40 children with neuroblastomas are identified but, unbeknownst to the screeners, 20 cases would have resolved spontaneously. Forty children are treated for their cancer and 10 die, as observed previously. The apparent cure rate is now 75%, an improvement of 25% that might be falsely attributed to the benefits of the screening program.
This problem is directly relevant to screening for metabolic diseases, because metabolic conditions usually entail a spectrum of severity and the spectrum may include a proportion of subjects with "abnormal" biochemical test results who will never become sick with the disease.5, 35 These neuroblastoma studies are excellent illustrations of the value of population-based research for assessment of the efficacy of screening approaches.
Another aspect of these studies worth mentioning is the use of separate but relevantly similar populations as control groups. Rather than randomize children within a region to screening versus clinical diagnosis, these studies screened an entire population and compared the outcomes with those for a comparable unscreened population during the same time period. This approach eliminates the problems with historical control data and avoids the complexities of randomizing children to 2 different groups within a population.
The final aspect of the neuroblastoma studies worth emphasizing is the ability to conduct large-scale, population-wide studies within a reasonable time frame. The German study required the collaboration of 6 of 16 states for the screening intervention and that of the remaining states for clinical data only as control populations. With uncommon conditions, no individual state could generate a sufficient number of cases to conduct such a study. Obviously this situation pertains to the United States, in which collaboration between multiple states would be essential to obtain a sufficient number of cases in a reasonable time with a population that is representative of the national population. The complexity of this interstate collaboration should not be underestimated but the obstacles should be confronted to generate high-quality data on population-based screening programs. These examples illustrate the need for a more consistent and comprehensive approach to evaluating screening tests and programs before implementation on a population-wide basis.
COLLABORATIVE RESEARCH AGENDA
A number of commentators, professional bodies, and state programs have developed criteria for deciding when a condition should be added to NBS programs.36–39 These criteria typically address the nature of the disease, the availability of a valid test, evidence for the benefits of screening, and the presence of all necessary service elements for a complete screening program. Here we are concerned primarily about the evidence for the benefits of screening. The criteria for what constitutes adequate evidence of benefit have not been established at the national level, leaving this determination up to individual state programs. The lack of established criteria and sufficient data on benefits is a central reason why there is substantial variation between states and countries regarding the conditions targeted in NBS programs.
We can imagine the confusion if drugs and devices were regulated and funded at the state level. Fortunately we have a national system of drug evaluation and approval through the Food and Drug Administration, by which drugs and devices proposed for human medical use are evaluated through a standard series of research protocols.40 Generally, human studies are pursued only after collection of data on safety in animals, when feasible. In phase I human studies, a small number of participants are involved, primarily for evaluation of safety and pharmacokinetic features. If the drug seems safe, then phase II studies involving up to several hundred participants are pursued to evaluate effectiveness. If these results are promising, then phase III studies are conducted with several hundred to thousands of individuals to assess safety, effectiveness, and dosage. Phase II studies may be performed with or without a control group, and phase III studies often use a randomized, double-blind, controlled protocol to maximize the quality of the data. With the results of these studies, the Food and Drug Administration is in a position to determine whether a drug should be licensed on a national basis for specific indications for specific population groups (such as adults or children). After approval, phase IV studies may be conducted for postmarketing evaluations of safety and efficacy in new or larger patient populations. The method is long, expensive, and by no means foolproof in terms of safety or efficacy, but it is a remarkably robust approach to the scientific assessment of drugs for medical applications.
A similar framework for the methodical evaluation of screening tests is necessary. The Institute of Medicine Committee on Assessing Genetic Risks concluded, in 1994, "The committee recommends the systematic development of basic data on the full range of genetic testing and screening services that is needed to provide a sound basis for policy development in the future."38(p306) Other authors also support a standardized approach to genetic test evaluation.41 The following is a preliminary proposal for a framework to study NBS tests and NBS programs.
There are 3 basic questions for research to address. First, does early detection and treatment of affected infants or children reduce morbidity and/or mortality rates Second, if early detection seems beneficial, does a population-based screening approach result in net benefits to affected children, compared with alternative methods of detection Third, if there are net benefits from population screening, are these benefits sufficient to warrant the use of public health resources for this purpose The proposed research framework is designed to answer these questions in sequence.
Does early detection produce better outcomes There is strong public confidence in the ability of medical science to identify signs of future disease and to act decisively to save lives.42 Screening tests have become quite prevalent in medicine, including mammograms, Pap tests, digital rectal examinations, sigmoidoscopies, amniocentesis, and measurements of blood pressure and blood glucose, cholesterol, and prostate-specific antigen levels, to name only a few. Commercial providers are now prominently advertising full-body computed tomography to the public as a method for early detection of a host of potential problems.43
However, early detection is not beneficial if medicine does not have the ability to affect the course of the disease. This is more common than popularly thought. The US Preventive Services Task Force conducts exhaustive analyses of preventive measures. The US Preventive Services Task Force supports screening for breast cancer, colon cancer, and cervical cancer, but it does not advocate population screening for cancers of the prostate, bladder, pancreas, ovaries, or lung. These decisions are based in large measure on the absence of data indicating that early detection improves outcomes.44
An inability to improve outcomes may mean that there is no ability to treat the condition at all or that there is no net benefit to early detection, as measured in a population of individuals. For some conditions, certain individuals may benefit from early detection whereas others are harmed. This can occur particularly when diagnostic or treatment procedures carry substantial risk. As noted above, this is also a concern when there is a broad spectrum of disease severity. In these situations, it may be that the individuals who are most severely affected do not benefit from early detection, those with mild or subclinical disease may be harmed by unnecessary interventions, and those with intermediate severity can obtain benefit from early detection. If clinicians cannot discriminate between these degrees of severity at the time of diagnosis, then an affected individual may experience burdensome or harmful interventions as often as an improved outcome resulting from a screening program.
STAGE I RESEARCH
For the purposes of this discussion, stage I research refers to projects that seek to determine whether early detection and intervention can improve clinical outcomes. This kind of research can be performed in a variety of ways that do not require population screening. For genetic conditions (the majority of NBS conditions), significant information can be obtained by comparing the outcomes of second affected siblings versus first affected siblings when there are discrepancies in the time of diagnosis. A first affected sibling is diagnosed often only after clinical symptoms emerge and frequently much later, after parents have pursued a "diagnostic odyssey." Once parents have been alerted to the risk for subsequent siblings, the second affected child can be diagnosed prenatally or in early infancy. If a proposed early treatment or preventive strategy is available, then a comparison of the outcomes for the first versus second (and subsequent) affected siblings provides evidence for the efficacy of the intervention. This approach can be used retrospectively, if an intervention is in use for the condition, or prospectively, through enrollment of sibling pairs at the time of diagnosis of the second affected child. The galactosemia study by Waggoner et al25 noted above is an example of this method.
A second option for stage I research is a RCT of the intervention among children diagnosed clinically. This approach is useful when the initial presentation of the condition is not devastating for the majority of children. Stated differently, it is more useful when only a subset of affected children experience the serious adverse outcome to be prevented. This is because investigators needs to know which children are affected before they can be randomized and the children cannot have already experienced the adverse outcome at the time of randomization. A good example here is penicillin prophylaxis for children with SCD. As discussed above, the study by Gaston et al15 demonstrated that children with SCD fare much better with penicillin prophylaxis, and it is an excellent example of stage I research.
A third approach to stage I research is a small-scale screening project. If there is a high-risk group that can be targeted for screening to produce a sufficient number of affected children, then a RCT of screening for the proposed intervention can be conducted. However, most conditions considered appropriate for population-wide NBS are rare enough in the general population, and not strongly associated with a particular racial or ethnic group, that targeted screening is not feasible.
An approach that is not as useful for stage I research is the use of historical data comparing children identified at a younger versus older age. Particularly when there is an association between an earlier era and the later age of diagnosis, there are many factors that may bias the comparison. More specifically, an earlier age of diagnosis in more recent eras may occur in conjunction with many other improvements in care.
The purpose of stage I research is to provide definitive data on the efficacy of early intervention. The move to population-wide screening should be made only when there is solid evidence that early detection and intervention can lead to improved morbidity and/or mortality rates.
STAGE II RESEARCH
Stage II research addresses the second question in sequence. That is, does a population-based screening approach result in net benefits to affected children, compared with alternative methods of detection The central point here is that improvements in clinical outcomes that are demonstrable through stage I research may not be achievable in population-wide programs. Conversely, the benefits of early detection may be brought to affected children through clinical detection schemes in the absence of population screening. After stage I research, the question is how best to bring the benefits of early detection to affected children.
As noted above, NBS programs entail a series of activities from the heelstick through the laboratory to the physician, the family, and ultimately a sustained intervention. Systematic weaknesses in any of these components can seriously hamper the efficacy of the program. The benefits of the SCD program may be significantly reduced by poor compliance of physicians and parents with prophylaxis and vaccination. In galactosemia and congenital adrenal hyperplasia screening programs, the primary value of screening is largely contingent on the ability of the program to provide a rapid test result before the decline and death of some infants at 2 weeks of age. Even for PKU, which represents the paradigm program for NBS, the efficacy of the program may be impaired substantially by the inability of families to obtain the special foods or to comply with the diet over time.45
An ideal design, from a scientific perspective, is the RCT. Newborns can be randomized to receive either screening for the target condition or no screening. The morbidity and/or mortality rates for the condition can be compared between affected children identified through screening and those identified clinically. To date, the only RCTs of NBS are the Wisconsin CF project initiated in 198446 and a CF screening project in the United Kingdom.47 All newborns in Wisconsin were screened after parental permission was obtained. Tests were run on all samples, but results were reviewed and disclosed for only one half of the newborns. For infants in the "unscreened" control group, results were disclosed at 4 years of age. Outcomes have been compared for the screened and unscreened groups over the past 20 years. Although the magnitude and nature of the benefits of CF NBS remain controversial, the Wisconsin trial has been critical in providing data for policy development.48
There are a number of potential problems with a randomized, controlled design from methodologic and ethical perspectives. First, if screening itself is randomized, then it may be difficult or impossible to identify all cases in the unscreened group. This creates a significant potential for bias. In the unscreened group, those who come to medical attention by virtue of clinical symptoms, or who do so at a younger age (within the window of a research project), tend to be those who are more severely affected with the condition. In contrast, a screened population would include children across the full spectrum of severity, including those who are mildly affected and those who may never become ill with the condition. Given this difference in sensitivity for detection, a comparison of outcomes for the screened and unscreened populations would show improved outcomes for the screened group even if the intervention confers no benefit. This problem is similar to "length bias"49 and is primarily a concern for studies that calculate outcome data in terms of number of deaths per affected population. This is because the denominator is expanded through screening to include mildly affected individuals, thereby decreasing the apparent death rate, compared with a group composed of only severely affected individuals.
There are at least 2 ways to address this problem. The neuroblastoma studies measured their outcomes in terms of deaths per 100000 population in the screened and unscreened populations. This eliminates the bias created by calculating death rates for the affected population. Another approach to eliminating this source of bias is to follow the approach used in the Wisconsin CF screening trial, in which blood samples were obtained for all infants but screening test results were disclosed on a randomized basis. As noted, the unscreened group had results reviewed and disclosed at 4 years of age. This allowed the research team to obtain outcome data for all affected members of the unscreened group as if they had been screened at the outset. Through this approach, subclinical cases could be identified in the unscreened group although they were never identified clinically.
A second practical problem with RCTs arises from the low incidence of most conditions targeted by NBS. Because RCTs require at least 2 approximately equivalent groups for comparison and the groups must be of a size to allow determination with sufficient power that a significant difference exists in the outcome measure, trials must be quite large for most NBS conditions. This issue is discussed in greater detail below.
The more fundamental challenges to the performance of RCTs in stage II research are ethical concerns. If early intervention has been shown to be effective in stage I research, is it ethical to randomize infants to an unscreened group50 In addressing this question, a standard approach in research ethics asks whether there is equipoise between the 2 study groups.51, 52 That is, is there genuine uncertainty in the professional community about whether an intervention under study is preferable to an alternative If there is general consensus that one option is preferable to another on the basis of solid scientific evidence, then randomization is not ethically acceptable. Conversely, if there is legitimate uncertainty about the best approach, then randomization is acceptable.
The NBS context and the proposed stage I/stage II approach offer a different level of complexity than most questions over equipoise. If stage I research demonstrates benefit, then there is no longer equipoise with respect to earlier intervention versus later intervention. However, equipoise may still exist with respect to whether the benefits of earlier intervention can be achieved through the complex mechanism of a NBS program. Therefore, the "test article" is the program, ie, the method of delivering the key intervention, rather than the intervention itself.
Let us look at the issue as if the research were to address the efficacy of a delivery method for an intervention that we know to be effective. Would it be ethical to compare a particular delivery method with no delivery at all This is analogous to comparing a placebo in a trial with an intervention of known efficacy. This is generally considered unethical, unless the risks to the placebo group are minor or there are other compelling scientific reasons to consider a placebo group.53 In the context of NBS, however, population-wide screening may be only one approach to early detection. For example, neonatal deaths resulting from congenital adrenal hyperplasia or galactosemia usually occur after symptoms have been present for several days. This symptomatic period offers the opportunity for clinical diagnosis. The more effective parents and the health care system are in recognizing and responding to characteristic symptoms in individual cases, the less marginally effective a population screening approach would be. Therefore, for these conditions, screening is an alternative not to nothing (as with a placebo) but to the health care system that is designed to respond to sick infants. When there is no ability to detect an affected child before the time when permanent damage has been done, as with PKU and congenital hypothyroidism, then an unscreened group in a RCT would be analogous to a placebo group; this study design would not be appropriate. Therefore, randomization need not be framed in terms of screening versus nothing, depending on the condition, but can be regarded as diagnosis through screening versus diagnosis through clinical care or selective screening.
Attempts can be made to promote efficient diagnosis through clinical care, with education programs for clinicians and perhaps parents, or through selective screening. A recent study in Toronto evaluated the possibility of screening for galactosemia by testing every infant <2 weeks of age who presented to the hospital for any reason and infants >2 weeks of age with clinical suspicion of galactosemia.31 The authors suggested that this selective approach to screening identifies severely affected infants with galactosemia as rapidly as does population screening.
The conclusion is that phase II screening RCTs are ethical in the context of NBS when population screening is compared with a potentially effective method of delivering a timely clinical diagnosis or with a more selective screening approach. RCTs of screening for conditions similar in their presentation to congenital adrenal hyperplasia, CF, or SCD can be justified, particularly in conjunction with efforts to enhance provider education about early clinical detection. In contrast, for conditions for which there is no prospect of early clinical detection before significant morbidity or death, a stage II RCT would not be justified.
If a RCT is not deemed ethical or feasible, an alternative is a cohort design. A prospective cohort design compares the outcomes of 2 groups that differ by virtue of the intervention in question. In this context, a screened cohort of children is compared with an unscreened cohort with respect to morbidity and mortality rates over time. For NBS, cohorts could consist of whole state newborn populations or populations of multiple states. There are several significant advantages to a cohort design for stage II screening research. Logistically, it is easier to implement a screening program in a population in a uniform manner. From an ethical perspective, the cohort design avoids explicitly assigning children to an unscreened group when screening could have been made available. Of course, infants in the unscreened cohort are not provided screening, but this is already the situation in the absence of research. After a new screening modality is introduced, some states take years to consider or to implement the program, whereas others are more rapid adopters, providing the opportunity for a comparison of cohorts according to state.
There are 2 principal drawbacks to the cohort design. The first is the potential bias created by comparing populations that may differ with respect to a number of variables in addition to the variable in question (screening). State populations may differ with respect to factors such as socioeconomic status, racial mixture, disease prevalence, health care services, insurance coverage, and efficacy of the NBS programs. If differences in morbidity or mortality rates are found between cohorts, there may be residual concern that the explanation does not depend on screening. The second problem inherent in the cohort design in this context involves the ability to identify and to monitor affected individuals in the unscreened cohort. A screening program establishes the population prevalence at an individual level and creates the infrastructure for tracking. Without a screening program, there is unlikely to be a comprehensive registry of affected children. Furthermore, children who might have died at a young age as a result of the condition might not have been identified as affected or their condition might not have been recorded in a retrievable manner. Many affected children may be known to subspecialty physicians in regional referral centers, but these are likely to be more severely affected children.
This latter problem of defining the affected group in the unscreened population is a fundamental challenge. If a cohort design is used for stage II research, then the unscreened cohort must be evaluated as thoroughly as possible to identify affected children. One way to address this problem adds a retrospective component to the project. In many states, residual NBS samples are stored for variable lengths of time, from months to decades.54, 55 If the analyte is stable with time, then stored NBS samples can be screened for the condition in question at a time when differences in morbidity or mortality rates between the screened and unscreened groups would be expected. Children identified as affected through retrospective screening of residual samples could be traced and their health status measured and compared with that of children identified prospectively through screening. Children who died before a diagnosis was made also could be identified through this approach. Furthermore, children who were mildly affected and never came to clinical recognition would be identified. Identification and tracking would not be 100% with this method, but this approach is likely to be much more comprehensive than other forms of identification. This approach would require the retention and availability of residual NBS samples. A discussion of the extent and content of parental information or permission for this kind of research would be important.3
For stage II research, the best approach from a scientific perspective is a RCT. However, this approach is likely to be expensive, and ethical concerns may be prominent. Nevertheless, a randomized design is justifiable in some circumstances. In other circumstances, a cohort design with retrospective screening of the initially unscreened cohort is most appropriate from both scientific and ethical perspectives.
STAGE III AND STAGE IV RESEARCH
Stage III research addresses the relative costs of a population-wide screening program. Stage III research may demonstrate benefits of screening, but decisions about implementation will be dependent on estimates of the costs necessary to achieve the benefits. Cost-benefit and cost-effectiveness analyses may be feasible with data obtained in stage II projects. An economic analysis may reveal that the benefits do not justify the costs of the program.
To date, cost-benefit and cost-effectiveness analyses of newborn metabolic screening have been limited by the lack of solid data on a number of variables. Economic analyses are often contingent on a variety of assumptions under the best of circumstances, including program costs, health care costs, test parameters, effectiveness of interventions, and the economic value of intangible factors such as anxiety with false-positive results and knowledge for future reproductive decision-making. Overall, the track record for cost-benefit analyses for NBS has been described as "unimpressive."56 Nevertheless, economic analyses are important for policy decisions. Several recent reports addressed MS/MS as an emerging technology. Schoen et al57 found that MS/MS compares favorably with other mass screening programs on a cost-benefit basis. In contrast, Pandor et al,58 in a systematic analysis in the United Kingdom, concluded that the evidence supports the use of MS/MS for PKU and MCAD but sufficient evidence for screening for other conditions is lacking. Both studies revealed the need for additional data to estimate actual costs and benefits. Despite the volume of literature on NBS for CF, Grosse et al48 noted that a full cost-effectiveness analysis has not been performed. Under the proposed research scheme, stage II projects could be designed to collect data on costs and benefits in a manner conducive to stage III economic analysis.
Stage IV research involves projects designed to evaluate established programs on an ongoing basis. To date, state NBS programs have a limited ability to conduct formal program evaluations and quality assurance activities. The American Academy of Pediatrics/Health Resources and Services Administration Newborn Screening Task Force1 and the Council of Regional Networks for Genetic Services59 place a strong emphasis on the funding and development of these activities. Effective programs require periodic evaluation because of changes in test technology, program organization, population demographic features, and health care resources.
COLLABORATION
Central to the ability to conduct stage II and stage III research is a population of sufficient size. Because of the low incidence of many conditions, multiple states must collaborate with a single protocol to achieve adequate statistical power to draw timely conclusions about the efficacy of a screening strategy. Traditionally, development of multistate research collaborations has been a significant challenge, and such collaborations have not been common in the NBS literature beyond survey projects. New federal initiatives may help foster larger-scale, multistate projects.
Title XXVI of the Children's Health Act of 2000, Screening for Heritable Disorders, establishes a program to improve the ability of states to provide newborn and child screening. The Act "authorizes the Secretary to award grants to States, or a political subdivision of a State, or a consortium of two or more States, or political subdivisions of States to enhance, improve or expand the ability of States and local public health agencies to provide screening, counseling or health care services to newborns and children having or at risk for heritable disorders."60 Furthermore, the Act "authorizes the Secretary to award grants to eligible entities to provide for the conduct of demonstration programs to evaluate the effectiveness of screening, counseling or health care services in reducing the morbidity and mortality caused by heritable disorders in newborns and children."60 To assist in this process, the Secretary of Health and Human Services recently established the Advisory Committee on Heritable Disorders in Newborns and Children. The tasks of the committee are to provide advice and recommendations to the Secretary concerning the grants and projects authorized under the Act.
In addition, Title V of the Social Security Act provided funding for 2 new initiatives. A national coordinating center for NBS is being established, and regional genetic services and NBS collaborative systems have been created. Seven national regions have been created to "enhance and support the genetics and newborn screening capacity of States across the nation by undertaking a regional approach toward addressing the maldistribution of genetic resources. These grants are expected to improve the health of children and their families by promoting the translation of genetic medicine into public health and health care services."61
These national priorities and funding opportunities represent an exciting development in the care of children. The state-level organization of NBS services is an accident of history but should not be a barrier to evidence-based analyses of the benefits and risks of these complex programs. The adoption of an accepted sequence of research protocols through multistate collaborations should greatly facilitate the translation of research into effective public health programs. Ultimately, there are no serious methodologic or ethical barriers to conducting stage I, stage II, and stage III research to demonstrate the efficacy of NBS modalities before the implementation of population-based programs.
ACKNOWLEDGMENTS
My thanks go to Mary Ann Bailey, PhD, and colleagues at the Hastings Center for supporting this work under grant 1 R01 HG02579.
FOOTNOTES
Accepted Jan 12, 2005.
No conflict of interest declared.
REFERENCES
American Academy of Pediatrics/Health Resources and Services Administration Newborn Screening Task Force. Serving the family from birth to the medical home: newborn screening: a blueprint for the future: a call for a national agenda on state newborn screening programs. Pediatrics. 2000;106 :389 –422
Begley S. Research involving tests on newborns highlights need for stricter ethics. Wall Street Journal. May 3, 2002
Taylor HA, Wilfond BS. Ethical issues in newborn screening research: lessons from the Wisconsin cystic fibrosis trial. J Pediatr. 2004;145 :292 –296
New York State Task Force on Life and the Law. Genetic Testing and Screening in the Age of Genomic Medicine. Albany, NY: Health Education Services; 2000:143
Wilcken B, Wiley V, Hammond J, Carpenter K. Screening newborns for inborn errors of metabolism by tandem mass spectrometry. N Engl J Med. 2003;348 :2304 –2312
Elliman D, Dezateux C, Bedford HE. Newborn and childhood screening programmes: criteria, evidence, and current policy. Arch Dis Child. 2002;87 :6
Wilfond B. Screening policy for cystic fibrosis: the role of evidence. Hastings Cent Rep. 1995;25 :S21 –S23
Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359 :881 –884
Miller AB. The ethics, the risks and the benefits of screening. Biomed Pharmacother. 1988;42 :439 –442
Mant D, Fowler G. Mass screening: theory and ethics. Br Med J. 1990;300 :916 –918
Kosters JP, Gotzsche PC. Regular self-examination or clinical examination for early detection of breast cancer. Cochrane Database Syst Rev. 2003;(2):CD003373
Feldman W. How serious are the adverse effects of screening J Gen Intern Med. 1990;5 :S50 –S53
American Academy of Pediatrics, Committee on Genetics. Newborn screening fact sheet. Pediatrics. 1996;98 :473 –501
Gaston MH, Verter JI, Woods G, et al. Prophylaxis with oral penicillin in children with sickle cell anemia: a randomized trial. N Engl J Med. 1986;314 :1593 –1599
National Institutes of Health Consensus Conference. Newborn screening for sickle cell disease and other hemoglobinopathies. JAMA. 1987;258 :1205 –1209
Ramgoolam A, Steele R. Formulations of antibiotics for children in primary care: effects on compliance and efficacy. Paediatr Drugs. 2002;4 :323 –333
Centers for Disease Control and Prevention. Update: newborn screening for sickle cell disease: California, Illinois, and New York, 1998. MMWR Morb Mortal Wkly Rep. 2000;49 :729 –731
Sox CM, Cooper WO, Koepsell TD, DiGiuseppe DL, Christakis DA. Provision of pneumococcal prophylaxis for publicly insured children with sickle cell disease. JAMA. 2003;290 :1057 –1061
Teach SJ, Lillis KA, Grossi M. Compliance with penicillin prophylaxis in patients with sickle cell disease. Arch Pediatr Adolesc Med. 1998;152 :274 –278
Wurst KE, Sleath BL. Physician knowledge and adherence to prescribing antibiotic prophylaxis for sickle cell disease. Int J Qual Health Care. 2004;16 :245 –251
Quinn CT, Rogers ZR, Buchanan GR. Survival of children with sickle cell disease. Blood. 2004;103 :4023 –4027
Lees CM, Davies S, Dezateux C. Neonatal screening for sickle cells disease [Cochrane review]. In: The Cochrane Library. Issue 3. Chichester, United Kingdom: John Wiley & Sons; 2004
Levy H, Hammersen G. Newborn screening for galactosemia and other galactose metabolic defects. J Pediatr. 1978;92 :871 –877
Waggoner DD, Buist NR, Donnell GN. Long-term prognosis in galactosemia: results of survey of 350 cases. J Inherit Metab Dis. 1990;13 :802 –818
Gitzelmann R, Steinmann B. Galactosemia: how does long-term treatment change the outcome Enzyme. 1984;32 :37 –46
Matalon R. Galactosemia: promise, frustration and challenge. J Am Coll Nutr. 1997;16 :190 –191
Widhalm K, Miranda de Cruz B, Koch M. Diet does not ensure normal development in galactosemia. J Am Coll Nutr. 1997;16 :204 –208
Badawi N, Cahalane SF, McDonald M, et al. Galactosaemia—a controversial disorder. Screening and outcome. Ireland 1972–1992. Ir Med J. 1996;89 :16 –17
Shah V, Friedman S, Moore AM, Platt BA, Feigenbaum AS. Selective screening for neonatal galactosemia: an alternative approach. Acta Paediatr. 2001;90 :948 –949
Castleberry RP. Biology and treatment of neuroblastoma. Pediatr Clin North Am. 1997;44 :919 –937
Schilling FH, Spix C, Berthold F, et al. Neuroblastoma screening at one year of age. N Engl J Med. 2002;346 :1047 –1053
Woods WG, Gao RN, Shuster JJ, et al. Screening of infants and mortality due to neuroblastoma. N Engl J Med. 2002;346 :1041 –1046
Refsum H, Fedriksen A, Meyer K, Ueland P, Kase BF. Birth prevalence of homocystinuria. J Pediatr. 2004;144 :830 –832
Wilson JMG, Jungner G. Principles and Practice of Screening for Disease. Geneva, Switzerland: World Health Organization; 1968
National Research Council, Committee for the Study of Inborn Errors of Metabolism. Genetic Screening: Programs, Principles, and Research. Washington, DC: National Academy of Sciences; 1975
Andrews LB, Fullarton JE, Holtzman NA, Motulsky AG, eds. Assessing Genetic Risks: Implications for Health and Social Policy. Washington, DC: National Academy of Sciences; 1994
National Institutes of Health. Promoting Safe and Effective Genetic Testing in the United States: Final Report of the Task Force on Genetic Testing. Bethesda, MD: National Institutes of Health; 1997
Burke W, Atkins D, Gwinn M, et al. Genetic test evaluation: information needs of clinicians, policy makers, and the public. Am J Epidemiol. 2002;156 :311 –318
Russell LB. Educated Guesses: Making Policy About Medical Screening Tests. Berkeley, CA: University of California Press; 1994
National Institutes of Health, Consensus Development Panel. National Institutes of Health Consensus Development Conference Statement: phenylketonuria: screening and management, October 16–18, 2000. Pediatrics. 2001;108 :972 –982
Farrell PM, Kosorok MR, Rock MJ, et al. Early diagnosis of cystic fibrosis through neonatal screening prevents severe malnutrition and improves long-term growth. Pediatrics. 2001;107 :1 –13
Chatfield S, Owen G, Ryley HC, et al. Neonatal screening for cystic fibrosis in Wales and the West Midlands: clinical assessment after five years of screening. Arch Dis Child. 1991;66 :29 –33
Grosse SD, Boyle CA, Botkin JR, et al. Newborn screening for cystic fibrosis: evaluation of benefits and risks and recommendations for state newborn screening programs. MMWR Recomm Rep. 2004;53 (RR-13):1–36
Wilcken B. Ethical issues in newborn screening and the impact of new technologies. Eur J Pediatr. 2003;162 :S62 –S66
Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317 :141 –145
Miller PB, Weijer C. Rehabilitating equipoise. Kennedy Inst Ethics J. 2003;13 :93 –118
Mandl KD, Felt S, Larson C, Kohane IA. Newborn screening program practices in the United States: notification, research and consent. Pediatrics. 2002;109 :269 –273
Therrell BL, Hannon WH, Pass KA, et al. Guidelines for the retention, storage, and use of residual dried blood spot samples after newborn screening analysis: statement of the Council of Regional Networks for Genetic Services. Biochem Mol Med. 1996;57 :116 –124
Pollitt RJ. Newborn mass screening versus selective investigation: benefits and costs. J Inherit Metab Dis. 2001;24 :299 –302
Schoen EJ, Baker JC, Colby CJ, To TT. Cost-benefit analysis of universal tandem spectrometry for newborn screening. Pediatrics. 2002;110 :781 –786
Pandor A, Eastham J, Beverley C, Chilcott J, Paisley S. Clinical effectiveness and cost-effectiveness of neonatal screening for inborn errors of metabolism using tandem mass spectrometry: a systematic review. Health Technol Assess. 2004;8(12) :1 –121
Pass KA, Lane PA, Fernhoff PM, et al. US newborn screening system guidelines II: follow-up of children, diagnosis, management, and evaluation: statement of the Council of Regional Networks for Genetic Services (CORN). Pediatrics. 2000;137(suppl) :S1 –S46
Social Security Act, Title V, 501(a)(2)(Jeffrey R. Botkin, MD, MP)