Risk Factors and Individual Probabilities of Melanoma for Whites
http://www.100md.com
《临床肿瘤学》
the Channing Laboratory, Department of Medicine, Harvard Medical School and Brigham and Women’s Hospital
Departments of Biostatistics and Epidemiology, Harvard School of Public Health
Harvard Center for Cancer Prevention, Boston, MA
ABSTRACT
PURPOSE: Incidence and mortality of cutaneous melanoma is rising rapidly in the United States; therefore, identifying risk factors for melanoma and integrating them into a clinical and population risk estimation tool may help guide prevention efforts and identify participants for preventive interventions.
METHODS: We examined risk factors for melanoma in three large prospective studies of women and men. We observed 152,949 women and 25,206 men free of cancer at baseline for up to 14 years.
RESULTS: A total of 535 incident cases of invasive melanoma (444 women and 91 men) were included in the analysis. We combined the three studies to examine risk factors and to build a risk model to calculate melanoma risk score. Older age, male sex, family history of melanoma, higher number of nevi, history of severe sunburn, and light hair color were each associated with significantly elevated risk of melanoma and were included in the final risk prediction. Participants at the highest decile of risk had a more than three-fold increase in risk of melanoma compared with those in the lowest decile (observed relative risk, 3.61; expected relative risk, 4.20). The measure of discriminatory accuracy as summarized by an age-and sex-adjusted concordance statistic of 0.62 (95% CI, 0.58 to 0.65) indicated that the model had reasonable ability to differentiate those who will develop melanoma and those who will remain free from the disease.
CONCLUSION: We identified several risk factors for melanoma and developed statistical models with adequate performance and discriminatory accuracy.
INTRODUCTION
Incidence and mortality of cutaneous malignant melanoma have been rising in the United States.1 Therefore, identifying risk factors and estimating an individual’s risk for developing melanoma are important.
Although several case-control studies have examined numerous potential risk factors for melanoma,2 there have been few prospective studies to examine the relationship between these factors and melanoma risk.
Mathematical models relating risk factors to cancer risk can summarize the impact of multiple factors and provide insights to develop strategies for prevention of cancer. They can be clinically useful in determining primary prevention strategies and in directing the level of screening surveillance or identifying high-risk individuals who may be recruited to prevention trials. Although there have been a few statistical models to estimate individual risk of developing cancers, including breast3,4 and lung5 cancers, there are few statistical models for estimation of melanoma risk.6
We therefore identified risk factors for melanoma and constructed a statistical model for melanoma risk using data from three large cohort studies of women and men. We evaluated the model with respect to both goodness of fit and discriminatory accuracy at the individual level.
METHODS
Study Population
The Nurses’ Health Study (NHS) enrolled 121,700 female registered nurses aged 30 to 55 years in 1976. The NHS II enrolled 116,671 female registered nurses aged 25 to 42 years in 1989. The Health Professionals Follow-up Study (HPFS) included 51,529 male health professionals (dentists, veterinarians, pharmacists, optometrists, osteopathic physicians, and podiatrists) aged 40 to 75 years in 1986. These participants responded to a questionnaire about their medical histories and lifestyles. We have sent follow-up questionnaires to the cohorts biennially to collect and update information regarding individual characteristics, behaviors, and diagnosed diseases. Deaths were reported by family members or by the postal service in response to the follow-up questionnaire or through the National Death Index for nonresponders.7
For this analysis, we used 1986 as a baseline for the NHS and 1992 for the HPFS because the information on some of the risk factors for melanoma was collected in these years. Participants who have reported diagnosis of cancer other than nonmelanoma skin cancer were excluded at baseline and censored at time of diagnosis during follow-up. Cases of melanoma in situ were also excluded at baseline and during follow-up. Because most of the participants were white and melanoma is rare in other races, we excluded other races. We only included those who answered questions on traditional melanoma risk factors including age, family history of melanoma, number of nevi (moles), hair color, and history of sunburn. A total of 178,155 participants (62,755 in the NHS; 90,194 in the NHS II; and 25,206 in the HPFS) were included in the analysis.
The study was approved by the Human Research Committees at the Harvard School of Public Health and the Brigham and Women’s Hospital.
Assessment of Melanoma Risk Factors
Each cohort collected information on age; height; body weight; family history of melanoma; number of acquired melanocytic nevi on arms (legs in the NHS II) larger than 3 mm in diameter; natural hair color; history of severe sunburn; skin reaction to sun during childhood or adolescence; latitude of residence at birth, at age 15 years, and at age 30 years; use of sunscreen; physical activity; and among women, reproductive factors such as age at menarche, oral contraceptive use, parity, menopausal status, duration of menopause, and postmenopausal hormone use. Eye color was asked in the HPFS only.
Melanoma Case Confirmation
Within the study populations over the periods of follow-up for these analyses, 536 NHS, 414 NHS II, and 282 HPFS members reported a diagnosis of melanoma. Medical records were obtained for 451 of the NHS, 311 of the NHS II, and 203 of the HPFS. After excluding in situ melanomas (171 NHS, 103 NHS II, 82 HPFS) and skin conditions that were not melanoma (five NHS II), we included 252 NHS, 192 NHS II, and 91 HPFS confirmed cases of invasive melanoma in our analyses. These included superficial spreading and nodular types.
Statistical Analysis
We initially analyzed the cohorts separately but then combined the three studies to maximize power. We calculated incidence rates of melanoma according to categories of potential risk factors. Baseline information was used for all of the potential risk factors except age, parity, oral contraceptive use, menopausal status, and postmenopausal hormone use. For these variables, the value was updated in each questionnaire cycle. Participants contributed person-time from the date of return of the baseline questionnaire until a diagnosis of melanoma (invasive or in situ), a report of another cancer other than nonmelanoma skin cancer, the date of death, or date of end of follow-up (June 1, 2000 for NHS; June 1, 1999 for NHS II; and January 1, 2000 for HPFS), whichever came first. For participants who reported a diagnosis of melanoma with no medical record obtained, we censored them at the time of reported diagnosis of melanoma. Relative risk was calculated as the rate for a given category as compared with the referent category. Pooled logistic regression8 was employed to adjust for age and other risk factors simultaneously. We chose this method to make the three studies comparable (because the duration of follow-up was different across the cohorts). It has been shown that the pooled logistic regression is asymptotically equivalent to the Cox proportional hazard regression with time-varying covariates.9 The necessary conditions for this equivalence include relatively short time intervals and small probability of the outcome in the intervals, both of which are satisfied here.
We entered potential risk factors for melanoma in a multivariate model to construct a final risk model. The likelihood ratio test was used to decide whether to keep each covariate in the final model, using P = .05 as the cutoff. To examine whether the association between risk factors and melanoma risk was modified by other factors, we included a cross-product term of both factors expressed as continuous variables, in a multivariate model. P value for the tests for interaction was obtained from a likelihood ratio test with 1 df.
For the final risk model, we calculated a risk score for each participant by summing up the regression coefficient for the intercept, and the regression coefficient for the corresponding category of each variable as defined by the data reported in response to questions described above.
We also calculated 10-year risk of being diagnosed with melanoma using the final risk model to compare with Surveillance, Epidemiology, and End Results (SEER) Cancer Statistics. We calculated the risk for a person with lowest risk profile and with most common risk profile by sex and age groups (40, 50, and 60 years).
To assess the goodness of fit of the risk model, we ran the final risk model, calculated risk score using the regression coefficients from the model, and ranked the exponentiated risk score into deciles among a random-half of the participants (model population). Using the regression coefficients and decile cutpoints from the model population, we calculated risk score and the observed and expected number of cases in each decile, stratified by 5-year age group and sex among the other half of the participants (test population). We calculated the 2 goodness-of-fit statistic with 9 df.
We estimated the concordance statistic, which represents the probability that for a randomly selected pair of individuals, one diseased and one nondiseased, the diseased individual has the higher estimated disease probability. Thus, it is an index of predictive discrimination based on the rank correlation between predicted and observed outcomes. The concordance statistic can range from 0.5 to 1.0. A concordance statistic of 0.5 for a risk model means that the model producing the estimated probabilities performs no better than chance at ranking diseased and nondiseased individuals—50% of the time the diseased person will have the higher estimated probability, while 50% of the time the nondiseased person will. A concordance statistic of 1.0 means that the model performs perfectly at ranking diseased and nondiseased individuals. The concordance statistic is equivalent to the area under a receiver-operating characteristic (ROC) curve created by computing sensitivity and specificity, with respect to true disease outcome, at all estimated risk cut points from 0 to 1.0. We calculated the concordance statistic among the test population, again to test the discriminatory accuracy of the risk model among independent population. We used the Mann-Whitney U statistics to calculate the concordance statistic adjusting for age and sex (see Appendix).
RESULTS
We documented 535 incident melanoma cases (252 NHS, 192 NHS II, 91 HPFS) during up to 14 years of follow-up of women and men in the three cohorts. Table 1 presents the general characteristics of the cohorts at baseline. The mean age of the participants was 50 years (standard deviation, 12) during follow-up. The mean age of melanoma case at diagnosis was 53 years (standard deviation, 12).
To identify risk factors for melanoma, we first analyzed the cohorts separately. However, because the effect estimates for strong risk factors were similar across these studies, we combined all of the cohorts to maximize power. We built models in the combined data set with predictors of melanoma risk in individual studies. Age, family history of melanoma, number of nevi, hair color, and history of severe and painful sunburn were added in the model one by one. All of the factors strongly predicted melanoma risk (P < .05 from likelihood ratio test). Age was initially examined in 5-year categories. However, because the association was linear, we used age as a continuous variable. We also added indicator variables for the different studies (NHS II and HPFS). The indicator variable for HPFS was highly significant, but the variable for NHS II did not add any further prediction in the multivariate model. Thus, the indicator variable for HPFS was kept in the final model. Because the HPFS included only men and the NHS and NHS II included only women, the indicator variable for HPFS corresponded to that for men. Based on this multivariate model, we also evaluated other potential risk factors one at a time, including, among men, skin reaction to sun; latitude of residence at birth, at age 15 years, and at age 30 years; body mass index; height; and physical activity; and among women, reproductive factors such as age at menarche, oral contraceptive use, parity, menopausal status, and duration of menopause and use of postmenopausal hormone. None of these variables made statistically significant contributions to the model. Use of sunscreen was not related to a reduced risk of melanoma. Eye color was examined only in men (not available for women) but was not related to melanoma risk. Because the incidence and mortality of cutaneous melanoma is rising rapidly and the baseline for the HPFS was 6 and 3 years after the baselines for the NHS and NHS II, it is possible that higher incidence of melanoma in the HPFS might be confounded by a time effect. Therefore, a variable for time effect was evaluated but was not statistically significant. These variables were omitted from further evaluation.
We also checked the interaction for the variables selected for the final model with latitude of residences at age 15 years and 30 years and skin reaction to sun, because it was hypothesized that the association between melanoma risk factors and melanoma risk may be different depending on sun sensitivity. None of the interactions were statistically significant.
Table 2 presents the final model for melanoma risk prediction. Risk scores for each individual can be calculated by adding up the regression coefficient for the intercept and for the level of each risk factor except age, and the regression coefficient for age multiplied by age in years. For example, a 50-year-old woman with no family history of melanoma, one to two episodes of severe and painful sunburn, three to five moles, and blond hair color will get a risk score of –6.9154 (–9.2523 + 0.0165 * 50 + 0.3358 + 0.9882 + 0.1879). This woman has an estimated relative risk of 4.54 (= exp[–6.9154 + 8.4273]) compared with a woman of the same age with no family history of melanoma, no episodes of severe and painful sunburn, no moles, and light brown hair color (risk score = –9.2523 + 0.0165 * 50 = –8.4273).
We calculated the 10-year risk of being diagnosed with melanoma using our final risk model and compared them with SEER Cancer Statistics (Table 3). The 10-year risks of melanoma from our risk model for a person with most common phenotype were similar to those from SEER.10
To evaluate the performance of the model, we conducted a goodness-of-fit test. For this purpose, we fit the final risk model in a random half of the participants (model population) and conducted a goodness-of-fit test among the other half of the participants (test population). We calculated the observed and expected number of melanoma cases in each study among the test population using deciles of predicted age-specific melanoma risk (decile cutpoints obtained from the model population). The observed and expected counts for each decile of risk score were then summed across studies and compared with a goodness-of-fit test (Table 4). The overall 29 test was 9.28 (P = .41), showing that the model fit was adequate. Relative risks for each decile compared with the first decile were calculated for observed and expected cases. The model provided good spread in risk with an observed relative risk comparing the top to bottom decile of 3.61, and an expected relative risk for this comparison of 4.20.
The predictive ability of the model to discriminate between persons who would develop melanoma and those who remain melanoma free was also evaluated using the area under the ROC curve among the test population. First, we calculated the predicted risk for each individual using the regression coefficients from the model population and stratified the data by 5-year age and sex group. Within each age and sex group, we computed the Mann-Whitney U statistic comparing the predicted risk of the cases with the predicted risk of the controls. The U statistic divided by nN, where n = number of cases within a 5-year age-sex group, and N = number of controls within a 5-year age-sex group, can be interpreted as the probability that a random case will have a higher risk score than a random control within a specific 5-year age and sex group. Alternatively, it can be interpreted as the area under the ROC curve based on our predicted model for a person in a specific age and sex group. We then computed a weighted average of the age- and sex-specific Mann-Whitney U statistics/(nN) with weight equal to the inverse variance of the age- and sex-specific statistics. Overall, the concordance statistic adjusted for age and sex was 0.62 (95% CI, 0.58 to 0.65; Table 5).
DISCUSSION
Using data from three large prospective studies, we constructed a statistical model for melanoma risk, incorporating traditional epidemiologic risk factors. We also stratified population subgroups based on risk estimates. The model performed adequately in terms of both goodness of fit (ie, predicting incidence in subgroups) and discriminatory accuracy at the individual level.
Each of the variables we included in the final risk model has been reported as a risk factor for melanoma. SEER data between 1999 and 2001 showed an increase in incidence of melanoma by age.10 The data also showed a similar incidence of melanoma for women and men up to 49 years of age and higher incidence of melanoma for men than women for ages older than 50 years. We also found that men had a higher incidence rate of melanoma. Family history of melanoma is a well-known risk factor for melanoma, and several candidate genes have been suggested.11 The relative risk for family history of melanoma in our final model was similar to that from a combined analysis of eight case-control studies (RR, 2.24; 95% CI, 1.76 to 2.86).12 In our study, history of multiple severe sunburns was a strong predictor of melanoma risk, consistent with a systematic review of 29 case-control studies.13 A combined evaluation of 16 case-control studies reported that the relative risks for melanoma were 2.0 (95% CI, 1.6 to 2.6) for ever sunburned and 3.7 (95% CI, 2.5 to 5.4) for highest sunburn exposure, comparable to our estimates.14 Several studies have reported a strong positive association between number of nevi and melanoma risk.2 We also observed that number of nevi is directly related to melanoma risk. Light or red hair color is a well-established risk factor for melanoma.2 We confirmed that hair color strongly predicted melanoma risk in our populations. In a combined analysis of 10 case-control studies,15 the relative risks for melanoma were 1.49 for light brown, 1.84 for blond, and 2.38 for red hair compared with black or dark brown hair, again comparable to our findings.
There have been risk models for other cancers. A concordance statistic for a risk model for breast cancer was 0.624,16; that for ovarian cancer was 0.6017; and that for lung cancer among smokers was 0.72.5 Therefore, the concordance statistic from our risk model for melanoma was comparable to those for other cancer sites.
The risk estimates from a statistical model may help identify and counsel individuals at elevated risk of melanoma, raising awareness for an individual’s risk, which might lead to risk-minimizing behaviors. Alternatively, the model prediction may be useful to define a high-risk population to include in prevention trials or to target screening and prevention efforts. A model for melanoma risk with adequate discriminatory accuracy at the individual level may reduce the volume of surgical excisions for premalignant lesions, by allowing a more precise targeting of a high-risk population. Further evaluation is necessary to determine if such an application is feasible.
Our study had several strengths. We used data from three large prospective cohort studies, which minimized the possibility of biased recall of exposure information. We had a wide range of potential risk factors for melanoma and have evaluated them for building the final risk model.
As limitations, we did not have any direct measure of cumulative sun exposure during lifetime or during childhood or adolescence. However, this is very difficult to measure. We did examine the latitude of residence at birth, at age 15 years, and at age 30 years as a measure of sun exposure in early life, but residence did not predict melanoma risk. Although we have checked goodness of fit for our risk model, which gave some reassurance of the robustness of our model, it did not represent independent verification of the model, and our model should still be verified in a distinct population. Because we only examined the risk factors in a white population, the results may not be generalized to other racial groups. However, melanoma is relatively rare in other races.10
In conclusion, we identified that higher age, male sex, family history of melanoma, higher number of nevi, history of severe sunburn, and light hair color were independently associated with elevated risk of melanoma. The model we created based on these risk factors will be useful to estimate individual’s risk of developing melanoma and identify high-risk populations.
Appendix: Calculation of Area Under the ROC Curve (AUC) for Incidence Modeling
The Mann-Whitney U statistic (Ui) comparing risk scores for melanoma cases versus controls was computed separately for each 5-year age-sex group. Ui is the number of case-control pairs, where the case has a higher risk score than the control. We computed as an estimate of the probability that a random case will have a higher score than a random control in the ith age-sex group, where mi = number of cases and ni = number of controls in the ith age-sex group. The variance of is given by . We then computed a weighted average of the as a global estimate of the AUC given by
where
Confidence limits for the AUC are given by
Authors' Disclosures of Potential Conflicts of Interest
The authors indicated no potential conflicts of interest.
NOTES
Supported by Harvard Skin SPORE and research grant CA87969 from the National Institutes of Health.
Authors' disclosures of potential conflicts of interest are found at the end of this article.
REFERENCES
Howe HL, Wingo PA, Thun MJ, et al: Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. J Natl Cancer Inst 93:824-842, 2001
Armstrong BK, English DR: Cutaneous malignant melanoma, in Schottenfeld D, Fraumeni JF Jr (eds): Cancer epidemiology and prevention. Oxford, England, Oxford University Press, 1996, pp 1282-1312
Gail MH, Brinton LA, Byar DP, et al: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81:1879-1886, 1989
Colditz GA, Rosner B: Cumulative risk of breast cancer to age 70 years according to risk factor status: Data from the Nurses' Health Study. Am J Epidemiol 152:950-964, 2000
Bach PB, Kattan MW, Thornquist MD, et al: Variations in lung cancer risk among smokers. J Natl Cancer Inst 95:470-478, 2003
Chaudru V, Chompret A, Bressac-de Paillerets B, et al: Influence of genes, nevi, and sun sensitivity on melanoma risk in a family sample unselected by family history and in melanoma-prone families. J Natl Cancer Inst 96:785-795, 2004
Stampfer MJ, Willett WC, Speizer FE, et al: Test of the National Death Index. Am J Epidemiol 119:837-839, 1984
Cupples LA, D'Agostino RB, Anderson K, et al: Comparison of baseline and repeated measure covariate techniques in the Framingham Heart Study. Stat Med 7:205-218, 1988
D'Agostino RB, Lee ML, Belanger AJ, et al: Relation of pooled logistic regression to time dependent Cox regression analysis: The Framingham Heart Study. Stat Med 9:1501-1515, 1990
Ries LAG, Eisner MP, Kosary CL, et al: SEER Cancer Statistics Review, 1975-2001. Bethesda, MD, National Cancer Institute, 2004
Green A, Trichopoulos D: Skin cancer, in Adami HO, Hunter D, Trichopoulos D (eds): Textbook of Cancer Epidemiology. New York, NY, Oxford University Press, 2002, pp 281-300
Ford D, Bliss JM, Swerdlow AJ, et al: Risk of cutaneous melanoma associated with a family history of the disease. The International Melanoma Analysis Group (IMAGE). Int J Cancer 62:377-381, 1995
Elwood JM, Jopson J: Melanoma and sun exposure: An overview of published studies. Int J Cancer 73:198-203, 1997
Whiteman D, Green A: Melanoma and sunburn. Cancer Causes Control 5:564-572, 1994
Bliss JM, Ford D, Swerdlow AJ, et al: Risk of cutaneous melanoma associated with pigmentation characteristics and freckling: Systematic overview of 10 case-control studies—The International Melanoma Analysis Group (IMAGE). Int J Cancer 62:367-376, 1995
Colditz GA, Rosner BA, Chen WY, et al: Risk factors for breast cancer according to estrogen and progesterone receptor status. J Natl Cancer Inst 96:218-228, 2004
Rosner BA, Colditz GA, Webb PM, et al: Mathematical models of ovarian cancer incidence in the Nurses' Health Study. Epidemiology (in press)(Eunyoung Cho, Bernard A. )
Departments of Biostatistics and Epidemiology, Harvard School of Public Health
Harvard Center for Cancer Prevention, Boston, MA
ABSTRACT
PURPOSE: Incidence and mortality of cutaneous melanoma is rising rapidly in the United States; therefore, identifying risk factors for melanoma and integrating them into a clinical and population risk estimation tool may help guide prevention efforts and identify participants for preventive interventions.
METHODS: We examined risk factors for melanoma in three large prospective studies of women and men. We observed 152,949 women and 25,206 men free of cancer at baseline for up to 14 years.
RESULTS: A total of 535 incident cases of invasive melanoma (444 women and 91 men) were included in the analysis. We combined the three studies to examine risk factors and to build a risk model to calculate melanoma risk score. Older age, male sex, family history of melanoma, higher number of nevi, history of severe sunburn, and light hair color were each associated with significantly elevated risk of melanoma and were included in the final risk prediction. Participants at the highest decile of risk had a more than three-fold increase in risk of melanoma compared with those in the lowest decile (observed relative risk, 3.61; expected relative risk, 4.20). The measure of discriminatory accuracy as summarized by an age-and sex-adjusted concordance statistic of 0.62 (95% CI, 0.58 to 0.65) indicated that the model had reasonable ability to differentiate those who will develop melanoma and those who will remain free from the disease.
CONCLUSION: We identified several risk factors for melanoma and developed statistical models with adequate performance and discriminatory accuracy.
INTRODUCTION
Incidence and mortality of cutaneous malignant melanoma have been rising in the United States.1 Therefore, identifying risk factors and estimating an individual’s risk for developing melanoma are important.
Although several case-control studies have examined numerous potential risk factors for melanoma,2 there have been few prospective studies to examine the relationship between these factors and melanoma risk.
Mathematical models relating risk factors to cancer risk can summarize the impact of multiple factors and provide insights to develop strategies for prevention of cancer. They can be clinically useful in determining primary prevention strategies and in directing the level of screening surveillance or identifying high-risk individuals who may be recruited to prevention trials. Although there have been a few statistical models to estimate individual risk of developing cancers, including breast3,4 and lung5 cancers, there are few statistical models for estimation of melanoma risk.6
We therefore identified risk factors for melanoma and constructed a statistical model for melanoma risk using data from three large cohort studies of women and men. We evaluated the model with respect to both goodness of fit and discriminatory accuracy at the individual level.
METHODS
Study Population
The Nurses’ Health Study (NHS) enrolled 121,700 female registered nurses aged 30 to 55 years in 1976. The NHS II enrolled 116,671 female registered nurses aged 25 to 42 years in 1989. The Health Professionals Follow-up Study (HPFS) included 51,529 male health professionals (dentists, veterinarians, pharmacists, optometrists, osteopathic physicians, and podiatrists) aged 40 to 75 years in 1986. These participants responded to a questionnaire about their medical histories and lifestyles. We have sent follow-up questionnaires to the cohorts biennially to collect and update information regarding individual characteristics, behaviors, and diagnosed diseases. Deaths were reported by family members or by the postal service in response to the follow-up questionnaire or through the National Death Index for nonresponders.7
For this analysis, we used 1986 as a baseline for the NHS and 1992 for the HPFS because the information on some of the risk factors for melanoma was collected in these years. Participants who have reported diagnosis of cancer other than nonmelanoma skin cancer were excluded at baseline and censored at time of diagnosis during follow-up. Cases of melanoma in situ were also excluded at baseline and during follow-up. Because most of the participants were white and melanoma is rare in other races, we excluded other races. We only included those who answered questions on traditional melanoma risk factors including age, family history of melanoma, number of nevi (moles), hair color, and history of sunburn. A total of 178,155 participants (62,755 in the NHS; 90,194 in the NHS II; and 25,206 in the HPFS) were included in the analysis.
The study was approved by the Human Research Committees at the Harvard School of Public Health and the Brigham and Women’s Hospital.
Assessment of Melanoma Risk Factors
Each cohort collected information on age; height; body weight; family history of melanoma; number of acquired melanocytic nevi on arms (legs in the NHS II) larger than 3 mm in diameter; natural hair color; history of severe sunburn; skin reaction to sun during childhood or adolescence; latitude of residence at birth, at age 15 years, and at age 30 years; use of sunscreen; physical activity; and among women, reproductive factors such as age at menarche, oral contraceptive use, parity, menopausal status, duration of menopause, and postmenopausal hormone use. Eye color was asked in the HPFS only.
Melanoma Case Confirmation
Within the study populations over the periods of follow-up for these analyses, 536 NHS, 414 NHS II, and 282 HPFS members reported a diagnosis of melanoma. Medical records were obtained for 451 of the NHS, 311 of the NHS II, and 203 of the HPFS. After excluding in situ melanomas (171 NHS, 103 NHS II, 82 HPFS) and skin conditions that were not melanoma (five NHS II), we included 252 NHS, 192 NHS II, and 91 HPFS confirmed cases of invasive melanoma in our analyses. These included superficial spreading and nodular types.
Statistical Analysis
We initially analyzed the cohorts separately but then combined the three studies to maximize power. We calculated incidence rates of melanoma according to categories of potential risk factors. Baseline information was used for all of the potential risk factors except age, parity, oral contraceptive use, menopausal status, and postmenopausal hormone use. For these variables, the value was updated in each questionnaire cycle. Participants contributed person-time from the date of return of the baseline questionnaire until a diagnosis of melanoma (invasive or in situ), a report of another cancer other than nonmelanoma skin cancer, the date of death, or date of end of follow-up (June 1, 2000 for NHS; June 1, 1999 for NHS II; and January 1, 2000 for HPFS), whichever came first. For participants who reported a diagnosis of melanoma with no medical record obtained, we censored them at the time of reported diagnosis of melanoma. Relative risk was calculated as the rate for a given category as compared with the referent category. Pooled logistic regression8 was employed to adjust for age and other risk factors simultaneously. We chose this method to make the three studies comparable (because the duration of follow-up was different across the cohorts). It has been shown that the pooled logistic regression is asymptotically equivalent to the Cox proportional hazard regression with time-varying covariates.9 The necessary conditions for this equivalence include relatively short time intervals and small probability of the outcome in the intervals, both of which are satisfied here.
We entered potential risk factors for melanoma in a multivariate model to construct a final risk model. The likelihood ratio test was used to decide whether to keep each covariate in the final model, using P = .05 as the cutoff. To examine whether the association between risk factors and melanoma risk was modified by other factors, we included a cross-product term of both factors expressed as continuous variables, in a multivariate model. P value for the tests for interaction was obtained from a likelihood ratio test with 1 df.
For the final risk model, we calculated a risk score for each participant by summing up the regression coefficient for the intercept, and the regression coefficient for the corresponding category of each variable as defined by the data reported in response to questions described above.
We also calculated 10-year risk of being diagnosed with melanoma using the final risk model to compare with Surveillance, Epidemiology, and End Results (SEER) Cancer Statistics. We calculated the risk for a person with lowest risk profile and with most common risk profile by sex and age groups (40, 50, and 60 years).
To assess the goodness of fit of the risk model, we ran the final risk model, calculated risk score using the regression coefficients from the model, and ranked the exponentiated risk score into deciles among a random-half of the participants (model population). Using the regression coefficients and decile cutpoints from the model population, we calculated risk score and the observed and expected number of cases in each decile, stratified by 5-year age group and sex among the other half of the participants (test population). We calculated the 2 goodness-of-fit statistic with 9 df.
We estimated the concordance statistic, which represents the probability that for a randomly selected pair of individuals, one diseased and one nondiseased, the diseased individual has the higher estimated disease probability. Thus, it is an index of predictive discrimination based on the rank correlation between predicted and observed outcomes. The concordance statistic can range from 0.5 to 1.0. A concordance statistic of 0.5 for a risk model means that the model producing the estimated probabilities performs no better than chance at ranking diseased and nondiseased individuals—50% of the time the diseased person will have the higher estimated probability, while 50% of the time the nondiseased person will. A concordance statistic of 1.0 means that the model performs perfectly at ranking diseased and nondiseased individuals. The concordance statistic is equivalent to the area under a receiver-operating characteristic (ROC) curve created by computing sensitivity and specificity, with respect to true disease outcome, at all estimated risk cut points from 0 to 1.0. We calculated the concordance statistic among the test population, again to test the discriminatory accuracy of the risk model among independent population. We used the Mann-Whitney U statistics to calculate the concordance statistic adjusting for age and sex (see Appendix).
RESULTS
We documented 535 incident melanoma cases (252 NHS, 192 NHS II, 91 HPFS) during up to 14 years of follow-up of women and men in the three cohorts. Table 1 presents the general characteristics of the cohorts at baseline. The mean age of the participants was 50 years (standard deviation, 12) during follow-up. The mean age of melanoma case at diagnosis was 53 years (standard deviation, 12).
To identify risk factors for melanoma, we first analyzed the cohorts separately. However, because the effect estimates for strong risk factors were similar across these studies, we combined all of the cohorts to maximize power. We built models in the combined data set with predictors of melanoma risk in individual studies. Age, family history of melanoma, number of nevi, hair color, and history of severe and painful sunburn were added in the model one by one. All of the factors strongly predicted melanoma risk (P < .05 from likelihood ratio test). Age was initially examined in 5-year categories. However, because the association was linear, we used age as a continuous variable. We also added indicator variables for the different studies (NHS II and HPFS). The indicator variable for HPFS was highly significant, but the variable for NHS II did not add any further prediction in the multivariate model. Thus, the indicator variable for HPFS was kept in the final model. Because the HPFS included only men and the NHS and NHS II included only women, the indicator variable for HPFS corresponded to that for men. Based on this multivariate model, we also evaluated other potential risk factors one at a time, including, among men, skin reaction to sun; latitude of residence at birth, at age 15 years, and at age 30 years; body mass index; height; and physical activity; and among women, reproductive factors such as age at menarche, oral contraceptive use, parity, menopausal status, and duration of menopause and use of postmenopausal hormone. None of these variables made statistically significant contributions to the model. Use of sunscreen was not related to a reduced risk of melanoma. Eye color was examined only in men (not available for women) but was not related to melanoma risk. Because the incidence and mortality of cutaneous melanoma is rising rapidly and the baseline for the HPFS was 6 and 3 years after the baselines for the NHS and NHS II, it is possible that higher incidence of melanoma in the HPFS might be confounded by a time effect. Therefore, a variable for time effect was evaluated but was not statistically significant. These variables were omitted from further evaluation.
We also checked the interaction for the variables selected for the final model with latitude of residences at age 15 years and 30 years and skin reaction to sun, because it was hypothesized that the association between melanoma risk factors and melanoma risk may be different depending on sun sensitivity. None of the interactions were statistically significant.
Table 2 presents the final model for melanoma risk prediction. Risk scores for each individual can be calculated by adding up the regression coefficient for the intercept and for the level of each risk factor except age, and the regression coefficient for age multiplied by age in years. For example, a 50-year-old woman with no family history of melanoma, one to two episodes of severe and painful sunburn, three to five moles, and blond hair color will get a risk score of –6.9154 (–9.2523 + 0.0165 * 50 + 0.3358 + 0.9882 + 0.1879). This woman has an estimated relative risk of 4.54 (= exp[–6.9154 + 8.4273]) compared with a woman of the same age with no family history of melanoma, no episodes of severe and painful sunburn, no moles, and light brown hair color (risk score = –9.2523 + 0.0165 * 50 = –8.4273).
We calculated the 10-year risk of being diagnosed with melanoma using our final risk model and compared them with SEER Cancer Statistics (Table 3). The 10-year risks of melanoma from our risk model for a person with most common phenotype were similar to those from SEER.10
To evaluate the performance of the model, we conducted a goodness-of-fit test. For this purpose, we fit the final risk model in a random half of the participants (model population) and conducted a goodness-of-fit test among the other half of the participants (test population). We calculated the observed and expected number of melanoma cases in each study among the test population using deciles of predicted age-specific melanoma risk (decile cutpoints obtained from the model population). The observed and expected counts for each decile of risk score were then summed across studies and compared with a goodness-of-fit test (Table 4). The overall 29 test was 9.28 (P = .41), showing that the model fit was adequate. Relative risks for each decile compared with the first decile were calculated for observed and expected cases. The model provided good spread in risk with an observed relative risk comparing the top to bottom decile of 3.61, and an expected relative risk for this comparison of 4.20.
The predictive ability of the model to discriminate between persons who would develop melanoma and those who remain melanoma free was also evaluated using the area under the ROC curve among the test population. First, we calculated the predicted risk for each individual using the regression coefficients from the model population and stratified the data by 5-year age and sex group. Within each age and sex group, we computed the Mann-Whitney U statistic comparing the predicted risk of the cases with the predicted risk of the controls. The U statistic divided by nN, where n = number of cases within a 5-year age-sex group, and N = number of controls within a 5-year age-sex group, can be interpreted as the probability that a random case will have a higher risk score than a random control within a specific 5-year age and sex group. Alternatively, it can be interpreted as the area under the ROC curve based on our predicted model for a person in a specific age and sex group. We then computed a weighted average of the age- and sex-specific Mann-Whitney U statistics/(nN) with weight equal to the inverse variance of the age- and sex-specific statistics. Overall, the concordance statistic adjusted for age and sex was 0.62 (95% CI, 0.58 to 0.65; Table 5).
DISCUSSION
Using data from three large prospective studies, we constructed a statistical model for melanoma risk, incorporating traditional epidemiologic risk factors. We also stratified population subgroups based on risk estimates. The model performed adequately in terms of both goodness of fit (ie, predicting incidence in subgroups) and discriminatory accuracy at the individual level.
Each of the variables we included in the final risk model has been reported as a risk factor for melanoma. SEER data between 1999 and 2001 showed an increase in incidence of melanoma by age.10 The data also showed a similar incidence of melanoma for women and men up to 49 years of age and higher incidence of melanoma for men than women for ages older than 50 years. We also found that men had a higher incidence rate of melanoma. Family history of melanoma is a well-known risk factor for melanoma, and several candidate genes have been suggested.11 The relative risk for family history of melanoma in our final model was similar to that from a combined analysis of eight case-control studies (RR, 2.24; 95% CI, 1.76 to 2.86).12 In our study, history of multiple severe sunburns was a strong predictor of melanoma risk, consistent with a systematic review of 29 case-control studies.13 A combined evaluation of 16 case-control studies reported that the relative risks for melanoma were 2.0 (95% CI, 1.6 to 2.6) for ever sunburned and 3.7 (95% CI, 2.5 to 5.4) for highest sunburn exposure, comparable to our estimates.14 Several studies have reported a strong positive association between number of nevi and melanoma risk.2 We also observed that number of nevi is directly related to melanoma risk. Light or red hair color is a well-established risk factor for melanoma.2 We confirmed that hair color strongly predicted melanoma risk in our populations. In a combined analysis of 10 case-control studies,15 the relative risks for melanoma were 1.49 for light brown, 1.84 for blond, and 2.38 for red hair compared with black or dark brown hair, again comparable to our findings.
There have been risk models for other cancers. A concordance statistic for a risk model for breast cancer was 0.624,16; that for ovarian cancer was 0.6017; and that for lung cancer among smokers was 0.72.5 Therefore, the concordance statistic from our risk model for melanoma was comparable to those for other cancer sites.
The risk estimates from a statistical model may help identify and counsel individuals at elevated risk of melanoma, raising awareness for an individual’s risk, which might lead to risk-minimizing behaviors. Alternatively, the model prediction may be useful to define a high-risk population to include in prevention trials or to target screening and prevention efforts. A model for melanoma risk with adequate discriminatory accuracy at the individual level may reduce the volume of surgical excisions for premalignant lesions, by allowing a more precise targeting of a high-risk population. Further evaluation is necessary to determine if such an application is feasible.
Our study had several strengths. We used data from three large prospective cohort studies, which minimized the possibility of biased recall of exposure information. We had a wide range of potential risk factors for melanoma and have evaluated them for building the final risk model.
As limitations, we did not have any direct measure of cumulative sun exposure during lifetime or during childhood or adolescence. However, this is very difficult to measure. We did examine the latitude of residence at birth, at age 15 years, and at age 30 years as a measure of sun exposure in early life, but residence did not predict melanoma risk. Although we have checked goodness of fit for our risk model, which gave some reassurance of the robustness of our model, it did not represent independent verification of the model, and our model should still be verified in a distinct population. Because we only examined the risk factors in a white population, the results may not be generalized to other racial groups. However, melanoma is relatively rare in other races.10
In conclusion, we identified that higher age, male sex, family history of melanoma, higher number of nevi, history of severe sunburn, and light hair color were independently associated with elevated risk of melanoma. The model we created based on these risk factors will be useful to estimate individual’s risk of developing melanoma and identify high-risk populations.
Appendix: Calculation of Area Under the ROC Curve (AUC) for Incidence Modeling
The Mann-Whitney U statistic (Ui) comparing risk scores for melanoma cases versus controls was computed separately for each 5-year age-sex group. Ui is the number of case-control pairs, where the case has a higher risk score than the control. We computed as an estimate of the probability that a random case will have a higher score than a random control in the ith age-sex group, where mi = number of cases and ni = number of controls in the ith age-sex group. The variance of is given by . We then computed a weighted average of the as a global estimate of the AUC given by
where
Confidence limits for the AUC are given by
Authors' Disclosures of Potential Conflicts of Interest
The authors indicated no potential conflicts of interest.
NOTES
Supported by Harvard Skin SPORE and research grant CA87969 from the National Institutes of Health.
Authors' disclosures of potential conflicts of interest are found at the end of this article.
REFERENCES
Howe HL, Wingo PA, Thun MJ, et al: Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. J Natl Cancer Inst 93:824-842, 2001
Armstrong BK, English DR: Cutaneous malignant melanoma, in Schottenfeld D, Fraumeni JF Jr (eds): Cancer epidemiology and prevention. Oxford, England, Oxford University Press, 1996, pp 1282-1312
Gail MH, Brinton LA, Byar DP, et al: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81:1879-1886, 1989
Colditz GA, Rosner B: Cumulative risk of breast cancer to age 70 years according to risk factor status: Data from the Nurses' Health Study. Am J Epidemiol 152:950-964, 2000
Bach PB, Kattan MW, Thornquist MD, et al: Variations in lung cancer risk among smokers. J Natl Cancer Inst 95:470-478, 2003
Chaudru V, Chompret A, Bressac-de Paillerets B, et al: Influence of genes, nevi, and sun sensitivity on melanoma risk in a family sample unselected by family history and in melanoma-prone families. J Natl Cancer Inst 96:785-795, 2004
Stampfer MJ, Willett WC, Speizer FE, et al: Test of the National Death Index. Am J Epidemiol 119:837-839, 1984
Cupples LA, D'Agostino RB, Anderson K, et al: Comparison of baseline and repeated measure covariate techniques in the Framingham Heart Study. Stat Med 7:205-218, 1988
D'Agostino RB, Lee ML, Belanger AJ, et al: Relation of pooled logistic regression to time dependent Cox regression analysis: The Framingham Heart Study. Stat Med 9:1501-1515, 1990
Ries LAG, Eisner MP, Kosary CL, et al: SEER Cancer Statistics Review, 1975-2001. Bethesda, MD, National Cancer Institute, 2004
Green A, Trichopoulos D: Skin cancer, in Adami HO, Hunter D, Trichopoulos D (eds): Textbook of Cancer Epidemiology. New York, NY, Oxford University Press, 2002, pp 281-300
Ford D, Bliss JM, Swerdlow AJ, et al: Risk of cutaneous melanoma associated with a family history of the disease. The International Melanoma Analysis Group (IMAGE). Int J Cancer 62:377-381, 1995
Elwood JM, Jopson J: Melanoma and sun exposure: An overview of published studies. Int J Cancer 73:198-203, 1997
Whiteman D, Green A: Melanoma and sunburn. Cancer Causes Control 5:564-572, 1994
Bliss JM, Ford D, Swerdlow AJ, et al: Risk of cutaneous melanoma associated with pigmentation characteristics and freckling: Systematic overview of 10 case-control studies—The International Melanoma Analysis Group (IMAGE). Int J Cancer 62:367-376, 1995
Colditz GA, Rosner BA, Chen WY, et al: Risk factors for breast cancer according to estrogen and progesterone receptor status. J Natl Cancer Inst 96:218-228, 2004
Rosner BA, Colditz GA, Webb PM, et al: Mathematical models of ovarian cancer incidence in the Nurses' Health Study. Epidemiology (in press)(Eunyoung Cho, Bernard A. )