当前位置: 首页 > 期刊 > 《血液学杂志》 > 2005年第16期 > 正文
编号:11175977
A newly discovered human -globin gene
http://www.100md.com 《血液学杂志》
     the Molecular Medicine Branch and the Microarray Core Facility, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH)

    National Center for Biotechnology Information (NCBI), NIH

    Laboratory of Neurotoxicology, National Institute of Mental Health, NIH

    National Naval Hospital, Bethesda, MD.

    Abstract

    A previously undefined transcript with significant homology to the pseudo-2 region of the -globin locus on human chromosome 16 was detected as part of an effort to better define the transcriptional profiles of human reticulocytes. Cloning and sequencing of that transcript (GenBank AY698022 [GenBank] ; named μ-globin) revealed an insert with a 423-nucleotide open reading frame. BLASTP and ClustalW and phylogenetic analyses of the predicted protein demonstrated a high level of homology with the avian -D globin. In addition, the heme- and globin-binding amino acids of μ-globin and avian -D globin are largely conserved. Using quantitative real-time polymerase chain reaction (PCR), μ-globin was detected at a level of approximately 0.1% that measured for -globin in erythroid tissues. erythroid-specific expression was detected by Northern blot analysis, and maximal expression during the erythroblast terminal differentiation was also detected. Despite this highly regulated pattern of μ-globin gene transcription, μ-globin protein was not detected by mass spectrometry. These results suggest the human genome encodes a previously unrecognized globin member of the avian -D family that is transcribed in a highly regulated pattern in erythroid cells. (Blood. 2005;106:1466-1472)

    Introduction

    The globin genes and their products have been intensively investigated for the past 50 years. Those studies led to the description of structural and regulatory elements that are useful for the recognition and comparison of hundreds of globin gene family members. The divergence of ancestral - and -globin genes is estimated to have occurred 500 million years ago. Those genes subsequently evolved and were modified by a variety of genetic processes, including duplication events.1 In humans, the -globin gene family resides on chromosome 16p13.3 and is composed of a cluster of 3 genes (2-2-1) with protein products that bind heme and assemble into hemoglobin. Transcription of the 2 gene is silenced during fetal life, and the 2 genes (2 and 1) are expressed in a balanced fashion for the remainder of ontogeny. In addition to those 3 genes, concerted efforts in the 1980s led to the discovery of other -like sequences. The downstream region of the locus contains an unusual gene named -globin that generates no detected globin protein in humans.2,3 -globin gene transcription is regulated, and the transcripts contain no obvious defects to explain the lack of detectable protein in erythroid tissues.2-5 Three pseudo-globin genes (pseudo-1,6 pseudo-2,7 and pseudo-18) were also identified in the -globin locus.

    During the past 2 years, investigators have begun a transition toward postgenomic approaches to basic and clinical research. Hypotheses are now generated with the knowledge of whole genomic DNA sequences,9 full-length cDNA collections,10 and millions of expressed sequence tags (eSTs)11 from humans and other species. Comparisons of DNA and RNA sequences with advanced bioinformatics analyses12 have become essential. Hematology is ideally suited for this type of genome-based research due to the ease with which purified populations of hematopoietic cells are isolated. We hypothesized that human reticulocytes contain sufficient mRNA from the terminal stages of differentiation for the study of globin gene expression patterns in high throughput. Levels of globin mRNA detection and differences in globin gene transcription between cord and adult blood were studied to determine the potential of this approach for clinical assessments of hemoglobinopathies and hemoglobin switching. Using oligonucleotide arrays, significantly different globin transcription patterns were found in cord and adult blood samples. evidence for transcription of the major globin genes was clearly demonstrated. Surprisingly, we also identified transcription from the genomic region previously thought to encode the pseudo-2 gene. The source of that transcription is characterized in this report as a previously unrecognized globin gene.

    Materials and methods

    Preparation of reticulocyte RNA

    Blood was collected from healthy adult donors and from placental umbilical cords. All cells were collected according to approved guidelines regarding human subjects. Blood samples were centrifuged, and plasma and buffy coat layers were removed. Packed red blood cells were diluted with 4 vol of 1 x phosphate-buffered saline (PBS) and were filtered through two consecutively linked RCXL2 high-efficiency leukocyte reduction filters (Pall, east Hill, NY). Platelets were removed by repeated low-speed centrifugation. RNA was isolated with Trizol LS (Invitrogen, Carlsbad, CA) and then treated with DNaseI to degrade residual genomic DNA in the RNeasy Mini Kit (Qiagen, Valencia, CA) according to the manufacturer's protocol. For erythropoiesis assays, peripheral blood CD34+ cells from healthy donors were cultured for 14 consecutive days in erythropoietin-containing medium, as previously described.13

    Microarray data analysis

    Microarray analyses were performed using 5 μg total RNA from each sample with one cycle of complementary RNA amplification according to the Affymetrix (Santa Clara, CA) protocol. After the hybridization and washing steps were performed, microarray chips were scanned using MAS 5.0 software. Collected data were analyzed using Partek Pro 6.0 software (Partek, MO). expression levels were clustered and displayed by Spotfire DecisionSite 8.0 (Spotfire, Somerville, MA). A complete description of the 28 array data sets is being prepared as a separate publication.

    Cloning full-length coding sequences

    Fifty nanograms of first-strand cDNA made from 1 μg adult blood reticulocyte total RNA was amplified with the forward (5'-CCA TGC TCA GCG CCC AGG AG-3') and reverse (5'-AGC ACA GGG CTC AGC GGT ATT TTT C-3') primers using the BD Advantage-GC cDNA polymerase chain reaction (PCR) kit (BD Clontech, Palo Alto, CA) with the cycle conditions as follows: 94°C pre-denaturation for 3 minutes, 94°C for 30 seconds, and 68°C annealing and extension for 3 minutes for 30 cycles. The amplified PCR product was purified with the Minelute PCR Cleanup Kit (Qiagen) and was cloned into pcDNA3.1-V5-6His and pCR2.1 vector (Invitrogen).

    Northern blotting

    The full-length cDNA clone insert was cleaved by restriction enzyme digestion and gel extraction using the Minelute Gelextraction Kit (Qiagen). Five hundred nanograms of insert was labeled with 32P [dCTP] (Amersham Bioscience, Piscataway, NJ), and purified from unincorporated nucleotides using a G-50 column. The labeled probe was hybridized onto a nylon membrane containing 10 μg total RNA from cord blood reticulocyte (5 pooled samples), adult blood reticulocyte (5 pooled samples), fetal liver (BD Clontech), and adult bone marrow (BD Clontech) on each lane at 43.5°C. The hybridized membrane was washed and exposed on BioMax MS film (eastman Kodak, Rochester, NY).

    Quantitative real-time PCR

    For quantitative PCR, the sequence-specific primers and probe were designed to span the border between exons 2 and 3 of the (forward primer, 5'-GGG TGG ACC CGG TCA ACT T-3'; reverse primer, 5'-GAG GTG GGC GGC CAG GGT; probe, FAM-5'-AAG CTC CTA AGC CAC TGC CTG CTG-3'-TAMRA) and μ-globin mRNA (forward primer, 5'-GCG TGG ACC CAG CCA ACT T-3'; reverse primer, 5'-CAG GTG GGA GGC CAG CAC-3'; probe, FAM-5'-TCC GCT GCT AAT CCA GTG TTT CCA C-3-TAMRA). Copy numbers were calculated by comparison with standard curves. The specificity of each primer and probe was defined by -globin and μ-globin cDNA templates. For each PCR reaction, 5 ng cDNA made from pooled total RNA was mixed with 2 x TaqMan Master Mix (Applied Biosystems, Foster City, CA) and 10 pmol each of primer and FAM/TAMRA-labeled probe, and were amplified using the ABI 7700 Sequence Detection System (Applied Biosystems). Results were analyzed by Sequence Detector 1.7 software (Applied Biosystems).

    Surface-enhanced laser desorption/ionization: time-of-flight mass spectrometry

    H4 ProteinChips and calibration standard molecules for the surface-enhanced laser desorption/ionization: time-of-flight (SeLDI-TOF) mass spectrometer were purchased from Ciphergen Biosystems (Fremont, CA). Sinapinic acid (SA) was obtained from Sigma (St Louis, MO). The SeLDI-TOF mass spectrometer was externally calibrated using the [M + H]+ ion peaks of somatostatin at 1637.9 m/z, insulin -chain at 3495.9 m/z, human recombinant insulin at 5807.6 m/z, and hirudin at 7033.6 m/z. All mass spectra were recorded in the positive-ion mode using a Ciphergen PBS IIc ProteinChip Array mass spectrometer with time-lag focusing.14 Before SeLDI-TOF mass spectrometry analysis, the H4 Protein-Chip was prewashed with 10% aqueous acetonitrile containing 0.1% trifluoroacetic acid (TFA). On drying, 1 μL sample was applied to the ProteinChip, air dried, and washed with 5% aqueous acetonitrile. After drying, 1 μL matrix (saturated SA in 50% aqueous acetonitrile containing 0.1% TFA) was added to each feature of the ProteinChip array. Data were analyzed using the computer software provided by the manufacturer and are reported as mass averages.15

    Bioinformatics analyses

    The comparisons of mRNA sequences of μ-globin were performed by basic local alignment search tool (BLAST),16 and alignment to the human genome was performed by BLAST-like alignment tool (BLAT).17 For promoter analyses, the 200 base pairs (bp) upstream from the translation start site were examined using PromoterInspector18 with the default setting. Phylogenetic analyses were performed by maximum parsimony (MP) using μ-globin and 291 known -like globin protein sequences deposited in GenBank. A complete alignment with gap was performed using ClustalX software.19 Aligned sequences were input to PAUP (version 4.0b10 for UNIX; Sinauer Associates, Sunderland, MA), which defined the MP tree, using the heuristic search command. The MP tree20 was chosen when PAUP had not improved the score after several hours of searching. The final tree was drawn by PhyloDraw software.21 Maximum likelihood, neighbor-joining (NJ), BIONJ,45 least-squares, and balanced minimum evolution analyses were performed and demonstrated similar results.

    Results

    Microarray comparison of globin gene expression

    To compare the mRNA profiles in the reticulocytes circulating at the time of birth with those in adults, high-throughput arrays were generated from the blood of 28 separate donors (14 cord blood, 14 adult blood). Platelets were removed from the samples by low-speed centrifugation, and nucleated cells were removed by leukocyte reduction filtering. Samples were analyzed using Affymetrix from HG-U133 A and B chips, and the expression of 44 229 probe sets was ranked. The focus of this report is the globin genes; a description of the other probe sets will be provided in a separate manuscript. As expected, the globin gene transcripts achieved high ranking because of their abundance in reticulocytes. The ranks of signal intensities for 2, 1, , A, and G globins in adult blood reticulocytes were 1st, 2nd, 3rd, 10th, and 8th, and the ranks in the cord blood reticulocytes were 1st, 2nd, 3rd, 4th, and 5th, respectively. The higher ranking of A compared with G-gamma in cord blood was unexpected because it is known that G-gamma represents approximately 70% of the total gamma chains at birth.22 In comparison with , , and transcripts, the levels of -, -, and -globin were low in all the samples, and -globin was reduced in the cord blood samples.

    In addition to the expected globins, we identified a probe set (240336_AT) described by the Affymetrix software as having homology with a hemoglobin-based blood substitute (Rhb1.1).23 The expression rank of that probe set was 21st of 44 229 probes in adult and 11th of 44 229 probes in cord blood reticulocytes. To place the expression pattern of 240336_AT probe in the context of the other globin probe sets, the signal intensities were clustered by unsupervised hierarchic clustering (Figure 1). The clustered arrangement of the signal intensities corresponding to the 9 globin gene probes and 240336_AT from 14 cord blood and 14 adult blood samples are shown. Cord blood and adult blood samples were segregated appropriately on the basis of switching patterns of intensity. The 1, 2, and -globin genes clustered together according to their high intensity in all the samples. The gamma genes also co-clustered, and they were expressed within the same range as -globin in cord blood. As shown, expression of the probe 240336_AT did not cluster with any other globin probe. The average signal intensity of that probe was higher than the intensities of , , , and but lower than those of , , and -globin. The pattern of 240336_AT expression was variegated between donors with decreased mean expression in adult blood compared with cord blood samples.

    Bioinformatics analyses and cloning of reticulocyte μ-globin

    Public sequence data describing the human genome, full-length cDNA, and eSTs provided a clear path for further investigation of the 240336_AT probe. The probe set was designed using more than 25 million eST sequences deposited on dbeST.11 The template eST sequence (GenBank, Be244453 [GenBank] ) that aligns with the 240336_AT probe was identified by using the reference sequence for a BLAT search.17 Surprisingly, 240336_AT aligned in the same region as the pseudo-2 gene rather than in other gene regions (Figure 2A). Unlike the pseudo-2 globin nucleotide sequences,7 the 240336_AT eST sequences aligned to generate a gene structure familiar to the other globin genes. Based on this bioinformatics comparison, a full-length cDNA was cloned from reticulocyte RNA to generate a 506-bp transcript encoding the gene probed by 240336_AT. Four additional bases were identified by 5' rapid amplification of cDNA ends (RACe). The 510-bp reticulocyte-derived sequence was originally deposited in GenBank in July 2004 (AY698022 [GenBank] , NM_001003938) and was named mu (μ) because of the smaller size of the predicted globin product (141 amino acids compared with 142 for the other human -globin genes).

    The μ-globin (HBM) sequence was aligned to the -globin mRNA using ClustalW,20 and its overall similarity was 59% (302 of 510 bases). The predicted protein sequence from the open reading frame (ORF) was identified in GenBank as NP_001003938. We aligned this sequence to the human -globin protein using ClustalW. It showed incomplete conservation of heme binding and -globin contact sites. As shown in Figure 2B, the μ-globin promoter region (upstream 200 bp) was also examined using PromoterInspector.18 Unlike the and promoters, μ-globin promoter did not contain a CAAT motif. A muscle TATA (TATAGA) core sequence was identified 60 bp upstream of the ATG. erythroid Kruppel-like factor (eKLF) and GATA binding factor 1 (GATA1) binding sites were identified at -70 and -82 bp, respectively. The μ-globin gene also contained a hypoxia-inducible factor (HIF) binding site commonly associated with higher-affinity hemoglobins26 but not identified in other human globin gene promoters.

    μ-globin gene expression

    In addition to array-based assays, μ-globin gene expression was examined by Northern blot analysis (Figure 3). Hybridizations containing 10 μg total RNA for each of 4 erythroid tissues (cord blood reticulocytes, adult blood reticulocytes, fetal liver, and bone marrow) were performed. As expected, -globin expression was detected at similar levels in adult blood reticulocytes and cord blood reticulocytes. equivalent bands were also detected in fetal liver and adult bone marrow (Figure 3A). Compared with bone marrow, no detectable μ-globin signal was identified on nonerythroid tissues (Figure 3C). The expression of μ-globin in cord blood reticulocytes was approximately 5 times higher than it was in adult blood reticulocytes. The μ-globin expression on fetal liver was also higher than in adult bone marrow (Figure 3B). Consistent with the array data, these data suggest that the levels of μ-globin mRNA in erythroid tissues decrease during postnatal development.

    Because of concerns that the hybridization signals detected by array and Northern blot analyses may be nonspecific, quantitative PCR was performed (Figure 4). Sequence-specific primers and a probe spanning the μ-globin exon2 and exon3 boundary were designed to avoid the amplification of unprocessed RNA, genomic DNA, or other -globin transcripts. In confirmation of the array and Northern data, μ-globin mRNA levels were significantly higher in cord blood reticulocytes than in adult blood reticulocytes (1.71 x 105 ± 9.51 x 104 copies/ng cDNA in cord blood and 2.17 x 104 ± 6.84 x 103 copies/ng cDNA in adult blood; P < .0002). This pattern of decreased adult expression was also noted by the comparison of fetal liver and adult bone marrow (3.04 x 104 ± 1.68 x 103 copies/ng cDNA in fetal liver and 1.15 x 104 ± 1.19 x 103 copies/ng cDNA in bone marrow; P < .0002) (Figure 4A). -globin amplification was performed for comparison. Although the expression of -globin was 2 to 3 orders of magnitude higher than that of μ-globin, the levels of -globin mRNA were equivalent in fetal and adult tissues (Figure 4B). Therefore, at the transcriptional level, μ-globin expression is only approximately 0.1% of the normal adult -globin.

    μ-globin expression during erythropoiesis was examined and compared with that of -globin using cultures of adult CD34+ cells.13 μ-globin was not detected above background levels until day 4, when large, immature erythroblasts began to appear in culture. After day 4, a rapid increase in μ-globin was detected until day 10; this was followed by an equally rapid loss as the cells underwent terminal maturation. This pattern was similar to that identified for -globin, but the -globin peak occurred later during the culture period on day 12. The overall expression level of μ-globin was 100-fold less than that of -globin throughout the culture period (compare scales on Figures 4C, D).

    SeLDI-TOF mass spectrometry of red blood cell lysate

    Based on the highly regulated transcription of the μ-globin gene, assays were developed to determine whether significant quantities of μ-globin proteins are expressed in circulating erythrocytes. Importantly, the literature provides little evidence for the existence of this protein in humans or other mammals. efforts to raise μ-specific antibodies using peptide sequences of μ-globin have not been successful to date (data not shown). Therefore, cord blood and adult blood lysates were directly examined using mass spectroscopy using the SeLDI-TOF mass spectometry technology. This method is one of the more sensitive proteomic detection tools and may have some advantages over matrix-assisted laser desorption/ionization (MALDI).14,27 A detection sensitivity of 20 ng per sample was determined using serially diluted hemoglobin standards. In parallel assays, cation exchange high-performance liquid chromatography (HPLC) peaks13 were not detected at levels below 300 ng per sample. SeLDI analysis of cell lysates from 3 cord blood and adult blood samples revealed the relative amount of , , and protein in each sample over the 15-kDa to 16.5-kDa mass range (Figure 5). We predicted the molecular weight of globin protein without the initial methionine residue because of the adjacent valine residue.28 The -globin chain (15 126 Da) was identified as the major peak in adult and cord blood samples. The beta (15 867 Da) and gamma (15 996 Da) globin peaks were detected in cord blood samples, but the gamma peak was not detected in the adult lysates. No peak was seen at the expected size of μ-globin (15 487 Da) in any samples (ie, no significant peaks with sufficient signal-noise ratio were observed). This suggested the absence of measurable quantities of μ-globin in the lysates. The identities of the other peaks demonstrated by this method of globin analysis are being studied separately (data not shown).

    Homology comparisons and phylogenetic analyses

    At the nucleotide level, several mammalian orthologues were identified by eST alignments, but no significant homologies were noted with avian or reptilian mRNA examined to date (data not shown). The predicted protein sequence of μ-globin was also searched against the GenBank database using BLASTP default parameters. The search demonstrated that the predicted μ-globin protein most closely aligned with the avian -D globin chain of bar-headed goose (GenBank, gi70296) with a similarity of 55% (78 of 141 amino acid [aa]). A lower similarity of μ-globin with the human -globin chain was found (64 of 141 aa). Alignments of the heme-binding sites 1-1 and 1-2 contact sites were also studied. Heme-binding homologies were equivalent among μ, the human chain, and the avian -D chain (84% [16 of 19]). However, the 1-1 and the 1-2 contact sites demonstrated considerably more homology between μ-globin and the avian -D globin. Interestingly, μ-globin and the avian -D globin chain have the same length of 141 amino acid residues.

    Based on the genetic and predicted protein homologies between μ-globin and avian -D globin, a more complete phylogenetic comparison was performed. A total of 291 -like globin protein sequences were collected from GenBank. These amino acid sequences were used for the construction of a phylogenetic tree using the MP algorithm. The constructed tree demonstrated several clustered globin families (Figure 6). As expected, the human chain was clustered with other mammalian chains, and human -globin clustered with the mammalian -globin group. The -globin group clustered most closely to the family. In comparison, human μ-globin clustered with the avian and reptilian -D chains at the greatest distance from the -globin cluster (Figure 6). No other mammalian globin within the group of 291 was placed within the -D family.

    Discussion

    In this report, the availability of a fully sequenced genome and high-throughput expression profiles led to a re-examination of the region of the -globin locus identified as pseudo-2.7 A novel globin transcript was identified and named μ-globin. μ-globin is not a pseudogene because it is transcribed from a 510-nucleotide (nt) genomic sequence that contains 2 introns, and it has an ORF encoding 141 amino acids without disruption.29 The μ-globin gene also contains a promoter region with erythroid transcription factor-binding sites, a 24-nt mRNA leader sequence, a Kozak sequence,24 and a functional polyadenylation signal (Figure 2A). In contrast, the originally described pseudo-2 pseudogene7 was reported to contain no promoter because of the proximity of its first exon with the 1-globin gene located just upstream. Pseudo-2 was also reported to contain a mutated 5' splice site for intron 1, several frameshift deletions, an insert in the second and third exons, and significant mutations in the polyadenylation signal region compared with the 2 gene. When aligned with current maps, the originally reported genomic sequence for pseudo-alpha2 was found to contain several unmatched nucleotides or gaps (data not shown). Therefore, the pseudogene annotation might have resulted from DNA sequencing limitations that existed 20 years ago.

    In the context of the 44 299 probes examined by microarray, relatively high-level expression from the μ-globin probe set (240336_AT) was detected in erythroid cells in vivo (ranked among the top 0.2% of reticulocyte transcripts). Among differentiating primary erythroblasts, μ-globin gene expression was highly regulated with a pattern nearly identical to that of -globin. However, the level of μ-globin mRNA represented only a small percentage of the amount of -globin mRNA in fetal and adult erythroid tissues. The differences in the levels of μ- and -globin gene transcripts may be attributed to differences in their promoters or in the stability of their mRNA. The delayed peak level of -globin transcripts compared with μ-globin (culture day 12 vs 10, respectively) is consistent with increased stability of the -globin transcripts.

    Because the expression pattern of the -globin genes in humans during ontogeny generally follows their gene order in the cluster,30 the location of the μ-globin gene between the embryonic and adult genes in the -globin cluster also suggests that the gene might be developmentally regulated. Northern blot analyses of other tissues revealed no detectable μ-globin gene expression among nonerythroid tissues, suggesting tissue specificity. Microarray, Northern, and quantitative PCR analyses consistently demonstrated significantly higher levels of μ-globin in the fetal tissues compared with that found in the adult tissues. Hence, μ-globin demonstrates erythroid-specific expression with a pattern during ontogeny similar to that described for the -globin genes in the cluster.

    To determine the similarity of the predicted μ-globin protein with other known -like proteins, homology analyses were performed. The ORF-predicted protein from human μ-globin was compared with 291 known -like globin protein sequences deposited in GenBank using an MP algorithm. BLASTP and ClustalW alignments were also performed, including focused comparisons of heme, 1-1, and 1-2 binding. In each case, the predicted human μ-globin was most closely related to the avian and reptilian -D globins. Initial analyses of primate, bovine, and porcine genomes or associated eSTs suggest -D-encoded ORFs will soon be identified in a variety of mammals using comparative genomics. Hence, the μ-globin gene may represent an expressed homolog of an ancient globin gene.1 The -D globin family was first identified as an chain of hemoglobin M in chicken embryos.31 -D proteins assemble into high-oxygen affinity hemoglobins among avian31-34 and reptilian species.35-38 In both species, hemoglobins containing -D chain are expressed at all stages of ontogeny. embryonic expression may be advantageous as an embryonic adaptation to hypoxia. Adult expression may also provide a survival advantage associated with the ability to respond to the hypoxic conditions of high-altitude flight33,34 or those associated with prolonged submersion.35,36 Thus, it was postulated that an evolutionary advantage for -D globin expression might have arisen from hypoxic or anoxic conditions.39 The higher level of μ-globin gene expression in cord blood is consistent with the usefulness of high-oxygen affinity hemoglobins during fetal life. Unfortunately, the absence of detectable μ-globin protein in human erythroid tissues makes it difficult to extrapolate the avian and reptilian functional data to humans.

    Interestingly, most of the proteins predicted by recent genome mapping efforts have not yet been detected in nature.40,41 This may be attributed to low-level expression or low sensitivity of protein assays. In this context, our inability to detect μ-globin is not unusual. However, it is extremely curious that μ-globin is not the only gene in the human -globin cluster that lacks a detectable hemoglobin product. The -globin gene is also transcribed, but no protein or hemoglobin product has been detected.2-5 Both genes are well conserved at the genomic level with appropriate splicing junctions and maintenance of ORFs. Like μ-globin, -globin also has a highly regulated pattern of transcription in erythroid cells.5 In addition, both genes demonstrate only fractional levels of transcription compared with the dominant genes, and their deletion in humans has no reported effects on the clinical phenotype.42,43 Therefore, the evolutionary conservation of the μ- and -globin genes in the absence of a hemoglobin product represents a biologic paradox. This is especially puzzling when considering the low levels of gene products compared with the amount of globin required in humans for the transport of oxygen. In the case of μ-globin, the protein homology between the human, avian, and reptilian species in the absence of any significant genetic homology suggests a selective pressure to sustain the ORF. Therefore, it is uncertain whether this gene is evolving toward becoming a pseudogene. Instead, the possibility exists that this newly discovered, but ancient, globin has a function for which high-level protein expression is not required.

    Acknowledgements

    We thank Drs Douglas R. Higgs, Robert L. Danner, and Alan S. Schechter for helpful discussions. The ongoing cell-processing support of the National Institutes of Health Department of Transfusion Medicine is greatly appreciated.

    Footnotes

    Prepublished online as Blood First edition Paper, April 26, 2005; DOI 10.1182/blood-2005-03-0948.

    The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked "advertisement" in accordance with 18 U.S.C. section 1734.

    References

    Higgs DR. Vickers MA, Wilkie AO, Pretorius IM, Jarman AP, Weatherall DJ. A review of the molecular genetics of the human alpha globin gene cluster. Blood. 1989;73: 1081-1104.

    Hsu SL, Marks J, Shaw JP, et al. Structure and expression of the human 1 globin gene. Nature. 1988;331: 94-96.

    Marks J, Shaw J-P, Shen C-KJ. Sequence organization and genomic complexity of primate theta1 globin gene, a novel alpha-globin-like gene. Nature. 1986;321: 785-788.

    Clegg JB. Can the product of the gene be a real globin? Nature. 1987;329: 465-466.

    Albitar M, Care A, Peschle C, Liebhaber SA. Developmental switching of messenger RNA expression from the human alpha globin cluster: fetal/adult pattern of theta-globin gene expression. Blood. 1992;80: 1586-1591.

    Proudfoot NJ, Gil A, Maniatis T. The structure of the human zeta-globin gene and a closely linked, nearly identical pseudogene. Cell. 1982;31: 553-563.

    Hardison RC, Sawada I, Cheng J-F, Shen C-K, Schmid CW. A previously undetected pseudogene in the human alpha globin gene cluster. Nucleic Acids Res. 1986;14: 1903-1911.

    Proudfoot NJ, Maniatis T. The structure of a human alpha globin pseudogene and its relationship to alpha globin gene duplication. Cell. 1980;21: 537-544.

    The Genome International Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001;409: 860-921.

    Baross A, Butterfield YS, Coughlin SM, et al. Systematic recovery and analysis of full-ORF human cDNA clones. Genome Res. 2004;14: 2083-2092.

    Boguski MS, Lowe TM, Tolstoshev CM. DbeST-database for "expressed sequence tags." Nat Genet. 1993;4: 332-333.

    Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM. An abundance of bidirectional promoters in the human genome. Genome Res. 2004;14: 62-66.

    Wojda U, Noel P, Miller JL. Fetal and adult hemoglobin production during adult erythropoiesis: coordinate expression correlates with cell proliferation. Blood. 2002;99: 3005-3013.

    Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionization-time of flight-mass spectrometry. electrophoresis. 2000;21: 1164-1177.

    Caputo e, Moharram R, Martin BM. Methods for on-chip protein analysis. Anal Biochem. 2003;321: 116-124.

    Altschul SF, Gish W, Miller W, Myers eW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215: 403-410.

    Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12: 656-664.

    Scherf M, Klingenhoff A, Werner T. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J Mol Biol. 2000;297: 599-606.

    Chenna R, Sugawara H, Koike T, Lopez, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31: 3497-3500.

    Rogers JS, Swofford DL. Multiple local maxima for likelihoods of phylogenetic trees: a simulation study. Mol Biol evol. 1999;16: 1079-1085.

    Choi JH, Jung HY, Kim HS, Cho HG. PhyloDraw: a phylogenetic tree drawing system. Bioinformatics. 2000;16: 1056-1058.

    Steinberg MH, Forget BG, Higgs DR, Nagel RL, eds. Disorders of Hemoglobin: Genetics, Pathophysiology, and Clinical Management. Cambridge, United Kingdom: Cambridge University Press; 2001: 201-202.

    Kroeger KS, Kundrot Ce. Structures of a hemoglobin-based blood substitute: insights into the function of allosteric proteins. Structure. 1997;5: 227-237.

    Kozak M. An analysis of 5'-noncoding sequences from 699 messenger RNAs. Nucleic Acids Res. 1987;15: 8125-8148.

    Waggoner SA, Liebhaber SA. Regulation of alpha globin mRNA stability. exp Biol Med. 2003;228: 387-395.

    Gorr TA, Cahn JD, Yamagata H, Bunn HF. Hypoxia-induced synthesis of hemoglobin in the crustacean Daphnia magna is hypoxia-inducible factor-dependent. J Biol Chem. 2004;279: 36038-36047.

    Fung eT, Thulasiraman V, Weinberger SR, Dalmasso eA. Protein biochips for differential profiling. Curr Opin Biotechnol. 2001;12: 65-69.

    Boissel JP, Kasper TJ, Bunn HF. Cotranslational amino-terminal processing of cytosolic proteins: cell-free expression of site-directed mutants of human hemoglobin. J Biol Chem. 1988;263: 8443-8449.

    Grimwood J, Gordon LA, Olsen A, et al. The DNA sequence and biology of human chromosome. 19 Nature. 2004;428: 529-535.

    Proudfoot NJ, Shander MH, Manley JL, Gefter ML, Maniatis T. Structure and in vitro transcription of human globin genes. Science. 1980;209: 1329-1336.

    Brown JL, Ingram V. Structural studies on chick embryonic hemoglobins. J Biol Chem. 1974;249: 3960-3972.

    Chapman BS, Hood Le, Tobin AJ. Minor early embryonic chick hemoglobin M. J Biol Chem. 1982;257: 651-659.

    Hiebl I, Schneeganss D, Braunitzer G. High-altitude respiration of birds: the primary structures of the alpha D-chains of the Bar-headed goose (Anser indicus), the Greylag goose (Anser anser) and the Canada goose (Branta canadensis). Biol Chem Hoppe-Seyler. 1986;367: 591-599.

    Hiebl I, Schneeganss D, Grimm F, K?sters J, Braunitzer G. The primary structure of the major and minor hemoglobin component of adult european black vulture (Aegypius monachus, Aegypiinae). Biol Chem Hoppe-Seyler. 1987;368: 11-18.

    Rücknagel KP, Braunitzer G. The primary structure of the major and minor hemoglobin component of adult western painted turtle (Chrysemys picta bellii). Biol Chem Hoppe-Seyler. 1988;369: 123-131.

    Petruzzelli R, Aureli G, Lania A, Galtieri A, Desideri A, Giardina B. Diving behavior and haemoglobin function: the primary structure of the alpha- and beta-chains of the sea turtle (Caretta caretta) and its functional implications. Biochem J. 1996;316: 959-965.

    Fushitani K, Higashiyama K, Moriyama eN, Imai K, Hosokawa K. The amino acid sequences of two alpha chains of hemoglobins from Komodo dragon Mu-anus komodoensis and phylogenetic relationships of amniotes. Mol Biol evol. 1996;13: 1039-1043.

    Stoeckelhuber M, Gorr T, Kleinschmidt T. The primary structure of three hemoglobin chains from the Indigo snake (Drymachon corais erebennus, Serpentes): first evidence for alpha-D chains and two beta chain types in snakes. Biol Chem. 2002;383: 1907-1916.

    Bunn HF. evolution of mammalian hemoglobin function. Blood. 1981;58: 189-197.

    Southan C. Has the yo-yo stopped? An assessment of human protein-coding gene number. Proteomics. 2004;4: 1712-1726.

    Snyder M, Gerstein M. Genomics: defining genes in the genomics era. Science. 2003;300: 258-260.

    Fei YJ, Fujita S, Huisman TH. Two different theta globin gene deletions observed among black newborn babies. Br J Haematol. 1988;68: 249-253.

    Weatherall DJ, Clegg JB. The Thalassemia Syndromes. 4th ed. Oxford, United Kingdom: Blackwell Scientific; 2001: 133-191.

    Perutz MF, Wilkinson AJ, Paoli M, Dodson GG. The stereochemical mechanism of the cooperative effects in hemoglobin revisited. Annu Rev Biophys Biomol Struct. 1989;27: 1-34.

    Gascuel O. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol evol. 1997;14: 685-695.(Sung-Ho Goh, Y. Terry Lee)