当前位置: 首页 > 期刊 > 《核酸研究》 > 2004年第12期 > 正文
编号:11371796
Generation of longer 3' cDNA fragments from massively parallel signatu
http://www.100md.com 《核酸研究医学期刊》
     Laboratory of Molecular Biology and Genomics, Ludwig Institute for Cancer Research, S?o Paulo 01509-010, Brazil, 1 Department of Medicine, University of Chicago, Chicago, IL 60637, USA and 2 Functional Genomics Laboratory, ENH Research Institute, Northwestern University, Evanston, IL 60201, USA

    * To whom correspondence should be addressed. Tel: +55 11 3388 3248; Fax: +55 11 3207 7001; Email: anamaria@compbio.ludwig.org.br

    ABSTRACT

    Massively Parallel Signature Sequencing (MPSS) is a powerful technique for genome-wide gene expression analysis, which, similar to SAGE, relies on the production of short tags proximal to the 3'end of transcripts. A single MPSS experiment can generate over 107 tags, providing a 10-fold coverage of the transcripts expressed in a human cell. A significant fraction of MPSS tags cannot be assigned to known transcripts (orphan tags) and are likely to be derived from transcripts expressed at very low levels (1 copy per cell). In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI protocol (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS tags into their corresponding 3' cDNA fragments. GLGI-MPSS was applied to 83 orphan tags and 41 cDNA fragments were obtained. The analysis of these 41 fragments allowed the identification of novel transcripts, alternative tags generated from polymorphic and alternatively spliced transcripts, as well as the detection of artefactual MPSS tags. A systematic large-scale analysis of the genome by MPSS, in combination with the use of GLGI-MPSS protocol, will certainly provide a complementary approach to generate the complete catalog of human transcripts.

    INTRODUCTION

    Determining the structure and expression pattern of the genes encoded in the human genome is one of the major challenges of the post-genomic era (1,2). Several techniques have been developed for such analyses, which depend either on specific hybridization of probes to microarrays or on the counting of transcript-specific tags (3–5). These techniques provide a faithful representation of the more abundantly expressed transcripts. However, due to technical limitations, rare mRNA species are usually underrepresented.

    Recently, Brenner et al. (6) described a novel high-throughput method for genome-wide gene expression analysis, named Massively Parallel Signature Sequencing (MPSS). Similar to SAGE (Serial Analysis of Gene expression), MPSS is capable of analyzing gene expression without a priori knowledge of the transcript sequence and irrespective of mRNA abundance. In the SAGE technique, a short sequence tag with either 10 nt (original SAGE) or 17 nt (long SAGE) adjacent to the 3'most NlaIII restriction site is extracted from each expressed sequence. The extracted tags are then concatenated for high-throughtput sequencing analysis and tag counts are used to measure the relative abundance of their corresponding transcripts (4). Usually, over 50 000 tags are generated within a single SAGE experiment. MPSS also relies on the production of short tags adjacent to DpnII restriction sites and proximal to the 3'end of transcripts. However, due to the combination of in vitro cloning of cDNA molecules on the surface of microbeads (7) with non-gel-based high-throughput signature sequencing, a single MPSS experiment can generate over 107 tags (100 times more than that in a SAGE experiment) (6). In principle, this number is sufficient to provide a 10-fold coverage of the transcripts expressed in a human cell and to characterize human transcripts expressed at very low levels (1 copy per cell), a capability matched by no other currently available technique.

    MPSS relies on efficient computational tools for the extraction of tag sequences and counts from raw sequence files, as well as for establishing comparisons of tag abundances between different libraries. Another crucial point in interpreting MPSS data is the assignment of experimentally observed tags to a specific transcript sequence. However, tag to gene assignment is not a straightforward process. A small percentage (4.6%) of the MPSS tags matches multiple transcript sequences and a significant portion of these tags (18%) have no match to known transcript sequences (8).

    MPSS was used to characterize the transcriptome of two human cell lines, HCT-116 (colon adenocarcinoma) and HB4a (mammary luminal epithelium) (8). In the HCT-116 library, the signature sequences comprised 24 065 unique tags present at >3 t.p.m. (tags per million) and 54 704 unique tags at <3 t.p.m. In the HB4a library, the numbers were 17 354 and 36 982, respectively. Tags detected by MPSS at a frequency of <3 t.p.m. in both MPSS libraries were not considered to be reliable, since a frequency of 3 t.p.m. corresponds to roughly one transcript per cell.

    A total of 27 689 unique reliable tags found at >3 t.p.m. in at least one of the two MPSS libraries were identified and an annotation database corresponding to a comprehensive map of the transcribed regions of the human genome was used to assign MPSS tags to their corresponding transcripts (8). A total of 17 992 of the 27 689 MPSS reliable tags generated were assigned to human transcripts. However, due to MPSS deep coverage, a significant portion of the unique tags could not be assigned to known human transcripts and were denominated as orphan tags. Approximately half of these 9697 orphan tags were shown to correspond to sequencing errors and genetic polymorphisms. Of the 4806 orphan tags that could not be mapped to transcripts nor attributed to polymorphisms and sequencing errors, 3765 were mapped to the human genome sequence, out of which 2645 were mapped in a unique position. Of these 2645 tags, 958 (36%) mapped to introns of known genes in the expected orientation relative to the direction of transcription, suggesting that they could be derived from yet unmapped regions of known genes. The remaining 1687 tags were considered to derive from novel human transcripts (8). The existence and further characterization of these novel transcripts requires further experimental verification.

    In order to explore the potential of MPSS for the characterization of the human transcriptome, we have adapted the GLGI technique (Generation of Longer cDNA fragments from SAGE tags for Gene Identification) to convert MPSS orphan tags into their corresponding 3' cDNA fragments. The GLGI technique was initially developed to further characterize SAGE tags with multiple matches to known transcripts or with no match at all to transcript sequences (9). GLGI was recently improved into a high-throughput format for simultaneous conversion of a large number of SAGE tags into their corresponding 3' cDNA sequences (10,11). In this work, we describe the adapted protocol, named GLGI-MPSS, which proved to be very useful for identifying novel transcripts, for detecting polymorphic and alternatively spliced transcripts as well as for identifying artefactual MPSS tags.

    MATERIALS AND METHODS

    Source of MPSS tag sequences

    The MPSS tag sequences used in this study were extracted from the HB4a library and tag counts for these tags within the HB4a library varied from 1 to 94 t.p.m. (Table 1). A comprehensive map of the transcribed regions of the human genome, including experimentally defined polyadenylation sites and information about intron–exon boundaries, was used to assign MPSS tags to their corresponding transcripts and to construct our tag reference database. Using this map, transcripts, whose sequence is derived from the genome and whose polyadenylation sites are known, were reconstituted. Transcripts were then scanned for the presence of DpnII restriction sites and a 13 nt sequence adjacent to the 3' most DpnII site was extracted as the virtual MPSS tag. MPSS tags that were not represented in our reference database and, thus, could not be assigned to known transcripts were defined as orphan tags. A total of 83 randomly selected orphan tags, with a single match to the human genome sequence, were used for GLGI amplification. In addition, MPSS tags corresponding to the MGP (NM_000900 ) and KRT16 (NM_005557 ) genes were used as controls for testing the specificity of GLGI amplification. A table containing the 85 MPSS tag sequences and count is provided as Supplementary Material.

    Table 1. GLGI-MPSS results for 41 MPSS orphan tags

    cDNA synthesis and DpnII digestion

    The same RNA source used for MPSS analysis was used for GLGI amplification. Total RNA was prepared from HB4a cells seeded in four 150 mm diameter plates (P150) using the cesium chloride cushion technique (12). Poly(A+) RNA was isolated from 180 μg of total RNA with oligo(dT)25 Dynabeads (Dynal) and the total yield of this purification was used for cDNA synthesis. cDNA synthesis was carried out as previously described (13) using 5' biotinylated, 3' anchored oligo (dT) primers (5' biotin-ACT ATC TAG AGC GGC CGC-T16-R where R = A/G and 5' biotin-ACT ATC TAG AGC GGC CGC-T16-C-V where V = A/G/C). Double-strand cDNA was then digested with 150 U of DpnII (New England Biolabs) for 2 h at 37°C and 3' cDNAs were isolated with streptavidin M280 beads (Dynal).

    3' cDNA amplification

    To generate sufficient 3' cDNA for large-scale GLGI analysis, we amplified 3' cDNA templates by PCR as follows: 140 ng of Linker A (linker A: 5'-TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGAGATC-3' and 5'-pTCCCTATTAAGCCTAGTTGTACTGCACCAGCAAATCC -3') was ligated to 3' cDNA bound to the streptavidin beads. The ligation reaction was incubated overnight at 16°C in the presence of 18 U T4 DNA ligase (Invitrogen) at a final volume of 30 μl. An amount of 1.5 μl 3' cDNA ligated to linker A was amplified by 20 cycles of PCR at 94°C for 30 s, 55°C for 30 s and 72°C for 35 s with 2.5 U Taq Platinum DNA polymerase (Invitrogen), 100 ng of sense primer complementary to Linker A (5'-GGATTTGCTGGTGCAGTACA-3') and 100 ng of antisense primer (5'-ACTATCTAGAGCGGCCGCTT-3') complementary to the anchored oligo (dT) primer in a final volume of 50 μl. The amplified templates were extracted with phenol/chloroform and ethanol precipitated in the presence of 50 μg of glycogen. Pellets were dissolved in 50 μl of TE and 0.5–0.8 μl was used for GLGI amplification.

    GLGI-MPSS reaction

    The sense primer used for GLGI amplification included 17 bases of the MPSS tag sequence and 6 additional bases CAGGGA, giving a total of 23 bases for each primer (5'-CAGGGAGATCXXXXXXXXXXXXX-3'). A GLGI mixture was prepared in a final volume of 30 μl, including 1x PCR buffer, 2.0 mM MgCl2, 83 μM dNTPs, 2.3 ng/μl antisense primer, 2.3 ng/μl sense primer, 0.5–0.8 μl of cDNA template, 1.5 U of Taq Platinum DNA polymerase (Invitrogen). PCR conditions used for amplification were 94°C for 2 min, followed by 30 cycles at 94°C for 30 s, 64°C for 30 s and 72°C for 35 s. Reactions were kept at 72°C for 5 min after the last cycle. The amplified products were ethanol precipitated in the presence of 50 μg of glycogen. After spinning at 4 000 r.p.m. for 35 min at 4°C (swing-bucket rotor, Type A-4-62; Eppendorf), the supernatants were removed and the pellets were dissolved in 5 μl of H2O.

    Characterization of GLGI-MPSS fragments

    GLGI-MPSS fragments were cloned into the pGEM?-T Easy vector (Promega). Ligation mixtures were prepared in a final volume of 10 μl, containing 50 ng pGEM?-T Easy vector, 1x Rapid Ligation Buffer, 5 U T4 DNA ligase and 2 μl of GLGI-MPSS purified product. The ligation reaction was kept overnight at 4°C and 2 μl of the ligated DNA was used to transform DH10B eletrocompetent cells. Eight colonies for each GLGI fragment were screened by PCR. Amplification was carried out at a final volume of 20 μl, containing 1x PCR buffer, 1.5 mM MgCl2, 0.2 mM dNTPs, 1 U Taq DNA Polymerase (Invitrogen), and 10 pmol of T7 and SP6 universal primers. Colonies were picked directly from transformation plates using sterile pipette tips. DNA was initially denatured at 95°C for 5 min and amplification was carried out at 94°C for 30 s, 55°C for 30 s and 72°C for 1 min for 30 cycles. Positive colonies were sequenced using Big-Dye Terminator (Applied Biosystems) and pGEM-T universal primers and sequences were analyzed in an ABI3100 sequencer (Applied Biosystems).

    Sequence analysis

    All sequences without the 13 bp MPSS tag sequence (excluding the DpnII site) were not considered for further analysis. The selected sequences were searched against the GenBank Database (NR and dbEST) using the BLAST N program (http://www.ncbi.nlm.nih.gov/BLAST/). A sequence generated from an orphan MPSS tag was classified as novel if no matches to a transcript sequence (full-length or EST) were found; a sequence was considered to represent a known gene if it matched a full-length transcript sequence with >95% similarity in the same orientation, including the same 17 bp MPSS tag sequence; a sequence was classified as known EST if it matched ESTs; a sequence was classified either as unspecific or as an alternative tag derived from the presence of single nucleotide polymorphisms (SNPs) if it matched partially in the region corresponding to the MPSS tag sequence; a sequence was classified as an antisense transcript if it matched with high similarity (95%) to known transcripts but in the opposite orientation. Finally, a sequence was considered either as alternative splicing or as an artefactual sequence derived from internal priming if it matched the middle of known full-length transcripts in the 5'–3' orientation and there was, in cases of internal priming, a poly(A) track immediately downstream of the matched region.

    RESULTS AND DISCUSSION

    GLGI-MPSS amplification and size distribution of the 3' cDNAs

    We have adapted the GLGI-SAGE protocol (11) to convert MPSS orphan tags of 17nt into their corresponding 3' cDNA fragments. The adapted protocol, named GLGI-MPSS, was used to amplify 83 MPSS orphan tags. A few modifications were introduced in the original GLGI-SAGE protocol. These modifications included changes in the linker sequence used for 3' cDNA amplification, the use of a higher number of PCR cycles in the GLGI-MPSS amplification and the screening of eight independent colonies for each GLGI fragment instead of four colonies as adopted for GLGI-SAGE. Modifications in the linker sequence were necessary due to the use of a different restriction enzyme (DpnII) for the construction of MPSS libraries and the increase in the number of PCR cycles and colonies screened was necessary due to the lower expression level of the transcripts corresponding to MPSS tags.

    Representative GLGI-MPSS amplifications are shown in Figure 1A. Two MPSS tags assigned to known transcripts were used as positive amplification controls and dominant bands of expected sizes were obtained (Figure 1B). The specificity of these two control fragments was further confirmed by DNA sequencing. The amplification of a dominant band was observed for 41 of the 83 (49.4%) GLGI-MPSS amplifications (Figure 1A). The size distribution of the amplified fragments ranged from 29 to 509 bp (Table 1). More than a half of the 3' cDNAs fragments were >120 bp, which is about 10 times the length of the starting MPSS tag sequence and hence provides additional and valuable information for further characterization of the corresponding transcripts.

    Figure 1. GLGI-MPSS amplification. (A) GLGI amplifications for MPSS orphan tags (lanes 48 to 83) were analyzed on agarose gels stained with ethidium bromide. Note that most lanes show only a single amplified band whereas others have more than one band and sometimes a smear. A 100 bp ladder (M) was used as molecular weight marker. (B) Control amplification (CT1 and CT2) and schematic representation of their expected GLGI-MPSS fragments. The hatched boxes correspond to the MPSS tags used for GLGI-MPSS amplifications.

    For half (42/83) of the GLGI-MPSS reactions, a dominant band corresponding to the amplified cDNA fragment could not be visualized in agarose gels after 30 cycles of amplification (Figure 1A). This fact could be attributed to the low expression level of the transcripts from which the MPSS tags were derived. Although we have not observed any relationship between tag count and a positive GLGI-MPSS amplification for the tags analyzed in this study, it has been reported in the literature that the efficiency of GLGI-SAGE amplification is proportional to the abundance of SAGE tags and consequently to the expression level of the corresponding transcript. Amplification of SAGE tags with a high copy number (over 50 copies) usually generates a dominant band, whereas amplification of SAGE tags with a lower copy number (under 50 copies) yields less product and may contain additional bands (10). Modifications in the GLGI-MPSS protocol, especially the increase in the number of cycles, could in theory solve this problem. However, for GLGI-SAGE the amplification with a high number of PCR cycles results in non-specific amplification due to partial annealing of the sense primer with other templates (10).

    Similar results were obtained for GLGI-MPSS. A total of 10 MPSS tags which did not produce a dominant cDNA fragment were re-amplified using 35 cycles. As can be seen in Figure 2, dominant bands >100 bp were obtained for 2 out of the 10 tags tested (tags 66 and 80). These fragments were cloned, sequenced and shown to be non-specific after similarity searches against GenBank. Another possibility to explain the observed GLGI-MPSS amplification efficiency would be the fact that a MPSS tag does not always provide an ideal sequence for primer design and for efficient PCR amplification. In this context, we have observed that among several used sense primers that could form stable secondary structures, only one generated a specific GLGI-MPSS fragment.

    Figure 2. Non-specific GLGI-MPSS amplification using high PCR cycles. Ten MPSS orphan tags (tags 49, 53, 55, 61, 66, 67, 73, 75, 79 and 80) that did not produce dominant bands in the standard GLGI-MPSS amplification were re-amplified using 35 cycles. Fragments were analyzed on agarose gels stained with ethidium bromide. A 100 bp ladder (M) was used as molecular weight marker.

    Analysis of the 3'cDNAs generated from MPSS orphan tags

    GLGI-MPSS fragments were cloned, and eight colonies for each GLGI fragment were sequenced. The average length of these sequences was 185 bp. All the sequences were searched for similarity using the BLAST N program against the GenBank Database (non-redundant and ESTs). These sequences are provided as Supplementary Material. Of the 41 fragments analyzed (39%), 16 showed a high score match to a known human transcript (Table 1). These matches, however, were partial in the region corresponding to the MPSS sense primers (usually the last 4–7 bases of the sense primer). These results suggest that these primers had non-specifically annealed to the mRNA molecule corresponding to the known transcript during the GLGI-MPSS amplification.

    Compared with standard PCR, only the sense primer provides specificity for the amplification in GLGI reactions. When the expression level of targeted templates is very low, partial annealing of the sense primers with other highly expressed templates can result in non-specific amplification. Similar amplification specificity (60%) has been reported for GLGI amplification of SAGE tags with low copy numbers. The amplification specificity was, however, higher (85%) for tags with high copy numbers (10). Although it has been reported that the number of specific GLGI-SAGE products can increase (10–15%) through screening additional colonies for each reaction (10), similar results were not obtained for GLGI-MPSS, possibly due to the very low expression level of the transcripts corresponding to these MPSS tags.

    Of the 41 GLGI-MPSS fragments, 10 were confirmed as specific 3' extensions by the presence of 3' poly(dA) tail and polyadenylation signal and by the absence of internal DpnII restriction sites within the amplified sequences. Of these 10 GLGI-MPSS fragments, 4 matched known full-length transcript sequences and 6 matched EST sequences mainly derived from normalized or substracted cDNA libraries (Table 1). All these full-length transcript sequences and ESTs matched by the GLGI-MPSS fragments were submitted to GenBank after the construction of our tag reference database, explaining why their corresponding MPSS tags were originally classified as orphan tags. These GLGI-MPSS fragments cannot be considered as derived from novel human transcripts. However, the matches between the GLGI-MPSS fragments and recently submitted sequences further confirmed the specificity of the GLGI-MPSS protocol.

    The analysis of the remaining 15 GLGI-MPSS fragments proved very useful in the identification of putative antisense transcripts, alternative tags generated from polymorphic transcripts, as well as in distinguishing tags derived from alternatively spliced transcripts from artefactual tags derived from internal oligo (dT) priming during MPSS library construction. First, 5 of the 15 GLGI-MPSS fragments matched known full-length cDNA sequences in the databases with >95% similarity but in the opposite orientation (3'–5'). All 5 GLGI-MPSS fragments have 3' poly(A) tail, and 3 of them have poly(A) signal, suggesting that these sequences are derived from antisense transcripts located in the opposite strand of the known sequence (Table 1). A better characterization of these antisense transcripts, however, will require further experimental validation.

    Second, 3 of the 15 GLGI-MPSS fragments matched the 3' region of a known human transcript almost perfectly, except for a single base substitution located within the 4 bp DpnII restriction site present on the sense primer sequence of the GLGI-MPSS fragment. After careful analysis of these sequences, we could conclude that these three orphan tags were actually derived from a known polymorphic transcript in which the presence of an SNP in the HB4a cell line (located downstream of the original MPSS tag) created an alternative DpnII restriction site not represented in the full-length cDNA sequences used for tag-to-gene assignments (Table 1). The presence of these SNPs in the transcripts expressed in the HB4a cell line produced alternative MPSS tags located downstream of the original tag. These alternative tags could not be correctly assigned to a known transcript based on the analysis of publicly available transcript sequences and were thus considered as orphan tags.

    For example, the GLGI-MPSS fragment derived from the orphan MPSS tag GATCTCTGGTTTGAAAG matched the NASP gene (NM_002482 ). The 3' most DpnII site present on publicly available NASP transcript sequences is located around nucleotides 2862 to 2865 and the original MPSS tag assigned to this gene is GATCTTGCTCTTCAGTG. The observed match for the GLGI-MPSS sequence was almost perfect from nucleotides 3040 to 3323 of the NASP sequence, except for a single base substitution within the DpnII site (GAG/TC) of the sense primer. This base substitution present in the HB4a cell line created a 3' DpnII site (and as a consequence an alternative MPSS tag) not represented in NASP transcript sequences available in public databases. The existence of this SNP within the NASP gene was confirmed after consulting NCBI SNP database build 108 (SNPid RS1053941), as well as after genotyping the HB4a cell line by DNA amplification and digestion with DpnII (data not shown). Taken together these results show that GLGI-MPSS can be successfully used to identify alternative MPSS tags derived from polymorphic transcripts.

    Finally, 7 of the 15 GLGI-MPSS fragments matched the middle of known full-length transcripts in the 5'–3' orientation. Five of these MPSS-GLGI extensions are probably derived from internal oligo (dT) priming during cDNA synthesis and, thus, these MPSS orphan tags can be classified as artefactual (Table 1). For example, the partial sequence of the GLGI-MPSS fragment derived from the orphan MPSS tag GATCCAAAAGTTCACTT matched the MBP4 gene (BC034463 ) from nucleotide 373 to 744. A stretch of poly(A) is present from nucleotides 751 to 762, which could have served as internal oligo (dT) priming site. However, for two of these six remaining GLGI-MPSS, we could not find evidence of the occurrence of internal priming and they are classified as derived from yet uncharacterized alternatively spliced transcripts (Table 1).

    CONCLUSIONS

    The number of genes predicted from the human genome sequence (30 000–40 000) has turned out to be much lower than earlier estimates (14,15). However recent data based on the analysis of transcriptional units in human chromosomes 21 and 22 (16), EST to genome alignments (17,18) as well as SAGE (11,19) showed that the number of transcribed sequences in the human genome could be an order of magnitude higher than the initial estimates. Genome and transcriptome complexity is thus greater than initially predicted and most of the missing genes and transcript variants are probably expressed at very low abundance levels.

    As expected due to the deep coverage, a high portion of MPSS tags cannot be assigned to known transcripts and are likely to be derived from novel human transcript and alternatively spliced variants expressed at very low levels. However due to their short size (17 nt) it is difficult to use tag sequence information for further characterization of these novel transcripts. To overcome this limitation, we have developed a GLGI-MPSS protocol to convert MPSS orphan tags of 17 nt into their corresponding 3' cDNA fragments. GLGI-MPSS proved to be very useful in detecting novel antisense transcripts (alternative MPSS tags generated by the presence of alternatively spliced and polymorphic transcripts) as well as in identifying artefactual MPSS tags derived from experimental errors. The whole process is rapid, specific and highly efficient for large-scale analysis. A systematic large-scale analysis of the genome by MPSS, together with the use of GLGI-MPSS protocol provides a complementary approach to generate a complete catalog of human transcripts.

    SUPPLEMENTARY MATERIAL

    ACKNOWLEDGEMENTS

    The authors gratefully acknowledge the support of the Ludwig Institute for Cancer Research and the National Cancer Institute for conducting the MPSS analysis from which the orphan tags were extracted. In particular, we would like to thank Dr Sandro de Souza and Dr Victor Jongeneel for the bioinformatics analysis and Dr Andrew J. G. Simpson and Dr Munro Neville for the access to the MPSS database. The authors thank Dr Luís Fernando L. Reis for critically reading this manuscript. This work is supported by NIH 1R01 HG002600 (SMW) and the CEPID Program from the Funda??o de Amparo a Pesquisa do Estado de S?o Paulo (FAPESP 98/14335-2). A.P.M.S. is sponsored by a fellowship from FAPESP.

    REFERENCES

    Lander,E.S. ( (1996) ) The new genomics: global views of biology. Science, , 274, , 536–539.

    Collins,F.S, Patrinos,A., Jordan,E., Chakravarti,A., Gesteland,R. and Walters,L. ( (1998) ) New goals for the U.S. Human Genome Project: 1998–2003. Science, , 282, , 682–689.

    Okubo,K., Hori,N., Matoba,R., Niiyama,T., Fukushima,A., Kojima,Y. and Matsubara,K. ( (1992) ) Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression. Nature Genet., , 2, , 173–179.

    Velculescu,V.E., Zhang,L., Vogelstein,B. and Kinzler,K.W. ( (1995) ) Serial analysis of gene expression. Science, , 270, , 484–487.

    Duggan,D.J., Bittner,M., Chen,Y., Meltzer,P. and Trent,J.M. ( (1999) ) Expression profiling using cDNA microarrays. Nature Genet., , 21, , 10–14.

    Brenner,S., Johnson,M., Bridgham,J., Golda,G., Lloyd,D.H., Johnson,D., Luo,S., McCurdy,S., Foy,M., Ewan,M. et al. ( (2000) ) Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol., , 18, , 630–634.

    Brenner,S., Williams,S.R., Vermaas,E.H., Storck,T., Moon,K., McCollum,C., Mao,J.I., Luo,S., Kirchner,J.J., Eletr,S. et al. ( (2000) ) In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc. Natl Acad. Sci. USA, , 97, , 1665–1670.

    Jongeneel,C.V., Iseli,C., Stevenson,B.J., Riggins,G.J., Lal,A., Mackay,A., Harris,R.A., O'Hare,M.J., Neville,A.M., Simpson,A.J. and Strausberg,R.L. ( (2003) ) Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing. Proc. Natl Acad. Sci. USA, , 100, , 4702–4705.

    Chen,J.J., Rowley,J.D. and Wang,S.M. ( (2000) ) Generation of longer cDNA fragments from serial analysis of gene expression tags for gene identification. Proc. Natl Acad. Sci. USA, , 97, , 349–353.

    Chen,J., Lee,S., Zhou,G. and Wang,S.M. ( (2002) ) High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3' complementary DNAs. Genes Chromosomes Cancer, , 33, , 252–261.

    Chen,J., Sun,M., Lee,S., Zhou,G., Rowley,J.D. and Wang,S.M. ( (2002) ) Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags. Proc. Natl Acad. Sci. USA, , 99, , 12257–12262.

    Chirgwin,J.M., Przybyla,A.E., MacDonald,R.J. and Rutter,W.J. ( (1979) ) Isolation of biologically active ribonucleic acid from sources enriched in ribonuclease. Biochemistry, , 18, , 5294–5299.

    Wang,S.M. and Rowley,J.D. ( (1998) ) A strategy for genome-wide gene analysis: Integrated procedure for gene identification. Proc. Natl Acad. Sci. USA, , 95, , 11909–11914.

    Lander,E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. ( (2001) ) Initial sequencing and analysis of the human genome. Nature, , 409, , 860–921.

    Venter,J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. ( (2001) ) The sequence of the human genome. Science, , 291, , 1304–1351.

    Kapranov,P., Cawley,S.E., Drenkow,J., Bekiranov,S., Strausberg,R.L., Fodor,S.P. and Gingeras,T.R. ( (2002) ) Large-scale transcriptional activity in chromosomes 21 and 22. Science, , 296, , 916–919.

    de Souza,S.J., Camargo,A.A., Briones,M.R., Costa,F.F., Nagai,M.A., Verjovski-Almeida,S., Zago,M.A., Andrade,L.E., Carrer,H., El-Dorry,H.F. et al. ( (2000) ) Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl Acad. Sci. USA, , 97, , 12690–12693.

    Reymond,A., Camargo,A.A., Deutsch,S., Stevenson,B.J., Parmigiani,R.B., Ucla,C., Bettoni,F., Rossier,C., Lyle,R., Guipponi,M. et al. ( (2002) ) Nineteen additional unpredicted transcripts from human chromosome 21. Genomics, , 79, , 824–832.

    Saha,S., Sparks,A.B., Rago,C., Akmaev,V., Wang,C.J., Vogelstein,B., Kinzler,K.W. and Velculescu,V.E. ( (2002) ) Using the trasncriptome to annotate the genome. Nat. Biotechnol., , 20, , 508–512.(Ana Paula M. Silva, Jianjun Chen1, Dirce)