mtDB: Human Mitochondrial Genome Database, a resource for population g
http://www.100md.com
《核酸研究医学期刊》
1Centre for Integrative Genomics, University of Lausanne Switzerland 2Department of Genetics and Pathology, Rudbeck Laboratory, University of Uppsala Uppsala, Sweden
*To whom correspondence should be addressed. Tel: +41 21 692 3962; Fax: +46 18 471 4931; Email: max.ingman@unil.ch
ABSTRACT
The mitochondrial genome, contained in the subcellular mitochondrial network, encodes a small number of peptides pivotal for cellular energy production. Mitochondrial genes are highly polymorphic and cataloguing existing variation is of interest for medical scientists involved in the identification of mutations causing mitochondrial dysfunction, as well as for population genetics studies. Human Mitochondrial Genome Database (mtDB) (http://www.genpat.uu.se/mtDB) has provided a comprehensive database of complete human mitochondrial genomes since early 2000. At this time, owing to an increase in the number of published complete human mitochondrial genome sequences, it became necessary to provide a web-based database of human whole genome and complete coding region sequences. As of August 2005 this database contains 2104 sequences (1544 complete genome and 560 coding region) available to download or search for specific polymorphisms. Of special interest to medical researchers and population geneticists evaluating specific positions is a complete list of (currently 3311) mitochondrial polymorphisms among these sequences. Recent expansions in the capabilities of mtDB include a haplotype search function and the ability to identify and download sequences carrying particular variants.
INTRODUCTION
The mitochondrial genome supplies parts of the protein machinery that are necessary for oxidative phosphorylation (OXPHOS), by utilizing a series of five multiple-subunit enzymes located within the mitochondrial inner membrane. The complex constituents are encoded by both nuclear and mitochondrial genes. A genetic defect could therefore be due to mutations in genes of either system. Since new mutations are introduced more frequently to the mitochondrial genome, a higher proportion of mitochondrial dysfunction is due to mitochondrial DNA (mtDNA) mutations. A number of human diseases have been shown to be caused by mitochondrial mutations, such as Leber's hereditary optic neuropathy (LHON) (1) and neurogenetic muscle weakness, ataxia, and retinitis pigmentosa (NARP) (2). In the evaluation of a possible functional effect of a mitochondrial variant found in a group of patients, reliable population frequency data for the variant under study is needed. The Human Mitochondrial Genome Database (mtDB) provides such a compilation of available genome sequences information for this purpose.
The mtDNA of most metazoan species (including humans) is predominantly maternally inherited (3). This clonal inheritance coupled with a substitution rate that in vertebrates is typically 5 to 10 times that of nuclear DNA (4) has made mitochondria an attractive source of DNA polymorphism data for population genetics studies in a wide range of species. The lack of recombination among maternal and paternal mitochondrial genomes allows the tracing of a direct genetic line where all polymorphism is due to mutation and the high substitution rate makes it possible to study variation between closely related individuals (i.e. within species). mtDNA sequences have been the main tool in a large number of studies of human evolution. The Human Mitochondrial Genome Database (mtDB) is a repository for these sequences and will provide scientists with access to a common resource for future studies in this field.
Since 2000, with the publication of the first comprehensive study on complete human mitochondrial genome sequences (5), the amount of data available from mitochondrial genomes has been growing rapidly. However, polymorphism information from these data is becoming more time consuming to produce. The mtDB provides a unique resource to both medical and human population genetic researchers. Here, published mitochondrial genome sequences are collected from GenBank and other sources (not all sequences are submitted to GenBank) and made available for download. In addition, extensive polymorphism information from the complete dataset is easily accessible.
Database content
The mtDB database contains three principal types of content for researchers:
Download of all mitochondrial sequences either as individuals or population sets. The sequences are grouped into 10 major geographic regions based on the population origin of the donor (Table 1). In cases where the geographic origin of the donor is different from their supposed historical background, the sequences are listed under the heading that best fits their donors' ancestry. For example, African American, European American and Asian American sequences are not listed under North America but under the headings Africa, Europe and Asia, respectively. Large sets from the same population are available as batches of individual files. All sequences are cross-referenced to their original publications and to GenBank accession numbers, where available. There are currently 2104 mitochondrial sequences at mtDB.
A list of all variable positions among complete, or near complete, mitochondrial sequences (Figure 1). Currently, 3311 polymorphic sites are identified and characterized in tabulated form. This table comprises a separate line for each variable site with a count of how many sequences contain each particular nucleotide variant at that site, the genic location of that site, the codon number and position and details of amino acid changes. An interested researcher can click on the number of a particular variant to obtain a list of all sequences that contain that particular variant. These sequences can then be downloaded from the list. All insertions relative to CRS have been removed.
A search function for mitochondrial haplotypes. This goes a step beyond the list of variable positions in that sequences carrying specific haplotypes can be retrieved by entering the position and nucleotide for up to 10 loci. Only sequences that match all these criteria will be returned. Again, these sequences can then be downloaded from the database.
Table 1 Summary table of the number of sequences from each of the 10 geographic regions
Figure 1 Truncated table of polymorphic sites. Each row of the table shows nucleotide position , CRS nucleotide state at that position, the number of database sequences with A, G, C, T or gap, and the functional region that the site is in. If the functional region is a protein coding gene, also listed is the codon number, the codon position, the amino acid state in CRS and for the variant, whether the change is synonymous or not. Clicking the number of sequences with a particular nucleotide state will retrieve a list of all sequences that carry that particular variant.
Some population genetics researchers use predefined haplotypes (haplogroups) purported to designate specific mitochondrial lineages. As a compliment to our search function, this page has a link to a haplogroup tree where clicking the individual haplogroup letters will return a list of all sequences that belong to that particular group.
Database interface
To facilitate easy updating of mtDB, all data pages are produced dynamically by PHP scripts. PHP is an easy-to-use scripting language that integrates well with HTML. Data is parsed on the server machine and an HTML output is sent to the client. This is independent of the client's operating system, browser and installed options. The only exception to this, is the polymorphic sites, nucleotide variants and amino acid states list which is produced by a separate script and the HTML output saved to avoid long processing time for individual requests. The core database is a text file of aligned sequences. New sequences can be simply pasted to this list and are then included in searches.
CONCLUSIONS
mtDB is the only comprehensive online source for the data contained within it. This includes the sequences themselves as many have not been deposited in a publicly available database such as GenBank. The list of mitochondrial polymorphisms continually grows with the addition of new sequences and is an important resource for phylogenetic and medical studies. The ability to search for multiple-variant haplotypes adds further detail to the latent data. We are committed to the maintenance of this database and hope that it will be a useful resource for researchers for years to come.
ACKNOWLEDGEMENTS
This research has been supported by the Swedish National Research Council (UG). Funding to pay the Open Access publication charges for this article was provided by Swedish National Research Council (UG).
REFERENCES
Wallace, D.C., Singh, G., Lott, M.T., Hodge, J.A., Schurr, T.G., Lezza, A.M., Elsas, L.J., II, Nikoskelainen, E.K. (1988) Mitochondrial DNA mutation associated with Leber's hereditary optic neuropathy Science, 242, 1427–1430 .
Holt, I.J., Harding, A.E., Petty, R.K., Morgan-Hughes, J.A. (1990) A new mitochondrial disease associated with mitochondrial DNA heteroplasmy Am. J. Hum. Genet, . 46, 428–433 .
Giles, R.E., Blanc, H., Cann, H.M., Wallace, D.C. (1980) Maternal inheritance of human mitochondrial DNA Proc. Natl Acad. Sci. USA, 77, 6715–6719 .
Brown, W.M., George, M., Jr, Wilson, A.C. (1979) Rapid evolution of animal mitochondrial DNA Proc. Natl Acad. Sci. USA, 76, 1967–1971 .
Ingman, M., Kaessmann, H., Paabo, S., Gyllensten, U. (2000) Mitochondrial genome variation and the origin of modern humans Nature, 408, 708–713 .
Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., et al. (1981) Sequence and organization of the human mitochondrial genome Nature, 290, 457–465 .(Max Ingman1,2,* and Ulf Gyllensten2)
*To whom correspondence should be addressed. Tel: +41 21 692 3962; Fax: +46 18 471 4931; Email: max.ingman@unil.ch
ABSTRACT
The mitochondrial genome, contained in the subcellular mitochondrial network, encodes a small number of peptides pivotal for cellular energy production. Mitochondrial genes are highly polymorphic and cataloguing existing variation is of interest for medical scientists involved in the identification of mutations causing mitochondrial dysfunction, as well as for population genetics studies. Human Mitochondrial Genome Database (mtDB) (http://www.genpat.uu.se/mtDB) has provided a comprehensive database of complete human mitochondrial genomes since early 2000. At this time, owing to an increase in the number of published complete human mitochondrial genome sequences, it became necessary to provide a web-based database of human whole genome and complete coding region sequences. As of August 2005 this database contains 2104 sequences (1544 complete genome and 560 coding region) available to download or search for specific polymorphisms. Of special interest to medical researchers and population geneticists evaluating specific positions is a complete list of (currently 3311) mitochondrial polymorphisms among these sequences. Recent expansions in the capabilities of mtDB include a haplotype search function and the ability to identify and download sequences carrying particular variants.
INTRODUCTION
The mitochondrial genome supplies parts of the protein machinery that are necessary for oxidative phosphorylation (OXPHOS), by utilizing a series of five multiple-subunit enzymes located within the mitochondrial inner membrane. The complex constituents are encoded by both nuclear and mitochondrial genes. A genetic defect could therefore be due to mutations in genes of either system. Since new mutations are introduced more frequently to the mitochondrial genome, a higher proportion of mitochondrial dysfunction is due to mitochondrial DNA (mtDNA) mutations. A number of human diseases have been shown to be caused by mitochondrial mutations, such as Leber's hereditary optic neuropathy (LHON) (1) and neurogenetic muscle weakness, ataxia, and retinitis pigmentosa (NARP) (2). In the evaluation of a possible functional effect of a mitochondrial variant found in a group of patients, reliable population frequency data for the variant under study is needed. The Human Mitochondrial Genome Database (mtDB) provides such a compilation of available genome sequences information for this purpose.
The mtDNA of most metazoan species (including humans) is predominantly maternally inherited (3). This clonal inheritance coupled with a substitution rate that in vertebrates is typically 5 to 10 times that of nuclear DNA (4) has made mitochondria an attractive source of DNA polymorphism data for population genetics studies in a wide range of species. The lack of recombination among maternal and paternal mitochondrial genomes allows the tracing of a direct genetic line where all polymorphism is due to mutation and the high substitution rate makes it possible to study variation between closely related individuals (i.e. within species). mtDNA sequences have been the main tool in a large number of studies of human evolution. The Human Mitochondrial Genome Database (mtDB) is a repository for these sequences and will provide scientists with access to a common resource for future studies in this field.
Since 2000, with the publication of the first comprehensive study on complete human mitochondrial genome sequences (5), the amount of data available from mitochondrial genomes has been growing rapidly. However, polymorphism information from these data is becoming more time consuming to produce. The mtDB provides a unique resource to both medical and human population genetic researchers. Here, published mitochondrial genome sequences are collected from GenBank and other sources (not all sequences are submitted to GenBank) and made available for download. In addition, extensive polymorphism information from the complete dataset is easily accessible.
Database content
The mtDB database contains three principal types of content for researchers:
Download of all mitochondrial sequences either as individuals or population sets. The sequences are grouped into 10 major geographic regions based on the population origin of the donor (Table 1). In cases where the geographic origin of the donor is different from their supposed historical background, the sequences are listed under the heading that best fits their donors' ancestry. For example, African American, European American and Asian American sequences are not listed under North America but under the headings Africa, Europe and Asia, respectively. Large sets from the same population are available as batches of individual files. All sequences are cross-referenced to their original publications and to GenBank accession numbers, where available. There are currently 2104 mitochondrial sequences at mtDB.
A list of all variable positions among complete, or near complete, mitochondrial sequences (Figure 1). Currently, 3311 polymorphic sites are identified and characterized in tabulated form. This table comprises a separate line for each variable site with a count of how many sequences contain each particular nucleotide variant at that site, the genic location of that site, the codon number and position and details of amino acid changes. An interested researcher can click on the number of a particular variant to obtain a list of all sequences that contain that particular variant. These sequences can then be downloaded from the list. All insertions relative to CRS have been removed.
A search function for mitochondrial haplotypes. This goes a step beyond the list of variable positions in that sequences carrying specific haplotypes can be retrieved by entering the position and nucleotide for up to 10 loci. Only sequences that match all these criteria will be returned. Again, these sequences can then be downloaded from the database.
Table 1 Summary table of the number of sequences from each of the 10 geographic regions
Figure 1 Truncated table of polymorphic sites. Each row of the table shows nucleotide position , CRS nucleotide state at that position, the number of database sequences with A, G, C, T or gap, and the functional region that the site is in. If the functional region is a protein coding gene, also listed is the codon number, the codon position, the amino acid state in CRS and for the variant, whether the change is synonymous or not. Clicking the number of sequences with a particular nucleotide state will retrieve a list of all sequences that carry that particular variant.
Some population genetics researchers use predefined haplotypes (haplogroups) purported to designate specific mitochondrial lineages. As a compliment to our search function, this page has a link to a haplogroup tree where clicking the individual haplogroup letters will return a list of all sequences that belong to that particular group.
Database interface
To facilitate easy updating of mtDB, all data pages are produced dynamically by PHP scripts. PHP is an easy-to-use scripting language that integrates well with HTML. Data is parsed on the server machine and an HTML output is sent to the client. This is independent of the client's operating system, browser and installed options. The only exception to this, is the polymorphic sites, nucleotide variants and amino acid states list which is produced by a separate script and the HTML output saved to avoid long processing time for individual requests. The core database is a text file of aligned sequences. New sequences can be simply pasted to this list and are then included in searches.
CONCLUSIONS
mtDB is the only comprehensive online source for the data contained within it. This includes the sequences themselves as many have not been deposited in a publicly available database such as GenBank. The list of mitochondrial polymorphisms continually grows with the addition of new sequences and is an important resource for phylogenetic and medical studies. The ability to search for multiple-variant haplotypes adds further detail to the latent data. We are committed to the maintenance of this database and hope that it will be a useful resource for researchers for years to come.
ACKNOWLEDGEMENTS
This research has been supported by the Swedish National Research Council (UG). Funding to pay the Open Access publication charges for this article was provided by Swedish National Research Council (UG).
REFERENCES
Wallace, D.C., Singh, G., Lott, M.T., Hodge, J.A., Schurr, T.G., Lezza, A.M., Elsas, L.J., II, Nikoskelainen, E.K. (1988) Mitochondrial DNA mutation associated with Leber's hereditary optic neuropathy Science, 242, 1427–1430 .
Holt, I.J., Harding, A.E., Petty, R.K., Morgan-Hughes, J.A. (1990) A new mitochondrial disease associated with mitochondrial DNA heteroplasmy Am. J. Hum. Genet, . 46, 428–433 .
Giles, R.E., Blanc, H., Cann, H.M., Wallace, D.C. (1980) Maternal inheritance of human mitochondrial DNA Proc. Natl Acad. Sci. USA, 77, 6715–6719 .
Brown, W.M., George, M., Jr, Wilson, A.C. (1979) Rapid evolution of animal mitochondrial DNA Proc. Natl Acad. Sci. USA, 76, 1967–1971 .
Ingman, M., Kaessmann, H., Paabo, S., Gyllensten, U. (2000) Mitochondrial genome variation and the origin of modern humans Nature, 408, 708–713 .
Anderson, S., Bankier, A.T., Barrell, B.G., de Bruijn, M.H., Coulson, A.R., Drouin, J., Eperon, I.C., Nierlich, D.P., Roe, B.A., Sanger, F., et al. (1981) Sequence and organization of the human mitochondrial genome Nature, 290, 457–465 .(Max Ingman1,2,* and Ulf Gyllensten2)