language-icon Old Web
English
Sign In

Human genome

The human genome is the complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells (the egg and sperm gamete cells created in the meiosis phase of sexual reproduction before fertilization creates a zygote) consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have twice the DNA content. While there are significant differences among the genomes of human individuals (on the order of 0.1%), these are considerably smaller than the differences between humans and their closest living relatives, the chimpanzees (approximately 4%) and bonobos. The human genome is the complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the nuclear genome, and the mitochondrial genome. Human genomes include both protein-coding DNA genes and noncoding DNA. Haploid human genomes, which are contained in germ cells (the egg and sperm gamete cells created in the meiosis phase of sexual reproduction before fertilization creates a zygote) consist of three billion DNA base pairs, while diploid genomes (found in somatic cells) have twice the DNA content. While there are significant differences among the genomes of human individuals (on the order of 0.1%), these are considerably smaller than the differences between humans and their closest living relatives, the chimpanzees (approximately 4%) and bonobos. The first human genome sequences were published in nearly complete draft form in February 2001 by the Human Genome Project and Celera Corporation. Completion of the Human Genome Project's sequencing effort was announced in 2004 with the publication of a draft genome sequence, leaving just 341 gaps in the sequence, representing highly-repetitive and other DNA that could not be sequenced with the technology available at the time. The human genome was the first of all vertebrates to be sequenced to such near-completion, and as of 2018, the diploid genomes of over a million individual humans had been determined using next-generation sequencing. These data are used worldwide in biomedical science, anthropology, forensics and other branches of science. Such genomic studies have lead to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, including human evolution. Although the sequence of the human genome has been (almost) completely determined by DNA sequencing, it is not yet fully understood. Most (though probably not all) genes have been identified by a combination of high throughput experimental and bioinformatics approaches, yet much work still needs to be done to further elucidate the biological functions of their protein and RNA products. Recent results suggest that most of the vast quantities of noncoding DNA within the genome have associated biochemical activities, including regulation of gene expression, organization of chromosome architecture, and signals controlling epigenetic inheritance. Prior to the acquisition of the full genome sequence, estimates of the number of human genes ranged from 50,000 to 140,000 (with occasional vagueness about whether these estimates included non-protein coding genes). As genome sequence quality and the methods for identifying protein-coding genes improved, the count of recognized protein-coding genes dropped to 19,000-20,000. However, a fuller understanding of the role played by genes expressing regulatory RNAs that do not encode proteins has raised the total number of genes to at least 46,831, plus another 2300 micro-RNA genes. By 2012, functional DNA elements that encode neither RNA nor proteins have been noted. and another 10% equivalent of human genome was found in a recent (2018) population survey. Protein-coding sequences account for only a very small fraction of the genome (approximately 1.5%), and the rest is associated with non-coding RNA genes, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as yet no function has been determined. In June 2016, scientists formally announced HGP-Write, a plan to synthesize the human genome. The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, plus the X chromosome (one in males, two in females) and, in males only, one Y chromosome. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. (Data source: Ensembl genome browser release 87, December 2016 for most values; Ensembl genome browser release 68, July 2012 for miRNA, rRNA, snRNA, snoRNA.) Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation. The number of genes in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies. Variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December, 2016. The number of identified variations is expected to increase as further personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.

[ "Genome", "Gene Wiki", "GENCODE", "Human accelerated region 1", "Human Genome Mapping Project", "Genome human" ]
Parent Topic
Child Topic
    No Parent Topic