Gene transcriptions

From Wikiversity
Jump to: navigation, search
Micrograph of gene transcription of ribosomal RNA illustrates the growing primary transcripts. Credit: Hans-Heinrich Trepte.{{free media}}

DNA is a double helix of interlinked nucleotides surrounded by an epigenome. On the basis of biochemical signals, an enzyme, specifically a ribonucleic acid (RNA) polymerase, is chemically bonded to one of the strands (the template strand) of this double helix. The polymerase, once phosphorylated, begins to catalyze the formation of RNA using the template strand. Although the catalysis may have more than one beginning nucleotide (a start site) and more than one ending nucleotide (a stop site) along the DNA, each nucleotide sequence catalyzed that ultimately produces approximately the same RNA is part of a gene. The catalysis of each RNA representation from the template DNA is a transcription, specifically a gene transcription. The overall process is also referred to as gene transcription.


Main source: Heredity

Heredity is the passing on of traits from one generation to the next.


Def. the "appearance of an organism based on a multifactorial combination of genetic traits and environmental factors, especially used in pedigrees"[1] is called a phenotype.


Main source: Genetics
This is an image of Bob, the guinea pig. Credit: selbst.

Genetics involves the expression, transmission, and variation of inherited characteristics.

Theoretical gene transcriptions[edit]

Def. the "copying of DNA segments into RNA, by RNA polymerase, as the first stage of gene expression"[2] is called gene transcription.

Here's a theoretical definition:

Def. a catalysis process to produce each ribonucleic acid representation of a deoxyribonucleic acid gene, or isoform, is called gene transcription.

Nucleic acids[edit]

Main source: Nucleic acids
These images show natural and artificial nucleic acid polymers. Credit: Irina Anosova, Ewa A. Kowal, Matthew R. Dunn, John C. Chaput, Wade D. Van Horn1, and Martin Egli.

"Synthetic genetics is a subdiscipline of synthetic biology that aims to develop artificial genetic polymers (also referred to as xeno-nucleic acids or XNAs) that can replicate in vitro and eventually in model cellular organisms."[3]

Def. any "acidic, chainlike biological macromolecule consisting of multiply repeat units of phosphoric acid, sugar and purine and pyrimidine bases"[4] occurring in cell nuclei is called a nucleic acid.

Def. a nucleic acid "in which the sugar component is threose"[5] is called threose nucleic acid, or threonucleic acid (TNA).

Additional DNAs may be

  1. deoxyapionucleic acid,
  2. deoxyarabinonucleic acid,
  3. deoxyxylonucleic acid (dXyNA),
  4. deoxylyxonucleic acid,
  5. deoxyribulonucleic acid, and
  6. deoxyxylulonucleic acid.

Synthesis of deoxyapionucleic acid has been accomplished.[6]

Deoxyxylonucleic acid and xylose nucleic acid have been produced.[7]

"[X]ylonucleic acid (XyloNA) [contains] a potentially prebiotic xylose sugar (a 3′-epimer of ribose) in its backbone."[7]

A "number of sugar-modified nucleic acid variants has been revealed as new genetic polymers, (2) some of them are endowed with catalytic activity (for e.g. FANA and HNA) (3). The structure of these artificial nucleic acids, however, mimics natural nucleic acid helicity (4)."[7]

"Although helices display a distinct pitch and curvature, they feature ca. 11–12 base pairs per turn, and χ/δ covariance plots indicate that the backbones of XNA:RNA or XNA:DNA heteroduplexes adopt an architecture that is either closely related to the A-form, as in the case of [1,5-anhydrohexitol nucleic acid (HNA)] HNA:RNA (96), [locked nucleic acid (LNA)] LNA:RNA (83), [cyclohexene nucleic acid (CeNA)] CeNA:RNA (85) and PNA:RNA (59), or between the A- and B-forms, as seen in the structures of DNA:RNA (97), [arabinonucleic acid (ANA)] ANA:RNA (79), [2′-deoxy-2′-fluoro-arabinonucleic acid (FANA)]FANA:RNA (79) and [peptide nucleic acid (PNA)] PNA:DNA (98)."[3]

Additional XNAs include bridged nucleic acid (BNA) glycol nucleic acid (GNA), FANA and peptide nucleic acid (PNA).

On the right is a diagram displaying various artificial and natural nucleic acid polymers.

"Representative structures illustrate the structural diversity and plasticity of natural and artificial nucleic acid (XNA) backbones. Structures are shown in alphabetic order. (A) Natural genetic polymers: B-form DNA (black), DNA:RNA hybrid and A-form RNA (gray). (B) Representative structures of XNA heteroduplexes with RNA or DNA. The RNA strand is shown in gray, the DNA strand in black and the orientation of the XNA strand is indicated. (C) XNA homoduplexes. Homo-XNA duplexes adopt a variety of structures. (D) Representative XNA-only heteroduplexes. FAF:FAF stands for FANA(F)-ANA(A)-FANA(F) XNA:XNA heteroduplex. Alt and chim indicate the alternated or chimeric order of FANA-segments in the duplex sequences respectively. The depicted duplexes have the following PDB ID codes in the Protein Data Bank ( B-DNA (3BSE); DNA:RNA (1EFS); A-RNA (3ND4); ANA(purple):RNA (2KP3); CeNA(blue):RNA (3KNC); FANA(violet):RNA (2KP4); HNA(yellow):RNA (2BJ6); LNA(cyan):RNA (1H0Q); PNA(orange):DNA (1PDT); PNA(orange):RNA (176D); CeNA:CeNA (blue, 2H0N); hDNA:hDNA (sky blue, 2H9S); FRNA:FRNA (magenta, 3P4A); GNA:GNA (red, 2XC6); HNA:HNA (yellow, 481D); LNA:LNA (cyan, 2×2Q); PNA:PNA (orange, 2K4G), TNA:TNA (green, coordinates not deposited in the PDB [...]); dXyNA:dXyNA (brown, coordinates not deposited in the PDB [...]); XyNA:XyNA (light green, 2N4J); FAF:FAF (FANA in violet, ANA in purple, 2LSC), FRNA:FANA (alt) (FRNA in magenta, FANA in violet, 2M8A); FRNA:FANA (chim) (FRNA in magenta, FANA in violet, 2M84)."[3]

Deoxyribonucleic acid[edit]

This diagram shows the chemical structure of deoxyribonucleic acid, with colored labels identifying the four nucleobases, the phosphate, and deoxyribose components. Credit: Madeleine Price Ball, Madprime.

Deoxyribonucleic acid (DNA) is a polymer composed of nucleic acids linked together with the sugar deoxyribose.


Main sources: Human DNA/Strands and Strands

DNA in humans consists of two strands. One, or a portion of one, is from each parent. The portion of a strand that is transcribed to produce an RNA that is translatable into a protein is usually referred to as the template strand. That portion of the other strand is then the coding strand because it should contain the nucleotides recorded in, or composing, the transcribed RNA.


Main source: Epigenomes
This is a schematic representation of a nucleosome. Credit: Zephyris.

Inside each eukaryote nucelus is genetic material (DNA) surrounded by protective and regulatory proteins. These protective and regulatory proteins and the dynamic changes to them that occur during the course of a eukaryote's existence are the epigenome.


Main source: Genes

Def. "[a] unit of heredity; a segment of DNA or RNA that is transmitted from one generation to the next, and that carries genetic information such as the sequence of amino acids for a protein"[8] is called a gene.


Def. a "section of DNA that controls the initiation of RNA transcription as a product of a gene"[9] is called a gene promoter, or a promoter in the field of genetics.

Proximal promoters[edit]

Def. any proximal nucleotide sequence upstream of the gene that tends to contain primary regulatory elements is called a proximal promoter.

Core promoters[edit]

The core promoter is the minimal portion of the promoter required to properly initiate gene transcription.[10] It contains a binding site for RNA polymerase (RNA polymerase I, RNA polymerase II, or RNA polymerase III).

The core promoter is approximately -34 nt upstream from the TSS.


A single strand of DNA [has a positive sense (+)] if an RNA version of the same sequence is translated or translatable into protein. Its complementary strand is called antisense (or negative (-) sense). Sometimes the phrase coding strand is encountered; however, protein coding and non-coding RNA's can be transcribed similarly from both strands, in some cases being transcribed in both directions from a common promoter region, or being transcribed from within introns, on both strands".[11][12][13]

The two complementary strands of double-stranded DNA (dsDNA) are usually differentiated as the "sense" strand and the "antisense" strand. The DNA sense strand looks like the messenger RNA (mRNA) and can be used to read the expected protein code by human eyes (e.g. ATG codon = Methionine amino acid). However, the DNA sense strand itself is not used to make protein by the cell. It is the DNA antisense strand which serves as the source for the protein code, because, with bases complementary to the DNA sense strand, it is used as a template for the mRNA. Since transcription results in an RNA product complementary to the DNA template strand, the mRNA is complementary to the DNA antisense strand. The mRNA is what is used for translation (protein synthesis).

The only real biological information that is important for labeling strands is the location of the 5' phosphate group and the 3' hydroxyl group because these ends determine the direction of transcription and translation. A sequence 5' CGCTAT 3' is equivalent to a sequence written 3' TATCGC 5' as long as the 5' and 3' ends are noted. If the ends are not labeled, convention is to assume that the sequence is written in the 5' to 3' direction. Good rule of thumb for figuring out the "sense" strand: Look for the start codon ATG (AUG in mRNA). In the table example, the sense mRNA has the AUG codon at the end (remember that translation proceeds in the 5' to 3' direction).

Preinitiation complexes[edit]

The diagram describes the eukaryotic preinitiation complex which includes the general transcription factors and RNA Polymerase II. Credit: ArneLH.
Here is a diagram of the attachment of RNA polymerase II to the de-helicized DNA. Credit: Forluvoft.

For eukaryotic transcription, the RNA polymerase II holoenzyme de-helicizes the DNA, attaches along the template strand.

Once the preinitiation complex has found its appropriate attachment section along the template strand of DNA, RNA polymerase II is attached and begins transcription.

Preinitiation complex assembly[edit]

DNA melting[edit]

Often included in this process is the separation of the DNA double helix from the epigenome.

The TATA-binding protein may serve to bend the double helix by 80°.

Generally, DNA melting involves the separation of the two strands so that transcription can begin on the template strand.

"TFIIH [...] is required for DNA melting".[14]

"TFIIE positions TFIIH in a configuration capable of melting the DNA."[14]

The "RAP30 WH domain [may play] an essential role in positioning the flexible promoter DNA downstream of BREd along the Pol II cleft, thus facilitating subsequent steps in the promoter melting process."[14]

"The INR element is sandwiched precisely between these two protein-DNA contacts, an arrangement that may be relevant in promoter melting at the correct position in the DNA. The slightly open clamp conformation seen upon DNA placement onto the cleft following TFIIF addition is likely due to the interaction of the DNA with the clamp head β sheet".[14]

Both "the TFIIB linker helix and the TFIIF arm domain align with the promoter melting start site".[14]

The "tip of the TFIIF arm domain contains seven positively charged residues, whereas four positively charged residues are present on the side of the TFIIB linker helix that faces the DNA [...]. The juxtaposition of these domains within the melting start site is consistent with their direct role in DNA interactions."[14]

The "clamp domain in the open state moves down to engage the open DNA bubble, adopting the conformation observed in the elongation state37 [...]. Thus, the clamp domain completes an open to closed transition throughout the process of [preinitiation complex assembly] PIC assembly and promoter opening [...]. [An] additional protein density now extends from the bottom of the clamp and connects to the dimerization domain of TFIIF [...]. Rigid body fitting of crystal structures suggests that this density corresponds to the stabilized rudder of Pol II and the arm domain of TFIIF. [These] elements [likely] interact with each other as the clamp closes down over the melted DNA. Interestingly, this proposed interaction would prevent re-annealing of the melted DNA. The TFIIB linker helix is near this position and likely participates in the promoter melting process as well. This [...] is consistent with our hypothesis that the flexible TFIIB linker helix and the TFIIF arm domain act together in promoter opening [...]."[14]

"Once promoter DNA melting is further extended and the Pol II clamp closes down, the TFIIB linker helix and the TFIIF arm domain work together with the Pol II rudder to maintain the upstream edge of the DNA bubble."[14]

RNA polymerase II holoenzyme complexes[edit]

RNA polymerase II is recruited to the promoters of protein-coding genes in living cells.[15] Or, transcription factories are present and the euchromatin is brought within the nearest transcription factory and A1BG messenger RNA (mRNA) is transcribed.

For those circumstances in which the holoenzyme is built onto the euchromatin, it is necessary to consider the holoenzyme components and the likely sequence of binding, RNA polymerase II entrance upon the scene and subsequent action.

RNA polymerase II (also called RNAP II and Pol II) ... catalyzes the transcription of DNA to synthesize precursors of mRNA and most snRNA and microRNA.[16] In humans RNAP II consists of seventeen protein molecules (gene products encoded by POLR2A-L, where the proteins synthesized from 2C-, E-, and F-form homodimers).

RNA polymerase II holoenzyme complex may also have to search for one or more transcription start sites.

Transcription start sites[edit]

The transcription start site is the location where transcription starts at the 5'-end of a gene sequence.[17]

A start site is a biochemically signaled nucleotide or set of nucleotides for attachment either to the epigenome or the DNA.


Def. "the process of transferring a phosphate group [e.g., PO43-] from a donor to an acceptor; often catalysed by enzymes"[18] is called phosphorylation.


Main source: Hypotheses
  1. Gene transcription can occur for each gene, or isoform, on either strand, template (-) or coding (+).
  2. Gene transcription can occur for each gene, or isoform, in either direction (+ or -), e.g., (+) → {ATG} or (-) {ATG} ←.
  3. Gene transcription can occur for the complement (c) of each gene, or isoform, e.g., {TAC} (c).
  4. Gene transcription can occur for the inverse (i) of each gene, or isoform, e.g. {CAT} (i).
  5. Gene transcription can occur for the complement of the inverse of each gene, or isoform, e.g. {GTA}

See also[edit]


  1. phenotype. San Francisco, California: Wikimedia Foundation, Inc. 12 September 2016. Retrieved 2016-10-04. 
  2. SemperBlotto (23 July 2016). gene transcription. San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-07-23. 
  3. 3.0 3.1 3.2 Irina Anosova, Ewa A. Kowal, Matthew R. Dunn, John C. Chaput, Wade D. Van Horn1, and Martin Egli (15 December 2015). "The structural diversity of artificial genetic polymers". Nucleic Acids Research. doi:10.1093/nar/gkv1472. Retrieved 2016-01-21. 
  4. nucleic acid. San Francisco, California: Wikimedia Foundation, Inc. January 12, 2013. Retrieved 2013-04-19. 
  5. threose nucleic acid, In: Wiktionary. San Francisco, California: Wikimedia Foundation, Inc. November 14, 2012. Retrieved 2013-04-19. 
  6. Mayumi Kataoka, Yasuo Kouda, Kousuke Sato, Noriaki Minakawaa and Akira Matsuda (14 August 2011). "Highly efficient enzymatic synthesis of 3′-deoxyapionucleic acid (apioNA) having the four natural nucleobases". Chemical Communications 47 (30): 8700-2. doi:10.1039/C1CC12980E.!divAbstract. Retrieved 2016-01-19. 
  7. 7.0 7.1 7.2 Mohitosh Maiti, Munmun Maiti, Christine Knies, Shrinivas Dumbre, Eveline Lescrinier, Helmut Rosemeyer, Arnout Ceulemans and Piet Herdewijn (13 July 2015). "Xylonucleic acid: synthesis, structure, and orthogonal pairing properties". Nucleic Acids Research 43: 7189-200. doi:10.1093/nar/gkv719. Retrieved 2016-01-21. 
  8. gene, In: Wiktionary. San Francisco, California: Wikimedia Foundation, Inc. December 13, 2012. Retrieved 2012-12-13. 
  9. promoter. San Francisco, California: Wikimedia Foundation, Inc. September 20, 2012. Retrieved 2012-09-29. 
  10. Stephen T. Smale and James T. Kadonaga (July 2003). "The RNA Polymerase II Core Promoter". Annual Review of Biochemistry 72 (1): 449-79. doi:10.1146/annurev.biochem.72.121801.161520. PMID 12651739. Retrieved 2012-05-07. 
  11. Anne-Lise Haenni (2003). "Expression strategies of ambisense viruses". Virus Research 93 (2): 141–150. doi:10.1016/S0168-1702(03)00094-7. PMID 12782362. 
  12. Kakutani T, Hayano Y, Hayashi T, Minobe Y (1991). "Ambisense segment 3 of rice stripe virus: the first instance of a virus containing two ambisense segments". J Gen Virol. 72: 465–8. PMID 1993885. 
  13. Zhu Y, Hayakawa T, Toriyama S, Takahashi M (1991). "Complete nucleotide sequence of RNA 3 of rice stripe virus: an ambisense coding strategy". J Gen Virol 72: 763–7. PMID 2016591. 
  14. 14.0 14.1 14.2 14.3 14.4 14.5 14.6 14.7 Yuan He, Jie Fang, Dylan J. Taatjes, and Eva Nogales (28 March 2013). "Structural visualization of key steps in human transcription initiation". Nature 495 (7442): 481-6. doi:10.1038/nature11991. PMID 23446344. Retrieved 2016-07-22. 
  15. Myer VE, Young RA (October 1998). "RNA polymerase II holoenzymes and subcomplexes". J. Biol. Chem. 273 (43): 27757–60. doi:10.1074/jbc.273.43.27757. PMID 9774381. 
  16. Kornberg R (1999). "Eukaryotic transcriptional control". Trends in Cell Biology 9 (12): M46. doi:10.1016/S0962-8924(99)01679-7. PMID 10611681. 
  17. Marketa Zvelebil, Jeremy O. Baum (2008). Understanding bioinformatics. Garland Science. ISBN 978-0815340249. 
  18. SemperBlotto (22 July 2016). phosphorylation. San Francisco, California: Wikimedia Foundation, Inc. Retrieved 2016-07-22. 

External links[edit]

{{Gene project}}