Eukaryotic gene example
Introduction[edit | edit source]
A gene can be described by listing the linear sequence of nucleotide subunits that constitutes the "gene's sequence". Eukaryotic genes exist inside cells as DNA molecules. The four nucleotide subunits of DNA are illustrated in the figure shown to the right on this page. For the DNA structure shown in the figure, the sequence of nucleotide subunits can be summarized as ACTG or TGAC. The two strands of the molecule are complementary according to the rules of Base pairing, so it is only necessary to provide the sequence of one strand; the complementary strand can be deduced according to the base pair rules.
Many genomes have been sequenced and their gene sequences are stored in general DNA sequence databases (e.g. GenBank) and in species specific databases (e.g. The Arabidopsis Information Resource (TAIR).
This tutorial and Figures 1-3, below, make us of one specific gene sequence as an example: the sequence of the AMY1 gene, which is one of the approximately 25,000 genes from Arabidopsis thaliana the Thale Cress plant. The AMY1 gene encodes an alpha amylase, an enzyme. Plant cells use the genetic instructions in this gene as a guide for making the amylase protein.
Figures 1-3, as described below, are views of the AMY1 gene sequence, cDNA, and coding sequence (CDS). A cDNA sequence contains part of a gene's entire sequence. The cDNA sequence has the part of the gene sequence that is found in a mature mRNA. The AMY1 gene sequence provides a convenient example of the important features that are found in most eukaryotic genes. The sequences of genes are used by researchers to help them understand living organisms. Gene research for Arabidopsis might involve studies of seed germination or plant food flavour.
Questions[edit | edit source]
Q1. What is the difference between a gene sequence and a cDNA sequence?
cDNA[edit | edit source]
Several related views of the AMY1 sequence can be found in gene databases. These include views of the 'full length CDS', 'full length cDNA' (Fig. 1) and 'full length genomic' (Figure 3, below) sequences. These sequences typically use the DNA alphabet (A, T, G, C) although, strictly, the CDS should be shown as RNA (AUG etc.) since it represents an RNA sequence.
Note that the AMY1 cDNA sequence starts with 20 nucleotides (AAACCATTCA CAATCAGACA) that do not code for amino acids in the amylase enzyme. The 5' untranslated sequence and the intron/exon structure of the AMY1 gene transcript is shown in Figure 2. Only the exon sequences specify the amino acid structure of the amylase enzyme.
Question[edit | edit source]
Q1. Define "intron" and "exon".
Gene and mRNA[edit | edit source]
The mature mRNA is composed of a 5' UTR (red) CDS (uppercase yellow) and 3' UTR (red again) (Figure 3). All three of these regions are exonic (not just the protein coding sequence (CDS)). Introns are shown in purple (lowercase) and are not present in the mature mRNA.
For convenience neither the 5' Cap nor 3' tail are shown in the cDNA (Figure 1) although the mRNA will have them. The gene sequence is also shown in a form where the codons can be read (ATG...), rather than as the template DNA strand which is actually copied into mRNA.
Your turn[edit | edit source]
Haemophilia A is treated with the blood clotting protein Factor VIII. Factor VIII is isolated from donated blood and the blood supply is contaminated with a mysterious virus that is killing hemophiliacs.
It will be possible to manufacture uncontaminated Factor VIII (as is done for insulin) if you can obtain the gene sequence that codes for Factor VIII. Find the human (Homo sapiens) Factor VIII cDNA sequence in this database. Describe what you find below.
Results and questions[edit | edit source]
Q1. In order to provide full Factor VIII function to hemophiliacs, do you need to obtain the cDNAs for both Transcript variant 1 and Transcript variant 2?
References[edit | edit source]
- Free online textbook: The Cell. Chapter 4. Introns and Exons.
- Free online textbook: Molecular Biology of the Cell. Chapter 6. RNA Splicing Removes Intron Sequences from Newly Transcribed Pre-mRNAs.
- Coombes, R. (2007). "Bad blood.". BMJ (Clinical research ed.) 334 (7599): 879–880. doi:10.1136/bmj.39195.621528.59. PMID 17463458. PMC 1857798. //www.ncbi.nlm.nih.gov/pmc/articles/PMC1857798/.