Transcription
The transcription process takes place steadily along the full length of the gene's DNA - not 3 bases at a time - triplets come into play later
In eukaryotes, short sections of DNA corresponding to genes within the nucleus of the cell are processed by a sort of copying and editing process which results in a strand of
messenger RNA, which can leave the nucleus and travel to the ribosomes in the cytoplasm. This is under the influence of the enzyme
RNA polymerase (also known as DNA dependent RNA polymerase).
In prokaryotes a similar and simpler process occurs but the DNA is not organised and enclosed in a nucleus, and a different RNA polymerase is involved.
The two halves of the DNA double helix in the region of the locus of the gene become separated -
breaking the hydrogen bonds between nucleotides here - and the RNA polymerase attaches to one of the strands - labelled below as the antisense strand -
RNA nucleotides are brought in, one by one, to the exposed bases on the DNA which acts a template. This 'matching' is by
complementary base pairing, reinforced by hydrogen bonds.
DNA base ↓ |
Guanine (G) |
Cytosine (C) |
Thymine(T) |
Adenine (A) |
RNA base |
Cytosine (C) |
Guanine (G) |
Adenine (A) |
Uracil (U) |
The RNA polymerase causes a phosphodiester bond to form between the RNA nucleotides and then moves along repeating the process, so that a continuous RNA strand - the 'RNA transcript' - is built up alongside the DNA antisense strand - also sometimes known as the coding strand.
The transcription of a (small) gene - halfway through
Modified image - Courtesy: National Human Genome Research Institute
(Link below)
The 'sense strand' - which does not participate in the transcription process - has almost the same base sequence as the RNA produced from its sister strand.
In what way will the base sequence of the RNA transcript differ from the sense strand DNA?
>
RNA has U (uracil) where DNA has T (thymine)
What will be the sequence of the RNA transcript produced from the section above?
AUGACAGGGAAAUCGGUC - and then the section yet to form: >
CGCCUUAACCGCUGUUAG
In prokaryotes, messenger RNA is directly produced by transcription from DNA, as described above. It then moves to a ribosome which is not far away in the cytoplasm.
In eukaryotes, which generally have larger genes, transcription produces a type of RNA known as
pre-mRNA. This has non-coding sections -
introns - and these need to be edited out and
spliced in another subsequent stage within the nucleus to form functional messenger RNA which is composed of
exons - sections of RNA which are expressed.
This
messenger RNA then leaves the nucleus via a nuclear pore and passes out into the cytoplasm. It moves to the ribosomes on the rough endoplsmic reticulum, in order to engage in the the next phase.
Messenger RNA

Messenger RNA is composed of a
linear single strand of RNA nucleotides.
Similarities with DNA replication
The transcription process has several similarities to the DNA replication process, as well as some differences.
The two strands of DNA split apart, allowing access by enzymes. This gives a 'replication bubble'. In eukaryotes this involves DNA unwinding from its histone protein within the chromosome. For transcription, this happens in only a section of DNA, whereas all the DNA is involved in DNA replication.
Only one DNA strand is 'copied', and (like DNA polymerase) RNA polymerase only works in one direction, adding
RNA nucleotides to the 3' end of the developing RNA polynucleotide chain, alongside the original DNA strand it is attached to.
The main difference is that uracil (in RNA) pairs with adenine (in DNA).
The two DNA strands rejoin after mRNA has been produced.
More detail
RNA polymerases consist of several sub-units and these join together to start transcription.
This aggregation happens at specific base sequences -
promoter sections - just before the gene itself - 'upstream' from it.
In prokaryotes there are distinct recognisable base sequences 10 and 35 bases before the gene, and the repeated
A-T sections here
(with only two hydrogen bonds between bases) mean that the DNA strands can be more easily separated.
In eukaryotes there is a similar promoter section of DNA called the TATA box, 25-35 bases upstream of the gene itself. Several proteins called
transcription factors become attached at this point, and RNA polymerase attaches as well. Other sections of the promoter, about 80 bases back may have distinctive base sequences that attract other proteins which stabilise the attachment of RNA polymerase.
When the RNA polymerase is attached, it
initiates the transcription of the RNA from the DNA from the beginning of the gene, and this is followed by an
elongation phase as the enzyme moves along the antisense (template) strand of the DNA, allowing the developing RNA transcript strand to branch away from the DNA template.
Transcription ceases (
termination) as it meets a section where the DNA sequence causes the resulting RNA to be buckled, and the RNA polymerase leaves the DNA strand, allowing it to re-form into a double helix.
It is worth saying that in the initiation and termination phases, 'abortive' sections of RNA are produced and quickly disposed of. This underlines the idea that not all of the sequence of DNA bases is converted into protein.
Exons, introns and splicing
The terms exons and introns apply to both DNA and the resulting RNA transcript in eukaryotes.
Exons are
expressed, so they result in the production of a specific protein product, [or ribosomal RNA (rRNA), or transfer RNA (tRNA)] .
Introns are not expressed - they are
intervening sequences - intragenic regions - that are 'non-coding'. They are sometimes described as nonsense sequences.
Before and after splicing
UTR - untranslated region
Pre-RNA leaves the gene section of the DNA, and before the splicing process, it becomes protected by being capped at each end - a 7-methylguanosine cap at the 5'end and a 'poly-A tail' of about 200 A residues at the 3' end.
Introns generally have a GU base sequence at one end (5'), and an AG sequence at the other end (3'). These are recognised by a large RNA-protein complex, the
spliceosome, made up of five small nuclear ribonucleoproteins (snRNPs). The intron is formed into a loop, then the RNA is cut and the two exon ends joined. After all the introns are removed, there remains a section of joined exons which is described as
mature mRNA.
Alternative splicing refers to the possibility that different combinations of exons can be joined together in this process, resulting in a single gene coding for multiple proteins. This contributes to genetic diversity, and may result in different protein products in different tissues. It was once thought that there must be 100,000 protein coding genes in the human genome, but the Human Genome Project has discovered more like 20,000 possible genes, and the difference is presumably accounted for by alternative splicing.