Site author Richard Steane
The BioTopics website gives access to interactive resource material, developed to support the learning and teaching of Biology at a variety of levels.

DNA and protein synthesis

This topic is a natural follow-on to DNA, genes and chromosomes , Nucleic acid structure, and DNA replication . . (links below).
Mouseover of green text (in the main body, not headings) should bring in further explanation in a small popup window.

Chromosome, ribosome: What comes next?

The term genome covers all the genes in a cell, organism or species.

Similarly, the term proteome covers all the proteins that may be produced by it.

More terminology - revision really (to help below)

What is the difference between a base and a nucleotide?
A base > is a simpler (heterocyclic aromatic) structure - with one or two rings
A nucleotide > includes a base as well as a 5-C sugar and a phosphate group.

What is the difference between a DNA nucleotide and an RNA nucleotide?
DNA nucleotide > has the pentose sugar deoxyribose (and phosphate) and one of the bases A, C, G or T
RNA nucleotide > has the pentose sugar ribose (and phosphate) and one of the bases A, C, G or U

Genomics is defined as the study of genes and their functions, and related techniques.
Whereas genetics refers to the composition and functioning of individual genes, genomics addresses all genes and their inter relationships in order to identify their combined influence on the growth and development of the organism.

There are a number of other terms ending -ome and -omics within this field.


The transcription process takes place steadily along the full length of the gene's DNA - not 3 bases at a time - triplets come into play later
In eukaryotes, short sections of DNA corresponding to genes within the nucleus of the cell are processed by a sort of copying and editing process which results in a strand of messenger RNA, which can leave the nucleus and travel to the ribosomes in the cytoplasm. This is under the influence of the enzyme RNA polymerase (also known as DNA dependent RNA polymerase).

In prokaryotes a similar and simpler process occurs but the DNA is not organised and enclosed in a nucleus, and a different RNA polymerase is involved.

The two halves of the DNA double helix in the region of the locus of the gene become separated - breaking the hydrogen bonds between nucleotides here - and the RNA polymerase attaches to one of the strands - labelled below as the antisense strand - RNA nucleotides are brought in, one by one, to the exposed bases on the DNA which acts a template. This 'matching' is by complementary base pairing, reinforced by hydrogen bonds.

DNA base ↓ Guanine (G) Cytosine (C) Thymine(T) Adenine (A)
RNA base Cytosine (C) Guanine (G) Adenine (A) Uracil (U)

The RNA polymerase causes a phosphodiester bond to form between the RNA nucleotides and then moves along repeating the process, so that a continuous RNA strand - the 'RNA transcript' - is built up alongside the DNA antisense strand - also sometimes known as the coding strand.

The transcription of a (small) gene - halfway through transcription (250K) Modified image - Courtesy: National Human Genome Research Institute
(Link below)

The 'sense strand' - which does not participate in the transcription process - has almost the same base sequence as the RNA produced from its sister strand.

In what way will the base sequence of the RNA transcript differ from the sense strand DNA?
> RNA has U (uracil) where DNA has T (thymine)

What will be the sequence of the RNA transcript produced from the section above?
AUGACAGGGAAAUCGGUC - and then the section yet to form: > CGCCUUAACCGCUGUUAG

In prokaryotes, messenger RNA is directly produced by transcription from DNA, as described above. It then moves to a ribosome which is not far away in the cytoplasm.

In eukaryotes, which generally have larger genes, transcription produces a type of RNA known as pre-mRNA. This has non-coding sections - introns - and these need to be edited out and spliced in another subsequent stage within the nucleus to form functional messenger RNA which is composed of exons - sections of RNA which are expressed.
This messenger RNA then leaves the nucleus via a nuclear pore and passes out into the cytoplasm. It moves to the ribosomes on the rough endoplsmic reticulum, in order to engage in the the next phase.
Messenger RNA
mRNA (75K)
Messenger RNA is composed of a linear single strand of RNA nucleotides.

Similarities with DNA replication

The transcription process has several similarities to the DNA replication process, as well as some differences.

The two strands of DNA split apart, allowing access by enzymes. This gives a 'replication bubble'. In eukaryotes this involves DNA unwinding from its histone protein within the chromosome. For transcription, this happens in only a section of DNA, whereas all the DNA is involved in DNA replication.

Only one DNA strand is 'copied', and (like DNA polymerase) RNA polymerase only works in one direction, adding RNA nucleotides to the 3' end of the developing RNA polynucleotide chain, alongside the original DNA strand it is attached to.

The main difference is that uracil (in RNA) pairs with adenine (in DNA).

The two DNA strands rejoin after mRNA has been produced.

More detail

RNA polymerases consist of several sub-units and these join together to start transcription.

This aggregation happens at specific base sequences - promoter sections - just before the gene itself - 'upstream' from it.

In prokaryotes there are distinct recognisable base sequences 10 and 35 bases before the gene, and the repeated A-T sections here (with only two hydrogen bonds between bases) mean that the DNA strands can be more easily separated.

In eukaryotes there is a similar promoter section of DNA called the TATA box, 25-35 bases upstream of the gene itself. Several proteins called transcription factors become attached at this point, and RNA polymerase attaches as well. Other sections of the promoter, about 80 bases back may have distinctive base sequences that attract other proteins which stabilise the attachment of RNA polymerase.

When the RNA polymerase is attached, it initiates the transcription of the RNA from the DNA from the beginning of the gene, and this is followed by an elongation phase as the enzyme moves along the antisense (template) strand of the DNA, allowing the developing RNA transcript strand to branch away from the DNA template.
Transcription ceases (termination) as it meets a section where the DNA sequence causes the resulting RNA to be buckled, and the RNA polymerase leaves the DNA strand, allowing it to re-form into a double helix.

It is worth saying that in the initiation and termination phases, 'abortive' sections of RNA are produced and quickly disposed of. This underlines the idea that not all of the sequence of DNA bases is converted into protein.

Exons, introns and splicing

The terms exons and introns apply to both DNA and the resulting RNA transcript in eukaryotes.

Exons are expressed, so they result in the production of a specific protein product, [or ribosomal RNA (rRNA), or transfer RNA (tRNA)] .
Introns are not expressed - they are intervening sequences - intragenic regions - that are 'non-coding'. They are sometimes described as nonsense sequences.

Before and after splicing 512px-Pre-mRNA.svg (11K) UTR - untranslated region
Pre-RNA leaves the gene section of the DNA, and before the splicing process, it becomes protected by being capped at each end - a 7-methylguanosine cap at the 5'end and a 'poly-A tail' of about 200 A residues at the 3' end.

Introns generally have a GU base sequence at one end (5'), and an AG sequence at the other end (3'). These are recognised by a large RNA-protein complex, the spliceosome, made up of five small nuclear ribonucleoproteins (snRNPs). The intron is formed into a loop, then the RNA is cut and the two exon ends joined. After all the introns are removed, there remains a section of joined exons which is described as mature mRNA.

Alternative splicing refers to the possibility that different combinations of exons can be joined together in this process, resulting in a single gene coding for multiple proteins. This contributes to genetic diversity, and may result in different protein products in different tissues. It was once thought that there must be 100,000 protein coding genes in the human genome, but the Human Genome Project has discovered more like 20,000 possible genes, and the difference is presumably accounted for by alternative splicing.


This is the production of polypeptides whose amino acid sequence is determined by the sequence of codons carried by mRNA. In other words the primary level of structure of the protein product comes from the order of bases in the RNA, which in turn came from the order of the DNA bases.

The translation process occurs at ribosomes - which have slightly different form, and size, in prokaryotes and eukaryotes, as well as in mitochondria and chloroplasts. In eukaryotes, ribosomes are attached to the endoplasmic reticulum.

Messenger RNA attaches to the ribosome (which also consists of sub-units that form around the RNA strand). These are composed of ribosomal RNA (rRNA), as well as proteins.

Simplified version of a tRNA moleculetRNA (22K)
There are over 20 different versions of tRNA and this one would bring the amino acid glycine to the ribosome
Once in position within the ribosome, each group of 3 bases - triplets or codons - attracts in another form of RNA - transfer RNA (tRNA) - which displays a complementary set of bases - the 'anti-codon'. This brings in a specific amino acid to be added to the polypeptide chain.

There are more than 20 forms of transfer RNA, each of which has a distinctive 'clover leaf' shape caused by folding and hydrogen bonding within sections of the single stranded molecule, and each has a different 3-base anticodon exposed at one end, and a different amino acid attached at the other end of the molecule.

In the preparation of the various forms of tRNA, specific enzymes are activated by ATP and some of its energy is used to combine the individual tRNA with its specific amino acid. This also effectively powers the combination of each amino acid with the developing polypeptide chain by a condensation reaction.

As each transfer RNA enters into the ribosome, its amino acid is joined by a covalent peptide bond to the previous one, so that a chain of amino acid residues is built up as the ribosome moves along the mRNA strand. The transfer RNA which has given up its amino acid then leaves the ribosome and is 'reloaded' with another molecule (of the same amino acid) before repeating the process.
The translation process at the ribosome

ribosome (62K)
Amino acids (black) are being brought in and added to the polypeptide chain.
tRNAs are shown in green.
There are two sites within the ribosome that position each incoming amino acid relative to the chain, as specified by the codon-anticodon pairing.

More about ribonucleic acids

Messenger RNA

The functional piece of mRNA begins with the 'start codon' and ends with the 'end codon'. It is usually between 75 and 300 bases in length.

In fact the start codon codes for the amino acid methionine, so all polypeptides have methionine as the first amino acid, but this is often removed after translation has taken place.

The end codon does not bring in an amino acid.

The number of amino acids coded for by a section of mRNA is thus a little less than 1/3 of the number of nucleotides in the RNA.

Transfer RNA

Each tRNA molecule consists of 75-90 RNA bases, partly folded back on themselves and held in this position by hydrogen bonding between complementary base pairs. This gives the molecule 3 loops and a tail, which is sometimes drawn as a 'clover leaf'. In 3 dimensions they have an L-shape similar to a foot!

These molecules are known by the name or code of the amino acid they may carry, such as tRNAtyr. This reminds us that the tRNA exists as an RNA polymer and as a RNA-amino acid complex.

There may be quite a lot more than 20 transfer RNA molecules. The genetic code has 61 triplet combinations that code for amino acids, but there may not be 61 tRNAs to participate in codon-anticodon pairings.

Some of the RNA bases in tRNA may be modified or substituted with groups that interact with a number of other bases. This may allow different versions of transfer RNA to bring in the same amino acid in response to different triplet codons, which is the basis for the degeneracy of the genetic code. In particular, the nucleoside inosine (I), which contains the base hypoxanthine derived from adenine, can pair with adenine (A), cytosine (C), and uracil (U) and it is sometimes found in the anti-codon of tRNAs.

The first base in the anticodon pairs with the 3rd base in the mRNA codon, and this is sometimes known as the 'wobble position', which allows for several different bases in this position on the mRNA codon to bind with the same unusual base - probably I - on the tRNA on the tRNA anticodon (usually position 1). This can be seen in the third bases of the codons in the genetic code, which may be described as N (any base), Y (pyrimidines A or U) or R (purines G and C). See the unit on DNA, genes and chromosomes

In fact this is the 34th base in the tRNA molecule, and inosine has been found in tRNAs coding for the amino acids threonine, alanine, proline, serine, leucine, isoleucine, valine and arginine.

It has been found that mitochondrial tRNAs (which are made independently of other cell organelles) include 'remo(u)lded' leucine ('LUUR') tRNA genes making tRNA responding to UUC and UUG as well as 'LCUN' leucine tRNAs responding to CUA, CUU, CUC and CUG.

Connecting each tRNA with its amino acid

There are also 20 (or more?) versions of aminoacyl-tRNA synthetase, also called tRNA-ligase, and these enzymes attach the appropriate amino acid onto its tRNA, using an ester linkage from the last base in the 3' end, away from the 3 loops.
The synthetase first binds ATP and the corresponding amino acid to form an aminoacyl-adenylate, aaAMP, releasing inorganic pyrophosphate (PPi).
The adenylate-aminoacyl-tRNA synthetase complex then binds to the appropriate tRNA molecule, and the amino acid is transferred from the aa-AMP to the last tRNA nucleotide at the 3'-end, and AMP is released.
In fact there are sometimes unusual substituted amino acids in the loops to the sides of the tRNA molecule, and it is thought that these are important in the identification of different tRNAs by or in the active sites of specific aminoacyl-tRNA synthetases.
You should be able to name the numbered amino acids that make up the polypeptide chain.
You will need to refer to the genetic code, which I have put in the other column
The first one is easiest:
> AUG (start) gives methionine (met) - as shown in red at the top
[although the -NH2 and -C=O are part of it]

The next five are:
> ACA (2) → threonine (thr) - GGG (3) → glycine (gly) - AAA (4) → lysine (lys) - UCG (5) → cysteine (cys) - GUC (6) → valine (val)
The last ones - not numbered across the bottom and up the side are:
> CGC → arginine (arg) - CUU → leucine (leu) - AAC → asparagine (asn) - CGC → arginine (arg) - UGU → cysteine (cys) - UAG (STOP)

When it reaches the stop codon, the ribosome detaches from the mRNA strand, but other ribosomes can join on to produce more copies of the protein. In fact groups of ribosomes can move along the same mRNA strand - making assemblies called polysomes.

The resulting polypeptide chain then usually folds itself to give the secondary and tertiary levels of structure that define the shape of the resulting protein.

Messenger RNA is eventually broken down in the cytoplasm and the nucleotides are re-used (in transcription and translation).

A gene consists of 480 bases. How many amino acids in the resulting polypeptide?
Several possible answers, depending on assumptions you make.
Answer         Assumption
> 160      > = 480/3 codons - 1 aa per codon - assuming all bases count
> 159     > as above but ignoring stop codon
> 159     > as above but ignoring start codon - initial methionine removed
> 158     > as above but ignoring both start & stop codon

The genetic codes for each amino acid

RNA triplet
amino acid 3lc* RNA triplet
amino acid 3lc*
AAA AAG lysine lys GAA GAG glutamic acid glu
AAC AAU asparagine asn GAC GAU aspartic acid asp
threonine thr GCA GCC
alanine ala
arginine arg GGA GGC
glycine gly
serine ser GUA GUC
valine val
methionine met UAA UAG
AUC AUU isoleucine ile UAC UAU tyrosine tyr
CAA CAG glutamine gln
CAC CAU histidine his UGC UGU cysteine cys
proline pro UGG tryptophan try
leucine leu UUC UUU phenylalanine phe

RNA world hypothesis

The activity of rRNA, tRNA, and spliceosomes all point to RNA as having possibly more of a physical role in the cell - like enzymes - rather than purely carrying genetic information.
In fact there is a concept that in early life RNA had a higher profile in cells, which was eventually taken over by DNA and proteins, relegating RNA to a secondary role.
And this is supported by the fact that the most important coenzymes and cofactors in respiration - Coenzyme A, ATP, NAD, FAD - are all built up from ribonucleotides, specifically adenosine.

Other related topics on this site

(also accessible from the drop-down menu above)

This series (genetic information)
DNA, genes and chromosomes

Genetic diversity
Genetic diversity and adaptation
Investigating diversity

Similar level
Structure of Nucleic Acids (DNA and RNA)
DNA replication
Prokaryotic cells
Eukaryotic cells
Endosymbiont theory
Virus particles

Web references

15.2 Prokaryotic Transcription - Part of an impressive OpenStax online textbook from Rice University
Transcription - from NIH National Human Genome Research Institute
Changing identities: tRNA duplication and remolding within animal mitochondrial genomes
Codon adaptation to tRNAs with Inosine modification at position 34 is widespread among Eukaryotes and present in two Bacterial phyla

www.BioTopics.co.uk    Home     Contents     Contact via form     Contact via email     Howlers     Books     WWWlinks     Terms of use     Privacy