Site author Richard Steane
The BioTopics website gives access to interactive resource material, developed to support the learning and teaching of Biology at a variety of levels.

Alteration of the sequence of bases in DNA can alter the structure of proteins

Sections of DNA which function as genes are transcribed into 'pre-mRNA', then edited and spliced to form mature messenger RNA (mRNA). This then moves to the ribosomes, where the genetic code from the nucleic acid is translated into amino acids which make the polypeptide chain or chains which fold to form protein. The bases in DNA and RNA have a sequence which dictates the sequence of amino acids, so any alteration of the sequence of bases in DNA can potentially alter the structure of proteins for which they code.
Mutations and meiosis were dealt with in a previous topic on Genetic diversity (links below).

This topic extends the number of types of mutation.

And at the end there is a section on base sequence and amino acids in Covid 19, which is a good match to the title of this topic.

Gene mutation

This is a spontaneous change in the bases within a gene. It can possibly produce a different protein product. There are several types of base mutation. A point mutation involves only one nucleotide/base, and it is sometimes called a single nucleotide polymorphism. However some changes involve more than one base, and cause major changes to the DNA molecule, which may even be seen as alterations to the structure of the chromosome.

This process results in the production of alternative forms of a gene - alleles.


is the insertion of one or more bases into the DNA.


is the removal of one or more bases - the opposite of addition.

Addition and deletion may occur during DNA replication, perhaps by misalignment of template and primer strands as DNA polymerase moves along the separated strands of DNA during the copying process, or larger chunks of DNA may be added or lost when (uneven) crossing-over occurs during prophase I of meiosis.


Here one base (pair) in DNA is replaced by a different base pair. There are actually two types of substitution:

Transitions are interchanges of two-ring purines (A to G, or G to A) or of one-ring pyrimidines (C to T, or T to C): they therefore involve bases of similar size and shape.

Transversions are interchanges of purine for pyrimidine bases, which therefore involve the exchange of one-ring and two-ring structures (A or G to C or T, or the reverse).

This changing of bases can be seen in the section on Single nucleotide substitution revealed by sequencing within the Topic Using genome projects on this site.


This is a more major effect. If DNA breaks at two points, the section in between may move out of alignment and then be re-joined by cellular repair mechanisms, but facing in the opposite direction. Clearly the base sequence in this region is effectively completely reversed, as well as possibly swapping the sense and antisense strands. Unlike insertion or deletion, inversion would not result in a change in the number of DNA bases.
(I have seen references to inversion being reversal of bases within a triplet but I think it is rarely limited to such a small number of bases.
And I see that in 12 out of the 64 possible triplets such a 3-base inversion would code for the same amino acid, and so the resulting protein/polypeptide would still be functional.
See the genetic code table, below.

Differences in DNA structure can be used to study evolutionary relationships between different animal groups over time.

Mitochondrial DNA extracted from bones is often used in these studies.

Most human sequences differ from each other by an average of 8.0 substitutions, while the human and chimpanzee sequences differ by about 55.0 substitutions. Neanderthal and modern human sequences differed by approximately 27.2 substitutions. Using this mtDNA information, the last common ancestor of Neanderthals and modern humans dates to approximately 550,000 to 690,000 years ago.

Comparing human DNA with that of our closest relatives (chimpanzees Pan troglodytes and bonobo Pan paniscus) shows a difference of only about 1.2% in terms of substitutions, but over the whole genome there are other differences (deletions, duplications , or insertions) that bring another 4-5% of distinction between them and us. It is thought that our closest common ancestor lived 6-8 million years ago.

Interestingly, humans have 23 pairs of chromosomes, while all other great apes have 24 pairs. Human chromosome 2, our second largest chromosome, is the product of a fusion between two smaller chromosomes found in the great apes. It has been described as a 'head-to head' fusion.

Chromosomes have readily identifiable, repeating DNA sequences called telomeres at both ends, and human chromosome 2 has these telomere sequences not only at both ends but also in the middle. There is even an identifiable remnant of the inactive centromere in the middle. Human chromosome 2 is described as metacentric (centromere more or less in the middle, so the 2 arms are about the same length), whilst the two component chromosomes in apes are telocentric (centromere at one end, so the 2 arms differ in length).


This involves the production of more than one copy of a section of DNA, possibly by breakage as above and reincorporation of DNA, presumably from a sister chromatid. Sometimes extra copies of genes formed in this way function alongside one another, or minor changes may make them perform completely new functions. This is often seen when comparing similar sections of DNA from different but fairly closely related species.

Of course, duplication of whole chromosomes can occur as a result of non-disjunction in meiosis, and it can result in conditions such as trisomy 21 (Down syndrome).

Polyploidy is a type of duplication, and it is sometimes found in plants. It is thought to contribute to evolution, as well as sometimes being effective in producing more productive strains of crop plants.


involves changes in the location of bases within the DNA. It is more often known as a relocation of specific chromosome material. For example it is possible for a section of chromosome 21 to become attached to chromosome 14. As a result, there are effectively 3 copies of chromosome 21, causing in a different form of trisomy 21.


Transposons, or transposable elements (TEs), also known as jumping genes, are DNA sequences that can change their position within a genome. As such they can create or act like a mutation, as well as adding to the amount of genetic material in cells, and causing duplication of genes.

In fact, approximately 50% of the human genome is thought to originate from transposable elements. 'Alu elements' - about 300 bases long - are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. The mobilome is the total of all the mobile genetic elements in a genome.

There are two main sorts:


These are widespread in mammals, and resemble retroviruses. An RNA transcript of the transposable element's DNA is produced, then the enzyme reverse transcriptase converts it back into a DNA version. This DNA sequence can then be inserted back into a random location of the genome.

DNA transposons

are more simply segments of DNA that can move to a new location by a 'cut-and-paste' action. A transposase enzyme makes a staggered cut at the target site, cutting out a section of DNA with 'sticky ends' which is ligated into the target site. Gaps are filled by DNA polymerase and DNA ligase closes the sugar-phosphate backbone. If this cutting out occurs during the S phase of the cell cycle it can result in gene duplication.

The randomness of placement of this extra genetic material can cause problems in the cell. It may have different effects depending on where it lands. If it lands in an intron, it can alter the expected base sequence, but it may affect a promoter ahead of this, or a regulatory segment within an exon.

Alterations to tumour suppressor genes can lead to the development of tumours, and if a transposon affects a proto-oncogene, producing an oncogene, it is likely to cause unregulated cell division, leading to cancer.

Mutagens and mutation rate

Mutations are said to be caused by mutagens, which may be a form of radiation such as x-rays, ultraviolet radiation and radioactive substances, as well as certain chemicals. There are several classes of chemical compounds that damage and change nucleotide bases, and these also act as carcinogens: So-called spontaneous mutations may be caused by exposure to small amounts of radiation or chemicals in the natural environment. These are quite infrequent, but the chance of mutations occurring can be greatly increased as the intensity of radiation or the concentration of mutagenic chemicals is increased.

In genetics laboratories it is normal to use these at life-threatening dosages in order to maximise the mutation rate, but the effect expected is changes in DNA in their gametes. Of course the surviving organisms after exposure must be allowed to reproduce and their offspring are then monitored to search for new phenotypes which can then be tested in breeding crosses.

Not all mutations result in a change

Point mutations (mutations changing only one base, not one triplet code) can have different effects.

Substitutions sometimes result in minor changes.

Due to the degenerate nature of the genetic code (64 possible triplets but only 20 amino acids) there are often several triplets coding for the same amino acid. This is especially so for the third base of the triplet, so a change here may still result in the incorporation of the same amino acid into the polypeptide chain: no noticeable change in the resulting protein. This could be called a silent mutation.

This is quite noticeable in the table of the genetic code - shown alongside. All triplets ending in N would be unaffected by a change in the third base, and those ending in R or Y would be less affected by such a change - just a transition substitution (see above).

Alternatively a mutation may change a base so that the triplet containing it codes for a different amino acid. If this amino acid has a similar structure (similar sidechain ) it may have little effect, perhaps slightly altering the packing of the polypeptide chain - most of which is coiled into an α-helix. Or it may have a different - probably deleterious - effect if it is in an important section such as the active site of an enzyme, but often small changes result in malformations of proteins with several chains. This is seen in the mutations which cause sickle cell anaemia and cystic fibrosis. These are described as missense mutations.

Some triplets (giving mRNA codes UAA UAG UGA) are stop codons, resulting in the termination of translation of mRNA into amino acids. So changing the base sequence so that it can be read as a stop codon has a drastic effect, producing a shorter, probably ineffective amino acid chain and resulting in a non-functional protein. This is a nonsense mutation.

Here we are considering changes within the exon section of a gene; changes within the intron sections may have different effects. Mutations at or near the splicing sites may lead to incorrect splicing, perhaps causing the expression of normally hidden base sequences. Or the (generally not well known) functions of the intron may be disrupted: perhaps affecting gene regulation.

The genetic code

A, C, G and U : 4 nucleotides/bases
N = any of them, R = purines (A or G) Y = pyrimidines (C or U)

RNA triplet codes amino acid abbreviation RNA triplet codes amino acid abbreviation
AAR lysine lys K GAR glutamic acid glu E
AAY asparagine asn N GAY aspartic acid asp D
ACN threonine thr T GCN alanine ala A
arginine arg R GGN glycine gly G
serine ser S GUN valine val V
methionine met M UAA UAG
and AUA
isoleucine ile I UAY tyrosine tyr Y
CAR glutamine gln Q
CAY histidine his H UGY cysteine cys C
CCN proline pro P UGG tryptophan try W
leucine leu L UUY phenylalanine phe F

Downstream changes

Insertions and deletions of single bases affect the translation process because mRNA is read in triplets, i.e. 3 bases at a time.

So these changes in mRNA base sequence result in a completely different sequence of amino acids in the polypeptide, from the mutation site. And there can be other consequences: the normal stop codon will not operate, so the polypeptide may be longer than normal, or triplets of other bases may be read as a stop codon so the polypeptide may be shorter than normal.

These mutations change the nature of all the base triplets downstream from the mutation, and they are said to cause a frame shift.
This is almost certain to result in no or non-functional proteins being formed.

However three (or six, or nine?) such insertions or deletions (close together?) may result in the rest of the bases being read correctly, cancelling the frameshift effect.
See the six-nucleotide deletion in Coronavirus alongside.

'New Variant' Covid 19 - Changes in the SARS-CoV-2 spike protein caused by mutations

This virus came to our attention towards the end of 2019 - hence the name Covid 19 - but the virus's RNA together with the proteins coded for were published in early 2020, and this was quickly used in the development of mRNA vaccines.

As time passes, a number of changes in the amino acid sequence in the virus protein have been recorded in viruses in circulation in the population, especially the spikes projecting from the surface.

And then there are some other genetic differences reported in the virus from mink, as reported from Denmark and the Netherlands, as well as developments in a number of other countries, including Mexico and South Africa.

These have been described as mutants, new variants and simply new strains. There is concern about their infectivity and the possibility that they may be less susceptible to control by vaccines.

Attention has been focussed on antibodies to the spike protein as well as the action of T-cells. Groups of mutations have been used to describe and monitor the movement of the virus through populations, and geographical areas.

There is a convention that substitutions show the changed amino acid (as a single letter code) followed by the location in the sequence and the replacement amino acid.

Sometimes it is accompanied by a similar code showing changes in the DNA bases (A, C, G, T), and their location within the genome, which is generally a larger number.
"Cluster 5" is a mutated variant of the SARS-CoV-2 virus, discovered in Northern Jutland, Denmark. It is believed to have been spread from minks to humans via mink farms.
These include Y453F, I692V, M1229I as well as two amino acid deletions H69del/V70del.

The first three are substitutions, and the deletion of 6 bases from 21765-21770 caused the loss of two amino acids from the polypeptide chain.

Cluster 5 Substitutions (and base code changes
- using the flexible base notation for triplet degeneracy)

Y453F is a substitution of phenylalanine (F) for tyrosine (Y) at position 453
[base code switched from UAY to UUY]
I692V is a substitution of valine (V) for isoleucine (I) at position 692
[base code switched from AUY to GUY]
M1229I is a substitution of isoleucine (I) for methionine (M) at position 1229
[base code switched from AUR to AUY - a transversion]

Spike deletion 21765-21770

(Partial) Base and amino acid sequence H69V70 (45K) A comparison of sections of (DNA) base sequences between reference sequence Wuhan-Hu-1 (at the top, in blue) and deletion 21765-21770 (in green).
The loss of 6 bases gives a triplet of ATC (bases 21764-21771-21772 in 'old numbers') still coding for an isoleucine ('same amino acid'), so the chain is simply shortened by 2 amino acids.

What are the two lost amino acids?

> H histidine
> V valine

I have taken the liberty of annotating base sequence data and amino acid listings to show the effects of that deletion. - Just look for lines I have inserted between the original data!

In fact this deletion is responsible for an effect known as S-gene dropout or S-gene target failure. When testing for possible coronavirus infection using PCR, it is conventional to use kits targeting a limited number of virus genes. If some of these are detected but the S-gene is not, this is strongly indicative of a variant of concern, such as the omicron variant, and full viral genome sequencing is called for.

Other mutants - rather a saga

The D614G mutation is an allele causing a modification of the virus' surface spike protein, which has become increasingly common. This notation means that the 614th amino acid in its polypeptide chain is altered from being aspartate - aspartic acid - (one-letter amino acid code D) to glycine (G). This is likely to be the result of a change (A to G) in the middle base of a codon in the viral RNA. It is said that this mutation appears to have greater transmissibility in humans rather than greater pathogenicity.

The BBC stated that a strain A222V spread across Europe and was linked to summer holidays in Spain.
What is the change in amino acid, and the base sequence? (Use the table above)
Amino acid 222 changed from > alanine to > valine, triplet changed from > GCN to > GUN

More recent versions, and a cause of concern because of apparently greater (70% ?) transmissibility, includes a number of mutations known from other parts of the world. Virus strains with different combinations of mutations have been given descriptions such as 'Variant of interest', 'Variant under investigation', or 'Variant of concern'. These are changing all the time.

Variant of Concern 202012/01 (VOC-202012/01) has 23 mutations: 14 changes to protein-coding codons, 3 deletions, and 6 'synonymous' mutations that code for the same amino acids due to degeneracy of the genetic code, i.e. there are 17 mutations that change proteins and six that do not.

Some countries have complained about names given to virus strains being used in a critical way. The WHO named the most recent cause for concern ('the Indian strain') as the Delta variant on 31 May 2021, which has effectively replaced a previous version ('the Kent variant') - now known as the Alpha variant.
The Delta Plus variant which also has the K417N substitution (asparagine instead of lysine in the spike protein) - although this has also been found in other variants.
A further development (19/10/21) of this, AY.4.2 , has two mutations : A222V and Y145H are changes to spike proteins which may give it 10% more transmissibility. These are alanine to valine (triplet change GCN to GUN) and cysteine to histidine (triplet change UAY to CAY) substitutions.

This has been followed by the Omicron variant - B.1.1.529 (which was first identified in South Africa). . .

Issues of concern include factors such as increased transmissibility and disease severity of the infections and potential 'immune escape' (not covered by vaccinations), diagnostic or therapeutic escape (testing and caring problems); Additionally community transmission or multiple COVID-19 clusters are expected.

More recently (March 2023) other variants have been noted:
BQ.1 and BQ.1.1 contain mutations to the spike protein on the surface of the virus allowing it to attach to and infect cells.
These mutations include changes to amino acids which are fairly close to each other:
K444T - substitution of lysine with threonine - a change to the middle base of the triplet,
N460K - substitution of asparagine with lysine - a change from AAY to AAR (transversion substitution) to the third base,
L452R - leucine to arginine - another middle base change,
F486V - phenylalanine replaced by valine - a first base change
and R346T - arginine to threonine - yet another middle base change
These changes have also been associated with significant immune escape and antibody evasion, meaning that they are less under control by antibodies from previous infections and vaccination.

In fact there are many lists of different 'mutations of interest' - including one (Omicron XBB.1.5) with several changes to amino acids including two to subsequent amino acids:
L455F and F456L, which surprisingly represent opposite changes (Leucine to phenylalanine and phenylalanine to leucine)

Other related topics on this site

(also accessible from the drop-down menu above)

This series (The control of gene expression)
Base sequence alteration
Cell potency
Regulation of transcription and translation
Gene expression and cancer

Genetic diversity - mutations and meiosis (AS)

DNA and protein synthesis

DNA replication - action of DNA polymerase

Interactive 3-D molecular graphic models on this site

(also accessible from the drop-down menu above)

DNA bases - rotatable in 3 dimensions

DNA - rotatable in 3 dimensions

Web references

Neanderthal Mitochondrial and Nuclear DNA - from the Smithsonian Institute

Screening of the H69 and V70 deletions in the SARS-CoV-2 spike protein with a RT-PCR diagnosis assay reveals low prevalence in Lyon, France


SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo

Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK defined by a novel set of spike mutations

Novel 2019 coronavirus genome SARS-CoV-2 coronavirus Listing of genetic code (as DNA) and amino acid sequence for an early isolate

Variant of Concern 202012/01 - From Wikipedia, the free encyclopedia

SARS-CoV-2 variants of concern and variants under investigation in England: Omicron VOC-21NOV-01 (B.1.1.529) update on cases, S gene target failure and risk assessment

  • SARS-CoV-2 variants of concern as of 20 October 2023

  • www.BioTopics.co.uk    Home     Contents     Contact via form     Contact via email     Howlers     Books     WWWlinks     Terms of use     Privacy