Base sequence alteration

Gene mutation

This is a spontaneous change in the bases within a gene. It can possibly produce a different protein product. There are several types of base mutation. A point mutation involves only one nucleotide/base, and it is sometimes called a single nucleotide polymorphism. However some changes involve more than one base, and cause major changes to the DNA molecule, which may even be seen as alterations to the structure of the chromosome.

This process results in the production of alternative forms of a gene - alleles.

Addition

is the insertion of one or more bases into the DNA.

Deletion

is the removal of one or more bases - the opposite of addition.

Addition and deletion may occur during DNA replication, perhaps by misalignment of template and primer strands as DNA polymerase moves along the separated strands of DNA during the copying process, or larger chunks of DNA may be added or lost when (uneven) crossing-over occurs during prophase I of meiosis.

Substitution

Here one base (pair) in DNA is replaced by a different base pair. There are actually two types of substitution:

Transitions are interchanges of two-ring purines (A to G, or G to A) or of one-ring pyrimidines (C to T, or T to C): they therefore involve bases of similar size and shape.

Transversions are interchanges of purine for pyrimidine bases, which therefore involve the exchange of one-ring and two-ring structures (A or G to C or T, or the reverse).

This changing of bases can be seen in the section on Single nucleotide substitution revealed by sequencing within the Topic Using genome projects on this site.

Inversion

This is a more major effect. If DNA breaks at two points, the section in between may move out of alignment and then be re-joined by cellular repair mechanisms, but facing in the opposite direction. Clearly the base sequence in this region is effectively completely reversed, as well as possibly swapping the sense and antisense strands. Unlike insertion or deletion, inversion would not result in a change in the number of DNA bases.
(I have seen references to inversion being reversal of bases within a triplet but I think it is rarely limited to such a small number of bases.
And I see that in 12 out of the 64 possible triplets such a 3-base inversion would code for the same amino acid, and so the resulting protein/polypeptide would still be functional.
See the genetic code table, below. )

Differences in DNA structure can be used to study evolutionary relationships between different animal groups over time.

Mitochondrial DNA extracted from bones is often used in these studies.

Most human sequences differ from each other by an average of 8.0 substitutions, while the human and chimpanzee sequences differ by about 55.0 substitutions. Neanderthal and modern human sequences differed by approximately 27.2 substitutions. Using this mtDNA information, the last common ancestor of Neanderthals and modern humans dates to approximately 550,000 to 690,000 years ago.

Comparing human DNA with that of our closest relatives (chimpanzees Pan troglodytes and bonobo Pan paniscus) shows a difference of only about 1.2% in terms of substitutions, but over the whole genome there are other differences (deletions, duplications , or insertions) that bring another 4-5% of distinction between them and us. It is thought that our closest common ancestor lived 6-8 million years ago.

Interestingly, humans have 23 pairs of chromosomes, while all other great apes have 24 pairs. Human chromosome 2, our second largest chromosome, is the product of a fusion between two smaller chromosomes found in the great apes. It has been described as a 'head-to head' fusion.

Chromosomes have readily identifiable, repeating DNA sequences called telomeres at both ends, and human chromosome 2 has these telomere sequences not only at both ends but also in the middle. There is even an identifiable remnant of the inactive centromere in the middle. Human chromosome 2 is described as metacentric (centromere more or less in the middle, so the 2 arms are about the same length), whilst the two component chromosomes in apes are telocentric (centromere at one end, so the 2 arms differ in length).

Duplication

This involves the production of more than one copy of a section of DNA, possibly by breakage as above and reincorporation of DNA, presumably from a sister chromatid. Sometimes extra copies of genes formed in this way function alongside one another, or minor changes may make them perform completely new functions. This is often seen when comparing similar sections of DNA from different but fairly closely related species.

Of course, duplication of whole chromosomes can occur as a result of non-disjunction in meiosis, and it can result in conditions such as trisomy 21 (Down syndrome).

Polyploidy is a type of duplication, and it is sometimes found in plants. It is thought to contribute to evolution, as well as sometimes being effective in producing more productive strains of crop plants.

Translocation

involves changes in the location of bases within the DNA. It is more often known as a relocation of specific chromosome material. For example it is possible for a section of chromosome 21 to become attached to chromosome 14. As a result, there are effectively 3 copies of chromosome 21, causing in a different form of trisomy 21.

Transposons

Transposons, or transposable elements (TEs), also known as jumping genes, are DNA sequences that can change their position within a genome. As such they can create or act like a mutation, as well as adding to the amount of genetic material in cells, and causing duplication of genes.

In fact, approximately 50% of the human genome is thought to originate from transposable elements. 'Alu elements' - about 300 bases long - are the most abundant transposable elements, containing over one million copies dispersed throughout the human genome. The mobilome is the total of all the mobile genetic elements in a genome.

There are two main sorts:

Retrotransposons

These are widespread in mammals, and resemble retroviruses. An RNA transcript of the transposable element's DNA is produced, then the enzyme reverse transcriptase converts it back into a DNA version. This DNA sequence can then be inserted back into a random location of the genome.

DNA transposons

are more simply segments of DNA that can move to a new location by a 'cut-and-paste' action. A transposase enzyme makes a staggered cut at the target site, cutting out a section of DNA with 'sticky ends' which is ligated into the target site. Gaps are filled by DNA polymerase and DNA ligase closes the sugar-phosphate backbone. If this cutting out occurs during the S phase of the cell cycle it can result in gene duplication.

The randomness of placement of this extra genetic material can cause problems in the cell. It may have different effects depending on where it lands. If it lands in an intron, it can alter the expected base sequence, but it may affect a promoter ahead of this, or a regulatory segment within an exon.

Alterations to tumour suppressor genes can lead to the development of tumours, and if a transposon affects a proto-oncogene, producing an oncogene, it is likely to cause unregulated cell division, leading to cancer.

Mutagens and mutation rate

Mutations are said to be caused by mutagens, which may be a form of radiation such as x-rays, ultraviolet radiation and radioactive substances, as well as certain chemicals. There are several classes of chemical compounds that damage and change nucleotide bases, and these also act as carcinogens:

Reactive oxygen species (ROS) - superoxide, hydroxyl radicals and hydrogen peroxide
Alkylating agents such as ethylnitrosourea
Deaminating agents, e.g. nitrous acid can cause transition mutations by converting cytosine to uracil.
Base analogues can substitute for DNA bases during replication and cause transition mutations.
Intercalating agents, such as ethidium bromide may insert between bases in DNA, causing frameshift mutation during replication.

So-called spontaneous mutations may be caused by exposure to small amounts of radiation or chemicals in the natural environment. These are quite infrequent, but the chance of mutations occurring can be greatly increased as the intensity of radiation or the concentration of mutagenic chemicals is increased.

In genetics laboratories it is normal to use these at life-threatening dosages in order to maximise the mutation rate, but the effect expected is changes in DNA in their gametes. Of course the surviving organisms after exposure must be allowed to reproduce and their offspring are then monitored to search for new phenotypes which can then be tested in breeding crosses.

Not all mutations result in a change

Point mutations (mutations changing only one base, not one triplet code) can have different effects.

Substitutions sometimes result in minor changes.

Due to the degenerate nature of the genetic code (64 possible triplets but only 20 amino acids) there are often several triplets coding for the same amino acid. This is especially so for the third base of the triplet, so a change here may still result in the incorporation of the same amino acid into the polypeptide chain: no noticeable change in the resulting protein. This could be called a silent mutation.

This is quite noticeable in the table of the genetic code - shown alongside. All triplets ending in N would be unaffected by a change in the third base, and those ending in R or Y would be less affected by such a change - just a transition substitution (see above).

Alternatively a mutation may change a base so that the triplet containing it codes for a different amino acid. If this amino acid has a similar structure (similar sidechain ) it may have little effect, perhaps slightly altering the packing of the polypeptide chain - most of which is coiled into an α-helix. Or it may have a different - probably deleterious - effect if it is in an important section such as the active site of an enzyme, but often small changes result in malformations of proteins with several chains. This is seen in the mutations which cause sickle cell anaemia and cystic fibrosis. These are described as missense mutations.

Some triplets (giving mRNA codes UAA UAG UGA) are stop codons, resulting in the termination of translation of mRNA into amino acids. So changing the base sequence so that it can be read as a stop codon has a drastic effect, producing a shorter, probably ineffective amino acid chain and resulting in a non-functional protein. This is a nonsense mutation.

Here we are considering changes within the exon section of a gene; changes within the intron sections may have different effects. Mutations at or near the splicing sites may lead to incorrect splicing, perhaps causing the expression of normally hidden base sequences. Or the (generally not well known) functions of the intron may be disrupted: perhaps affecting gene regulation.

The genetic code

A, C, G and U : 4 nucleotides/bases
N = any of them, R = purines (A or G) Y = pyrimidines (C or U)

RNA triplet codes	amino acid	abbreviation	RNA triplet codes	amino acid	abbreviation
AAR	lysine	lys K	GAR	glutamic acid	glu E
AAY	asparagine	asn N	GAY	aspartic acid	asp D
ACN	threonine	thr T	GCN	alanine	ala A
AGR CGN	arginine	arg R	GGN	glycine	gly G
AGY UCN	serine	ser S	GUN	valine	val V
AUG (start)	methionine	met M	UAA UAG UGA	(stop)
AUY and AUA	isoleucine	ile I	UAY	tyrosine	tyr Y
CAR	glutamine	gln Q
CAY	histidine	his H	UGY	cysteine	cys C
CCN	proline	pro P	UGG	tryptophan	try W
CUN UUR	leucine	leu L	UUY	phenylalanine	phe F

Downstream changes

Insertions and deletions of single bases affect the translation process because mRNA is read in triplets, i.e. 3 bases at a time.

So these changes in mRNA base sequence result in a completely different sequence of amino acids in the polypeptide, from the mutation site. And there can be other consequences: the normal stop codon will not operate, so the polypeptide may be longer than normal, or triplets of other bases may be read as a stop codon so the polypeptide may be shorter than normal.

These mutations change the nature of all the base triplets downstream from the mutation, and they are said to cause a frame shift.
This is almost certain to result in no or non-functional proteins being formed.

However three (or six, or nine?) such insertions or deletions (close together?) may result in the rest of the bases being read correctly, cancelling the frameshift effect.
See the six-nucleotide deletion in Coronavirus alongside.

'New Variant' Covid 19 - Changes in the SARS-CoV-2 spike protein caused by mutations

This virus came to our attention towards the end of 2019 - hence the name Covid 19 - but the virus's RNA together with the proteins coded for were published in early 2020, and this was quickly used in the development of mRNA vaccines.

As time passes, a number of changes in the amino acid sequence in the virus protein have been recorded in viruses in circulation in the population, especially the spikes projecting from the surface.

And then there are some other genetic differences reported in the virus from mink, as reported from Denmark and the Netherlands, as well as developments in a number of other countries, including Mexico and South Africa.

These have been described as mutants, new variants and simply new strains. There is concern about their infectivity and the possibility that they may be less susceptible to control by vaccines.

Attention has been focussed on antibodies to the spike protein as well as the action of T-cells. Groups of mutations have been used to describe and monitor the movement of the virus through populations, and geographical areas.

There is a convention that substitutions show the changed amino acid (as a single letter code) followed by the location in the sequence and the replacement amino acid.

Sometimes it is accompanied by a similar code showing changes in the DNA bases (A, C, G, T), and their location within the genome, which is generally a larger number.

"Cluster 5" is a mutated variant of the SARS-CoV-2 virus, discovered in Northern Jutland, Denmark. It is believed to have been spread from minks to humans via mink farms.
These include Y453F, I692V, M1229I as well as two amino acid deletions H69del/V70del.

The first three are substitutions, and the deletion of 6 bases from 21765-21770 caused the loss of two amino acids from the polypeptide chain.

Cluster 5 Substitutions (and base code changes
- using the flexible base notation for triplet degeneracy)

Y453F is a substitution of phenylalanine (F) for tyrosine (Y) at position 453
[base code switched from UAY to UUY]
I692V is a substitution of valine (V) for isoleucine (I) at position 692
[base code switched from AUY to GUY]
M1229I is a substitution of isoleucine (I) for methionine (M) at position 1229
[base code switched from AUR to AUY - a transversion]

Spike deletion 21765-21770

(Partial) Base and amino acid sequence H69V70 (45K)

A comparison of sections of (DNA) base sequences between reference sequence Wuhan-Hu-1 (at the top, in blue) and deletion 21765-21770 (in green).

The loss of 6 bases gives a triplet of ATC (bases 21764-21771-21772 in 'old numbers') still coding for an isoleucine ('same amino acid'), so the chain is simply shortened by 2 amino acids.

What are the two lost amino acids?

> H histidine
> V valine

I have taken the liberty of annotating base sequence data and amino acid listings to show the effects of that deletion. - Just look for lines I have inserted between the original data!

In fact this deletion is responsible for an effect known as S-gene dropout or S-gene target failure. When testing for possible coronavirus infection using PCR, it is conventional to use kits targeting a limited number of virus genes. If some of these are detected but the S-gene is not, this is strongly indicative of a variant of concern, such as the omicron variant, and full viral genome sequencing is called for.

Other mutants - rather a saga

The D614G mutation is an allele causing a modification of the virus' surface spike protein, which has become increasingly common. This notation means that the 614th amino acid in its polypeptide chain is altered from being aspartate - aspartic acid - (one-letter amino acid code D) to glycine (G). This is likely to be the result of a change (A to G) in the middle base of a codon in the viral RNA. It is said that this mutation appears to have greater transmissibility in humans rather than greater pathogenicity.

The BBC stated that a strain A222V spread across Europe and was linked to summer holidays in Spain.
What is the change in amino acid, and the base sequence? (Use the table above)
Amino acid 222 changed from > alanine to > valine, triplet changed from > GCN to > GUN

More recent versions, and a cause of concern because of apparently greater (70% ?) transmissibility, includes a number of mutations known from other parts of the world.

N501Y (substitution of tyrosine for asparagine at 501 - triplet change AAY to UAY) codes for changes in one of six key contact residues within the receptor-binding domain (RBD) of the spike protein.
It has been identified as increasing binding affinity to the target protein ACE2 in lungs.
The spike deletion 69-70del - See Spike deletion 21765-21770 above - may make the virus more resistant to the human immune response. It has also occurred a number of times in association with other RBD changes.
Mutation P681H (substitution of histidine for proline at 681 - triplet change CCN to CAY) is immediately adjacent to the furin cleavage site,, which enables the virus to easily enter into the host cell for infection, thus efficiently aiding its spread throughout the human population. Another version of this, P681R also involves the substitution at position 681, a proline-to-arginine substitution.
Mutation E484K (substitution of lysine for glutamic acid at 484 - triplet change AAR to GAR) is feared because it may defeat some antibodies (formed after vaccination or infection by other strains). This has been followed by E484Q, giving the amino acid glutamine - triplet CAR), which may be of even more concern.
L452R (substitution of arginine for leucine at 452 - triplet change CUN to CGN) may also increase the effectiveness of some emerging variants.

Virus strains with different combinations of mutations have been given descriptions such as 'Variant of interest', 'Variant under investigation', or 'Variant of concern'. These are changing all the time.

Variant of Concern 202012/01 (VOC-202012/01) has 23 mutations: 14 changes to protein-coding codons, 3 deletions, and 6 'synonymous' mutations that code for the same amino acids due to degeneracy of the genetic code, i.e. there are 17 mutations that change proteins and six that do not.

Some countries have complained about names given to virus strains being used in a critical way. The WHO named the most recent cause for concern ('the Indian strain') as the Delta variant on 31 May 2021, which has effectively replaced a previous version ('the Kent variant') - now known as the Alpha variant.
The Delta Plus variant which also has the K417N substitution (asparagine instead of lysine in the spike protein) - although this has also been found in other variants.
A further development (19/10/21) of this, AY.4.2 , has two mutations : A222V and Y145H are changes to spike proteins which may give it 10% more transmissibility. These are alanine to valine (triplet change GCN to GUN) and cysteine to histidine (triplet change UAY to CAY) substitutions.

This has been followed by the Omicron variant - B.1.1.529 (which was first identified in South Africa). . .

Issues of concern include factors such as increased transmissibility and disease severity of the infections and potential 'immune escape' (not covered by vaccinations), diagnostic or therapeutic escape (testing and caring problems); Additionally community transmission or multiple COVID-19 clusters are expected.

More recently (March 2023) other variants have been noted:
BQ.1 and BQ.1.1 contain mutations to the spike protein on the surface of the virus allowing it to attach to and infect cells.
These mutations include changes to amino acids which are fairly close to each other:
K444T - substitution of lysine with threonine - a change to the middle base of the triplet,
N460K - substitution of asparagine with lysine - a change from AAY to AAR (transversion substitution) to the third base,
L452R - leucine to arginine - another middle base change,
F486V - phenylalanine replaced by valine - a first base change
and R346T - arginine to threonine - yet another middle base change
These changes have also been associated with significant immune escape and antibody evasion, meaning that they are less under control by antibodies from previous infections and vaccination.

In fact there are many lists of different 'mutations of interest' - including one (Omicron XBB.1.5) with several changes to amino acids including two to subsequent amino acids:
L455F and F456L, which surprisingly represent opposite changes (Leucine to phenylalanine and phenylalanine to leucine)

Alteration of the sequence of bases in DNA can alter the structure of proteins

Gene mutation

Addition

Deletion

Substitution

Inversion

Duplication

Translocation

Transposons

Retrotransposons

DNA transposons

Mutagens and mutation rate

Not all mutations result in a change

The genetic code

Downstream changes

'New Variant' Covid 19 - Changes in the SARS-CoV-2 spike protein caused by mutations

Cluster 5 Substitutions (and base code changes
- using the flexible base notation for triplet degeneracy)

Spike deletion 21765-21770

Other mutants - rather a saga

Other related topics on this site

Interactive 3-D molecular graphic models on this site

Web references

Site author Richard Steane	The BioTopics website gives access to interactive resource material, developed to support the learning and teaching of Biology at a variety of levels.
Introductory Topics FULL CONTENTS Classical Topics Humans, Plants Variation, Ecology Further Topics Microbes Section Higher level Topics A/AS level Section 3D molecules [Jsmol] MOLECULES MENU

Alteration of the sequence of bases in DNA can alter the structure of proteins

Gene mutation

Addition

Deletion

Substitution

Inversion

Duplication

Translocation

Transposons

Retrotransposons

DNA transposons

Mutagens and mutation rate

Not all mutations result in a change

The genetic code

Downstream changes

'New Variant' Covid 19 - Changes in the SARS-CoV-2 spike protein caused by mutations

Cluster 5 Substitutions (and base code changes - using the flexible base notation for triplet degeneracy)

Spike deletion 21765-21770

Other mutants - rather a saga

Other related topics on this site

Interactive 3-D molecular graphic models on this site

Web references

Cluster 5 Substitutions (and base code changes
- using the flexible base notation for triplet degeneracy)