Using genome projects

Genome sequencing projects

A number of organisms have had the sequence of the (DNA or RNA) bases or nucleotides in their genome determined.

The Human Genome Project has been extremely useful to many areas within Biology, but the amount of data collected continues to be require further interpretation. In particular, the role of sections of DNA that do not specifically code for proteins - 'non-coding DNA' - and its relationship to actual coding DNA is under scrutiny.

Simpler organisms have smaller genomes which generally lack regulatory genes and non-coding DNA, and it is thought that this permits a better understanding of the relationship between the genome and its products - the proteome.

The Human Genome Project was completed in 2003, but other species were sequenced before this. Other species continue to be sequenced . . .

You may be interested to see what species have been sequenced, and why they were significant.

Here are some suggestions:

Drosophila melanogaster - the fruit fly

Caenorhabditis elegans - a nematode worm

Brachydanio rerio - the zebrafish

bacteriophage φX174.

And of course the most recent application of this process has centred around Coronavirus COVID 19 variants.

Vaccine production

Vaccines generally rely on interactions between the immune system and antigens on the surface of potentially infecting organisms, including viruses. Historically, this has involved the development of 'attenuated strains', often grown in sub-optimal conditions, or organisms killed with chemicals like formaldehyde. In the past, there have been a few mishaps with these processes. The development of vaccines is quite time-consuming, and it requires a number of testing and checking stages.

It has proved more effective to base vaccines on antigens on the surface of pathogens, rather than whole cells or virions. Such subunit vaccines include proteins, polysaccharides or glycoproteins.

Most recently vaccines have been developed more efficiently using proteins obtained by reading the nuclear material of viruses, notably the coronavirus COVID 19. Much attention has been focussed on the spike protein on the outside of the virus particles (and changes causing the emergence of different variants).

Most of the vaccines involve the spike protein itself, and sometimes the section of RNA that can be persuaded to undergo translation, so that the foreign spike protein is expressed in body cells, and then these set up an antigen-antibody reaction causing immunity to develop.

Other related topics on this site include 'Base sequence and amino acids in Covid 19' and 'Vaccination and immunity' - (links below).

Ebola vaccines

A number of vaccines have been developed against the virus responsible for Ebola haemorrhagic fever.
The first Ebola vaccine consisted of whole virions inactivated by heat, formalin and gamma-irradiation and was largely ineffective. Glycoproteins forming trimeric spikes throughout the viral envelope and an internal nucleoprotein were seen as potential targets for the development of immune responses.
The current vaccine rVSV-ZEBOV is a live, attenuated recombinant vesicular stomatitis virus (VSV) in which the gene for the native envelope glycoprotein (P03522) is replaced with that from the Zaire strain of the Ebola virus.

Influenza vaccines

These are usually based upon three or four strains of virus, and these are sometimes grown in live (fertilised) hens' eggs or in cultivated cell lines using recombinant technology.
The WHO recommends a combination of different influenza virus strains each year for the 'northern hemisphere influenza season' and separately for the 'southern hemisphere influenza season'.
The external proteins which seem to vary from season to season are haemagglutinin H and neuraminidase N, and several distinct numbered forms of each are known.
The quadrivalent vaccines for use in 2020 - 2021 include
an A/Guangdong-Maonan/SWL1536/2019 (H1N1)pdm09-like virus;
an A/Hong Kong/2671/2019 (H3N2)-like virus;
a B/Washington/02/2019 (B/Victoria lineage)-like virus; and
a B/Phuket/3073/2013 (B/Yamagata lineage)-like virus.

DNA Sequencing methods

Methods of determining base or nucleotide sequence utilise the specificity of complementary base pairing, as well as the strength of bonds in the backbone of DNA.

Originally it was a batch process with individual tubes set to identify the position of each nucleotide, but it has been improved in a number of ways and become automated.

Putting things in order

Frederick Sanger sanger-13123-portrait-mini-2x (58K)

from the Nobel Prize website

Frederick Sanger (1918-2013) was a British biochemist who twice won the Nobel Prize in Chemistry, once for establishing the sequence of amino acids in insulin (1958), and the other for developing the process for reading the base sequence in DNA (1980).

He received a number of honours (OM CH CBE FRS FAA) but turned down the offer of a knighthood.

The Wellcome Trust Sanger Institute near Cambridge is named after him - he declared 'it had better be good'.

The Institute has educational as well as purely scientific aims.

25 Genomes for 25 Years is a project to sequence 25 novel genomes representing UK biodiversity, as part of the Wellcome Sanger Institute's wider 25th Anniversary celebrations. Some nice videos here

The Sanger or chain termination procedure

There are some points of amplification, and questions to test your comprehension in the section on the opposite side, (or beneath)

This technique requires four reactions to be carried out at the same time, in separate tubes.

All 4 tubes contain

- a large quantity of the sample DNA, in the single strand form

- a large quantity of the four nucleotides containing thymine, cytosine, guanine and adenine

- DNA polymerase enzyme

- radioactive primers

- To each tube is also added a lesser amount of a different (single) modified nucleotide - which lacks a -OH group on the deoxyribose section. This may also be called a 'terminator base', although it is obviously bonded to (modified) deoxyribose and phosphate(s).

The primers act as starter points for the attachment of DNA polymerase, which adds nucleotides to the second strand (rebuilding the double helix).

However once a modified nucleotide is added, the DNA polymerase cannot make a phosphodiester bond in the outside edge of the developing DNA strand, so the chain is terminated.
It has been suggested that the modified nucleotide does not "fit" the active site of DNA polymerase, but the condensation reaction forming the phosphodiester bond needs an -OH group.

The two DNA strands are separated again. There will be a range of strand lengths. Half will be complete (original strands), and half will be of variable length - probably incomplete due to chain termination. On the end of these will be the modified nucleotide added to that tube.

The contents of each tube are then loaded into wells in a film of polyacrylamide urea gel which does not allow the DNA strands to anneal. This gel layer is then subjected to electrophoresis to separate the components according to their molecular size. It is then overlaid with photographic film and this is 'developed' after some time to give an autoradiograph.

Autoradiograph sangerautoradiograph (10K)

Letters indicate the modified nucleotide in each tube

The sequence of bases in the DNA can be read from the autoradiograph. Smallest sections of DNA travel furthest, so the lowest bars show the first bases in the sequence.

This sequence is CTTCAGAGTC
so the original strand was
> GAAGTCTCAG

What causes the DNA fragments to move in the electrophoresis process?
> Negatively charged phosphate groups (on the outside of the helix) are pulled towards the anode (+ve electrode)

Single nucleotide substitution revealed by sequencing

(Section of)
Normal allele on the left, mutant allele on the right
From Wikimedia Commons, the free media repository

I noticed this pair of autoradiographs when searching for another topic.
It ties up with the topic of Base sequence alteration on this site (you can check the genetic codes there, too).

Factor V Leiden is the most common hereditary blood clotting disorder amongst ethnic Europeans.

Factor V is a cofactor allowing factor Xa to activate prothrombin, resulting in the enzyme thrombin which in turn cleaves fibrinogen to form fibrin, forming fibres that result in a clot. This clotting process is then controlled by (activated) protein C - a natural anticoagulant that acts to limit the extent of clotting by cleaving and degrading Factor V.

The mutant ('Leiden') form of the gene reponsible for this protein is caused by a 1691G → A substitution (shown by arrows). In fact reading up from one below the arrow to one above it you can see the triplet of bases is CGA on the left, whereas on the right the triplet is CAA.

This results in a change in the amino acid sequence of the protein coded for by the gene - changing a single amino acid arginine R to a glutamine Q. This change in structure prevents the cleaving action of protein C so that any events causing blood clotting can have more serious consequences than usual.

More details, and a few questions about the chain termination procedure

The sample DNA is amplified using the PCR process, and 'denatured' - so it is in a single-stranded form.
How is the sample DNA converted into a single-stranded form?
> heated - temperatures over 90 °C prevent hydrogen bonds forming

The nucleotides are actually triphosphates dATP, dGTP, dCTP, and dTTP.
What is the significance of the nucleotides being in the form of triphosphates?
> This gives energy (think of ATP) for the formation of phosphodiester bonds which hold the backbone together

Dideoxyribose has no -OH groups at positions 2 and 3

The modified nucleotides are in fact dideoxynucleotides (ddNTPs - ddATP, ddGTP, ddCTP, or ddTTP). They do not have an -OH group (just 2 -H) in position 3 of the deoxyribose, so it is not possible to form a phosphodiester bond, and the DNA backbone of the developing second DNA strand will stop here.
Why is only a small quantity of dideoxynucleotide needed?
> It is only needed once in each tube, whereas the other ordinary nucleotides are needed to fill in at positions leading up to that

Primers migrate into position alongside the 3' end of the single DNA strands, and bind ('anneal') to it.
What causes this binding process?
> hydrogen bonding, using complementary base pairing - NOT phosphodiester bond

However once a dideoxynucleotide has paired up with its complementary base on the DNA template, the DNA polymerase cannot make a phosphodiester bond between the carbon at position 3' and the carbon at position 5' on the next nucleotide, so the (second) chain terminates at this point.

In the diagram alongside on the left three nucleotides have successfully formed a chain alongside the template strand, but the next nucleotide (with the base cytosine) cannot join on because the nucleotide above it (with the base thymine) has no -OH group at 3' to form the phosphodiester bond (just -H).

Polyacrylamide urea gel does not allow the DNA strands to anneal, and prevents single-stranded DNA forming into a circular form.
Suggest how this binding process is prevented.
> Urea prevents hydrogen bonds forming between bases.

What causes the bars seen on the autoradiograph?
> radioactivity (in the primers)

Elements used in radioisotope labelling include sulphur (Sulphur-35), phosphorus (Phosphorus-32) and iodine (Iodine-125).
Which of these would be used in this case?
> phosphorus - which is in the 'backbone' of the DNA

Alternatively, DNA fragments may be visualised using a variety of dyes.
Ethidium bromide migrates into the gaps between bases and it can be seen when illuminated by ultraviolet light. It is considered to be a potential carcinogen so it is not used in educational contexts. Other coloured chemicals may be used, however.

Automated sequencing

A number of improved techniques involve nucleotides labelled with different coloured dyes, which anneal to a section of the sample DNA strand, hundreds of bases long. This does not require 4 separate reaction vessels, and can result in increases in the speed of sequencing.

The DNA may be drawn by electrophoresis though a fine capillary and as it emerges it passes a laser which causes the dyes to emit light of different colours/wavelengths which are detected by a probe, sending data as a series of peaks corresponding to individual bases/nucleotides.

Computers can easily identify overlapping sections of code on different fragments of DNA and combine these into an integrated sequence. They can also scan sections of base code data to identify genes (ORFs).

4 nucleotides labelled with different fluorophores F1.large (153K)

The ring-shaped sections fluoresce with different colours when illuminated by laser light.

Sequencing by synthesis (SBS) on a chip

Each nucleotide analogue consists of a different fluorophore attached to the 5 position of the pyrimidines and the 7 position of the purines through a photocleavable 2-nitrobenzyl linker.

All four fluorophores can be detected and then efficiently cleaved using near-UV irradiation, allowing continuous identification of the DNA template sequence.

Site author Richard Steane	The BioTopics website gives access to interactive resource material, developed to support the learning and teaching of Biology at a variety of levels.
Introductory Topics FULL CONTENTS Classical Topics Humans, Plants Variation, Ecology Further Topics Microbes Section Higher level Topics A/AS level Section 3D molecules [Jsmol] MOLECULES MENU