Proteins

Proteins can be structural or functional.
Proteins form parts of all cells. They also perform important biochemical functions, working as enzymes, hormones and other chemical messengers, and components of blood.
Like other biological chemicals in this section, proteins are polymers, built up from simpler monomers by condensation reactions.

Amino acids

Amino acids are the monomers from which proteins are constructed. Twenty types are found in most organisms.

Amino acids have a basic structure.

amino (1K)

-NH₂ (also written H₂N-) is an amino group, and -COOH is a carboxylic acid (also known as a carboxyl group) - the same as in fatty acids.

Most of the variation between amino acids is due to different organic side groups, labelled above as -R.

Six different amino acids
6aas (4K)

Some R groups are quite large, and may contain groups which interact with others, by ionic interactions, forming hydrophobic or hydrophilic regions and even covalent bonding.

These groups are all attached to a carbon atom (the alpha-carbon α-C) which has a hydrogen atom attached.

Condensation reaction between two amino acids
triangle (1K)

Amino acids can undergo condensation reactions, losing -H from the amino group and -OH from the carboxylic acid group to form water. As a result a peptide bond (-CONH-) is formed between each amino acid residue.

Joining two amino acids in this way produces a dipeptide.

dipeptide (1K)

Similarly, joining three amino acids produces a tripeptide, four gives a tetrapeptide, and so on.
There are several examples of these oligopeptides having functions in the body as hormones, etc.

Continuing the condensation process many times results in a polypeptide.
straight(ish) chain of amino acids

A polypeptide chain can then become at least partly coiled or folded and then perhaps further modified to take on a particular shape which is stabilised by various bonds and attractions between the different R groups.
coiled chain of amino acids

As a result of this folding and bonding, a polypeptide chain becomes transformed into a protein.
Some proteins consist of more than one one polypeptide chain.

Different protein molecules have different (3-dimensional) shapes according to their functions: structural proteins are mostly linear, and functional proteins are mainly globular in shape.
However it would be wrong to think that proteins have very rigid fixed shapes. For example, enzymes mould themselves around their substrates, and haemoglobin changes shape slightly as it absorbs oxygen.

Levels of protein structure

There are said to be four levels of protein structure, and each is maintained by by different types of bonds between amino acid residues.

Level:	Maintained by
Primary: Sequence of amino acid residues in polypeptide	peptide bonds between amino acid residues
Secondary: (Fairly regular) coiling and folding of polypeptide chain e.g. into alpha helix, beta sheet	hydrogen bonds within polypeptide chain (between >C=O and HN<)
Tertiary: Further folding of helices/sheets to give distinctive 3-D shape	hydrogen bonds between R groups acid/base or hydrophobic/ hydrophilic interactions between R groups disulphide bonds between cysteine residues other covalent bonds
Quaternary (not in all proteins): Association of two or more polypeptide chains	hydrogen bonds and other usually non-covalent bonds between polypeptide chains

Plant proteins

Green plants produce proteins. In the ecological context, the term producer is used to describe the role of plants within food webs and ecosystems. It is normally stressed that plants produce sugars as a result of photosynthesis, and that the energy in the bonds of organic compounds derived from these sugars is passed from producers to consumers.

However it is not emphasised enough that only plants can combine the element nitrogen from inorganic sources (mostly nitrate NO₃^-) in the soil with carbon, hydrogen and oxygen contained in carbohydrates to produce amino-acids, the building blocks of proteins. All animal protein effectively comes from plant protein.

Plant material is often seen as having a low protein content compared with animal material. In the living world there is obviously a lot of plant protein, but it is widely spread - in effect diluted. The most abundant protein on the planet is RUBISCO - ribulose bisphosphate carboxylase, which is the key enzyme in photosynthesis.

The main sources of plant protein are (seeds of) legumes (soya and other beans, peas and other pulses) and cereals (grains). These seeds contain protein bodies, and when they germinate they produce protease enzymes which break down the insoluble protein into soluble amino acids which can be taken to the tips of roots and shoots to support growth.

There are several categories of plant proteins, some named according to the plant groups in which they are principally found.

Plant albumins include leguminins and vicilins in legumes and pulses.

Other types include globulins, and gliadins and glutelins which are protein types found mainly in cereal grains, especially wheat. Some people are allergic to these compounds, collectively known as gluten. The linear nature of these molecules can be seen when dough is made for breadmaking, and pasta.

Animal proteins

In studying the functioning of the body, it is normal to consider anatomy (details of bodily structure) in relation to physiology (how the body functions at the chemical level). In a similar way, the main uses for protein can be said to be either structural or functional.

Structural proteins

Most structural proteins have a fibrous shape or texture at the molecular level.

These actually make up parts of the body, and reinforce tissues to increase rigidity, or generate mechanical forces for movement.

Examples

keratin
This is found within the skin, forming a waterproof layer. It is also found in distinctive structures which grow out from it to cover the body: mammals have fibres known as hair, fur or wool, birds have feathers and reptiles have scales. Other structures such as claws, nails, hooves, horns, beaks are also made of keratin.

collagen and elastin
These form an intercellular matrix in several types of connective tissue

actin and myosin
These are the proteins responsible for muscle contraction, so they are a main component of meat

Functional proteins

Functional proteins usually have a globular shape at the molecular level, and details within the shape have a connection with the way they perform their function.

These perform a more active role in body processes.

Example categories

enzymes

e.g carbohydrases, proteases, lipase (digestive enzymes)
restriction endonuclease and ligase (DNA cut and paste enzymes)

(some) hormones and their receptors

e.g. insulin (and insulin receptor)

antibodies

e.g. immunoglobulins. Each type of cell has molecules on its surface that identify it, including proteins that act as antigens, enabling the immune system to identify and interact with pathogens, abnormal body cells

transport proteins

e.g. haemoglobin. Possibly serum albumin could be included here.

membrane channels

e.g. carrier proteins and channel proteins involved in facilitated diffusion,
ion channel proteins: voltage sensitive Na⁺ and K⁺ gates, stretch-mediated Na⁺ gates

pumping molecules

e.g. carrier proteins using energy from the hydrolysis of ATP in active transport
Na⁺/K⁺ pumps in nerve cell membranes,

food storage

Milk contains many proteins, the main one being casein, which is digested and absorbed by baby mammals, although it is converted into dairy products especially cheeses.
Ovalbumin (egg white) is the main protein in birds' eggs, providing nourishment for the developing embryo (and human consumers).

Viral proteins

In a viral infection, proteins are produced inside the host cell using the host's protein synthesis mechanisms (ribosomes etc) but these are coded by viral nucleic acid (DNA or RNA).

Some proteins are needed for the replication cycle of the virus and to act against defence mechanisms.

Other proteins form the outer coat - also known as the capsid - of virus particles which are produced as part of the process. These usually form geometric shapes consisting of only a few types of subunits which assemble themselves spontaneously. There may also be so called attachment proteins which latch on to molecules on the surface of host cells and enable infection to occur again.

Similarities and differences between amino acids

Most proteins are built up from combinations of about 20 ('proteinogenic') amino acids .

Most amino acids have names ending '-ine'.
Click the links below to see structural formulae.
Each amino acid has a three letter code, and a single letter code.
Click here to see things differently
Amino acids in alphabetical order:
alanine arginine asparagine aspartic acid cysteine glutamic acid glutamine glycine histidine isoleucine leucine lysine methionine phenylalanine proline serine threonine tryptophan tyrosine valine

Generalised diagram
More abstruse stuff: Zwitterion . . . CORN rule for L and D forms

Essential amino acids

Nine of the twenty amino acids found in proteins are considered essential in the human diet as they cannot be made in the cells of the body:
Leucine, lysine, valine, isoleucine, phenylalanine, threonine, methionine, histidine, tryptophan.
These must therefore come from the diet.
Other amino acids can be made by conversion from these.

Other amino acids may be considered important in the diet in some circumstances.
Premature infants and those with challenging metabolic conditions may require more than the nine above.
People with the genetic condition phenylketonuria lack the enzyme phenylalanine hydroxylase and so cannot convert phenylalanine to tyrosine, and from birth they need a modified (low protein) diet, including supplemental tyrosine.

Primary structure of proteins

This is the order or sequence of amino acid residues joined by peptide bonds in the polypeptide chain. Peptide bonds are quite strong covalent bonds.

At one end ("N-terminal") there is an unreacted amino group, and similarly there is a free carboxylic acid group at the other end ("C-terminal"), although depending on the surrounding pH these may be converted into NH₃⁺ or COO^-. This sequence is always numbered starting from the N-terminal end.

For example: consider the protein Human Chorionic Gonadotrophin (hCG) :
Click to show/ hide the primary structure as a list of alpha chain residues:
Click to show/ hide the primary structure as a list of beta chain residues:
See more about this protein below.

It all depends on DNA . . .

A polypeptide is formed at a cell's ribosomes in the process of translation and thus the primary structure can be directly related to the sequence of base triplets ('codons') in the mRNA which is derived as a result of the process of transcription from the section of DNA corresponding to a gene. More specifically, in eukaryotes it is obtained from the exon sections of the DNA, the non-coding intron sections of pre-mRNA being 'edited out' to produce the final spliced mRNA.

[Some micro-organisms use nonribosomal peptide synthetase enzymes - independent of RNA - to synthesise peptides, often with unusual chemical structure, and these include a number of antibiotics and other compounds with a variey of uses.]

. . . but it can still be changed

Polypeptide chains can be altered by post-translational modifications: sections can be cut off at the ends or out of the middle, and groups can be added to individual amino acid residues, mostly to the R groups forming side chains extending out from the main polypeptide chain. These extra groups can be small: e.g. phosphate, hydroxyl, acetate, methyl or amide groups, or larger: e.g. carbohydrate, lipid.

These changes can alter the shape adopted by the polypeptide chain as it folds.

Secondary structure

The backbone of the polypeptide chain consists of 3 repeated atoms:- N - (α-)C - C -.

1aa (1K)

Each α-C has an R group, which varies. However the most regular feature is that every N has -H (with a slight charge δ+), and every non α-C has =O (with a slight charge δ-) projecting from it. The NH (H bond donor site) and C=O (H bond acceptor site) groups on each amino acid residue point in different directions ('up' and 'down' in these diagrams). Different sections of the polypeptide chain can become aligned so that NH and O=C groups from different residues can face one another, and form hydrogen bonds between them if they are quite close together: about 0.3 nm : 3 � (angstroms).

3aas (2K)

In fact, sharing of electrons causes a degree of resonance which means that the bonds between each non α-C and N can be considered to some extent equivalent to a double bond, so that the peptide bond section can be considered to be planar i.e. flat. The other bonds involving α-C are single and allow more rotation.

Hydrogen bonds are responsible for spontaneous re-arrangement of the polypeptide chain into fairly regular sections, with a distinctive 3-dimensional shape, usually coiling into a helix or aligning itself into fairly flat sheets.

The alpha-helix

Sections of the polypeptide chain may coil into an alpha-helix, so that each C=O interacts with H-N on another amino acid 4 residues further along the polypeptide chain. This electrostatic attraction stabilises the polypeptide form in this coiled shape. In other words it would take energy to break this attraction, to unwind it and change it into another shape.

alphahelixHbondscartoon (57K)

These diagrams show a section of alpha helix with hydrogen bonds (cyan, dotted) between C=O and N-H on different amino acid residues, R groups reduced to simple stubs.
On the left: just the shape of the polypeptide backbone.
In the middle, molecular detail of polypeptide backbone shown.
On the right: elements identified with letters, outside edge of helix coloured.
The alpha helix is described as a right hand helix, with 3.6 amino acid residues per turn. The pitch of the helix is 0.54nm (5.4 Å) . So 36 amino acids would make a helix with 10 loops, 5.4nm (54 Å ) long.
Its diameter is about 1.2 nm (12�)

R groups extend out sideways. Large R groups may limit the continuation of the helix, and proline wll put a kink in it.

Other helical forms are sometimes found. For example, collagen fibrils consist of a left-handed triple helix.

The beta-pleated sheet

If the polypeptide chain does not have bulky R groups, it can interact with another section of polypeptide chain without coiling up. If the polypeptide chain sections are running in the same direction (parallel) or in opposite directions (antiparallel) then hydrogen bonds can be formed between each C=O and NH. The relative rigidity of the peptide bond section means that polypeptide chains form fairly flat sheets alongside one another, but there is an angle between each layer as other bonds allow more rotation.

betaparallel (11K)

Tertiary structure

The polypeptide chain undergoes further twisting and folding, without alterations to the primary and secondary structure described above. This arrangement is mostly due to attractive forces between the R-groups projecting from the polypeptide backbone, and effectively results in a stabilised, low-energy state. It is sometimes assisted by other - 'chaperone' - proteins (chaperonins) which use energy from ATP to achieve the folding.

Hydrophobic side chains orientate themselves towards the inside, hydrophilic side chains towards the outside, of most globular protein molecules.
Hydrogen bonding can occur between R-groups, if they are suitably close.
Similarly, salt bridges can form between acidic and basic side groups.
In some cases, disulphide bonds form between pairs of cysteine residues. Being covalent, these provide greatest stability.

In many cases, reaching the tertiary level of protein structure is the final modification of the polypeptide chain and the resulting protein molecule now has a distinctive shape which enables it to function.

However it is worth noting that abnormal protein production is seen in a number of protein aggregation diseases such as the prion diseases bovine spongiform encephalopathy (BSE) and its associated Creutzfeldt-Jakob disease (CJD). Alzheimer's Disease is characterised by abnormal amounts of malformed protein ('amyloid plaques'), and fibres ('tau tangles') in the brain, and other conditions are associated with accumulation of odd proteins.

Quaternary structure

Some proteins consist of a combination of polypeptide chains with the structure described above.

Expressions such as dimer, trimer, tetramer .. describe the total number (2,3,4 ..) of polypeptide sub-units, and homo- or hetero- signify that each is the same or different. For example the enzyme alkaline phosphatase with 2 identical sub-units is said to be homodimeric and so is glucose-6-phosphate isomerase, whereas haemoglobin, which has 2 α and 2 β chains, is heterotetrameric.
These sub-units are held together by a variety of bonds which are often non-covalent e.g. hydrogen bonds, salt bridges etc, but occasionally covalent - disulphide bonds.

Human Chorionic Gonadotrophin

Primary structure: Colour by amino acids
Secondary structure: Show as cartoon ... main chain H bonds
Tertiary structure: Show cysteines, disulphide bonds and sidechain H bonds
Quaternary structure: Show two polypeptide chains
Reset to simple wireframe, coloured by element
These images are taken from the interactive jsmol file HCG on this site.

Protein denaturation

The structure of proteins as described above can be disrupted by a number of factors: heat, variations of pH and salt concentration.

Heat causes increased vibrations within molecules and this neutralises the weak attractive forces responsible for the higher levels of protein structure: hydrogen bonds, salt bridges etc so the molecular folding is reversed. A denatured protein generally lacks some of the properties of the original form: enzymes lose their catalytic capability when their active site becomes altered, and 'egg white' (the protein ovalbumin) changes from transparent and fluid to white and firm when heated. A similar process occurs when eggs are pickled in vinegar (ethanoic acid). Eggs may also be preserved in lime (alkaline solution), and in salt. In these cases decay is prevented because bacterial enzymes are also denatured.

Moderate heating does not affect peptide bonds or disulphide bridges, so the primary structure is unaffected. Enzymes produced by thermophilic organisms contain a higher proportion of amino acids residues with a polar sidechain.

The denaturation process is usually irreversible. However, under some circumstances denaturation can be prevented by 'heat shock proteins' which are similar to chaperonins mentioned above.