The Protein Synthesis Machinery
Melissa Hardy
The synthesis of proteins consumes more of a cell’s energy than any other metabolic process. In turn, proteins account for more mass than any other component of living organisms (with the exception of water), and proteins perform many different functions in a cell. The process of translation, or protein synthesis, involves the decoding of an mRNA message into a polypeptide product. Amino acids are covalently strung together by interlinking peptide bonds in lengths ranging from approximately 50 to more than 1000 amino acid residues. Each individual amino acid has an amino group (NH2) and a carboxyl group (COOH). Polypeptides are formed when the amino group of one amino acid forms an amide (i.e., peptide) bond with the carboxyl group of another amino acid. This reaction is catalyzed by ribosomes and generates one water molecule (it is a dehydration reaction).
The Protein Synthesis Machinery
In addition to the mRNA template, many molecules and macromolecules contribute to the process of translation. Translation requires the input of an mRNA template, ribosomes, tRNAs, and various enzymatic factors.
Ribosomes
Even before an mRNA is translated, a cell must invest energy to build its ribosomes. In an E. coli cell, there are between 10,000 and 70,000 ribosomes present at any given time. A ribosome is a complex macromolecule composed of ribosomal RNAs (rRNAs), and many polypeptides. In eukaryotes, the nucleolus is completely specialized for the synthesis and assembly of rRNAs.
Ribosomes are found in the cytoplasm of prokaryotes. In eukaryotes, they are also found in the cytoplasm and are sometimes associated with the outer surface of the rough endoplasmic reticulum. Mitochondria and chloroplasts also have their own ribosomes, which look more similar to prokaryotic ribosomes (and have similar drug sensitivities) than the ribosomes just outside their outer membranes in the cytoplasm.
Ribosomes dissociate into large and small subunits when they are not synthesizing proteins and reassociate during the initiation of translation. In E. coli, the small subunit is described as 30S, and the large subunit is 50S, for a total of 70S (Svedberg units are not additive). Mammalian ribosomes have a small 40S subunit and a large 60S subunit, for a total of 80S. The small subunit is responsible for binding the mRNA template, whereas the large subunit sequentially binds tRNAs. Each mRNA molecule is simultaneously translated by many ribosomes (although in different locations along the mRNA), all synthesizing protein in the same direction: reading the mRNA from 5′ to 3′ and synthesizing the polypeptide from the N terminus to the C terminus.

tRNAs
The transfer RNAs (tRNAs) are structural RNA molecules that were transcribed from their corresponding genes by RNA polymerase III. Depending on the species, 40 to 60 types of tRNAs exist in the cytoplasm. Transfer RNAs serve as adaptor molecules. Each tRNA carries a specific amino acid and recognizes one or more of the mRNA codons that define the order of amino acids in a protein. Aminoacyl-tRNAs (tRNA with attached amino acid) bind to the ribosome and add the corresponding amino acid to the polypeptide chain. Therefore, tRNAs are the molecules that actually “translate” the language of RNA into the language of proteins.
Of the 64 possible mRNA codons—or triplet combinations of A, U, G, and C—three specify the termination of protein synthesis and 61 specify the addition of amino acids to the polypeptide chain. Of these 61, one codon (AUG) also encodes the initiation of translation. Each tRNA anticodon can base pair with one or more of the mRNA codons for its amino acid.
Aminoacyl tRNA Synthetases
The process of pre-tRNA synthesis by RNA polymerase III only creates the RNA portion of the adaptor molecule. The corresponding amino acid must be added later, once the tRNA is processed and exported to the cytoplasm. Through the process of tRNA “charging,” each tRNA molecule is linked to its correct amino acid by one of a group of enzymes called aminoacyl tRNA synthetases. At least one type of aminoacyl tRNA synthetase exists for each of the 20 amino acids; the exact number of aminoacyl tRNA synthetases varies by species. The term “charging” is appropriate, since the high-energy bond that attaches an amino acid to its tRNA is later used to drive the formation of the peptide bond. Each tRNA is named for its amino acid.

Media Attributions
- Ribosome_shape © Vossman is licensed under a CC BY-SA (Attribution ShareAlike) license
- aminoacyl tRNA synthetase © Melissa Hardy is licensed under a CC BY-SA (Attribution ShareAlike) license
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by interrogating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand a protein's shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of the β chain of human hemoglobin may be found on Uniprot, entry P68871. The N-terminal amino acid is valine (Val, V), and the C-terminal amino acid is histidine (His, H) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Figure 1: Primary structure of human hemoglobin β chain. The β chain of human hemoglobin has 146 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence in the gene’s coding region may lead to change in the amino acid sequence, causing a change in the protein's structure and sometimes, therefore its function. In people who have sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, at the sixth position in the primary sequence of the β chain, the wild type amino acid, glutamate (Glu, E) is substituted by valine (Val, V). What is most remarkable to consider is that a hemoglobin molecule is comprised of two α and two β chains that each consist of about 150 amino acids. The full hemoglobin protein, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the ~600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the β chain of hemoglobin, hemoglobin proteins form long fibers that distort normally disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In wild type hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. In α-helices, for example, hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in the primary sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure forms primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determines which amino acids are energetically favorable to be near other amino acids. For example, side chains with like charges repel each other and those with opposite charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water from the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein as the protein folds, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some – but not all – proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come into interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Practice Questions
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. The R group is considered the side chain, and all atoms not in the R group are part of the backbone. [Image Description]
Practice Question
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxylic acid group in their basic structure. The 20 common amino acids make up most of the proteins in our bodies. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's chemical properties, such as whether it is acidic, basic, polar, or hydrophobic. Each amino acid has both a single-letter code and a three-letter abbreviation. For example, valine is abbreviated with the single letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter codes are also provided. Backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond forms when the amino group from one amino acid reacts with the carboxyl group of another in a dehydration reaction, releasing a water molecule. In vivo this process happens in the ribosome. The resulting bond is the peptide bond (Figure 8), which has partial double-bond character due to resonance in the amide group.
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, a water molecule is released. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the carboxyl or C terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are likewise written from the N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a long polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups: 1) those with polar side chains, 2) those with hydrophobic side chains, and 3) those with charged side chains. Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing hydrogen bond donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site). Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and also may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may donate a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires a proton extraction.
Figure 9: The polar amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), cysteine (Cys, C), valine (Val, V), isoleucine (Ile, I), leucine (Leu, L), phenylalanine (Phe, F) and proline (Pro, P) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the tertiary structure of the protein. In addition, cysteine residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges between their sulfur atoms, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of cysteine is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other, stabilized by π-π interactions. They are also highly conserved within protein families, with tryptophan having the highest conservation rate.
Figure 10: The hydrophobic amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7) carry a single charge in the side chain. There are four of them; the two basic ones are lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues are aspartate or aspartic acid (Asp, D) and glutamate or glutamic acid (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of aspartate and glutamate. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly, G), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro, P), although considered hydrophobic, is also often found on the surface of proteins, presumably due to its presence in turn and loop regions. In contrast to glycine, which provides the polypeptide chain high flexibility, proline provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes proline is called a helix breaker since it is often found at the end of α-helices (Figure 12).
Figure 12: The special amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Practice Questions
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 5: The image illustrates the hierarchical structure of proteins from the primary structure to the quaternary structure, using hemoglobin as an example. The background is a gradient blue, transitioning from a darker blue at the top to a lighter blue at the bottom.
From left to right:
- Primary Structure: Depicts a sequence of amino acids connected via peptide bonds. Four amino acids are shown (labeled 1, 2, 3, and 4). Each amino acid consists of an amino group (NH2), carboxyl group (COOH), hydrogen atom (H), and side chain (R1, R2, R3, R4).
- Secondary Structure (α Helix): Shows the formation of an alpha helix from the amino acid chain. The helix is represented by an orange spiraling ribbon with dotted lines indicating hydrogen bonds stabilizing the structure.
- Tertiary Structure: Illustrates a β-globin polypeptide chain folded into a specific three-dimensional shape. It appears as a purple, looped, and twisted structure.
- Quaternary Structure: Demonstrates the assembly of multiple polypeptide chains. The β-globin (purple) and α-globin (yellow, green, and blue) polypeptides combine to form a hemoglobin molecule.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC BY-NC-SA 4.0. "Protein Structure & Function" is licensed under CC BY-NC-SA 4.0.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by investigating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand how the protein gets its final shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of hemoglobin may be found on Uniprot, entry P69905. The N-terminal amino acid is methionine (Met, M), and the C-terminal amino acid is arginine (Arg, R) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Figure 1: Primary structure of human hemoglobin α chain. The α chain of human hemoglobin has 142 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence of the gene’s coding region may lead to adding a different amino acid to the growing polypeptide chain, causing a change in protein structure and sometimes, therefore function. In sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, valine in the β chain is substituted with the amino acid, glutamate. What is most remarkable to consider is that a hemoglobin molecule is comprised of two alpha and two beta chains that each consist of about 150 amino acids. The molecule, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the 600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the beta chain of hemoglobin, hemoglobin molecules form long fibers that distort the biconcave, or disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In normal hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. Hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure is primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determine which amino acids are energetically favorable to be next to other amino acids. For example, side chains with like charges repel each other and those with unlike charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water in the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come to interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. [Image Description]
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxyl-acid-group in their basic structure. As we mentioned, there are 20 common amino acids present in proteins. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's nature (that is, whether it is acidic, basic, polar, or nonpolar). Each amino acid has both a single-letter and a three-letter abbreviation. For example, valine is abbreviated with the letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter abbreviations are also provided. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid's carboxyl group and the incoming amino acid's amino group combine, releasing a water molecule. The resulting bond is the peptide bond (Figure 8).
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, it releases a water molecule. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the C or carboxyl terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are written from N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups:
- Those with polar side chains.
- Those with hydrophobic side chains.
- Those with charged side chains.
Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site. Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may give away a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires proton extraction.
Figure 9: The polar amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F) and cysteine (Cys, C) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the structure. In addition, Cys residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of Cys is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. We should also note here that the side chains of histidine and tyrosine, together with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, using electron clouds within their ring structures. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.
Figure 10: The hydrophobic amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7.4) carry a single charge in the side chain. There are four of them; the two basic ones include lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues include aspartate (Asp, D) and glutamate (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro), although considered hydrophobic, is also found at the surface, presumably due to its presence in turn and loop regions. In contrast to Gly, which provides the polypeptide chain high flexibility, Pro provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes Pro is called a helix breaker since it is often found at the end of α-helices.
Figure 12: The special amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC-BY-SA 4.0. "Protein Structure & Function" is licensed under ???.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by investigating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand how the protein gets its final shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of hemoglobin may be found on Uniprot, entry P69905. The N-terminal amino acid is methionine (Met, M), and the C-terminal amino acid is arginine (Arg, R) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Figure 1: Primary structure of human hemoglobin α chain. The α chain of human hemoglobin has 142 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence of the gene’s coding region may lead to adding a different amino acid to the growing polypeptide chain, causing a change in protein structure and sometimes, therefore function. In sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, valine in the β chain is substituted with the amino acid, glutamate. What is most remarkable to consider is that a hemoglobin molecule is comprised of two alpha and two beta chains that each consist of about 150 amino acids. The molecule, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the 600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the beta chain of hemoglobin, hemoglobin molecules form long fibers that distort the biconcave, or disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In normal hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. Hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure is primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determine which amino acids are energetically favorable to be next to other amino acids. For example, side chains with like charges repel each other and those with unlike charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water in the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come to interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. [Image Description]
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxyl-acid-group in their basic structure. As we mentioned, there are 20 common amino acids present in proteins. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's nature (that is, whether it is acidic, basic, polar, or nonpolar). Each amino acid has both a single-letter and a three-letter abbreviation. For example, valine is abbreviated with the letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter abbreviations are also provided. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid's carboxyl group and the incoming amino acid's amino group combine, releasing a water molecule. The resulting bond is the peptide bond (Figure 8).
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, it releases a water molecule. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the C or carboxyl terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are written from N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups:
- Those with polar side chains.
- Those with hydrophobic side chains.
- Those with charged side chains.
Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site. Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may give away a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires proton extraction.
Figure 9: The polar amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F) and cysteine (Cys, C) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the structure. In addition, Cys residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of Cys is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. We should also note here that the side chains of histidine and tyrosine, together with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, using electron clouds within their ring structures. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.
Figure 10: The hydrophobic amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7.4) carry a single charge in the side chain. There are four of them; the two basic ones include lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues include aspartate (Asp, D) and glutamate (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro), although considered hydrophobic, is also found at the surface, presumably due to its presence in turn and loop regions. In contrast to Gly, which provides the polypeptide chain high flexibility, Pro provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes Pro is called a helix breaker since it is often found at the end of α-helices.
Figure 12: The special amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC-BY-SA 4.0. "Protein Structure & Function" is licensed under ???.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict how environmental conditions regulate gene expression
1. Review of prokaryotic transcription & translation
Genes are composed of DNA and are linearly arranged on chromosomes. Genes specify the sequences of amino acids, which are the building blocks of proteins. In turn, proteins are responsible for orchestrating nearly every function of the cell. Both genes and the proteins they encode are absolutely essential to life as we know it.
Transcription in bacteria
The bacterial chromosome is a closed circle of double-stranded DNA that is found in the central region of the cell called the nucleoid region. Prokaryotic genomes are very compact, and prokaryotic transcripts often cover more than one gene or cistron, a coding sequence for a single protein. Polycistronic mRNAs are then translated to produce multiple proteins.
A promoter is a DNA sequence onto which the transcription machinery, including RNA polymerase, binds and initiates transcription. In most cases, promoters exist upstream of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding gene is transcribed all the time, some of the time, or infrequently.
Transcription in prokaryotes (and in eukaryotes) requires the DNA double helix to partially unwind in the region of mRNA synthesis. The region of unwinding is called a transcription bubble. The protein, RNA polymerase, carries out transcription, reading from the template strand of the DNA, in the 3’ to 5’ direction.
RNA polymerase proceeds along the DNA template strand, pairing A, C, U, and G nucleotides to the DNA’s T, G, A, and C nucleotides, respectively. RNA polymerase catalyzes the formation of phosphodiester bonds between the mRNA nucleotides in sequence, synthesizing mRNA in the 5' to 3' direction at a rate of approximately 40 nucleotides per second. As elongation proceeds, the DNA is continuously unwound ahead of the core enzyme and rewound behind it, unchanged. Transcription terminates when the RNA polymerase reaches a termination sequence downstream of the genes being transcribed, which causes the RNA polymerase to fall off of the template DNA strand.
Translation in bacteria
For cells that are growing and dividing, the synthesis of proteins consumes more of a cell’s energy than any other metabolic process. In turn, proteins account for more mass than any other component of living organisms (with the exception of water), and proteins perform virtually every function of a cell. The process of translation, or protein synthesis, involves the decoding of an mRNA message into a polypeptide product. Amino acids are covalently strung together by interlinking peptide bonds in lengths ranging from approximately 50 to more than 1000 amino acid residues. Each individual amino acid has an amino group (NH2) and a carboxyl (COOH) group. Polypeptides are formed when the amino group of one amino acid forms an amide (i.e., peptide) bond with the carboxyl group of another amino acid. This reaction is catalyzed by ribosomes and generates one water molecule.
Translation requires the input of an mRNA template, ribosomes, tRNAs, and various enzymatic factors. A ribosome is a complex macromolecule composed of structural and catalytic rRNAs and many proteins. Ribosomes exist in the cytoplasm of prokaryotes. Each mRNA molecule is simultaneously translated by many ribosomes, all synthesizing protein in the same direction: reading the mRNA from 5' to 3' and synthesizing the polypeptide from the N terminus to the C terminus.
The tRNAs are structural RNA molecules that serve as adapter molecules. Each tRNA carries a specific amino acid and recognizes one or more of the mRNA codons that define the order of amino acids in a protein. Aminoacyl-tRNAs bind to the ribosome and add the corresponding amino acid to the growing polypeptide chain. Therefore, tRNAs are the molecules that actually “translate” the language of RNA into the language of proteins.
As with mRNA synthesis, protein synthesis can be divided into three phases: initiation, elongation, and termination. In initiation, a sequence upstream of the first AUG codon interacts with the rRNA molecules that compose the ribosome and anchors the ribosome at the correct location on the mRNA template. The initiator tRNA then interacts with the start codon AUG, whose tRNA carries the amino acid methionine. During elongation, the mRNA template provides tRNA binding specificity. As the ribosome moves along the mRNA, each mRNA codon comes into register, and specific binding with the corresponding charged tRNA anticodon is ensured. Termination of translation occurs when a stop codon (UAA, UAG, or UGA) is encountered. When encountered, these stop codons are recognized by protein release factors that resemble tRNAs. This reaction forces the previous amino acid to detach from its tRNA, and the newly made protein is released.
Practice Question
2. Operons as units of transcriptional regulation
For a cell to function properly, necessary proteins must be synthesized at the proper time and place. All cells control or regulate the synthesis of proteins from information encoded in their DNA. The process of activating the molecules that produce RNA through transcription and protein through translation is called gene expression. Whether in a simple unicellular organism or a complex multi-cellular organism, each cell controls when and how its genes are expressed. For this to occur, there must be internal chemical mechanisms that control when a gene is expressed to make RNA and protein, how much of the protein is made, and when it is time to stop making that protein because it is no longer needed.
The regulation of gene expression conserves energy and matter. Transcription and translation require ATP, so it would require a significant amount of energy for an organism to express every gene at all times. In addition, every RNA and protein molecule is composed of nucleotides or amino acids, respectively. These subunit molecules must ultimately be made by recycling other molecules or consumed by the organism from the environment. A cell conserves energy and matter by only expressing the subset of genes that are required for its function at any given time.
The control of gene expression is extremely complex. Malfunctions in this process are detrimental to the cell and can lead to the development of many diseases in humans, including cancer.
The DNA of prokaryotes is organized into a circular chromosome within the nucleoid region of the cell cytoplasm. Proteins that are needed for a specific function, or that are involved in the same biochemical pathway, are encoded together in the coding region of an operon, which are transcribed as a single mRNA molecule (Figure 1). For example, all of the genes needed to use lactose as an energy source are coded next to each other in the coding region of the lac operon. The promoter and operator are located upstream of the coding region and are regions of the DNA where the proteins that regulate and carry out transcription bind. For example, RNA polymerase binds to the promoter site and then slides downstream, transcribing the coding region as it passes. An operator is a region of DNA where regulatory proteins that either activate or repress transcription of the operon bind. Promoters and operators are not themselves genes because they do not encode proteins, rather they are non-coding DNA.
Figure 1: Components of an Operon. The promoter and operator are located upstream of the coding region of an operator. Created in BioRender.com. [Image Description]
In prokaryotic cells, there are three types of regulatory molecules that can affect the expression of operons: repressors, activators, and inducers. Repressors and activators are proteins produced in the cell. Both repressors and activators regulate gene expression by binding to specific DNA sites upstream of the genes they control. Repressors prevent transcription of a gene in response to an external stimulus, whereas activators increase the transcription of a gene in response to an external stimulus. How an operator is positioned relative to the promoter is critical the the function of the regulatory protein, positioning it to either support the binding of RNA polymerase (an activator) or interfere with RNA polymerase (a repressor). Inducers are small molecules that may be produced by the cell or that are in the cell’s environment. Inducers either activate activator proteins or repress repressor proteins to cause a gene to be expressed.
Practice Questions
3. Transcriptional repression
Most organisms, including the bacteria Escherichia coli, need glucose to perform cellular respiration in order to survive. If glucose is not available, E. coli are able to uptake other sugars from the environment and metabolize them into glucose. Lactose is one such sugar that E. coli can ingest from its environment and metabolize into glucose (Figure 2). It is a disaccharide, made of the two monosaccaride molecules, glucose and galactose, bound together. If glucose is present in the environment, there is no need for the E. coli to expend energy and matter expressing the proteins needed to uptake and digest lactose. However, if glucose is not present and lactose is, the E. coli would benefit from using lactose as a source of glucose.
Figure 2: Lactose digestion. β-galactosidase breaks a bond in the disaccharide lactose to break it into its component monosaccharides, glucose and galactose. Source: lactase.svg. Created in BioRender.com. [Image Description]
The genes required for E. coli to uptake lactose and break it down into glucose are located together on its chromosome in the lac operon. The coding region of the lac operon includes the genes for three proteins (Figure 3). lacZ encodes the enzyme, β-galactosidase or LacZ, which breaks down lactose into its monosaccharides, glucose and galactose. lacY encodes the membrane transport protein, β-galactoside permease or LacY, which moves lactose from outside the cell to inside the cell. lacA encodes the enzyme β-galactoside transacetylase or LacA, and its role in lactose metabolism is unclear.
The lac operon includes three important regions: the lac promoter, the lac operator, and the coding region. The promoter region is where RNA polymerase binds to initiate transcription and where transcriptional activators bind to help RNA polymerase bind. Just downstream is the operator region where repressors can bind. Next comes the transcriptional start site, immediately followed by the coding region, which contains the three genes, lacZ, lacY, and lacA.
Figure 3: An E. coli cell that is in the presence of lactose and transcribing its lac operon. Lactose from the lumen of the large intestine enters the E. coli cell through lac permease. Once in the cell, lactose is broken down by β-galactosidase into glucose and galactose or isomerized into allolactose. Allolactose binds to the lac repressor and cAMP binds to CRP allowing RNA polymerase to transcribe the lac operon. In this way, E. coli is able to digest lactose into glucose to use in cellular respiration. Created in BioRender.com. [Image Description]
The lac operator contains the DNA code to which the lac repressor protein can bind. The lac repressor binds to the operator region of the lac operon, physically blocking RNA polymerase and halting transcription (Figure 4). The lac repressor is expressed constitutively, meaning always, in E. coli. Its gene is located elsewhere on the E. coli chromosome, not in the lac operon.
Figure 4: State of the lac operon in the absence of lactose. The lac repressor binds to the operator region, blocking the progression of RNA polymerase downstream, off the promoter region. Created in BioRender.com. [Image Description]
When there is allolactose present, allolactose binds to the lac repressor and causes it to change shape such that it no longer fits on the lac operator (Figure 5). When β-galactosidase breaks down lactose, 90% of the time the lactose breaks down all the way to glucose and galactose, but the other 10% of the time, it isomerizes into allolactose. When lactose is isomerized to allolactose, the allolactose can bind the lac repressor and block it from binding to the lac operator. Thus, RNA polymerase is not blocked, and it can initiate transcription of the lac genes to digest lactose.
Figure 5: State of the lac operon in the presence of lactose. Allolactose binds to the lac repressor, which causes it to change shape and unbind from the operator region. RNA polymerase can progress downstream off the promotor region to the coding region, where it transcribes the lacZ, lacY, and lacA genes into mRNA. This mRNA transcript is translated by the ribosome into β-galactosidase, lac permease, and transacetylase. Created in BioRender.com. [Image Description]
Practice Questions
Images created in BioRender.com.
4. Transcriptional activation
Just as the lac operon is negatively regulated by the lac repressor binding to the operator, there are activator proteins that bind to the promoter that act as positive regulators to turn on transcription. When glucose is scarce, E. coli can turn to other sugar sources for fuel. To do this, the genes to digest these alternate sugars and transform them into glucose must be transcribed.
When glucose levels drop, cyclic AMP (cAMP) begins to accumulate in the cell. cAMP is a small, signaling molecule that is involved in glucose and energy metabolism in E. coli. Accumulating cAMP binds to the positive regulator, cAMP receptor protein (CRP), a protein that binds to the promoters of many operons that control the processing of alternative sugars, including the lac operon (Figure 6). CRP is also known as catabolite activator protein (CAP), and the names CRP and CAP may be used interchangeably.
When cAMP binds to CRP, the complex then binds to the promoter region of the lac operon, just upstream of the RNA-polymerase-binding site on the promoter. The binding of CRP stabilizes the binding of RNA polymerase to the promoter region and increases transcription of the genes in the coding region of the operon.
Figure 6: State of the lac operon in the absence of glucose. When glucose levels are low, cAMP accumulates in the cell and binds CRP. CRP helps RNA polymerase bind to the promoter region and transcribe the lac operon. Created in BioRender.com. [Image Description]
If E. coli cells are in an environment where there is plentiful glucose, glucose molecules will enter the cell through the glucose permease membrane protein. When glucose levels are high in the cell, cAMP is consumed by other cellular processes, and cellular levels of cAMP are low. Without cAMP bound, CRP changes conformation and cannot bind the promoter region (Figure 7). The lac operon has evolved to have a very weak promoter, meaning RNA polymerase does not bind the promoter region very tightly unless cAMP+CRP are present to increase its binding affinity. Therefore, in the presence of glucose, RNA polymerase does not bind as tightly to the promoter region, and expression of the lac operon is low. There is no need for the cell to waste energy and matter building the proteins encoded by the lac operon when there is abundant glucose available for cellular respiration.
Figure 7: State of the lac operon in the presence of glucose. When glucose levels are high, cAMP levels are low. Without cAMP bound, CRP does not bind to the promoter region. RNA polymerase does not bind very well to the promoter without cAMP-CRP, so transcription does not occur. Created in BioRender.com. [Image Description]
Practice Questions
Images created in BioRender.com.
5. Activation and repression to control lac operon expression
E. coli only need to expend energy and matter expressing the genes necessary to digest lactose when two conditions are both met: 1) glucose is not present and 2) lactose is present. There is no reason to make the proteins that import and digest lactose if glucose is present, and there is no need to make these proteins if there is no lactose available in the environment (Table 1).
Only when glucose is absent and lactose is present will the lac operon be transcribed. In the absence of glucose, the binding of the cAMP+CRP binds to the lac promoter and makes transcription of the lac operon more effective. When lactose is present, its metabolite, allolactose, binds to the lac repressor and changes its shape so that it cannot bind to the lac operator to prevent transcription. This combination of conditions makes sense for the cell, because it would be energetically wasteful to synthesize the enzymes to process lactose if glucose was plentiful or lactose was not available.
If glucose is present, then CRP fails to bind to the promoter sequence to activate transcription. If lactose is absent, then the lac repressor binds to the operator to prevent transcription. If either of these conditions is met, then transcription remains off. Only when glucose is absent and lactose is present is the lac operon transcribed at a high rate.
Table 1: Positive and negative regulation of the lac operon
Glucose present | cAMP levels | RNA polymerase bound to promoter | Lactose/allolactose present | lac repressor bound to operator | Transcription of lac genes |
Present | Low | Very little | Absent | Yes | No |
Present | Low | Very little | Present | No | No (very little) |
Absent | High | Yes | Absent | Yes | No |
Absent | High | Yes | Present | No | Yes |
Practice Questions
Images created in BioRender.com.
Image Descriptions
Figure 1: The image depicts a segment of DNA as part of an operon structure, visualized as a double helix. The DNA strand runs horizontally across the image from left to right. The left side is marked as the 5' end and the right side as the 3' end, indicating the direction of the DNA sequence. The DNA is color-coded into three distinct regions: the Promoter region is colored in blue, the Operator region in purple, and the Coding Region in orange. Above the DNA strand, a black bracket labeled 'Operon' spans across these three regions. The background is white, emphasizing the colored segments. [Return to Figure]
Figure 2: The image depicts a chemical reaction process in which lactose is broken down into D-galactose and D-glucose. At the top, there are structural chemical formulas. On the left, lactose is shown as two connected hexagonal ring structures. To the right, an arrow points towards two separate hexagonal ring structures, representing D-galactose and D-glucose, with a plus sign in between them. Below the formulas, a simplified graphical representation mirrors the above process using colored hexagons. On the left, a light blue hexagon is linked to a dark blue hexagon representing lactose. An arrow points to the right toward two separate hexagons, one light blue representing D-galactose and one dark blue representing D-glucose, with a plus sign in between them. [Return to Figure]
Figure 3: This is a schematic diagram depicting the lactose operon regulation mechanism within an E. coli cell. The diagram is framed by a red oval outline representing the boundary of the E. coli cell. The background inside the cell is a light pink color. There are several key elements and labels in the image, elaborating the process of lactose metabolism.
Starting from the top right, blue hexagons represent lactose molecules outside the cell, entering through a membrane protein labeled “lac permease.” Once inside, lactose can be converted into allolactose, which is shown as hexagons with various shades of blue. Allolactose binds to a purple irregular shape labeled “lac repressor,” inactivating it and allowing for transcription.
The middle part of the diagram shows "beta-galactosidase" depicted as a green enzyme, converting lactose into glucose and galactose, represented by blue hexagons.
At the bottom left, the DNA is shown as a double helix with different segments marked as lacZ, lacY, and lacA, respectively. Colored proteins including “cAMP,” “CRP,” and “RNA polymerase” interact with the DNA strand to regulate the transcription process. [Return to Figure]
Figure 4: The image is a schematic representation of DNA with associated proteins, illustrating the process of transcription regulation. The double-helix DNA strand runs horizontally across the image, transitioning from left to right. At the left end of the DNA strand, there is a blue-shaded structure labeled "RNA polymerase" that is bound to the DNA. This structure is represented with a slight transparency and is roughly cubic in shape, with smooth contours. Adjacent to this, towards the right, is a purple, irregularly shaped structure labeled "lac repressor" also bound to a segment of the DNA. The DNA sequence includes three distinct sections labeled "lacZ," "lacY," and "lacA" in green, teal, and turquoise text respectively. The DNA strand transitions in color between segments, visually differentiating these regions. The ends of the DNA strand are marked with 5' and 3' labels indicating the orientation, with "5'" and "3'" labeled at both the leftmost and rightmost ends. [Return to Figure]
- DNA Segment: At the top-center, there is a double-helical strand of DNA. The DNA strand has labels indicating its 5' and 3' ends on the left and right respectively. The DNA is color-coded, with different regions marked in purple (lac repressor binding site), light blue (around RNA polymerase), green (lacZ), blue-green (lacY), and dark cyan (lacA).
- RNA Polymerase: On the left, a light blue, rounded structure labeled "RNA polymerase" is shown bound to the DNA strand, indicating the start of transcription.
- Lac Repressor and Allolactose: Towards the upper-center, a purple-colored, irregular-shaped structure labeled "lac repressor" is shown with hexagonal allolactose molecules in blue attached to it.
- Transcription Process: Below the DNA, a wavy green strand represents the mRNA being synthesized, labeled "mRNA 5'" on the left and "3'" on the right.
- Translation and Enzyme Production: The mRNA strand is split into segments coded for three different enzymes:
- β-galactosidase: Shown as a green, cloud-like structure.
- Lac permease: Illustrated as a teal, tubular structure.
- Transacetylase: Depicted as a darker green, cloud-like structure.
- Black arrows indicate the processes of transcription (DNA to mRNA) and translation (mRNA to proteins)
Figure 6: The image depicts a molecular biological process involving the transcription and translation of the lac operon, structured in a linear format from top to bottom. At the top left, a DNA strand is shown with a nucleotide sequence represented by a helical structure. Bound to the DNA are a red, irregularly shaped protein labeled "CRP," and an orange circular molecule labeled "cAMP." Adjacent to this, a light blue structure representing RNA polymerase is shown, initiating transcription from the DNA template. The DNA sequence is segmented into colored regions labeled "lacZ" in green, "lacY" in turquoise, and "lacA" in teal. Below this, the mRNA transcript is shown emerging, signified by a wavy green and blue line. The mRNA transcript undergoes translation, denoted by arrows, producing three distinct proteins. On the bottom row are illustrations of the resulting proteins: a green, amorphous shape labeled "β-galactosidase," a turquoise, cylindrical shape labeled "lac permease," and a dark green, irregular shape labeled "Transacetylase." [Return to Figure]
Figure 7: The image depicts a molecular diagram related to gene transcription. Three distinct elements are illustrated above a sequence of DNA, and labeled as "RNA polymerase," "CRP," and "glucose."
- RNA polymerase is shown on the left as a light blue, abstract shape with smooth curves, resembling a cap or enzyme model.
- CRP (cAMP receptor protein) is shown in the middle as a dark red, irregular shape with a lumpy texture.
- Glucose is illustrated on the right with five small, dark blue hexagons arranged in a square formation with one hexagon in the center.
Below these elements is a DNA strand depicted as a double helix with segments colored in shades of blue, purple, green, and aqua. The DNA sequence is labeled at the 5' and 3' ends with corresponding numbers. Three specific genes within the DNA sequence are labeled:
- "lacZ" in green,
- "lacY" in teal,
- "lacA" in aqua.
Licenses and Attributions
"Transcriptional regulation of the lac operon" by Michelle McCully is adapted from "15-2-prokaryotic-transcription", "15-5-ribosomes-and-protein-synthesis", "16-1-regulation-of-gene-expression", and "16-2-prokaryotic-gene-regulation" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC BY 4.0. "Transcriptional regulation of the lac operon" is licensed under CC BY-NC 4.0.
Images created with BioRender are licensed with permission as CC BY-NC 4.0.
Learning Objectives
By the end of this section, you will be able to do the following:
- Explain the significance of photosynthesis to other living organisms
- Describe the main structures involved in photosynthesis
- Identify the substrates and products of photosynthesis
Photosynthesis is essential to all life on earth; both plants and animals depend on it. It is the only biological process that can capture energy that originates from sunlight and converts it into chemical compounds (carbohydrates) that every organism uses to power its metabolism. It is also a source of oxygen necessary for many living organisms. In brief, the energy of sunlight is “captured” to energize electrons, whose energy is then stored in the covalent bonds of sugar molecules. How long lasting and stable are those covalent bonds? The energy extracted today by the burning of coal and petroleum products represents sunlight energy captured and stored by photosynthesis 350 to 200 million years ago during the Carboniferous Period.
Plants, algae, and a group of bacteria called cyanobacteria are the only organisms capable of performing photosynthesis (Figure 8.2). Because they use light to manufacture their own food, they are called photoautotrophs (literally, “self-feeders using light”). Other organisms, such as animals, fungi, and most other bacteria, are termed heterotrophs (“other feeders”), because they must rely on the sugars produced by photosynthetic organisms for their energy needs. A third very interesting group of bacteria synthesize sugars, not by using sunlight’s energy, but by extracting energy from inorganic chemical compounds. For this reason, they are referred to as chemoautotrophs.

The importance of photosynthesis is not just that it can capture sunlight’s energy. After all, a lizard sunning itself on a cold day can use the sun’s energy to warm up in a process called behavioral thermoregulation. In contrast, photosynthesis is vital because it evolved as a way to store the energy from solar radiation (the “photo-” part) to energy in the carbon-carbon bonds of carbohydrate molecules (the “-synthesis” part). Those carbohydrates are the energy source that heterotrophs use to power the synthesis of ATP via respiration. Therefore, photosynthesis powers 99 percent of Earth’s ecosystems. When a top predator, such as a wolf, preys on a deer (Figure 8.3), the wolf is at the end of an energy path that went from nuclear reactions on the surface of the sun, to visible light, to photosynthesis, to vegetation, to deer, and finally to the wolf.

Main Structures and Summary of Photosynthesis
Photosynthesis is a multi-step process that requires specific wavelengths of visible sunlight, carbon dioxide (which is low in energy), and water as substrates (Figure 8.4). After the process is complete, it releases oxygen and produces glyceraldehyde-3-phosphate (G3P), as well as simple carbohydrate molecules (high in energy) that can then be converted into glucose, sucrose, or any of dozens of other sugar molecules. These sugar molecules contain energy and the energized carbon that all living things need to survive.

The following is the chemical equation for photosynthesis (Figure 8.5):

Basic Photosynthetic Structures
In plants, photosynthesis generally takes place in leaves, which consist of several layers of cells. The process of photosynthesis occurs in a middle layer called the mesophyll. The gas exchange of carbon dioxide and oxygen occurs through small, regulated openings called stomata (singular: stoma), which also play roles in the regulation of gas exchange and water balance. The stomata are typically located on the underside of the leaf, which helps to minimize water loss due to high temperatures on the upper surface of the leaf. Each stoma is flanked by guard cells that regulate the opening and closing of the stomata by swelling or shrinking in response to osmotic changes.
In all autotrophic eukaryotes, photosynthesis takes place inside an organelle called a chloroplast. For plants, chloroplast-containing cells exist mostly in the mesophyll. Chloroplasts have a double membrane envelope (composed of an outer membrane and an inner membrane), and are ancestrally derived from ancient free-living cyanobacteria. Within the chloroplast are stacked, disc-shaped structures called thylakoids. Embedded in the thylakoid membrane is chlorophyll, a pigment (molecule that absorbs light) responsible for the initial interaction between light and plant material, and numerous proteins that make up the electron transport chain. The thylakoid membrane encloses an internal space called the thylakoid lumen. As shown in Figure 8.6, a stack of thylakoids is called a granum, and the liquid-filled space surrounding the granum is called stroma or “bed” (not to be confused with stoma or “mouth,” an opening on the leaf epidermis).

The Two Parts of Photosynthesis
Photosynthesis takes place in two sequential stages: the light-dependent reactions and the light-independent reactions. In the light-dependent reactions, energy from sunlight is absorbed by chlorophyll and that energy is converted into stored chemical energy. In the light-independent reactions, the chemical energy harvested during the light-dependent reactions drives the assembly of sugar molecules from carbon dioxide. Therefore, although the light-independent reactions do not use light as a reactant, they require the products of the light-dependent reactions to function. In addition, however, several enzymes of the light-independent reactions are activated by light. The light-dependent reactions utilize certain molecules to temporarily store the energy: These are referred to as energy carriers. The energy carriers that move energy from light-dependent reactions to light-independent reactions can be thought of as “full” because they are rich in energy. After the energy is released, the “empty” energy carriers return to the light-dependent reaction to obtain more energy. Figure 8.7 illustrates the components inside the chloroplast where the light-dependent and light-independent reactions take place.

Mutations
Mutations that occur during translation of RNA into proteins often affect the coding sequence of a gene. Changes to the gene of this kind can occur in a non-coding region of the gene, such as the promoter, or the coding region, where the protein sequence is established. Mutations to the non-coding region can affect how much a gene is activated, where it is activated, or when it is activated. Changes to the coding region can affect protein sequence and expression during translation.
Within the coding region, there are a few classes of mutations that classify what change is being made. These names can be used to classify the severity of the change occurring as well. A silent mutation is considered to be non harmful. Silent mutations do not alter protein formation, due to large advantageous redundancies in genetic code. For example, GAA and GAG both code for the amino acid glutamate. Missense mutations result from single codon switches - these cause an entirely new amino acid to be coded for. Glutamate (GAA) can very easily result in alanine (GCA) if the A codon is mutated to a C codon. A nonsense mutation results in the creation of a stop codon, UAA, prematurely.
Codon table from - https://lmu.pressbooks.pub/conceptsinbiology/chapter/the-genetic-code/?preview_id=664&preview_nonce=e9d7228378&preview=true
A frameshift mutation is caused by an addition or subtraction of less than three nucleotides; the entire coding frame experiences a shift. Frameshifts do not result when insertions or deletions occur in a multiple of three, due to each amino acid being coded for in triplet. Removing or inserting an entire amino acid series will not fundamentally change any nucleotides behind it. These additions and subtractions, more specifically, are called insertions and deletions. An insertion results in a frameshift when 1 to 2 nucleotides are added to an existing series - such as UUA becoming GUU A. UUA normally codes for amino acid leucine, but after the insertion, this becomes GUU, coding for valine. All of the code downstream of this insertion will be altered.
Case Study: Sickle Cell Anemia
Sickle cell anemia is an example of genetic mutations prevalent in the lives of approximately 8 million people worldwide. Sickle cell disease is caused by a missense mutation in the HBB gene, which codes for a protein component of hemoglobin called beta globin. Functional hemoglobin protein is important, its job is to help carry oxygen to red blood cells. The mutation causes the existing GAG codons to become GTG - glutamic acid now substituted with valine. Glutamic acid has a negatively charged R group while valine has a non-polar R group, this amino acid change can have a dramatic effect on the folding of the hemoglobin protein in the aqueous environment of the cell. Sickle cells appear physically different from unmutated cells, appearing like a crescent moon or sickle, rather than a round red blood cell. The new conformation of cells results in a sticky and stiff quality, leading to large issues with unnecessary blood clot formation. Blocking blood flow is particularly damaging for individuals when it occurs in small vessels, where tissue and organ damage can result.
Sickle cell anemia is a genetic disorder, coded for by two HBB genes. A person with the disorder has the genotype SS, meaning they have two copies of the mutated HBB gene. A healthy individual has the genotype AA, with no mutated copies of the HBB gene. The genotype AS is referred to as sickle cell trait possessing. An individual with this genotype is considered a carrier, and will usually not experience complications from their genotype unless extreme conditions arise from altitude, intense exercise, or dehydration. Uniquely, this carrier has an advantage over both the homozygous genotypes. Heterozygous individuals are less likely to acquire malaria than either homozygous individual due to the sickle cell making it more difficult for the malaria parasite to grow.
Mutations
Mutations that occur during translation of RNA into proteins often affect the coding sequence of a gene. Changes to the gene of this kind can occur in a non-coding region of the gene, such as the promoter, or the coding region, where the protein sequence is established. Mutations to the non-coding region can affect how much a gene is activated, where it is activated, or when it is activated. Changes to the coding region can affect protein sequence and expression during translation.
Within the coding region, there are a few classes of mutations that classify what change is being made. These names can be used to classify the severity of the change occurring as well. A silent mutation is considered to be non harmful. Silent mutations do not alter protein formation, due to large advantageous redundancies in genetic code. For example, GAA and GAG both code for the amino acid glutamate. Missense mutations result from single codon switches - these cause an entirely new amino acid to be coded for. Glutamate (GAA) can very easily result in alanine (GCA) if the A codon is mutated to a C codon. A nonsense mutation results in the creation of a stop codon, UAA, prematurely.
Codon table from - https://lmu.pressbooks.pub/conceptsinbiology/chapter/the-genetic-code/?preview_id=664&preview_nonce=e9d7228378&preview=true
A frameshift mutation is caused by an addition or subtraction of less than three nucleotides; the entire coding frame experiences a shift. Frameshifts do not result when insertions or deletions occur in a multiple of three, due to each amino acid being coded for in triplet. Removing or inserting an entire amino acid series will not fundamentally change any nucleotides behind it. These additions and subtractions, more specifically, are called insertions and deletions. An insertion results in a frameshift when 1 to 2 nucleotides are added to an existing series - such as UUA becoming GUU A. UUA normally codes for amino acid leucine, but after the insertion, this becomes GUU, coding for valine. All of the code downstream of this insertion will be altered.
Case Study: Sickle Cell Anemia
Sickle cell anemia is an example of genetic mutations prevalent in the lives of approximately 8 million people worldwide. Sickle cell disease is caused by a missense mutation in the HBB gene, which codes for a protein component of hemoglobin called beta globin. Functional hemoglobin protein is important, its job is to help carry oxygen to red blood cells. The mutation causes the existing GAG codons to become GTG - glutamic acid now substituted with valine. Glutamic acid has a negatively charged R group while valine has a non-polar R group, this amino acid change can have a dramatic effect on the folding of the hemoglobin protein in the aqueous environment of the cell. Sickle cells appear physically different from unmutated cells, appearing like a crescent moon or sickle, rather than a round red blood cell. The new conformation of cells results in a sticky and stiff quality, leading to large issues with unnecessary blood clot formation. Blocking blood flow is particularly damaging for individuals when it occurs in small vessels, where tissue and organ damage can result.
Sickle cell anemia is a genetic disorder, coded for by two HBB genes. A person with the disorder has the genotype SS, meaning they have two copies of the mutated HBB gene. A healthy individual has the genotype AA, with no mutated copies of the HBB gene. The genotype AS is referred to as sickle cell trait possessing. An individual with this genotype is considered a carrier, and will usually not experience complications from their genotype unless extreme conditions arise from altitude, intense exercise, or dehydration. Uniquely, this carrier has an advantage over both the homozygous genotypes. Heterozygous individuals are less likely to acquire malaria than either homozygous individual due to the sickle cell making it more difficult for the malaria parasite to grow.
Mutations
Mutations that occur during translation of RNA into proteins often affect the coding sequence of a gene. Changes to the gene of this kind can occur in a non-coding region of the gene, such as the promoter, or the coding region, where the protein sequence is established. Mutations to the non-coding region can affect how much a gene is activated, where it is activated, or when it is activated. Changes to the coding region can affect protein sequence and expression during translation.
Within the coding region, there are a few classes of mutations that classify what change is being made. These names can be used to classify the severity of the change occurring as well. A silent mutation is considered to be non harmful. Silent mutations do not alter protein formation, due to large advantageous redundancies in genetic code. For example, GAA and GAG both code for the amino acid glutamate. Missense mutations result from single codon switches - these cause an entirely new amino acid to be coded for. Glutamate (GAA) can very easily result in alanine (GCA) if the A codon is mutated to a C codon. A nonsense mutation results in the creation of a stop codon, UAA, prematurely.
Codon table from - https://lmu.pressbooks.pub/conceptsinbiology/chapter/the-genetic-code/?preview_id=664&preview_nonce=e9d7228378&preview=true
A frameshift mutation is caused by an addition or subtraction of less than three nucleotides; the entire coding frame experiences a shift. Frameshifts do not result when insertions or deletions occur in a multiple of three, due to each amino acid being coded for in triplet. Removing or inserting an entire amino acid series will not fundamentally change any nucleotides behind it. These additions and subtractions, more specifically, are called insertions and deletions. An insertion results in a frameshift when 1 to 2 nucleotides are added to an existing series - such as UUA becoming GUU A. UUA normally codes for amino acid leucine, but after the insertion, this becomes GUU, coding for valine. All of the code downstream of this insertion will be altered.
Case Study: Sickle Cell Anemia
Sickle cell anemia is an example of genetic mutations prevalent in the lives of approximately 8 million people worldwide. Sickle cell disease is caused by a missense mutation in the HBB gene, which codes for a protein component of hemoglobin called beta globin. Functional hemoglobin protein is important, its job is to help carry oxygen to red blood cells. The mutation causes the existing GAG codons to become GTG - glutamic acid now substituted with valine. Glutamic acid has a negatively charged R group while valine has a non-polar R group, this amino acid change can have a dramatic effect on the folding of the hemoglobin protein in the aqueous environment of the cell. Sickle cells appear physically different from unmutated cells, appearing like a crescent moon or sickle, rather than a round red blood cell. The new conformation of cells results in a sticky and stiff quality, leading to large issues with unnecessary blood clot formation. Blocking blood flow is particularly damaging for individuals when it occurs in small vessels, where tissue and organ damage can result.
Sickle cell anemia is a genetic disorder, coded for by two HBB genes. A person with the disorder has the genotype SS, meaning they have two copies of the mutated HBB gene. A healthy individual has the genotype AA, with no mutated copies of the HBB gene. The genotype AS is referred to as sickle cell trait possessing. An individual with this genotype is considered a carrier, and will usually not experience complications from their genotype unless extreme conditions arise from altitude, intense exercise, or dehydration. Uniquely, this carrier has an advantage over both the homozygous genotypes. Heterozygous individuals are less likely to acquire malaria than either homozygous individual due to the sickle cell making it more difficult for the malaria parasite to grow.