19.3 Protein Structure
Michelle McCully
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Describe the levels of protein structure
- Identify the twenty common amino acids
- Chemically classify the twenty common amino acids
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by interrogating their structures, we can make predictions about their functions.
1. Protein structure
A protein’s shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand a protein’s shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of the β chain of human hemoglobin may be found on Uniprot, entry P68871. The N-terminal amino acid is valine (Val, V), and the C-terminal amino acid is histidine (His, H) (Figure 19.12). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Figure 19.12. Primary structure of human hemoglobin β chain. The β chain of human hemoglobin has 146 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence in the gene’s coding region may lead to change in the amino acid sequence, causing a change in the protein’s structure and sometimes, therefore its function. In people who have sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 19.13) has a single amino acid substitution, causing a change in the protein’s structure and function. Specifically, at the sixth position in the primary sequence of the β chain, the wild type amino acid, glutamate (Glu, E) is substituted by valine (Val, V). What is most remarkable to consider is that a hemoglobin molecule is comprised of two α and two β chains that each consist of about 150 amino acids. The full hemoglobin protein, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the ~600.
Figure 19.13. Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the β chain of hemoglobin, hemoglobin proteins form long fibers that distort normally disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In wild type hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 19.14). Both structures are held in shape by backbone hydrogen bonds. In α-helices, for example, hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in the primary sequence.
Figure 19.14. The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide’s unique three-dimensional structure is its tertiary structure (Figure 19.15). This structure forms primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determines which amino acids are energetically favorable to be near other amino acids. For example, side chains with like charges repel each other and those with opposite charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids’ hydrophobic side chains repel water from the protein’s environment and pack into the protein’s interior; whereas, the hydrophilic side chains tend position on the surface of the protein as the protein folds, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 19.15. A variety of chemical interactions determine the proteins’ 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some – but not all – proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come into interact with two copies of the β chain to form a tetramer of four chains (Figure 19.16). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 19.16. Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Practice Questions
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 19.17).
Figure 19.17. Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. The R group is considered the side chain, and all atoms not in the R group are part of the backbone. [Image Description]
Practice Question
Scientists use the name “amino acid” because these acids contain both an amino group and a carboxylic acid group in their basic structure. The 20 common amino acids make up most of the proteins in our bodies. For each amino acid, the side chain (or R group) is different (Figure 19.18). The chemical nature of the side chain determines the amino acid’s chemical properties, such as whether it is acidic, basic, polar, or hydrophobic. Each amino acid has both a single-letter code and a three-letter abbreviation. For example, valine is abbreviated with the single letter V or the three-letter symbol, Val.
Figure 19.18. The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter codes are also provided. Backbone atoms are indicated with a gray box. [Image Description]
The sequence and the number of amino acids ultimately determine the protein’s shape, size, and function. A covalent bond forms when the amino group from one amino acid reacts with the carboxyl group of another in a dehydration reaction, releasing a water molecule. In vivo this process happens in the ribosome. The resulting bond is the peptide bond (Figure 19.19), which has partial double-bond character due to resonance in the amide group.
Figure 19.19. Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid’s amino group. In the process, a water molecule is released. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the carboxyl or C terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are likewise written from the N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a long polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups: 1) those with polar side chains, 2) those with hydrophobic side chains, and 3) those with charged side chains. Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 19.20). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing hydrogen bond donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site). Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and also may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may donate a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires a proton extraction.
Figure 19.20. The polar amino acids. The single- and three-letter codes are provided, and backbone atoms are indicated with a gray box. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), cysteine (Cys, C), valine (Val, V), isoleucine (Ile, I), leucine (Leu, L), phenylalanine (Phe, F) and proline (Pro, P) (Figure 19.21). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the tertiary structure of the protein. In addition, cysteine residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges between their sulfur atoms, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of cysteine is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other, stabilized by π-π interactions. They are also highly conserved within protein families, with tryptophan having the highest conservation rate.
Figure 19.21. The hydrophobic amino acids. The single- and three-letter codes are provided, and backbone atoms are indicated with a gray box. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7) carry a single charge in the side chain. There are four of them; the two basic ones are lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues are aspartate or aspartic acid (Asp, D) and glutamate or glutamic acid (Glu, E), which carry a negative charge at neutral pH (Figure 19.22). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of aspartate and glutamate. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 19.22. The charged amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. [Image Description]
Glycine & proline
Glycine (Gly, G), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro, P), although considered hydrophobic, is also often found on the surface of proteins, presumably due to its presence in turn and loop regions. In contrast to glycine, which provides the polypeptide chain high flexibility, proline provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes proline is called a helix breaker since it is often found at the end of α-helices (Figure 19.23).
Figure 19.23. The special amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. [Image Description]
Practice Questions
Figure Descriptions
Figure 19.13. The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure, divided into two vertical sections labeled “Normal” and “Sickle-Cell,” each with subsections for primary, secondary, tertiary, quaternary structures, and function. In the primary structure, the normal sequence shows seven circular molecules labeled sequentially from 1 to 7 with the amino acids Val, His, Leu, Thr, Pro, Glu, Glu, while the sickle-cell sequence is the same except the sixth molecule, Glu, is replaced with Val, highlighted in red. In the secondary and tertiary structures, normal hemoglobin is represented by a blue 3D ellipsoid shape for the β subunit, whereas sickle-cell hemoglobin has a reddish-brown 3D ellipsoid shape. In the quaternary structure, normal hemoglobin combines blue and purple ellipsoid shapes, while sickle-cell hemoglobin combines reddish-brown and purple ellipsoid shapes. In terms of function, normal hemoglobin is shown as individual, unassociated globular molecules capable of carrying oxygen, whereas sickle-cell hemoglobin is shown with abnormal aggregation into fibers, impairing oxygen-carrying capacity. [Return to Figure 19.13]
Figure 19.14. The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet, divided horizontally into two sections. In the top section, the alpha-helix is shown as a right-handed helical structure in orange, twisting in a clockwise direction, with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure. Hydrogen bonds are indicated by dashed lines connecting parts of the helix, with labels including “α Helix” and “Hydrogen Bond.” In the bottom section, the beta-pleated sheet consists of several aligned strands forming a pleated sheet structure in orange, with strands composed of colored spheres connected by lines. Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent ones, with labels including “β Pleated Sheet,” “β Strand,” and “Hydrogen Bond.” [Return to Figure 19.14]
Figure 19.15. The image depicts a simplified diagram of a polypeptide backbone, represented by a red, ribbon-like structure that loops and twists to show the complex folding of a protein, with various interactions and bonds labeled. The polypeptide backbone winds through the image, with an “Ionic Bond” shown between an NH₃⁺ group and an O⁻ group, a “Hydrogen Bond” highlighted in light blue between O-H groups, a “Disulfide linkage” marked by two sulfur atoms connected by a line (S-S), and “Hydrophobic interactions” illustrated by CH₃ groups interacting with each other. [Return to Figure 19.15]
Figure 19.16. The image illustrates the hierarchical structure of proteins from the primary to the quaternary level using hemoglobin as an example, set against a gradient blue background that transitions from dark at the top to light at the bottom. From left to right, the primary structure shows a sequence of four amino acids (labeled 1–4), each with an amino group (NH₂), carboxyl group (COOH), hydrogen atom (H), and side chain (R1–R4), connected by peptide bonds. The secondary structure depicts an α helix, represented by an orange spiraling ribbon stabilized by dotted hydrogen bonds. The tertiary structure presents a β-globin polypeptide chain folded into a specific three-dimensional purple form with loops and twists. The quaternary structure shows the assembly of multiple polypeptide chains, where β-globin (purple) combines with α-globin (yellow, green, and blue) to form a complete hemoglobin molecule. [Return to Figure 19.16]
Figure 19.17. The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled “Amino group,” “Side chain,” and “Carboxyl group.” The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an “R” group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled “α carbon.” [Return to Figure 19.17]
Figure 19.18. The image is an educational chart titled “20 Common Amino Acids,” divided into four main sections distinguished by background colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow). The Polar Uncharged section contains six amino acids—Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y)—each shown with its chemical structure and a red circle indicating its one-letter code. The Hydrophobic section contains nine amino acids—Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W)—each depicted with its chemical structure and one-letter code in a red circle. The Charged section is divided into Positive (Arginine (R), Lysine (K)) and Negative (Aspartic Acid (D), Glutamic Acid (E)) groups, each amino acid shown with its chemical structure and red-coded letter. The Special Cases section contains Glycine (G) and Proline (P), both displayed with their chemical structures and one-letter codes in red circles. [Return to Figure 19.18]
Figure 19.19. The top left structure shows an amino acid with an amino group (H₂N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH) with the hydroxyl group (OH) highlighted in red. The top right structure depicts another amino acid with the same core components but a different variable side chain (R). These two top structures are separated by space and connected by an arrow pointing to a single structure at the bottom. The bottom structure represents the resulting dipeptide, with a peptide bond (highlighted in a blue rectangle) linking the carbon (C) of one amino acid to the nitrogen (N) of the other. The term “Peptide Bond” appears below the blue rectangle. [Return to Figure 19.19]
Figure 19.20. The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure 19.20]
Figure 19.21. The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure 19.21]
Figure 19.22. The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters “R” and “K”. Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters “D” and “E”. [Return to Figure 19.22]
Figure 19.23. The image has a yellow background and is titled “Special Cases” at the top in black font, featuring two sections for the amino acids Glycine (Gly) and Proline (Pro). On the left, under the heading “Glycine (Gly)” in black text, a red circle contains a white uppercase letter “G,” with the structural formula of Glycine displayed below inside a beige rectangle, showing a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms. On the right, under the heading “Proline (Pro)” in black text, a red circle contains a white uppercase letter “P,” with the structural formula of Proline shown below in the same beige rectangle, depicting a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms. [Return to Figure 19.23]
Licenses and Attributions
“Protein Structure” by Michelle McCully is adapted from “3.4 Proteins” by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e used under CC BY 4.0 and “The 20 Amino Acids and Their Role in Protein Structures” by Salam Al-Karadaghi used under CC BY-NC-SA 4.0. “Protein Structure” is licensed under CC BY-NC-SA 4.0.
Media Attributions
- Sickle Cell Anemia-XY © Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) is licensed under a CC BY (Attribution) license
- 1C-J-3.1 Figure – secondary structure © Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. is licensed under a CC BY (Attribution) license
- 1C-J-3.1 Figure – tertiary structure is licensed under a CC BY (Attribution) license
- 3.29 and 3.30 Protein Structure-ZZ © Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University is licensed under a CC BY (Attribution) license
- 1C-J-3.2 Figure – amino acid is licensed under a CC BY (Attribution) license
- 1C-J-3.2 Figure – 20 amino acids © Dan Cojocari adapted by Michelle McCully is licensed under a CC BY-SA (Attribution ShareAlike) license
- 1C-J-3.2-Figure-peptide-bond is licensed under a CC BY (Attribution) license
- 1C-J-3.2 Figure – polar amino acids © Dan Cojocari adapted by Michelle McCully is licensed under a CC BY-SA (Attribution ShareAlike) license
- 1C-J-3.2 Figure – hydrophobic amino acids © Dan Cojocari adapted by Michelle McCully is licensed under a CC BY-SA (Attribution ShareAlike) license
- 1C-J-3.2 Figure – charged amino acids © Dan Cojocari adapted by Michelle McCully is licensed under a CC BY-SA (Attribution ShareAlike) license
- 1C-J-3.2 – Figure – special amino acids © Dan Cojocari adapted by Michelle McCully is licensed under a CC BY-SA (Attribution ShareAlike) license