3.4 Protein Structure and Function
Melissa Hardy and Christelle Sabatier
Learning Objectives
By the end of this section, you will be able to do the following:
- Describe the functions proteins perform in the cell and in tissues
- Discuss the relationship between amino acids and proteins
Proteins are one of the most abundant organic molecules in living systems. About half of the dry weight of a cell is protein. Proteins have the most diverse range of functions of all the macromolecules. They are the main molecules that carry out the functions of the cell.
The major reason that proteins are so varied in their function is due to their ability to bind other molecules. Various proteins can bind other proteins, DNA, RNA, lipids, carbohydrates, ions, or small molecules. Proteins are specific in their binding abilities; meaning certain proteins can only bind to certain molecules. The generic name for a molecule that a protein binds is ligand, and the place on the protein where it binds is the binding site.
The ability to specifically bind other molecules is a characteristic of a protein’s shape.

Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, communication, or defense; or they may be toxins or enzymes. Each cell in a living system contains thousands of different proteins, each with a unique function. Their structures, like their functions, vary greatly, but they are all amino acid polymers arranged in a linear sequence.
Proteins are synthesized by ribosomes, which attach amino acids together to form a polypeptide, which folds into its three-dimensional shape.
Amino Acids
Amino acids are the monomers that comprise proteins. There are 20 common amino acids present in proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha (α) carbon, bonded to an amino group (NH2), a carboxyl group (COOH), and to a hydrogen atom. Every amino acid also has another atom or group of atoms bonded to the central atom, which is known as the R group.
We use the name “amino acid” because they contain both an amino group and a carboxyl group (which is acidic) in their structure. For each of the 20 amino acids, the R group (or side chain) is different.
The chemical nature of the side chain determines the amino acid’s nature (that is, whether it is acidic, basic, polar, or nonpolar). For example, the amino acid glycine has a hydrogen atom as the R group. A single upper case letter or a three-letter abbreviation is used to represent each amino acid. For example, the letter V or the three-letter symbol Val represents valine.
Figure 2. There are 20 common amino acids found in proteins, each with a different R group. (Amino Acid Table by OpenStax is used under a Creative Commons Attribution license.)
The sequence and the number of amino acids ultimately determine the protein’s shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid’s carboxyl group and the incoming amino acid’s amino group combine, releasing a water molecule. The resulting bond is the peptide bond.
Figure 3. Peptide bonds are formed by dehydration synthesis. The carboxyl group of one amino acid is linked to the amino group of a second amino acid. A water molecule is released when a peptide bond is formed. (Peptide Bond by OpenStax is used under a Creative Commons Attribution license.)
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end the N terminal, or the amino terminal, and the other end has a free carboxyl group, also the C or carboxyl terminal. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide or polypeptides that have combined together, often have bound non-peptide prosthetic groups, have a distinct shape, and have a unique function.
Nine of these are essential amino acids in humans because we need them to build proteins, but the human body cannot produce them. We therefore must obtain them from our diet. Which amino acids are essential varies from organism to organism.
Media Attributions
- Protein shape © Opabinia regalis is licensed under a CC BY-SA (Attribution ShareAlike) license
- amino acid © OpenStax is licensed under a CC BY (Attribution) license
- peptide bond © OpenStax is licensed under a CC BY (Attribution) license
biological macromolecules made up of amino acids; essential for cellular function.
Learning Objectives
By the end of this section, you will be able to do the following:
- Explain the processes of digestion and absorption
- Compare and contrast different types of digestive systems
- Explain the specialized functions of the organs involved in processing food in the body
- Describe the ways in which organs work together to digest food and absorb nutrients
Animals obtain their nutrition from the consumption of other organisms. Depending on their diet, animals can be classified into the following categories: plant eaters (herbivores), meat eaters (carnivores), and those that eat both plants and animals (omnivores). Once consumed, nutrients and macromolecules present in food are not immediately accessible to the cells. There are a number of processes that modify food within the animal body in order to make the nutrients and organic molecules usable for cellular function. As animals evolve in form and function, their digestive systems have also evolved to accommodate their various dietary needs. Digestion involves more than just material - it is a complex system of breakdown and nutrient transfer that will fuel or fail the body.
Herbivores, Omnivores, and Carnivores
Herbivores are animals whose food sources are plant-based by dietary need and biology. Examples of herbivores, as shown in Figure X include vertebrates like deer, koalas, and some bird species, as well as invertebrates such as crickets and caterpillars. Herbivorous animals have evolved digestive systems capable of handling large amounts of plant material. Herbivores can be further classified into frugivores (fruit-eaters), granivores (seed eaters), nectivores (nectar feeders), and folivores (leaf eaters).
Figure X Herbivores, like this (a) mule deer and (b) monarch caterpillar, eat primarily plant material. (credit a: modification of work by Bill Ebbesen; credit b: modification of work by Doug Bowman)
Carnivores are animals that eat other animals. The word carnivore is derived from Latin and literally means “meat eater.” Wild cats such as lions, shown in Figure Xa and tigers are examples of vertebrate carnivores, as are snakes and sharks. Invertebrate carnivores include sea stars, spiders, and ladybugs, shown in Figure Xb. Obligate carnivores are those that rely entirely on animal flesh to obtain their nutrients; examples of obligate carnivores are members of the cat family, such as lions and cheetahs. Facultative carnivores are those that also eat non-animal food in addition to animal food. Note that there is no clear line that differentiates facultative carnivores from omnivores; dogs would be considered facultative carnivores.
Figure X Carnivores like the (a) lion eat primarily meat. The (b) ladybug is also a carnivore that consumes small insects called aphids. (credit a: modification of work by Kevin Pluck; credit b: modification of work by Jon Sullivan)
Omnivores are animals that eat both plant- and animal-derived food. In Latin, omnivore means to eat everything. Humans, bears (shown in Figure Xa), and chickens are examples of vertebrate omnivores; invertebrate omnivores include cockroaches and crayfish (shown in FigureXb).
Figure X Omnivores like the (a) bear and (b) crayfish eat both plant and animal based food. (credit a: modification of work by Dave Menke; credit b: modification of work by Jon Sullivan)
Parts of the Digestive System
The vertebrate digestive system is designed to facilitate the transformation of food matter into the nutrient components that sustain organisms.
Oral Cavity
The oral cavity, or mouth, is the point of entry of food into the digestive system, illustrated in Figure X The food consumed is broken into smaller particles by mastication, the chewing action of the teeth. All mammals have teeth and can chew their food.
The extensive chemical process of digestion begins in the mouth. Some animals produce saliva to assist in breaking down food by moistening it and buffering the food’s pH. There are three major glands that secrete saliva—the parotid, the submandibular, and the sublingual. Saliva contains immunoglobulins and lysozymes, which have antibacterial action to reduce tooth decay by inhibiting growth of some bacteria. Saliva also contains an enzyme called salivary amylase that begins the process of converting starches in the food into a disaccharide called maltose. .
Another enzyme aiding in initial digestion called lipase is produced by the cells in the tongue. Lipases are a class of enzymes that can break down triglycerides. The lingual lipase begins the breakdown of fat components in the food. Chewing and wetting action provided by the teeth and saliva prepare the food into a mass called the bolus for swallowing. The tongue helps in swallowing—moving the bolus from the mouth into the pharynx. The pharynx opens to two passageways: the trachea, which leads to the lungs, and the esophagus, which leads to the stomach. The trachea has an opening called the glottis, which is covered by a cartilaginous flap called the epiglottis. When swallowing, the epiglottis closes the glottis and food passes into the esophagus and not the trachea. This arrangement allows food to be kept out of the trachea.
Figure X Digestion of food begins in the (a) oral cavity. Food is masticated by teeth and moistened by saliva secreted from the (b) salivary glands. Enzymes in the saliva begin to digest starches and fats. With the help of the tongue, the resulting bolus is moved into the esophagus by swallowing. (credit: modification of work by the National Cancer Institute)
Esophagus
The esophagus is a tubular organ that connects the mouth to the stomach. The chewed and softened food passes through the esophagus after being swallowed. The smooth muscles of the esophagus undergo a series of wavelike movements called peristalsis that push the food toward the stomach, as illustrated in Figure X. The peristalsis wave is unidirectional—it moves food from the mouth to the stomach, and reverse movement is not possible. The peristaltic movement of the esophagus is an involuntary reflex; it takes place in response to the act of swallowing.
Figure X The esophagus transfers food from the mouth to the stomach through peristaltic movements.
Several ring-like muscles called a sphincter form valves in the digestive system. The gastro-esophageal sphincter is located at the stomach end of the esophagus. In response to swallowing and the pressure exerted by the bolus of food, this sphincter opens, and the bolus enters the stomach. When there is no swallowing action, this sphincter is shut and prevents the contents of the stomach from traveling up the esophagus. Many animals have a true sphincter; however, in humans, there is no true sphincter, but the esophagus remains closed when there is no swallowing action. Acid reflux or “heartburn” occurs when the acidic digestive juices escape into the esophagus.
Stomach
A large part of digestion occurs in the stomach, shown in FigureX. The stomach is a saclike organ that secretes gastric digestive juices. The pH in the stomach is between 1.5 and 2.5. This highly acidic environment is required for the chemical breakdown of food and the extraction of nutrients. When empty, the stomach is a rather small organ; however, it can expand to up to 20 times its resting size when filled with food. This characteristic is particularly useful for animals that need to eat when food is available.
Visual Connection
Figure X The human stomach has an extremely acidic environment where most of the protein gets digested. (credit: modification of work by Mariana Ruiz Villareal)
Which of the following statements about the digestive system is false?
- Chyme is a mixture of food and digestive juices that is produced in the stomach.
- Food enters the large intestine before the small intestine.
- In the small intestine, chyme mixes with bile, which emulsifies fats.
- The stomach is separated from the small intestine by the pyloric sphincter.
The stomach is the major site for protein digestion in animals other than ruminants. Protein digestion is mediated by an enzyme called pepsin in the stomach chamber. Pepsin is secreted by the chief cells in the stomach in an inactive form called pepsinogen. Pepsin breaks peptide bonds and cleaves proteins into smaller polypeptides; it also helps activate more pepsinogen, starting a positive feedback mechanism that generates more pepsin. Another cell type—parietal cells—secrete hydrogen and chloride ions, which combine in the lumen to form hydrochloric acid, the primary acidic component of the stomach juices. Hydrochloric acid helps to convert the inactive pepsinogen to pepsin. The highly acidic environment also kills many microorganisms in the food and, combined with the action of the enzyme pepsin, results in the hydrolysis of protein in the food. Chemical digestion is facilitated by the churning action of the stomach. Churning movement is the contraction and relaxation of smooth muscles mixes the stomach contents about every 20 minutes. The partially digested food and gastric juice mixture is called chyme. Chyme passes from the stomach to the small intestine. Further protein digestion takes place in the small intestine. Gastric emptying occurs within two to six hours after a meal. Only a small amount of chyme is released into the small intestine at a time. The movement of chyme from the stomach into the small intestine is regulated by the pyloric sphincter.
When digesting protein and some fats, the stomach lining must be protected from getting digested by pepsin. The stomach lining protects itself by 1) synthesizing the pepsin enzyme in the inactive form to protect chief cells, and 2) maintaining a thick mucus lining that protects the underlying tissue from digestive juices.
The rupture of mucus lining can lead to stomach ulcer formation. Ulcers are open wounds in or on an organ caused by bacteria (Helicobacter pylori) when the mucus lining is ruptured and fails to reform.
Small Intestine
Chyme moves from the stomach to the small intestine. The small intestine is the organ where the digestion of protein, fats, and carbohydrates is completed. The small intestine is a long tube-like organ with a highly folded surface containing finger-like projections called the villi. The apical surface of each villus has many microscopic projections called microvilli. These structures, illustrated in Figure X, are lined with epithelial cells on the luminal side and allow for the nutrients to be absorbed from the digested food and absorbed into the bloodstream on the other side. The villi and microvilli, with their many folds, increase the surface area of the intestine and increase absorption efficiency of the nutrients. Absorbed nutrients in the blood are carried into the hepatic portal vein, which leads to the liver. There, the liver regulates the distribution of nutrients to the rest of the body and removes toxic substances, including drugs, alcohol, and some pathogens.
Visual Connection
Figure X Villi are folds on the small intestine lining that increase the surface area to facilitate the absorption of nutrients.
Which of the following statements about the small intestine is false?
- Absorptive cells that line the small intestine have microvilli, small projections that increase surface area and aid in the absorption of food.
- The inside of the small intestine has many folds, called villi.
- Microvilli are lined with blood vessels as well as lymphatic vessels.
- The inside of the small intestine is called the lumen.
The human small intestine is over 6m long and is divided into three parts: the duodenum, the jejunum, and the ileum. The “C-shaped,” fixed part of the small intestine is called the duodenum and is shown in X. The duodenum is separated from the stomach by the pyloric sphincter which opens to allow chyme to move from the stomach to the duodenum. In the duodenum, chyme is mixed with pancreatic juices in an alkaline solution rich in bicarbonate that neutralizes the acidity of chyme and acts as a buffer. Pancreatic juices also contain several digestive enzymes. Digestive juices from the pancreas, liver, and gallbladder, as well as from gland cells of the intestinal wall itself, enter the duodenum. Bile, which contains bile salts which emulsify lipids, is produced in the liver and stored and concentrated in the gallbladder.
The pancreas produces enzymes that catabolize starches, disaccharides, proteins, and fats. The digestive juices of bile and pancreatic enzymes break down the food particles in the chyme into glucose, triglycerides, and amino acids. The bulk of chemical digestion of food takes place in the duodenum. Absorption of fatty acids also takes place in the duodenum.
The second part of the small intestine is called the jejunum, shown in X. Here, hydrolysis of nutrients is continued while most of the carbohydrates and amino acids are absorbed through the intestinal lining. Some chemical digestion and the bulk of nutrient absorption occurs in the jejunum.
The ileum, also illustrated in Figure X is the last part of the small intestine and here the bile salts and vitamins are absorbed into the bloodstream. The undigested food is sent to the colon from the ileum via peristaltic movements of the muscle. The ileum ends and the large intestine begins at the ileocecal valve. The vermiform, “worm-like,” appendix is located at the ileocecal valve. The appendix of humans secretes no enzymes and has an insignificant role in immunity.
Large Intestine
The large intestine, illustrated in Figure X, reabsorbs the water from the undigested food material and processes the waste material. The human large intestine is much smaller in length compared to the small intestine but larger in diameter. It has three parts: the cecum, the colon, and the rectum. The cecum joins the ileum to the colon and is the receiving pouch for the waste matter. The colon is home to many bacteria or “intestinal flora” sometimes refered to as “gut flora” that aid in the digestive processes. The colon can be divided into four regions, the ascending colon, the transverse colon, the descending colon, and the sigmoid colon. The main functions of the colon are to extract the water and mineral salts from undigested food, and to store waste material. Carnivorous mammals have a shorter large intestine compared to herbivorous mammals due to their diet.
Figure X The large intestine reabsorbs water from undigested food and stores waste material until it is eliminated.
Rectum and Anus
The rectum is the terminal end of the large intestine, as shown in FigureX. The primary role of the rectum is to store the feces until defecation. The feces are propelled using peristaltic movements during elimination. The anus is an opening at the far-end of the digestive tract and is the exit point for the waste material. Two sphincters between the rectum and anus control elimination: the inner sphincter is involuntary and the outer sphincter is voluntary.
Accessory Organs
The organs discussed above are the organs of the digestive tract through which food passes. Accessory organs are organs that add secretions (enzymes) that catabolize food into nutrients. Accessory organs include salivary glands, the liver, the pancreas, and the gallbladder. The liver, pancreas, and gallbladder are regulated by hormones in response to the food consumed.
The liver is the largest internal organ in humans and it plays a very important role in digestion of fats and detoxifying blood. The liver produces bile, a digestive juice that is required for the breakdown of fatty components of the food in the duodenum. The liver also processes the vitamins and fats and synthesizes many plasma proteins.
The pancreas is another important gland that secretes digestive juices. The chyme produced from the stomach is highly acidic in nature; the pancreatic juices contain high levels of bicarbonate, an alkali that neutralizes the acidic chyme. Additionally, the pancreatic juices contain a large variety of enzymes that are required for the digestion of protein and carbohydrates.
The gallbladder is a small organ that aids the liver by storing bile and concentrating bile salts. When chyme containing fatty acids enters the duodenum, the bile is secreted from the gallbladder into the duodenum.
Food as Culture, Culture as Food - Consider This
Insects are high in protein, fiber, and vitamins. Recently, American food producers have begun to fund more research into this more environmentally friendly nutritional source for snack foods. Tyson foods, an Arkansas based company, as well as several startups across the country are developing insect based high protein products. The United States is a bit late to the trade, but pursuing high level investments in the opportunity to lower environmental waste.
Abundance and high nutritional value have made invertebrates like crickets and silkworms relatively common in the diet of some parts of Africa and East Asia. For example, Thailand and Cambodia are known for their streetfood - a staple of which is the incorporation of crickets. Moreover bombyx mori - silkworm larvae - are a delicacy in Korea, Japan, and China.
The commonality of silkworms and crickets in certain eastern foods aligns with their cultural significance. The constant chirping of crickets brings good luck in Chinese culture, and silkworms are symbols of diligence and a dynamic spirit.
Learning Objectives
By the end of this section, you will be able to do the following:
- Explain the significance of photosynthesis to other living organisms
- Describe the main structures involved in photosynthesis
- Identify the substrates and products of photosynthesis
Photosynthesis is essential to all life on earth; both plants and animals depend on it. It is the only biological process that can capture energy that originates from sunlight and converts it into chemical compounds (carbohydrates) that every organism uses to power its metabolism. It is also a source of oxygen necessary for many living organisms. In brief, the energy of sunlight is “captured” to energize electrons, whose energy is then stored in the covalent bonds of sugar molecules. How long lasting and stable are those covalent bonds? The energy extracted today by the burning of coal and petroleum products represents sunlight energy captured and stored by photosynthesis 350 to 200 million years ago during the Carboniferous Period.
Plants, algae, and a group of bacteria called cyanobacteria are the only organisms capable of performing photosynthesis (Figure 8.2). Because they use light to manufacture their own food, they are called photoautotrophs (literally, “self-feeders using light”). Other organisms, such as animals, fungi, and most other bacteria, are termed heterotrophs (“other feeders”), because they must rely on the sugars produced by photosynthetic organisms for their energy needs. A third very interesting group of bacteria synthesize sugars, not by using sunlight’s energy, but by extracting energy from inorganic chemical compounds. For this reason, they are referred to as chemoautotrophs.

The importance of photosynthesis is not just that it can capture sunlight’s energy. After all, a lizard sunning itself on a cold day can use the sun’s energy to warm up in a process called behavioral thermoregulation. In contrast, photosynthesis is vital because it evolved as a way to store the energy from solar radiation (the “photo-” part) to energy in the carbon-carbon bonds of carbohydrate molecules (the “-synthesis” part). Those carbohydrates are the energy source that heterotrophs use to power the synthesis of ATP via respiration. Therefore, photosynthesis powers 99 percent of Earth’s ecosystems. When a top predator, such as a wolf, preys on a deer (Figure 8.3), the wolf is at the end of an energy path that went from nuclear reactions on the surface of the sun, to visible light, to photosynthesis, to vegetation, to deer, and finally to the wolf.

Main Structures and Summary of Photosynthesis
Photosynthesis is a multi-step process that requires specific wavelengths of visible sunlight, carbon dioxide (which is low in energy), and water as substrates (Figure 8.4). After the process is complete, it releases oxygen and produces glyceraldehyde-3-phosphate (G3P), as well as simple carbohydrate molecules (high in energy) that can then be converted into glucose, sucrose, or any of dozens of other sugar molecules. These sugar molecules contain energy and the energized carbon that all living things need to survive.

The following is the chemical equation for photosynthesis (Figure 8.5):

Basic Photosynthetic Structures
In plants, photosynthesis generally takes place in leaves, which consist of several layers of cells. The process of photosynthesis occurs in a middle layer called the mesophyll. The gas exchange of carbon dioxide and oxygen occurs through small, regulated openings called stomata (singular: stoma), which also play roles in the regulation of gas exchange and water balance. The stomata are typically located on the underside of the leaf, which helps to minimize water loss due to high temperatures on the upper surface of the leaf. Each stoma is flanked by guard cells that regulate the opening and closing of the stomata by swelling or shrinking in response to osmotic changes.
In all autotrophic eukaryotes, photosynthesis takes place inside an organelle called a chloroplast. For plants, chloroplast-containing cells exist mostly in the mesophyll. Chloroplasts have a double membrane envelope (composed of an outer membrane and an inner membrane), and are ancestrally derived from ancient free-living cyanobacteria. Within the chloroplast are stacked, disc-shaped structures called thylakoids. Embedded in the thylakoid membrane is chlorophyll, a pigment (molecule that absorbs light) responsible for the initial interaction between light and plant material, and numerous proteins that make up the electron transport chain. The thylakoid membrane encloses an internal space called the thylakoid lumen. As shown in Figure 8.6, a stack of thylakoids is called a granum, and the liquid-filled space surrounding the granum is called stroma or “bed” (not to be confused with stoma or “mouth,” an opening on the leaf epidermis).

The Two Parts of Photosynthesis
Photosynthesis takes place in two sequential stages: the light-dependent reactions and the light-independent reactions. In the light-dependent reactions, energy from sunlight is absorbed by chlorophyll and that energy is converted into stored chemical energy. In the light-independent reactions, the chemical energy harvested during the light-dependent reactions drives the assembly of sugar molecules from carbon dioxide. Therefore, although the light-independent reactions do not use light as a reactant, they require the products of the light-dependent reactions to function. In addition, however, several enzymes of the light-independent reactions are activated by light. The light-dependent reactions utilize certain molecules to temporarily store the energy: These are referred to as energy carriers. The energy carriers that move energy from light-dependent reactions to light-independent reactions can be thought of as “full” because they are rich in energy. After the energy is released, the “empty” energy carriers return to the light-dependent reaction to obtain more energy. Figure 8.7 illustrates the components inside the chloroplast where the light-dependent and light-independent reactions take place.

Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by interrogating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand a protein's shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of the β chain of human hemoglobin may be found on Uniprot, entry P68871. The N-terminal amino acid is valine (Val, V), and the C-terminal amino acid is histidine (His, H) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Figure 1: Primary structure of human hemoglobin β chain. The β chain of human hemoglobin has 146 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence in the gene’s coding region may lead to change in the amino acid sequence, causing a change in the protein's structure and sometimes, therefore its function. In people who have sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, at the sixth position in the primary sequence of the β chain, the wild type amino acid, glutamate (Glu, E) is substituted by valine (Val, V). What is most remarkable to consider is that a hemoglobin molecule is comprised of two α and two β chains that each consist of about 150 amino acids. The full hemoglobin protein, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the ~600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the β chain of hemoglobin, hemoglobin proteins form long fibers that distort normally disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In wild type hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. In α-helices, for example, hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in the primary sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure forms primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determines which amino acids are energetically favorable to be near other amino acids. For example, side chains with like charges repel each other and those with opposite charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water from the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein as the protein folds, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some – but not all – proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come into interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Practice Questions
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. The R group is considered the side chain, and all atoms not in the R group are part of the backbone. [Image Description]
Practice Question
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxylic acid group in their basic structure. The 20 common amino acids make up most of the proteins in our bodies. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's chemical properties, such as whether it is acidic, basic, polar, or hydrophobic. Each amino acid has both a single-letter code and a three-letter abbreviation. For example, valine is abbreviated with the single letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter codes are also provided. Backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond forms when the amino group from one amino acid reacts with the carboxyl group of another in a dehydration reaction, releasing a water molecule. In vivo this process happens in the ribosome. The resulting bond is the peptide bond (Figure 8), which has partial double-bond character due to resonance in the amide group.
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, a water molecule is released. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the carboxyl or C terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are likewise written from the N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a long polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups: 1) those with polar side chains, 2) those with hydrophobic side chains, and 3) those with charged side chains. Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing hydrogen bond donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site). Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and also may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may donate a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires a proton extraction.
Figure 9: The polar amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), cysteine (Cys, C), valine (Val, V), isoleucine (Ile, I), leucine (Leu, L), phenylalanine (Phe, F) and proline (Pro, P) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the tertiary structure of the protein. In addition, cysteine residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges between their sulfur atoms, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of cysteine is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other, stabilized by π-π interactions. They are also highly conserved within protein families, with tryptophan having the highest conservation rate.
Figure 10: The hydrophobic amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7) carry a single charge in the side chain. There are four of them; the two basic ones are lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues are aspartate or aspartic acid (Asp, D) and glutamate or glutamic acid (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of aspartate and glutamate. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly, G), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro, P), although considered hydrophobic, is also often found on the surface of proteins, presumably due to its presence in turn and loop regions. In contrast to glycine, which provides the polypeptide chain high flexibility, proline provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes proline is called a helix breaker since it is often found at the end of α-helices (Figure 12).
Figure 12: The special amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Practice Questions
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 5: The image illustrates the hierarchical structure of proteins from the primary structure to the quaternary structure, using hemoglobin as an example. The background is a gradient blue, transitioning from a darker blue at the top to a lighter blue at the bottom.
From left to right:
- Primary Structure: Depicts a sequence of amino acids connected via peptide bonds. Four amino acids are shown (labeled 1, 2, 3, and 4). Each amino acid consists of an amino group (NH2), carboxyl group (COOH), hydrogen atom (H), and side chain (R1, R2, R3, R4).
- Secondary Structure (α Helix): Shows the formation of an alpha helix from the amino acid chain. The helix is represented by an orange spiraling ribbon with dotted lines indicating hydrogen bonds stabilizing the structure.
- Tertiary Structure: Illustrates a β-globin polypeptide chain folded into a specific three-dimensional shape. It appears as a purple, looped, and twisted structure.
- Quaternary Structure: Demonstrates the assembly of multiple polypeptide chains. The β-globin (purple) and α-globin (yellow, green, and blue) polypeptides combine to form a hemoglobin molecule.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC BY-NC-SA 4.0. "Protein Structure & Function" is licensed under CC BY-NC-SA 4.0.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by investigating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand how the protein gets its final shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of hemoglobin may be found on Uniprot, entry P69905. The N-terminal amino acid is methionine (Met, M), and the C-terminal amino acid is arginine (Arg, R) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Figure 1: Primary structure of human hemoglobin α chain. The α chain of human hemoglobin has 142 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence of the gene’s coding region may lead to adding a different amino acid to the growing polypeptide chain, causing a change in protein structure and sometimes, therefore function. In sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, valine in the β chain is substituted with the amino acid, glutamate. What is most remarkable to consider is that a hemoglobin molecule is comprised of two alpha and two beta chains that each consist of about 150 amino acids. The molecule, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the 600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the beta chain of hemoglobin, hemoglobin molecules form long fibers that distort the biconcave, or disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In normal hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. Hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure is primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determine which amino acids are energetically favorable to be next to other amino acids. For example, side chains with like charges repel each other and those with unlike charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water in the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come to interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. [Image Description]
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxyl-acid-group in their basic structure. As we mentioned, there are 20 common amino acids present in proteins. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's nature (that is, whether it is acidic, basic, polar, or nonpolar). Each amino acid has both a single-letter and a three-letter abbreviation. For example, valine is abbreviated with the letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter abbreviations are also provided. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid's carboxyl group and the incoming amino acid's amino group combine, releasing a water molecule. The resulting bond is the peptide bond (Figure 8).
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, it releases a water molecule. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the C or carboxyl terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are written from N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups:
- Those with polar side chains.
- Those with hydrophobic side chains.
- Those with charged side chains.
Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site. Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may give away a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires proton extraction.
Figure 9: The polar amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F) and cysteine (Cys, C) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the structure. In addition, Cys residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of Cys is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. We should also note here that the side chains of histidine and tyrosine, together with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, using electron clouds within their ring structures. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.
Figure 10: The hydrophobic amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7.4) carry a single charge in the side chain. There are four of them; the two basic ones include lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues include aspartate (Asp, D) and glutamate (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro), although considered hydrophobic, is also found at the surface, presumably due to its presence in turn and loop regions. In contrast to Gly, which provides the polypeptide chain high flexibility, Pro provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes Pro is called a helix breaker since it is often found at the end of α-helices.
Figure 12: The special amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC-BY-SA 4.0. "Protein Structure & Function" is licensed under ???.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by investigating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand how the protein gets its final shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of hemoglobin may be found on Uniprot, entry P69905. The N-terminal amino acid is methionine (Met, M), and the C-terminal amino acid is arginine (Arg, R) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
Figure 1: Primary structure of human hemoglobin α chain. The α chain of human hemoglobin has 142 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence of the gene’s coding region may lead to adding a different amino acid to the growing polypeptide chain, causing a change in protein structure and sometimes, therefore function. In sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, valine in the β chain is substituted with the amino acid, glutamate. What is most remarkable to consider is that a hemoglobin molecule is comprised of two alpha and two beta chains that each consist of about 150 amino acids. The molecule, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the 600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the beta chain of hemoglobin, hemoglobin molecules form long fibers that distort the biconcave, or disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In normal hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. Hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure is primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determine which amino acids are energetically favorable to be next to other amino acids. For example, side chains with like charges repel each other and those with unlike charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water in the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come to interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. [Image Description]
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxyl-acid-group in their basic structure. As we mentioned, there are 20 common amino acids present in proteins. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's nature (that is, whether it is acidic, basic, polar, or nonpolar). Each amino acid has both a single-letter and a three-letter abbreviation. For example, valine is abbreviated with the letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter abbreviations are also provided. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid's carboxyl group and the incoming amino acid's amino group combine, releasing a water molecule. The resulting bond is the peptide bond (Figure 8).
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, it releases a water molecule. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the C or carboxyl terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are written from N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups:
- Those with polar side chains.
- Those with hydrophobic side chains.
- Those with charged side chains.
Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site. Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may give away a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires proton extraction.
Figure 9: The polar amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F) and cysteine (Cys, C) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the structure. In addition, Cys residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of Cys is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. We should also note here that the side chains of histidine and tyrosine, together with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, using electron clouds within their ring structures. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.
Figure 10: The hydrophobic amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7.4) carry a single charge in the side chain. There are four of them; the two basic ones include lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues include aspartate (Asp, D) and glutamate (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro), although considered hydrophobic, is also found at the surface, presumably due to its presence in turn and loop regions. In contrast to Gly, which provides the polypeptide chain high flexibility, Pro provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes Pro is called a helix breaker since it is often found at the end of α-helices.
Figure 12: The special amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC-BY-SA 4.0. "Protein Structure & Function" is licensed under ???.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by interrogating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand a protein's shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of the β chain of human hemoglobin may be found on Uniprot, entry P68871. The N-terminal amino acid is valine (Val, V), and the C-terminal amino acid is histidine (His, H) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Figure 1: Primary structure of human hemoglobin β chain. The β chain of human hemoglobin has 146 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence in the gene’s coding region may lead to change in the amino acid sequence, causing a change in the protein's structure and sometimes, therefore its function. In people who have sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, at the sixth position in the primary sequence of the β chain, the wild type amino acid, glutamate (Glu, E) is substituted by valine (Val, V). What is most remarkable to consider is that a hemoglobin molecule is comprised of two α and two β chains that each consist of about 150 amino acids. The full hemoglobin protein, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the ~600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the β chain of hemoglobin, hemoglobin proteins form long fibers that distort normally disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In wild type hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. In α-helices, for example, hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in the primary sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure forms primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determines which amino acids are energetically favorable to be near other amino acids. For example, side chains with like charges repel each other and those with opposite charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water from the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein as the protein folds, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some – but not all – proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come into interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Practice Questions
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. The R group is considered the side chain, and all atoms not in the R group are part of the backbone. [Image Description]
Practice Question
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxylic acid group in their basic structure. The 20 common amino acids make up most of the proteins in our bodies. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's chemical properties, such as whether it is acidic, basic, polar, or hydrophobic. Each amino acid has both a single-letter code and a three-letter abbreviation. For example, valine is abbreviated with the single letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter codes are also provided. Backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond forms when the amino group from one amino acid reacts with the carboxyl group of another in a dehydration reaction, releasing a water molecule. In vivo this process happens in the ribosome. The resulting bond is the peptide bond (Figure 8), which has partial double-bond character due to resonance in the amide group.
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, a water molecule is released. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the carboxyl or C terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are likewise written from the N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a long polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups: 1) those with polar side chains, 2) those with hydrophobic side chains, and 3) those with charged side chains. Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing hydrogen bond donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site). Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and also may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may donate a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires a proton extraction.
Figure 9: The polar amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), cysteine (Cys, C), valine (Val, V), isoleucine (Ile, I), leucine (Leu, L), phenylalanine (Phe, F) and proline (Pro, P) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the tertiary structure of the protein. In addition, cysteine residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges between their sulfur atoms, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of cysteine is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other, stabilized by π-π interactions. They are also highly conserved within protein families, with tryptophan having the highest conservation rate.
Figure 10: The hydrophobic amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7) carry a single charge in the side chain. There are four of them; the two basic ones are lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues are aspartate or aspartic acid (Asp, D) and glutamate or glutamic acid (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of aspartate and glutamate. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly, G), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro, P), although considered hydrophobic, is also often found on the surface of proteins, presumably due to its presence in turn and loop regions. In contrast to glycine, which provides the polypeptide chain high flexibility, proline provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes proline is called a helix breaker since it is often found at the end of α-helices (Figure 12).
Figure 12: The special amino acids. The single- and three-letter codes are provided and backbone atoms are indicated with a gray box. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Practice Questions
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 5: The image illustrates the hierarchical structure of proteins from the primary structure to the quaternary structure, using hemoglobin as an example. The background is a gradient blue, transitioning from a darker blue at the top to a lighter blue at the bottom.
From left to right:
- Primary Structure: Depicts a sequence of amino acids connected via peptide bonds. Four amino acids are shown (labeled 1, 2, 3, and 4). Each amino acid consists of an amino group (NH2), carboxyl group (COOH), hydrogen atom (H), and side chain (R1, R2, R3, R4).
- Secondary Structure (α Helix): Shows the formation of an alpha helix from the amino acid chain. The helix is represented by an orange spiraling ribbon with dotted lines indicating hydrogen bonds stabilizing the structure.
- Tertiary Structure: Illustrates a β-globin polypeptide chain folded into a specific three-dimensional shape. It appears as a purple, looped, and twisted structure.
- Quaternary Structure: Demonstrates the assembly of multiple polypeptide chains. The β-globin (purple) and α-globin (yellow, green, and blue) polypeptides combine to form a hemoglobin molecule.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC BY-NC-SA 4.0. "Protein Structure & Function" is licensed under CC BY-NC-SA 4.0.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict the functional effects of mutations in β-galactosidase
Proteins are one of the most abundant biological macromolecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective. They may serve in transport, storage, or membranes; or they may be toxins or enzymes. Each cell in a living system may contain thousands of proteins, each with a unique function. Their structures, like their functions, vary greatly, and by interrogating their structures, we can make predictions about their functions.
1. Protein structure
A protein's shape is critical to its function. For example, an enzyme can bind to a specific substrate at an active site. If this active site is altered because of local changes or changes in overall protein structure, the enzyme may be unable to bind to the substrate. To understand a protein's shape or conformation, we need to understand the four levels of protein structure: primary, secondary, tertiary, and quaternary.
Primary Structure
The amino acid sequence in a polypeptide chain is its primary structure. For example, the primary sequence of the β chain of human hemoglobin may be found on Uniprot, entry P68871. The N-terminal amino acid is valine (Val, V), and the C-terminal amino acid is histidine (His, H) (Figure 1). The amino acid sequence of hemoglobin is the same every time it is expressed, and hemoglobin is the only protein that has exactly this sequence of amino acids.
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH
Figure 1: Primary structure of human hemoglobin β chain. The β chain of human hemoglobin has 146 amino acids, all linked together in sequence with peptide bonds.
The gene encoding the protein ultimately determines the unique sequence of amino acids for every protein. A change in nucleotide sequence in the gene’s coding region may lead to change in the amino acid sequence, causing a change in the protein's structure and sometimes, therefore its function. In people who have sickle cell anemia, the hemoglobin β chain (a small portion of which is shown in Figure 2) has a single amino acid substitution, causing a change in the protein's structure and function. Specifically, at the sixth position in the primary sequence of the β chain, the wild type amino acid, glutamate (Glu, E) is substituted by valine (Val, V). What is most remarkable to consider is that a hemoglobin molecule is comprised of two α and two β chains that each consist of about 150 amino acids. The full hemoglobin protein, therefore, has about 600 amino acids. The structural difference between a normal hemoglobin molecule and a sickle cell molecule – which dramatically decreases life expectancy – is two amino acids of the ~600.
Figure 2: Structure and function of hemoglobin. Because of one change in the primary, amino acid sequence of the β chain of hemoglobin, hemoglobin proteins form long fibers that distort normally disc-shaped, red blood cells and causes them to assume a crescent or “sickle” shape, which clogs blood vessels. In wild type hemoglobin, the amino acid at position six is glutamate, but in sickle cell hemoglobin, it is valine. (Credit: Rao, A., Tag, A. Ryan, K. and Fletcher, S. Department of Biology, Texas A&M University) [Image Description]
Secondary Structure
The local folding of the polypeptide in some regions gives rise to the secondary structure of the protein. The most common are the α-helix and β-pleated sheet structures (Figure 3). Both structures are held in shape by backbone hydrogen bonds. In α-helices, for example, hydrogen bonds form between the oxygen atom in the carbonyl group in one amino acid and hydrogen and nitrogen atoms in the amide group of another amino acid that is four amino acids away in the primary sequence.
Figure 3: The α-helix and β-pleated sheet are secondary structures formed in proteins. These structures occur when hydrogen bonds form between the carbonyl oxygen and the amino hydrogen and nitrogen in the peptide backbone of two amino acids in a protein. Black = carbon, White = hydrogen, Blue = nitrogen, and Red = oxygen. Credit: Rao, A., Ryan, K. Fletcher, S. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Tertiary Structure
The polypeptide's unique three-dimensional structure is its tertiary structure (Figure 4). This structure forms primarily due to chemical interactions between the side chains of amino acids in the polypeptide chain. The chemical nature of the side chain in the amino acids involved determines which amino acids are energetically favorable to be near other amino acids. For example, side chains with like charges repel each other and those with opposite charges are attracted to each other (ionic bonds). The sulfur atoms in cysteine side chains can form disulfide linkages in the presence of oxygen, the only covalent bond that forms during protein folding. When protein folding takes place, the nonpolar amino acids' hydrophobic side chains repel water from the protein's environment and pack into the protein's interior; whereas, the hydrophilic side chains tend position on the surface of the protein as the protein folds, interacting with water. In general, whenever a protein is translated, it always folds into the same tertiary structure, as determined by the primary structure of its amino acids.
Figure 4: A variety of chemical interactions determine the proteins' 3D, tertiary structure. These include hydrophobic interactions, ionic bonding, hydrogen bonding, and disulfide linkages. [Image Description]
Quaternary Structure
In nature, some – but not all – proteins form from several polypeptides, or subunits, and the interaction of these subunits forms the quaternary structure of the protein. Weak interactions between the subunits help to stabilize the overall structure. For example, the α and β chains of human hemoglobin, a globular protein, fold into a their tertiary structures, and then two copies of the α chain come into interact with two copies of the β chain to form a tetramer of four chains (Figure 5). Silk, a fibrous protein, however, has a β-pleated sheet structure that is the result of hydrogen bonding between many different chains.
Figure 5: Primary, secondary, tertiary, and quaternary structure of hemoglobin. The primary structure of a hemoglobin is its amino acid sequence. It secondary structure is entirely α helices. Its tertiary structure is globular. Four protein chains come together to form the quaternary structure that is the functional hemoglobin protein. Credit: Rao, A. Ryan, K. and Tag, A. Department of Biology, Texas A&M University. [Image Description]
Practice Questions
2. Amino acids
Amino acids are the monomers that comprise the polymeric molecules, proteins. Each amino acid has the same fundamental structure, which consists of a central carbon atom, or the alpha carbon (Cα), bonded to an amino group (NH2), a carboxyl group (COOH), and a hydrogen atom. These atoms are considered the backbone of the amino acid. Every amino acid also has another atom or group of atoms bonded to the central Cα atom known as the R group or side chain (Figure 6).
Figure 6: Structure of an amino acid. Amino acids have a central asymmetric carbon (Cα) to which an amino group, a carboxyl group, a hydrogen atom, and a side chain (R group) are covalently bonded. [Image Description]
Practice Question
Scientists use the name "amino acid" because these acids contain both an amino group and a carboxyl-acid-group in their basic structure. As we mentioned, there are 20 common amino acids present in proteins. For each amino acid, the side chain (or R group) is different (Figure 7). The chemical nature of the side chain determines the amino acid's nature (that is, whether it is acidic, basic, polar, or nonpolar). Each amino acid has both a single-letter and a three-letter abbreviation. For example, valine is abbreviated with the letter V or the three-letter symbol, Val.
Figure 7: The 20 common amino acids. The chemical structure for each amino acid is given, grouped by chemical property. The single- and three-letter abbreviations are also provided. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
The sequence and the number of amino acids ultimately determine the protein's shape, size, and function. A covalent bond, or peptide bond, attaches to each amino acid, which a dehydration reaction forms. One amino acid's carboxyl group and the incoming amino acid's amino group combine, releasing a water molecule. The resulting bond is the peptide bond (Figure 8).
Figure 8: Peptide bond formation. The carboxyl group of one amino acid is linked to the incoming amino acid's amino group. In the process, it releases a water molecule. [Image Description]
The products that such linkages form are peptides. As more amino acids join to this growing chain, the resulting chain is a polypeptide. Each polypeptide has a free amino group at one end. This end is called the N terminus, or the amino terminus, and the other end has a free carboxyl group, also called the C or carboxyl terminus. When a polypeptide is built by the ribosome, amino acids are added from the N terminus to the C terminus. When polypeptide sequences are written out, they are written from N to C terminus. While the terms polypeptide and protein are sometimes used interchangeably, a polypeptide is technically a polymer of amino acids, whereas the term protein is used for a polypeptide that is folded into its functional form.
Each of the 20 most common amino acids has specific chemical characteristics and a unique role in protein structure and function. Based on the propensity of the side chains to be in contact with water (polar environment), amino acids can be classified into three groups: 1) those with polar side chains, 2) those with hydrophobic side chains, and 3) those with charged side chains. Below we look at each of these classes and briefly discuss their role in protein structure and function.
Polar amino acids
When considering polarity, some amino acids are straightforward to define as polar, while in other cases, we may encounter disagreements. For example, serine (Ser, S), threonine (Thr, T), and tyrosine (Tyr, Y) are polar since they carry a hydroxylic (-OH) group (Figure 9). Furthermore, this group can form a hydrogen bond with another polar group by donating or accepting a proton (a table showing donors and acceptors in polar and charged amino acid side chains can be found at the FoldIt site. Tyrosine is also involved in metal binding in many enzymatic sites. Asparagine (Asn, N) and glutamine (Gln, Q) also belong to this group and may donate or accept a hydrogen bond.
Histidine (His, H), on the other hand, depending on the environment and pH, can be polar or carry a charge. It has two –NH groups with a pKa value of around 6. At pHs below 6, when both groups are protonated, the side chain has a charge of +1. Within protein molecules, the pKa may be modulated by the environment so that the side chain may give away a proton and become neutral or accept a proton, becoming charged. This ability makes histidine useful in enzyme active sites when the chemical reaction requires proton extraction.
Figure 9: The polar amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Hydrophobic amino acids
The hydrophobic amino acids include alanine (Ala, A), valine (Val, V), leucine (Leu, L), isoleucine (Ile, I), proline (Pro, P), phenylalanine (Phe, F) and cysteine (Cys, C) (Figure 10). These residues typically form the hydrophobic core of proteins, which is isolated from the polar solvent. The side chains within the core are tightly packed and participate in van der Waals interactions, which are essential for stabilizing the structure. In addition, Cys residues are involved in three-dimensional structure stabilization through the formation of disulfide (S-S) bridges, which sometimes connect different secondary structure elements or different subunits in a complex. Another essential function of Cys is metal binding, sometimes in enzyme active sites and sometimes in structure-stabilizing metal centers.
The aromatic amino acids tryptophan (Trp, W) and Tyr and the non-aromatic methionine (Met, M) are sometimes called amphipathic due to their ability to have both polar and nonpolar character. In protein molecules, these residues are often found close to the interface between a protein and solvent. We should also note here that the side chains of histidine and tyrosine, together with the hydrophobic phenylalanine and tryptophan, can also form weak hydrogen bonds of the types OH−π and CH−O, using electron clouds within their ring structures. A characteristic feature of aromatic residues is that they are often found within the core of a protein structure, with their side chains packed against each other. They are also highly conserved within protein families, with Trp having the highest conservation rate.
Figure 10: The hydrophobic amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Charged amino acids
The charged amino acids at neutral pH (around 7.4) carry a single charge in the side chain. There are four of them; the two basic ones include lysine (Lys, K) and arginine (Arg, R), with a positive charge at neutral pH. The two acidic residues include aspartate (Asp, D) and glutamate (Glu, E), which carry a negative charge at neutral pH (Figure 11). A so-called salt bridge is often formed by the interaction of closely located positively and negatively charged side chains. Such bridges are often involved in stabilizing three-dimensional protein structure, especially in proteins from thermophilic organisms, organisms that live at elevated temperatures, up to 80-90 C, or even higher. The binding of positively charged metal ions is another function of the negatively charged carboxylic groups of Asp and Glu. Metalloproteins and the role of metal centers in protein function is a fascinating field of structural biology research.
Figure 11: The charged amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Glycine & proline
Glycine (Gly), one of the common amino acids, does not have a side chain – its R group is just a hydrogen atom – and is often found at the surface of proteins within loop or coil regions (regions without defined secondary structure), providing high flexibility to the polypeptide chain. This flexibility is required in sharp polypeptide turns in loop structures. Proline (Pro), although considered hydrophobic, is also found at the surface, presumably due to its presence in turn and loop regions. In contrast to Gly, which provides the polypeptide chain high flexibility, Pro provides rigidity by imposing certain torsion angles on the segment of the structure. The reason for this is that its side chain makes a covalent bond with the main chain, which constrains the backbone shape of the polypeptide in this location. Sometimes Pro is called a helix breaker since it is often found at the end of α-helices. (Figure 12)
Figure 12: The special amino acids. Adapted from "Molecular structures of the 21 proteinogenic amino acids.svg" by Dan Cojocari licensed under CC-BY-SA. [Image Description]
Practice Questions
Figure Descriptions
Figure 2: The image is a comparative illustration of the structural and functional differences between normal hemoglobin and sickle-cell hemoglobin across various levels of protein structure. The layout is divided into two vertical sections labeled "Normal" and "Sickle-Cell," each with subsections depicting the primary, secondary, tertiary, quaternary structures, and function.
- Primary Structure:
- Normal: Seven circular molecules labeled sequentially from 1 to 7 with the respective amino acids: Val, His, Leu, Thr, Pro, Glu, Glu.
- Sickle-Cell: Same seven circular molecules labeled sequentially with the amino acids: Val, His, Leu, Thr, Pro, Val, Glu. The sixth molecule, Glu, is replaced with Val, highlighted in red.
- Secondary and Tertiary Structures:
- Normal: A blue 3D ellipsoid shape representing the normal β subunit.
- Sickle-Cell: A reddish-brown 3D ellipsoid shape representing the sickle-cell β subunit.
- Quaternary Structure:
- Normal: Combination of blue and purple ellipsoid shapes to form normal hemoglobin.
- Sickle-Cell: Combination of reddish-brown and purple ellipsoid shapes to form sickle-cell hemoglobin.
- Function:
- Normal: Depicts individual globular hemoglobin molecules scattered and unassociated, each capable of carrying oxygen.
- Sickle-Cell: Illustrates abnormal aggregation of hemoglobin molecules into fibers, impairing oxygen-carrying capacity.
Figure 3: The image illustrates two types of secondary protein structures against a light blue background: an alpha-helix and a beta-pleated sheet. The illustration is divided horizontally into two sections.
- Top Section: Alpha Helix
- A right-handed helical structure is shown in orange, twisting in a clockwise direction.
- The helix is depicted with a string of colored spheres (atoms) connected by lines (chemical bonds) representing the molecular structure.
- Hydrogen bonds are represented by dashed lines connecting parts of the helix.
- The labels include "α Helix" and "Hydrogen Bond".
- Bottom Section: Beta Pleated Sheet
- Several strands are aligned next to each other, forming a pleated sheet structure in orange.
- Similar to the helix, the strands are composed of colored spheres (atoms) connected by lines (chemical bonds).
- Hydrogen bonds are depicted as dashed lines running perpendicular to the strands, connecting adjacent strands.
- The labels include "β Pleated Sheet," "β Strand," and "Hydrogen Bond".
Figure 4: The image depicts a simplified diagram of a polypeptide backbone, illustrating various interactions and bonds that occur within a protein structure. The backbone is represented by a red, ribbon-like structure that loops and twists, showing the complex folding of the protein.
- Polypeptide Backbone: The main red ribbon represents the polypeptide backbone which loops around the image.
- Ionic Bond: There is a highlighted section showing a segment with a labeled "Ionic Bond," featuring an NH₃⁺ group connected to an O⁻ group.
- Hydrogen Bond: A light blue segment indicates a "Hydrogen bond" between O-H groups.
- Disulfide Linkage: An adjacent part shows a connection labeled "Disulfide linkage" marked by two sulfur atoms connected by a line (represented by "S-S").
- Hydrophobic Interactions: Another section indicates "Hydrophobic interactions," involving CH₃ groups interacting with one another.
Figure 5: The image illustrates the hierarchical structure of proteins from the primary structure to the quaternary structure, using hemoglobin as an example. The background is a gradient blue, transitioning from a darker blue at the top to a lighter blue at the bottom.
From left to right:
- Primary Structure: Depicts a sequence of amino acids connected via peptide bonds. Four amino acids are shown (labeled 1, 2, 3, and 4). Each amino acid consists of an amino group (NH2), carboxyl group (COOH), hydrogen atom (H), and side chain (R1, R2, R3, R4).
- Secondary Structure (α Helix): Shows the formation of an alpha helix from the amino acid chain. The helix is represented by an orange spiraling ribbon with dotted lines indicating hydrogen bonds stabilizing the structure.
- Tertiary Structure: Illustrates a β-globin polypeptide chain folded into a specific three-dimensional shape. It appears as a purple, looped, and twisted structure.
- Quaternary Structure: Demonstrates the assembly of multiple polypeptide chains. The β-globin (purple) and α-globin (yellow, green, and blue) polypeptides combine to form a hemoglobin molecule.
Figure 6: The image is a diagram depicting the structure of an amino acid. The diagram is divided into three sections vertically, from left to right, labeled "Amino group," "Side chain," and "Carboxyl group." The amino group section contains a nitrogen atom (N) colored blue at the center, bonded to two hydrogen atoms (H) represented in white and labeled. Moving rightwards, the central section contains a carbon atom (C) depicted in black, bonded to one hydrogen atom (H) in white and to an "R" group representing the side chain. The carbon is also bonded to another carbon atom (C), also in black, positioned to the right in the carboxyl group section. This carbon is double-bonded to an oxygen atom (O) colored in red, and single-bonded to another oxygen (O) with a single hydrogen (H) attached. An arrow points to the central carbon labeled "α carbon." [Return to Figure]
Figure 7: The image is an educational chart titled "20 Common Amino Acids." It is divided into four main sections by backgrounds of different colors: Polar Uncharged (light blue), Hydrophobic (light green), Charged (light pink), and Special Cases (light yellow).
- Polar Uncharged (light blue background):
- Contains six amino acids: Serine (S), Threonine (T), Histidine (H), Asparagine (N), Glutamine (Q), and Tyrosine (Y).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Hydrophobic (light green background):
- Contains nine amino acids: Alanine (A), Cysteine (C), Valine (V), Isoleucine (I), Leucine (L), Methionine (M), Phenylalanine (F), and Tryptophan (W).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Charged (light pink background):
- Divided into Positive and Negative sections.
- The Positive section includes Arginine (R) and Lysine (K).
- The Negative section includes Aspartic Acid (D) and Glutamic Acid (E).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- Special Cases (light yellow background):
- Contains two amino acids: Glycine (G) and Proline (P).
- Each amino acid is depicted with its chemical structure and a red circle indicating its one-letter code inside the circle.
- The top left structure represents an amino acid, featuring an amino group (H2N), a central carbon (C) bonded to a hydrogen atom (H), a variable side chain (R), and a carboxyl group (COOH). The hydroxyl group (OH) in the carboxyl group is highlighted in red.
- The top right structure represents another amino acid with a similar structure but differing variable side chains (R).
- The two structures at the top are separated by a space and linked by an arrow pointing to a single structure at the bottom.
- The bottom structure represents the resulting dipeptide with a peptide bond formed. The peptide bond is highlighted within a blue rectangle, showing the linkage between the carbon (C) of one amino acid and the nitrogen (N) of the other amino acid.
- The term "Peptide Bond" is written below the blue rectangle.
Figure 9: The image categorizes polar uncharged amino acids and visually represents their structures. It displays six amino acids: Serine, Threonine, Histidine, Asparagine, Glutamine, and Tyrosine. Each amino acid shows its backbone and distinct side chain. The background is light blue, with the structures depicted in black. Each amino acid name is followed by its three-letter and one-letter code, represented within a red circle. [Return to Figure]
Figure 10: The image is a diagram depicting the molecular structures of eight hydrophobic amino acids. The background is light green, and each amino acid is illustrated with its chemical structure, the three-letter abbreviation, and the single-letter code. The amino acids are aligned horizontally. From left to right, the amino acids are Alanine (Ala, A), Cysteine (Cys, C), Valine (Val, V), Isoleucine (Ile, I), Leucine (Leu, L), Methionine (Met, M), Phenylalanine (Phe, F), and Tryptophan (Trp, W). Each single-letter code is presented in a red circle. [Return to Figure]
Figure 11: The image is a diagram that categorizes amino acids based on their charge properties and atomic structure. The background is a light pink color, and there is a shaded rectangular area in the center where the chemical structures are displayed. The diagram is divided into two main groups labeled “Positive” and “Negative”. Under the “Positive” group, two amino acids are listed: Arginine (Arg) and Lysine (Lys), each represented with their respective chemical structures and a red circle with the letters "R" and "K". Under the “Negative” group, two amino acids are listed: Aspartic Acid (Asp) and Glutamic Acid (Glu), each represented with their respective chemical structures and a red circle with the letters "D" and "E". [Return to Figure]
The image has a yellow background and is titled "Special Cases" at the top in black font. Below the title, there are two sections dedicated to the amino acids Glycine (Gly) and Proline (Pro).
To the left, under the heading "Glycine (Gly)" in black text, there is a red circle with a white uppercase letter "G" inside. Below this, a structural formula of Glycine is depicted within a beige rectangle. The formula shows a carbon atom bonded to an amine group (NH₂), a carboxyl group (COOH), and two hydrogen atoms.
To the right, under the heading "Proline (Pro)" in black text, there is a red circle with a white uppercase letter "P" inside. Below this, a structural formula of Proline is also shown within the same beige rectangle. The Proline structure shows a carbon atom bonded to a carboxyl group (COOH), an amine group in a five-membered ring structure, and single hydrogen atoms.
Licenses and Attributions
"Protein Structure & Function" by Michelle McCully is adapted from "3.4 Proteins" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC-BY 4.0 and "The 20 Amino Acids and Their Role in Protein Structures" by Salam Al-Karadaghi under CC-BY-SA 4.0. "Protein Structure & Function" is licensed under ???.
Learning Objectives
By the end of this chapter, you will be able to do the following:
- Predict how environmental conditions regulate gene expression
1. Review of prokaryotic transcription & translation
Genes are composed of DNA and are linearly arranged on chromosomes. Genes specify the sequences of amino acids, which are the building blocks of proteins. In turn, proteins are responsible for orchestrating nearly every function of the cell. Both genes and the proteins they encode are absolutely essential to life as we know it.
Transcription in bacteria
The bacterial chromosome is a closed circle of double-stranded DNA that is found in the central region of the cell called the nucleoid region. Prokaryotic genomes are very compact, and prokaryotic transcripts often cover more than one gene or cistron, a coding sequence for a single protein. Polycistronic mRNAs are then translated to produce multiple proteins.
A promoter is a DNA sequence onto which the transcription machinery, including RNA polymerase, binds and initiates transcription. In most cases, promoters exist upstream of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding gene is transcribed all the time, some of the time, or infrequently.
Transcription in prokaryotes (and in eukaryotes) requires the DNA double helix to partially unwind in the region of mRNA synthesis. The region of unwinding is called a transcription bubble. The protein, RNA polymerase, carries out transcription, reading from the template strand of the DNA, in the 3’ to 5’ direction.
RNA polymerase proceeds along the DNA template strand, pairing A, C, U, and G nucleotides to the DNA’s T, G, A, and C nucleotides, respectively. RNA polymerase catalyzes the formation of phosphodiester bonds between the mRNA nucleotides in sequence, synthesizing mRNA in the 5' to 3' direction at a rate of approximately 40 nucleotides per second. As elongation proceeds, the DNA is continuously unwound ahead of the core enzyme and rewound behind it, unchanged. Transcription terminates when the RNA polymerase reaches a termination sequence downstream of the genes being transcribed, which causes the RNA polymerase to fall off of the template DNA strand.
Translation in bacteria
For cells that are growing and dividing, the synthesis of proteins consumes more of a cell’s energy than any other metabolic process. In turn, proteins account for more mass than any other component of living organisms (with the exception of water), and proteins perform virtually every function of a cell. The process of translation, or protein synthesis, involves the decoding of an mRNA message into a polypeptide product. Amino acids are covalently strung together by interlinking peptide bonds in lengths ranging from approximately 50 to more than 1000 amino acid residues. Each individual amino acid has an amino group (NH2) and a carboxyl (COOH) group. Polypeptides are formed when the amino group of one amino acid forms an amide (i.e., peptide) bond with the carboxyl group of another amino acid. This reaction is catalyzed by ribosomes and generates one water molecule.
Translation requires the input of an mRNA template, ribosomes, tRNAs, and various enzymatic factors. A ribosome is a complex macromolecule composed of structural and catalytic rRNAs and many proteins. Ribosomes exist in the cytoplasm of prokaryotes. Each mRNA molecule is simultaneously translated by many ribosomes, all synthesizing protein in the same direction: reading the mRNA from 5' to 3' and synthesizing the polypeptide from the N terminus to the C terminus.
The tRNAs are structural RNA molecules that serve as adapter molecules. Each tRNA carries a specific amino acid and recognizes one or more of the mRNA codons that define the order of amino acids in a protein. Aminoacyl-tRNAs bind to the ribosome and add the corresponding amino acid to the growing polypeptide chain. Therefore, tRNAs are the molecules that actually “translate” the language of RNA into the language of proteins.
As with mRNA synthesis, protein synthesis can be divided into three phases: initiation, elongation, and termination. In initiation, a sequence upstream of the first AUG codon interacts with the rRNA molecules that compose the ribosome and anchors the ribosome at the correct location on the mRNA template. The initiator tRNA then interacts with the start codon AUG, whose tRNA carries the amino acid methionine. During elongation, the mRNA template provides tRNA binding specificity. As the ribosome moves along the mRNA, each mRNA codon comes into register, and specific binding with the corresponding charged tRNA anticodon is ensured. Termination of translation occurs when a stop codon (UAA, UAG, or UGA) is encountered. When encountered, these stop codons are recognized by protein release factors that resemble tRNAs. This reaction forces the previous amino acid to detach from its tRNA, and the newly made protein is released.
Practice Question
2. Operons as units of transcriptional regulation
For a cell to function properly, necessary proteins must be synthesized at the proper time and place. All cells control or regulate the synthesis of proteins from information encoded in their DNA. The process of activating the molecules that produce RNA through transcription and protein through translation is called gene expression. Whether in a simple unicellular organism or a complex multi-cellular organism, each cell controls when and how its genes are expressed. For this to occur, there must be internal chemical mechanisms that control when a gene is expressed to make RNA and protein, how much of the protein is made, and when it is time to stop making that protein because it is no longer needed.
The regulation of gene expression conserves energy and matter. Transcription and translation require ATP, so it would require a significant amount of energy for an organism to express every gene at all times. In addition, every RNA and protein molecule is composed of nucleotides or amino acids, respectively. These subunit molecules must ultimately be made by recycling other molecules or consumed by the organism from the environment. A cell conserves energy and matter by only expressing the subset of genes that are required for its function at any given time.
The control of gene expression is extremely complex. Malfunctions in this process are detrimental to the cell and can lead to the development of many diseases in humans, including cancer.
The DNA of prokaryotes is organized into a circular chromosome within the nucleoid region of the cell cytoplasm. Proteins that are needed for a specific function, or that are involved in the same biochemical pathway, are encoded together in the coding region of an operon, which are transcribed as a single mRNA molecule (Figure 1). For example, all of the genes needed to use lactose as an energy source are coded next to each other in the coding region of the lac operon. The promoter and operator are located upstream of the coding region and are regions of the DNA where the proteins that regulate and carry out transcription bind. For example, RNA polymerase binds to the promoter site and then slides downstream, transcribing the coding region as it passes. An operator is a region of DNA where regulatory proteins that either activate or repress transcription of the operon bind. Promoters and operators are not themselves genes because they do not encode proteins, rather they are non-coding DNA.
Figure 1: Components of an Operon. The promoter and operator are located upstream of the coding region of an operator. Created in BioRender.com. [Image Description]
In prokaryotic cells, there are three types of regulatory molecules that can affect the expression of operons: repressors, activators, and inducers. Repressors and activators are proteins produced in the cell. Both repressors and activators regulate gene expression by binding to specific DNA sites upstream of the genes they control. Repressors prevent transcription of a gene in response to an external stimulus, whereas activators increase the transcription of a gene in response to an external stimulus. How an operator is positioned relative to the promoter is critical the the function of the regulatory protein, positioning it to either support the binding of RNA polymerase (an activator) or interfere with RNA polymerase (a repressor). Inducers are small molecules that may be produced by the cell or that are in the cell’s environment. Inducers either activate activator proteins or repress repressor proteins to cause a gene to be expressed.
Practice Questions
3. Transcriptional repression
Most organisms, including the bacteria Escherichia coli, need glucose to perform cellular respiration in order to survive. If glucose is not available, E. coli are able to uptake other sugars from the environment and metabolize them into glucose. Lactose is one such sugar that E. coli can ingest from its environment and metabolize into glucose (Figure 2). It is a disaccharide, made of the two monosaccaride molecules, glucose and galactose, bound together. If glucose is present in the environment, there is no need for the E. coli to expend energy and matter expressing the proteins needed to uptake and digest lactose. However, if glucose is not present and lactose is, the E. coli would benefit from using lactose as a source of glucose.
Figure 2: Lactose digestion. β-galactosidase breaks a bond in the disaccharide lactose to break it into its component monosaccharides, glucose and galactose. Source: lactase.svg. Created in BioRender.com. [Image Description]
The genes required for E. coli to uptake lactose and break it down into glucose are located together on its chromosome in the lac operon. The coding region of the lac operon includes the genes for three proteins (Figure 3). lacZ encodes the enzyme, β-galactosidase or LacZ, which breaks down lactose into its monosaccharides, glucose and galactose. lacY encodes the membrane transport protein, β-galactoside permease or LacY, which moves lactose from outside the cell to inside the cell. lacA encodes the enzyme β-galactoside transacetylase or LacA, and its role in lactose metabolism is unclear.
The lac operon includes three important regions: the lac promoter, the lac operator, and the coding region. The promoter region is where RNA polymerase binds to initiate transcription and where transcriptional activators bind to help RNA polymerase bind. Just downstream is the operator region where repressors can bind. Next comes the transcriptional start site, immediately followed by the coding region, which contains the three genes, lacZ, lacY, and lacA.
Figure 3: An E. coli cell that is in the presence of lactose and transcribing its lac operon. Lactose from the lumen of the large intestine enters the E. coli cell through lac permease. Once in the cell, lactose is broken down by β-galactosidase into glucose and galactose or isomerized into allolactose. Allolactose binds to the lac repressor and cAMP binds to CRP allowing RNA polymerase to transcribe the lac operon. In this way, E. coli is able to digest lactose into glucose to use in cellular respiration. Created in BioRender.com. [Image Description]
The lac operator contains the DNA code to which the lac repressor protein can bind. The lac repressor binds to the operator region of the lac operon, physically blocking RNA polymerase and halting transcription (Figure 4). The lac repressor is expressed constitutively, meaning always, in E. coli. Its gene is located elsewhere on the E. coli chromosome, not in the lac operon.
Figure 4: State of the lac operon in the absence of lactose. The lac repressor binds to the operator region, blocking the progression of RNA polymerase downstream, off the promoter region. Created in BioRender.com. [Image Description]
When there is allolactose present, allolactose binds to the lac repressor and causes it to change shape such that it no longer fits on the lac operator (Figure 5). When β-galactosidase breaks down lactose, 90% of the time the lactose breaks down all the way to glucose and galactose, but the other 10% of the time, it isomerizes into allolactose. When lactose is isomerized to allolactose, the allolactose can bind the lac repressor and block it from binding to the lac operator. Thus, RNA polymerase is not blocked, and it can initiate transcription of the lac genes to digest lactose.
Figure 5: State of the lac operon in the presence of lactose. Allolactose binds to the lac repressor, which causes it to change shape and unbind from the operator region. RNA polymerase can progress downstream off the promotor region to the coding region, where it transcribes the lacZ, lacY, and lacA genes into mRNA. This mRNA transcript is translated by the ribosome into β-galactosidase, lac permease, and transacetylase. Created in BioRender.com. [Image Description]
Practice Questions
Images created in BioRender.com.
4. Transcriptional activation
Just as the lac operon is negatively regulated by the lac repressor binding to the operator, there are activator proteins that bind to the promoter that act as positive regulators to turn on transcription. When glucose is scarce, E. coli can turn to other sugar sources for fuel. To do this, the genes to digest these alternate sugars and transform them into glucose must be transcribed.
When glucose levels drop, cyclic AMP (cAMP) begins to accumulate in the cell. cAMP is a small, signaling molecule that is involved in glucose and energy metabolism in E. coli. Accumulating cAMP binds to the positive regulator, cAMP receptor protein (CRP), a protein that binds to the promoters of many operons that control the processing of alternative sugars, including the lac operon (Figure 6). CRP is also known as catabolite activator protein (CAP), and the names CRP and CAP may be used interchangeably.
When cAMP binds to CRP, the complex then binds to the promoter region of the lac operon, just upstream of the RNA-polymerase-binding site on the promoter. The binding of CRP stabilizes the binding of RNA polymerase to the promoter region and increases transcription of the genes in the coding region of the operon.
Figure 6: State of the lac operon in the absence of glucose. When glucose levels are low, cAMP accumulates in the cell and binds CRP. CRP helps RNA polymerase bind to the promoter region and transcribe the lac operon. Created in BioRender.com. [Image Description]
If E. coli cells are in an environment where there is plentiful glucose, glucose molecules will enter the cell through the glucose permease membrane protein. When glucose levels are high in the cell, cAMP is consumed by other cellular processes, and cellular levels of cAMP are low. Without cAMP bound, CRP changes conformation and cannot bind the promoter region (Figure 7). The lac operon has evolved to have a very weak promoter, meaning RNA polymerase does not bind the promoter region very tightly unless cAMP+CRP are present to increase its binding affinity. Therefore, in the presence of glucose, RNA polymerase does not bind as tightly to the promoter region, and expression of the lac operon is low. There is no need for the cell to waste energy and matter building the proteins encoded by the lac operon when there is abundant glucose available for cellular respiration.
Figure 7: State of the lac operon in the presence of glucose. When glucose levels are high, cAMP levels are low. Without cAMP bound, CRP does not bind to the promoter region. RNA polymerase does not bind very well to the promoter without cAMP-CRP, so transcription does not occur. Created in BioRender.com. [Image Description]
Practice Questions
Images created in BioRender.com.
5. Activation and repression to control lac operon expression
E. coli only need to expend energy and matter expressing the genes necessary to digest lactose when two conditions are both met: 1) glucose is not present and 2) lactose is present. There is no reason to make the proteins that import and digest lactose if glucose is present, and there is no need to make these proteins if there is no lactose available in the environment (Table 1).
Only when glucose is absent and lactose is present will the lac operon be transcribed. In the absence of glucose, the binding of the cAMP+CRP binds to the lac promoter and makes transcription of the lac operon more effective. When lactose is present, its metabolite, allolactose, binds to the lac repressor and changes its shape so that it cannot bind to the lac operator to prevent transcription. This combination of conditions makes sense for the cell, because it would be energetically wasteful to synthesize the enzymes to process lactose if glucose was plentiful or lactose was not available.
If glucose is present, then CRP fails to bind to the promoter sequence to activate transcription. If lactose is absent, then the lac repressor binds to the operator to prevent transcription. If either of these conditions is met, then transcription remains off. Only when glucose is absent and lactose is present is the lac operon transcribed at a high rate.
Table 1: Positive and negative regulation of the lac operon
Glucose present | cAMP levels | RNA polymerase bound to promoter | Lactose/allolactose present | lac repressor bound to operator | Transcription of lac genes |
Present | Low | Very little | Absent | Yes | No |
Present | Low | Very little | Present | No | No (very little) |
Absent | High | Yes | Absent | Yes | No |
Absent | High | Yes | Present | No | Yes |
Practice Questions
Images created in BioRender.com.
Image Descriptions
Figure 1: The image depicts a segment of DNA as part of an operon structure, visualized as a double helix. The DNA strand runs horizontally across the image from left to right. The left side is marked as the 5' end and the right side as the 3' end, indicating the direction of the DNA sequence. The DNA is color-coded into three distinct regions: the Promoter region is colored in blue, the Operator region in purple, and the Coding Region in orange. Above the DNA strand, a black bracket labeled 'Operon' spans across these three regions. The background is white, emphasizing the colored segments. [Return to Figure]
Figure 2: The image depicts a chemical reaction process in which lactose is broken down into D-galactose and D-glucose. At the top, there are structural chemical formulas. On the left, lactose is shown as two connected hexagonal ring structures. To the right, an arrow points towards two separate hexagonal ring structures, representing D-galactose and D-glucose, with a plus sign in between them. Below the formulas, a simplified graphical representation mirrors the above process using colored hexagons. On the left, a light blue hexagon is linked to a dark blue hexagon representing lactose. An arrow points to the right toward two separate hexagons, one light blue representing D-galactose and one dark blue representing D-glucose, with a plus sign in between them. [Return to Figure]
Figure 3: This is a schematic diagram depicting the lactose operon regulation mechanism within an E. coli cell. The diagram is framed by a red oval outline representing the boundary of the E. coli cell. The background inside the cell is a light pink color. There are several key elements and labels in the image, elaborating the process of lactose metabolism.
Starting from the top right, blue hexagons represent lactose molecules outside the cell, entering through a membrane protein labeled “lac permease.” Once inside, lactose can be converted into allolactose, which is shown as hexagons with various shades of blue. Allolactose binds to a purple irregular shape labeled “lac repressor,” inactivating it and allowing for transcription.
The middle part of the diagram shows "beta-galactosidase" depicted as a green enzyme, converting lactose into glucose and galactose, represented by blue hexagons.
At the bottom left, the DNA is shown as a double helix with different segments marked as lacZ, lacY, and lacA, respectively. Colored proteins including “cAMP,” “CRP,” and “RNA polymerase” interact with the DNA strand to regulate the transcription process. [Return to Figure]
Figure 4: The image is a schematic representation of DNA with associated proteins, illustrating the process of transcription regulation. The double-helix DNA strand runs horizontally across the image, transitioning from left to right. At the left end of the DNA strand, there is a blue-shaded structure labeled "RNA polymerase" that is bound to the DNA. This structure is represented with a slight transparency and is roughly cubic in shape, with smooth contours. Adjacent to this, towards the right, is a purple, irregularly shaped structure labeled "lac repressor" also bound to a segment of the DNA. The DNA sequence includes three distinct sections labeled "lacZ," "lacY," and "lacA" in green, teal, and turquoise text respectively. The DNA strand transitions in color between segments, visually differentiating these regions. The ends of the DNA strand are marked with 5' and 3' labels indicating the orientation, with "5'" and "3'" labeled at both the leftmost and rightmost ends. [Return to Figure]
- DNA Segment: At the top-center, there is a double-helical strand of DNA. The DNA strand has labels indicating its 5' and 3' ends on the left and right respectively. The DNA is color-coded, with different regions marked in purple (lac repressor binding site), light blue (around RNA polymerase), green (lacZ), blue-green (lacY), and dark cyan (lacA).
- RNA Polymerase: On the left, a light blue, rounded structure labeled "RNA polymerase" is shown bound to the DNA strand, indicating the start of transcription.
- Lac Repressor and Allolactose: Towards the upper-center, a purple-colored, irregular-shaped structure labeled "lac repressor" is shown with hexagonal allolactose molecules in blue attached to it.
- Transcription Process: Below the DNA, a wavy green strand represents the mRNA being synthesized, labeled "mRNA 5'" on the left and "3'" on the right.
- Translation and Enzyme Production: The mRNA strand is split into segments coded for three different enzymes:
- β-galactosidase: Shown as a green, cloud-like structure.
- Lac permease: Illustrated as a teal, tubular structure.
- Transacetylase: Depicted as a darker green, cloud-like structure.
- Black arrows indicate the processes of transcription (DNA to mRNA) and translation (mRNA to proteins)
Figure 6: The image depicts a molecular biological process involving the transcription and translation of the lac operon, structured in a linear format from top to bottom. At the top left, a DNA strand is shown with a nucleotide sequence represented by a helical structure. Bound to the DNA are a red, irregularly shaped protein labeled "CRP," and an orange circular molecule labeled "cAMP." Adjacent to this, a light blue structure representing RNA polymerase is shown, initiating transcription from the DNA template. The DNA sequence is segmented into colored regions labeled "lacZ" in green, "lacY" in turquoise, and "lacA" in teal. Below this, the mRNA transcript is shown emerging, signified by a wavy green and blue line. The mRNA transcript undergoes translation, denoted by arrows, producing three distinct proteins. On the bottom row are illustrations of the resulting proteins: a green, amorphous shape labeled "β-galactosidase," a turquoise, cylindrical shape labeled "lac permease," and a dark green, irregular shape labeled "Transacetylase." [Return to Figure]
Figure 7: The image depicts a molecular diagram related to gene transcription. Three distinct elements are illustrated above a sequence of DNA, and labeled as "RNA polymerase," "CRP," and "glucose."
- RNA polymerase is shown on the left as a light blue, abstract shape with smooth curves, resembling a cap or enzyme model.
- CRP (cAMP receptor protein) is shown in the middle as a dark red, irregular shape with a lumpy texture.
- Glucose is illustrated on the right with five small, dark blue hexagons arranged in a square formation with one hexagon in the center.
Below these elements is a DNA strand depicted as a double helix with segments colored in shades of blue, purple, green, and aqua. The DNA sequence is labeled at the 5' and 3' ends with corresponding numbers. Three specific genes within the DNA sequence are labeled:
- "lacZ" in green,
- "lacY" in teal,
- "lacA" in aqua.
Licenses and Attributions
"Transcriptional regulation of the lac operon" by Michelle McCully is adapted from "15-2-prokaryotic-transcription", "15-5-ribosomes-and-protein-synthesis", "16-1-regulation-of-gene-expression", and "16-2-prokaryotic-gene-regulation" by Mary Ann Clark, Matthew Douglas, Jung Choi for OpenStax Biology 2e under CC BY 4.0. "Transcriptional regulation of the lac operon" is licensed under CC BY-NC 4.0.
Images created with BioRender are licensed with permission as CC BY-NC 4.0.