Subjects/Science/Biology/Molecular Biology/Bioinformatics

Bioinformatics - Structure and Systems Biology

Understand protein structure prediction methods, subcellular localization inference, and network-based systems biology simulations.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is the primary structure of a protein?

1 of 9

Summary

Structural Bioinformatics and Systems Biology Protein Structure: From Sequence to Function Understanding Protein Structure Levels Proteins are biological molecules with a hierarchical structure, organized into distinct levels of complexity. Understanding these levels is essential because they determine how a protein functions. Primary structure is the linear sequence of amino acids that make up a protein. This sequence is directly encoded by the DNA of genes—the order of codons in a gene determines which amino acids are linked together in which order. Primary structure is deceptively simple in appearance but critically important: the specific sequence of amino acids contains all the information needed for a protein to fold into its functional three-dimensional shape. Secondary structure refers to local patterns of hydrogen bonding between the protein backbone atoms. These patterns form recognizable structural motifs like alpha helices (spiral-shaped regions) and beta sheets (extended, zigzag-shaped regions). These structures arise spontaneously as the amino acid chain folds, stabilized by hydrogen bonds along the backbone. Tertiary structure describes the complete three-dimensional shape of the entire protein molecule. This structure emerges as the chain folds further, with amino acid side chains interacting with each other through various forces: hydrophobic interactions (nonpolar side chains clustering away from water), hydrogen bonds, ionic interactions, and disulfide bonds between cysteine residues. Tertiary structure determines the protein's functional properties—where its active site is located, how it binds substrates, and how it interacts with other molecules. Quaternary structure applies only to proteins made up of multiple subunits. It describes how these subunits are arranged relative to each other and how they interact. Not all proteins have quaternary structure; single-subunit proteins do not. The critical principle here is that primary structure generally determines all higher levels of structure. In most cases, if you know the amino acid sequence, the protein will spontaneously fold into its correct three-dimensional form in the cell. This is why knowing the sequence is so powerful—it tells you, indirectly, what the protein will look like and how it will function. However, there are important exceptions. Prion proteins demonstrate what happens when things go wrong: misfolded prions can force correctly folded proteins to adopt their abnormal shape, propagating the misfolding. This shows that while primary structure usually determines structure, the cellular environment and protein interactions matter too. The image above shows an example of a complete three-dimensional protein structure, illustrating the complex folding and organization you get from a linear amino acid sequence. Why Sequence Similarity Predicts Structure: Homology Modeling The most important principle in structural bioinformatics is that homologous proteins—those sharing evolutionary origin and sequence similarity—also share similar three-dimensional structures. This is the foundation of homology modeling, one of the most practical techniques in the field. When proteins are homologous, their amino acid sequences are similar because they evolved from a common ancestral sequence. During evolution, many amino acid positions remain unchanged because changing them would damage the protein's function. These conserved regions are particularly important functionally or structurally. Other positions tolerate amino acid substitutions better because the specific identity matters less. The key insight is: if two proteins have similar sequences, they likely have similar three-dimensional structures, even if the structure hasn't been experimentally determined for one of them. Homology modeling is a structure prediction technique that exploits this principle: You have a protein of unknown structure (the "target" or "query") You search databases for homologous proteins with known experimental structures (from X-ray crystallography or cryo-EM) You align the sequence of your target protein with the known structure You use the known structure as a template, adjusting it based on the sequence differences You build a three-dimensional model of your unknown protein The image shows an example of homology relationships and sequence alignments—the colored regions indicate where sequences match or differ, helping identify which parts of the template structure can be reliably borrowed for the target. This method works best when sequence identity is high (>30-40%), because greater similarity means the structures are more likely to be nearly identical. As sequence similarity decreases, the predictions become less reliable. The AlphaFold Revolution For decades, homology modeling was the best available method for predicting protein structures when experimental determination was impractical. However, a major limitation remained: homology modeling only works when you can find a homologous protein with a known structure. For many proteins, especially those from less-studied organisms or with unusual sequences, no good templates existed. AlphaFold, a deep learning system developed by DeepMind, fundamentally changed protein structure prediction. Rather than relying on experimental templates, AlphaFold uses machine learning trained on all known protein structures to predict structure directly from sequence alone. The significance cannot be overstated: AlphaFold markedly outperforms all previous methods, including traditional homology modeling, even when no good homologous structures exist. It can predict structures with remarkable accuracy by learning patterns about how amino acid sequences fold into three dimensions. The system analyzes: The primary sequence itself Multiple sequence alignments (patterns of conservation across many related sequences) Structural patterns it learned from training data The result is that protein structure prediction has shifted from being a bottleneck in biological research to being relatively routine. Where experimental structure determination of a protein might take months or years, AlphaFold predictions are nearly instantaneous. <extrainfo> AlphaFold won the CASP13 competition (Critical Assessment of protein Structure Prediction) in 2018 by an enormous margin, and subsequent versions have only improved. This achievement marked the point at which AI-based methods definitively surpassed traditional bioinformatics approaches for this fundamental problem. </extrainfo> Cellular Organization: Predicting Protein Localization Knowing where a protein is located within the cell provides crucial clues about its function. Protein localization prediction uses computational methods to infer which cellular compartment or region a protein is likely to inhabit based on its sequence characteristics. The logic is straightforward: proteins destined for the nucleus often contain specific signal sequences that direct them there. Proteins headed for the mitochondria have different targeting sequences. Membrane proteins have characteristic hydrophobic regions. By analyzing these sequence features, bioinformaticians can predict a protein's location. Why does location prediction matter? Consider that nuclear proteins are enriched for DNA-binding and gene regulation functions, while secreted proteins are often signaling molecules or enzymes meant to act outside cells. A protein predicted to localize to the mitochondria likely participates in energy metabolism. By knowing where a protein goes, you gain insight into its biological role—even before conducting any experiments. This is an example of how sequence information can be leveraged to infer function through intermediate characteristics of the protein. Network and Systems Biology Understanding Molecular Interaction Networks Proteins do not function in isolation. They interact with other molecules constantly, forming complex networks of interactions that collectively carry out cellular processes. Molecular interaction networks include several types: Protein-protein interactions (PPI): Direct physical contacts between two or more proteins. These might involve enzyme-substrate pairs, regulatory protein complexes, or structural assemblies. Protein-ligand interactions: Proteins binding small molecules like drugs, metabolites, or signaling molecules. Protein-peptide interactions: Proteins binding short peptide sequences, important in signaling and immune recognition. These networks can be visualized as graphs where proteins (or molecules) are nodes and interactions are edges connecting them. This image shows a protein-protein interaction network—each node is a protein, and lines represent known interactions. The complexity and interconnectedness are striking: individual proteins often interact with dozens of partners, and removing or altering one protein affects the entire system. Understanding these networks is essential because: Robustness and redundancy: Networks often have backup pathways, so single failures don't crash the system Information flow: Interaction networks determine how signals propagate through cells Disease mechanisms: Disease often involves disrupted or altered interactions Drug discovery: Drugs work by interfering with specific molecular interactions Systems Biology Simulations While studying individual proteins and their interactions is valuable, systems biology takes a broader view: it uses computer simulations to model entire cellular subsystems and understand their behavior as integrated units. Systems biology simulates several important types of cellular networks: Metabolic pathways are networks of enzymatic reactions that transform molecules (like glucose) into energy and building blocks. A simulation can model how changing one enzyme's activity affects the production of end products several steps downstream. Signal transduction pathways are communication cascades where one protein activates the next, transmitting signals from cell surface receptors to the nucleus. Simulations can predict how a cell responds to hormones or growth factors by modeling the cascade of protein activations. Gene regulatory networks model how transcription factors (proteins that control gene expression) regulate each other and downstream genes. These networks often contain feedback loops where a protein can regulate its own production, leading to complex behaviors like oscillations or bistability. The power of systems biology is that it allows predictions impossible from studying components alone. A simple metabolic pathway might seem to have obvious behavior, but when you add feedback regulation and multiple interconnections, surprising behaviors emerge—oscillations, switch-like responses, or unexpected sensitivities to parameter changes. Systems biology simulations typically: Define the molecular components and their interactions Assign reaction rates and binding constants (often from experimental data) Run computer simulations to predict system behavior Compare predictions to experimental observations Refine the model based on discrepancies This iterative cycle between computation and experiment has become central to modern molecular cell biology. Summary The field of structural bioinformatics and systems biology provides powerful tools for understanding proteins and cellular organization: Protein structure exists at multiple hierarchical levels, with primary sequence generally determining all higher levels Homology modeling exploits evolutionary conservation to predict unknown structures from known ones AlphaFold has revolutionized structure prediction using deep learning Protein localization prediction connects sequence to cellular function Molecular interaction networks visualize the complexity of protein relationships Systems biology simulations integrate networks of interactions to predict emergent cellular behaviors These approaches together transform sequence information—the raw output of DNA sequencing—into functional understanding of how proteins and cells work.

Flashcards

What is the primary structure of a protein?

The linear amino acid sequence

From what genetic sequence is the primary structure of a protein derived?

The codon sequence of the encoding gene

What does the primary structure of a protein generally determine?

The three-dimensional native structure

What is a notable exception where the primary structure does not correctly determine the native structure?

Misfolded prion proteins

What are the four levels of protein structure?

Primary structure Secondary structure Tertiary structure Quaternary structure

How does homology modeling predict the structure of an unknown protein?

By using known structures of homologous proteins

What type of software is AlphaFold?

Deep-learning software for protein structure prediction

Which organization developed the AlphaFold software?

DeepMind

What are the three types of interactions included in molecular interaction networks?

Protein-protein interactions Protein-ligand interactions Protein-peptide interactions

Quiz

What term describes the linear amino acid sequence of a protein that is derived from the coding gene’s codon sequence?

1 of 4

Key Concepts

Protein Structure and Prediction

Primary structure

Homology modeling

AlphaFold

Structural bioinformatics

Biological Networks and Pathways

Molecular interaction network

Gene regulatory network

Metabolic pathway

Signal transduction pathway

Systems biology

Protein Localization

Protein subcellular localization

Definitions

Primary structure

The linear amino‑acid sequence of a protein, encoded directly by the gene’s codons.

Homology modeling

A computational method that predicts the three‑dimensional structure of an unknown protein using the known structures of homologous proteins.

AlphaFold

A deep‑learning system developed by DeepMind that predicts protein structures with accuracy surpassing earlier computational approaches.

Protein subcellular localization

The prediction of the cellular compartment where a protein resides, providing clues to its biological function.

Molecular interaction network

A graph representing physical or functional interactions among biomolecules such as proteins, ligands, and peptides.

Systems biology

An interdisciplinary field that uses computational models to simulate and analyze complex cellular processes and networks.

Metabolic pathway

A series of enzymatically catalyzed chemical reactions within a cell that convert substrates into products, supporting metabolism.

Signal transduction pathway

A cascade of molecular events by which cells respond to external signals and convert them into specific cellular actions.

Gene regulatory network

An interconnected system of DNA, RNA, proteins, and other molecules that control the expression levels of genes.

Structural bioinformatics

The sub‑discipline of bioinformatics focused on the analysis and prediction of the three‑dimensional structures of biological macromolecules.