Protein conformational transitions calculations
Protein Conformational Transitions calculations tutorial using BioExcel Building Blocks (biobb) and GOdMD
This tutorial aims to illustrate the process of computing a conformational transition between two known structural conformations of a protein, step by step, using the BioExcel Building Blocks library (biobb).
Examples shown are the calculation of the conformational transition for the Adenylate Kinase protein, from the closed state (PDB Code 1AKE, https://doi.org/10.2210/pdb1AKE/pdb) to the open state (PDB Code 4AKE, https://doi.org/10.2210/pdb4AKE/pdb). Adenylate Kinases are phosphotransferase enzymes that catalyze the interconversion of the various adenosine phosphates (ATP, ADP, and AMP), and are known to undergo large conformational changes during their catalytic cycle.
The code wrapped is the GOdMD method, developed in the Molecular Modeling and Bioinformatics group (IRB Barcelona). GOdMD determines pathways for conformational transitions in macromolecules using discrete molecular dynamics and biasing techniques based on a combination of essential dynamics and Maxwell-Demon sampling techniques. A web implementation of the method can be found here: https://mmb.irbbarcelona.org/GOdMD/index.php
Exploration of conformational transition pathways from coarse-grained simulations.
Sfriso P, Hospital A, Emperador A, Orozco M.
Bioinformatics, 129(16):1980-6.
Available at: https://doi.org/10.1093/bioinformatics/btt324
Settings
Biobb modules used
- biobb_godmd: Tools to compute protein conformational transitions using GOdMD.
- biobb_io: Tools to fetch biomolecular data from public databases.
- biobb_structure_utils: Tools to modify or extract information from a PDB structure.
- biobb_analysis: Tools to analyse Molecular Dynamics trajectories.
Auxiliary libraries used
- emboss: Software that automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web.
- jupyter: Free software, open standards, and web services for interactive computing across all programming languages.
- plotly: Python interactive graphing library integrated in Jupyter notebooks.
- nglview: Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
- simpletraj: Lightweight coordinate-only trajectory reader based on code from GROMACS, MDAnalysis and VMD.
Conda Installation and Launch
git clone https://github.com/bioexcel/biobb_wf_godmd.git
cd biobb_wf_godmd
conda env create -f conda_env/environment.yml
conda activate biobb_wf_godmd
jupyter-notebook biobb_wf_godmd/notebooks/biobb_wf_godmd.ipynb
Pipeline steps
- Input Parameters
- Fetching Structures
- Preparing Structures
- Residue Mapping
- Conformational Transition
- Transition Visualization
- Questions & Comments
Initializing colab
The two cells below are used only in case this notebook is executed via Google Colab. Take into account that, for running conda on Google Colab, the condacolab library must be installed. As explained here, the installation requires a kernel restart, so when running this notebook in Google Colab, don't run all cells until this installation is properly finished and the kernel has restarted.
# Only executed when using google colab
import sys
if 'google.colab' in sys.modules:
import subprocess
from pathlib import Path
try:
subprocess.run(["conda", "-V"], check=True)
except FileNotFoundError:
subprocess.run([sys.executable, "-m", "pip", "install", "condacolab"], check=True)
import condacolab
condacolab.install()
# Clone repository
repo_URL = "https://github.com/bioexcel/biobb_wf_godmd.git"
repo_name = Path(repo_URL).name.split('.')[0]
if not Path(repo_name).exists():
subprocess.run(["mamba", "install", "-y", "git"], check=True)
subprocess.run(["git", "clone", repo_URL], check=True)
print("⏬ Repository properly cloned.")
# Install environment
print("⏳ Creating environment...")
env_file_path = f"{repo_name}/conda_env/environment.yml"
subprocess.run(["mamba", "env", "update", "-n", "base", "-f", env_file_path], check=True)
print("🎨 Install NGLView dependencies...")
subprocess.run(["mamba", "install", "-y", "-c", "conda-forge", "nglview==3.0.8", "ipywidgets=7.7.2"], check=True)
print("👍 Conda environment successfully created and updated.")
# Enable widgets for colab
if 'google.colab' in sys.modules:
from google.colab import output
output.enable_custom_widget_manager()
# Change working dir
import os
os.chdir("biobb_wf_godmd/biobb_wf_godmd/notebooks")
print(f"📂 New working directory: {os.getcwd()}")
Input parameters
Input parameters needed:
-
y libraries: Libraries to be used in the workflow are imported once at the beginning
-
pdbOrigin: PDB code for the origin structure (e.g. 1AKE, https://doi.org/10.2210/pdb1AKE/pdb)
-
chainOrigin: Chain for the origin structure (e.g. A)
-
ligandOrigin: Ligand (if present) for the origin structure (e.g. AP5, Drugbank DB01717)
-
pdbTarget: PDB code for the target structure (e.g. 4AKE, https://doi.org/10.2210/pdb4AKE/pdb)
-
chainTarget: Chain for the target structure (e.g. A)
-
ligandTarget: Ligand (if present) for the target structure (e.g. None)
import os
import plotly
import plotly.graph_objs as go
import nglview
import ipywidgets
# Adenylate Kinase (ADK)
pdbOrigin = "1ake"
chainOrigin = "A"
ligandOrigin = "AP5"
pdbTarget = "4ake"
chainTarget = "A"
ligandTarget = None
# Other Examples (taken from https://mmb.irbbarcelona.org/TransAtlas/)
# Estrogen Receptor Alpha (ERα)
# pdbOrigin = "1a52"
# pdbTarget = "3dt3"
# chainOrigin = "A"
# chainTarget = "B"
# ligandOrigin = "EST"
# ligandTarget = "369"
# Calmodulin (CaM)
# pdbOrigin = "1deg"
# pdbTarget = "2f2o"
# chainOrigin = "A"
# chainTarget = "A"
# ligandOrigin = None
# ligandTarget = None
Fetching PDB structures
Downloading PDB structures with the origin and target protein molecules from the RCSB PDB database.
Alternatively, PDB files can be used as starting structures.
Building Blocks used:
- Pdb from biobb_io.api.pdb
# Import module
from biobb_io.api.pdb import pdb
# Create properties dict and inputs/outputs
originPDB = pdbOrigin+'.pdb'
targetPDB = pdbTarget+'.pdb'
prop_origin = {
'pdb_code': pdbOrigin,
'filter': False
}
prop_target = {
'pdb_code': pdbTarget,
'filter': False
}
# Launch bb for Origin PDB
pdb(output_pdb_path=originPDB,
properties=prop_origin)
# Launch bb for Target PDB
pdb(output_pdb_path=targetPDB,
properties=prop_target)
# Show different structures (for comparison)
view1 = nglview.show_structure_file(originPDB)
#view1.add_representation(repr_type='ball+stick')
view1._remote_call('setSize', target='Widget', args=['400px','400px'])
view1.camera='orthographic'
view1
view2 = nglview.show_structure_file(targetPDB)
#view2.add_representation(repr_type='ball+stick')
view2._remote_call('setSize', target='Widget', args=['400px','400px'])
view2.camera='orthographic'
view2
ipywidgets.HBox([view1, view2])
Preparing the structures
Preparing the structures to be used for the conformational transition calculation.
- Extracting the interesting chains from the PDB structures (chain A in both cases).
- Removing the inhibitor AP5A from the closed conformation.
Building Blocks used:
- extract_chain from biobb_structure_utils.utils.extract_chain
- remove_molecules from biobb_structure_utils.utils.remove_molecules
Extracting the interesting chains from the PDB structures (chain A in both cases).
from biobb_structure_utils.utils.extract_chain import extract_chain
originPDB_chain = pdbOrigin + ".chains.pdb"
targetPDB_chain = pdbTarget + ".chains.pdb"
prop = {
'chains': [ chainOrigin ]
}
# Launch bb for Origin PDB
extract_chain(
input_structure_path=originPDB,
output_structure_path=originPDB_chain,
properties=prop
)
prop = {
'chains': [ chainTarget ]
}
# Launch bb for Target PDB
extract_chain(
input_structure_path=targetPDB,
output_structure_path=targetPDB_chain,
properties=prop
)
Removing the inhibitor AP5A from the closed conformation.
from biobb_structure_utils.utils.remove_molecules import remove_molecules
originPDB_chain_nolig = pdbOrigin + ".chains.nolig.pdb"
targetPDB_chain_nolig = pdbTarget + ".chains.nolig.pdb"
if ligandOrigin:
prop = {
'molecules': [
{
'name' : ligandOrigin
}
]
}
remove_molecules(input_structure_path=originPDB_chain,
output_molecules_path=originPDB_chain_nolig,
properties=prop)
else:
originPDB_chain_nolig = originPDB_chain
if ligandTarget:
prop = {
'molecules': [
{
'name' : ligandTarget
}
]
}
remove_molecules(input_structure_path=targetPDB_chain,
output_molecules_path=targetPDB_chain_nolig,
properties=prop)
else:
targetPDB_chain_nolig = targetPDB_chain
# Show different structures generated (for comparison)
view1 = nglview.show_structure_file(originPDB_chain_nolig)
#view1.add_representation(repr_type='ball+stick')
view1._remote_call('setSize', target='Widget', args=['400px','400px'])
view1.camera='orthographic'
view1
view2 = nglview.show_structure_file(targetPDB_chain_nolig)
#view2.add_representation(repr_type='ball+stick')
view2._remote_call('setSize', target='Widget', args=['400px','400px'])
view2.camera='orthographic'
view2
ipywidgets.HBox([view1, view2])
Computing the mapping
GOdMD works by moving the atoms from the position of the origin structure to the one in the target structure. For that, a one-to-one correspondance is needed. This step builds a couple of mapping files (.aln) from an internal sequence alignment (using EMBOSS water pairwise sequence alignment tool).
Building Blocks used:
- godmd_prep from biobb_godmd.godmd.godmd_prep
from biobb_godmd.godmd.godmd_prep import godmd_prep
originALN = pdbOrigin + ".aln"
targetALN = pdbTarget + ".aln"
prop = {
'gapopen' : '12.0',
'gapextend' : '2'
}
godmd_prep( input_pdb_orig_path=originPDB_chain_nolig,
input_pdb_target_path=targetPDB_chain_nolig,
output_aln_orig_path=originALN,
output_aln_target_path=targetALN,
properties=prop)
Running GOdMD
Computing the conformational transition, from the origin to the target structure, using GOdMD and the mappings generated in the previous step. The output file is a trajectory file in mdcrd format.
Building Blocks used:
- godmd_run from biobb_godmd.godmd.godmd_run
from biobb_godmd.godmd.godmd_run import godmd_run
godmd_log = pdbOrigin + "-" + pdbTarget + ".godmd.log"
godmd_trj = pdbOrigin + "-" + pdbTarget + ".godmd.mdcrd"
godmd_ene = pdbOrigin + "-" + pdbTarget + ".godmd.ene.out"
godmd_pdb = pdbOrigin + "-" + pdbTarget + ".godmd.pdb"
prop = {
'godmdin':{
'temp' : 400
}
}
godmd_run( input_pdb_orig_path=originPDB_chain_nolig,
input_pdb_target_path=targetPDB_chain_nolig,
input_aln_orig_path=originALN,
input_aln_target_path=targetALN,
output_log_path=godmd_log,
output_ene_path=godmd_ene,
output_trj_path=godmd_trj,
output_pdb_path=godmd_pdb,
properties=prop)
Converting trajectory to XTC (visualization)
Converting the generated GOdMD trajectory from the mdcrd format to a xtc format, for the sake of visualization with NGL (see next cell).
Building Blocks used:
- cpptraj_convert from biobb_analysis.ambertools.cpptraj_convert
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert
godmd_trj_xtc = pdbOrigin + "-" + pdbTarget + ".godmd.xtc"
prop = {
'format': 'xtc'
}
cpptraj_convert(input_top_path=godmd_pdb,
input_traj_path=godmd_trj,
output_cpptraj_path=godmd_trj_xtc,
properties=prop)
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(godmd_trj_xtc, godmd_pdb), gui=True)
view.add_representation(repr_type='tube', colorScheme = 'atomindex')
# Origin Structure (comment when executing in google colab)
b = view.add_component(nglview.FileStructure(originPDB_chain_nolig))
b.clear_representations()
b.add_representation(repr_type='tube',
opacity=.2,
color='blue')
# Target Structure (comment when executing in google colab)
c = view.add_component(nglview.FileStructure(targetPDB_chain_nolig))
c.clear_representations()
c.add_representation(repr_type='tube',
opacity=.2,
color='red')
# Align origin and target
code = """
var stage = this.stage;
var clist_len = stage.compList.length;
var i = 0;
var s = [];
for(i = 0; i <= clist_len; i++){
if(stage.compList[i] != undefined && stage.compList[i].structure != undefined) {
s.push(stage.compList[i])
}
}
NGL.superpose(s[1].structure, s[2].structure, true, ".CA")
s[ 1 ].updateRepresentations({ position: true })
s[ 1 ].autoView()
"""
# (comment when executing in google colab)
view._execute_js_code(code)
view._remote_call('setSize', target='Widget', args=['800px','600px'])
view