AMBER Constant pH MD setup tutorial using BioExcel Building Blocks (biobb)
AMBER Constant pH MD setup tutorial using BioExcel Building Blocks (biobb)
Partly based on:
- AMBER Advanced Tutorial 18: Constant pH MD Example, Calculating pKas for titratable side chains in HEWL by Jason Swails and T. Dwight McGee Jr.
- AMBER Advanced Tutorial 33: Constant pH and Redox Potential MD Example: Predicting pH-dependent standard Redox Potential values by Vinícius Wilian D. Cruzeiro.
- Modeling of pH sensors for CLN025 beta-hairpin by Jordi Juárez, Barril Lab, University of Barcelona.
- Constant pH MD simulation tutorial of BPTI protein in implicit solvent by Wei Zhang, University of the Pacific Stockton.
This tutorial aims to illustrate the process of setting up a simulation system to run constant pH Molecular Dynamics simulations with AMBER, step by step, using the BioExcel Building Blocks library (biobb) wrapping the AmberTools utility from the AMBER package. The particular example used is the Bovine Pancreatic Trypsin Inhibitor (BPTI) protein (PDB code 6PTI, https://doi.org/10.2210/pdb6PTI/pdb).
Settings
Biobb modules used
- biobb_io: Tools to fetch biomolecular data from public databases.
- biobb_amber: Tools to setup and run Molecular Dynamics simulations with AmberTools.
Auxiliary libraries used
- jupyter: Free software, open standards, and web services for interactive computing across all programming languages.
- plotly: Python interactive graphing library integrated in Jupyter notebooks.
- nglview: Jupyter/IPython widget to interactively view molecular structures and trajectories in notebooks.
- simpletraj: Lightweight coordinate-only trajectory reader based on code from GROMACS, MDAnalysis and VMD.
- gfortran: Fortran 95/2003/2008/2018 compiler for GCC, the GNU Compiler Collection.
- libgfortran5: Fortran compiler and libraries from the GNU Compiler Collection.
Conda Installation
git clone https://github.com/bioexcel/biobb_wf_amber.git
cd biobb_wf_amber
conda env create -f conda_env/environment.yml
conda activate biobb_wf_amber
jupyter-notebook biobb_wf_amber/notebooks/md_setup_ph/biobb_wf_amber_md_setup_ph.ipynb
Pipeline steps
- Input Parameters
- Fetching the PDB structure
- Preparing the PDB file for Amber
- Create Protein System Topology
- Create Solvent Box and Solvating the System
- Adding Ions
- Generating the constant pH input file
- Energetically Minimize the System
- Heating the System
- Equilibrate the System (NVT)
- Equilibrate the System (NPT)
- Constant pH Molecular Dynamics Simulation
- Output Files
- Questions & Comments
Initializing colab
The two cells below are used only in case this notebook is executed via Google Colab. Take into account that, for running conda on Google Colab, the condacolab library must be installed. As explained here, the installation requires a kernel restart, so when running this notebook in Google Colab, don't run all cells until this installation is properly finished and the kernel has restarted.
# Only executed when using google colab
import sys
if 'google.colab' in sys.modules:
import subprocess
from pathlib import Path
try:
subprocess.run(["conda", "-V"], check=True)
except FileNotFoundError:
subprocess.run([sys.executable, "-m", "pip", "install", "condacolab"], check=True)
import condacolab
condacolab.install()
# Clone repository
repo_URL = "https://github.com/bioexcel/biobb_wf_amber.git"
repo_name = Path(repo_URL).name.split('.')[0]
if not Path(repo_name).exists():
subprocess.run(["mamba", "install", "-y", "git"], check=True)
subprocess.run(["git", "clone", repo_URL], check=True)
print("⏬ Repository properly cloned.")
# Install environment
print("⏳ Creating environment...")
env_file_path = f"{repo_name}/conda_env/environment.yml"
subprocess.run(["mamba", "env", "update", "-n", "base", "-f", env_file_path], check=True)
print("🎨 Install NGLView dependencies...")
subprocess.run(["mamba", "install", "-y", "-c", "conda-forge", "nglview==3.0.8", "ipywidgets=7.7.2"], check=True)
print("👍 Conda environment successfully created and updated.")
# Enable widgets for colab
if 'google.colab' in sys.modules:
from google.colab import output
output.enable_custom_widget_manager()
# Change working dir
import os
os.chdir("biobb_wf_amber/biobb_wf_amber/notebooks/md_setup_ph")
print(f"📂 New working directory: {os.getcwd()}")
Input parameters
Input parameters needed:
- pdbCode: PDB code of the protein structure (e.g. 6PTI, https://doi.org/10.2210/pdb6PTI/pdb)
import nglview
import ipywidgets
import plotly
import sys
from plotly import subplots
import plotly.graph_objs as go
pdbCode="6PTI"
Fetching PDB structure
Downloading PDB structure with the protein molecule from the RCSB PDB database.
Alternatively, a PDB file can be used as starting structure.
Building Blocks used:
- pdb from biobb_io.api.pdb
# Import module
from biobb_io.api.pdb import pdb
# Create properties dict and inputs/outputs
downloaded_pdb = pdbCode+'.pdb'
prop = {
'pdb_code': pdbCode
}
#Create and launch bb
pdb(output_pdb_path=downloaded_pdb,
properties=prop)
# Show protein
view = nglview.show_structure_file(downloaded_pdb)
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
Preparing PDB file for AMBER
Before starting a protein MD setup, it is always strongly recommended to take a look at the initial structure and try to identify important properties and also possible issues. These properties and issues can be serious, as for example the definition of disulfide bridges, the presence of a non-standard aminoacids or ligands, or missing residues. Other properties and issues might not be so serious, but they still need to be addressed before starting the MD setup process. Missing hydrogen atoms, presence of alternate atomic location indicators or inserted residue codes (see PDB file format specification) are examples of these not so crucial characteristics. Please visit the AMBER tutorial: Building Protein Systems in Explicit Solvent for more examples. AmberTools utilities from AMBER MD package contain a tool able to analyse PDB files and clean them for further usage, especially with the AmberTools LEaP program: the pdb4amber tool. The next step of the workflow is running this tool to analyse our input PDB structure.
For the particular BPTI example, the most important features that are going to be used from the pdb4amber utility is the identification of disulfide bridges in the structure and the renaming of ionizable residues for our constant pH calculation. Disulfide bridges are marked changing the residue names from CYS to CYX, which is the code that AMBER force fields use to distinguish between cysteines forming or not forming disulfide bridges. This will be used in the following step to correctly form a bond between these cysteine residues. Ionizable residues are also marked changing the residue names, from the deprotonated form to the protonated one (e.g. from ASN to AS4, from GLN to GL4, and from HIS to HIP). Cysteine, Lysine and Tyrosine residue names are not changed as they are already in their protonated form at phisiological pH (pKa > 7.4).
Building Blocks used:
- pdb4amber_run from biobb_amber.pdb4amber.pdb4amber_run
# Import module
from biobb_amber.pdb4amber.pdb4amber_run import pdb4amber_run
# Create prop dict and inputs/outputs
output_pdb4amber_path = 'structure.pdb4amber.pdb'
prop = {
'constant_pH' : True
}
# Create and launch bb
pdb4amber_run(input_pdb_path=downloaded_pdb,
output_pdb_path=output_pdb4amber_path,
properties=prop)
# Show protein
view = nglview.show_structure_file(output_pdb4amber_path)
view.add_representation(repr_type='ball+stick', selection='all')
view.add_representation(repr_type='ball+stick', radius='0.5', selection='GL4 AS4')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
Create protein system topology
Building AMBER topology corresponding to the protein structure.
The force field used in this tutorial is ff14SB, an evolution of the ff99SB force field with improved accuracy of protein side chains and backbone parameters, and the constph force field, including the constant pH parameters. Water molecules type used in this tutorial is tip3p.
Generating three output files:
- AMBER structure (PDB file)
- AMBER topology (AMBER Parmtop file)
- AMBER coordinates (AMBER Coordinate/Restart file)
Building Blocks used:
- leap_gen_top from biobb_amber.leap.leap_gen_top
# Import module
from biobb_amber.leap.leap_gen_top import leap_gen_top
# Create prop dict and inputs/outputs
output_pdb_path = 'structure.leap.pdb'
output_top_path = 'structure.leap.top'
output_crd_path = 'structure.leap.crd'
prop = {
"forcefield" : ["protein.ff14SB","constph"]
}
# Create and launch bb
leap_gen_top(input_pdb_path=output_pdb4amber_path,
output_pdb_path=output_pdb_path,
output_top_path=output_top_path,
output_crd_path=output_crd_path,
properties=prop)
# Show protein
view = nglview.show_structure_file(output_pdb_path)
view.add_representation(repr_type='ball+stick', selection='all')
view.add_representation(repr_type='ball+stick', radius='0.3', selection='GL4 AS4')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
Create solvent box and solvating the system
Define the unit cell for the protein structure MD system to fill it with water molecules.
A truncated octahedron box is used to define the unit cell, with a distance from the structure to the box edge of 9.0 Angstroms.
The solvent type used is the default TIP3P water model, a generic 3-point solvent model.
Building Blocks used:
- leap_solvate from biobb_amber.leap.leap_solvate
# Import module
from biobb_amber.leap.leap_solvate import leap_solvate
# Create prop dict and inputs/outputs
output_solv_pdb_path = 'structure.solv.pdb'
output_solv_top_path = 'structure.solv.parmtop'
output_solv_crd_path = 'structure.solv.crd'
prop = {
"forcefield" : ["protein.ff14SB","constph"],
"water_type": "TIP3PBOX",
"distance_to_molecule": "9.0",
"box_type": "truncated_octahedron"
}
# Create and launch bb
leap_solvate(input_pdb_path=output_pdb_path,
output_pdb_path=output_solv_pdb_path,
output_top_path=output_solv_top_path,
output_crd_path=output_solv_crd_path,
properties=prop)
# Show protein
view = nglview.show_structure_file(output_solv_pdb_path)
view.clear_representations()
view.add_representation(repr_type='cartoon', selection='protein')
view.add_representation(repr_type='ball+stick', selection='protein')
view.add_representation(repr_type='line', selection='solvent')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
Adding ions
Neutralizing the system and adding an additional ionic concentration using the leap tool from the AMBER MD package.
Using Sodium (Na+) and Chloride (Cl-) counterions and an additional ionic concentration of 150mM.
Building Blocks used:
- leap_add_ions from biobb_amber.leap.leap_add_ions
# Import module
from biobb_amber.leap.leap_add_ions import leap_add_ions
# Create prop dict and inputs/outputs
output_ions_pdb_path = 'structure.ions.pdb'
output_ions_top_path = 'structure.ions.parmtop'
output_ions_crd_path = 'structure.ions.crd'
prop = {
"forcefield" : ["protein.ff14SB","constph"],
"neutralise" : True,
"box_type": "truncated_octahedron"
}
# Create and launch bb
leap_add_ions(input_pdb_path=output_solv_pdb_path,
output_pdb_path=output_ions_pdb_path,
output_top_path=output_ions_top_path,
output_crd_path=output_ions_crd_path,
properties=prop)
# Show protein
view = nglview.show_structure_file(output_ions_pdb_path)
view.clear_representations()
view.add_representation(repr_type='cartoon', selection='protein')
view.add_representation(repr_type='ball+stick', selection='protein')
view.add_representation(repr_type='line', selection='solvent')
view.add_representation(repr_type='spacefill', selection='Cl- Na+', color='green')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
Generating the constant pH input file
Generating the constant pH input file (cpin file) using the cpinutil.py program from the AmberTools MD package. This step is identifying which residues should be titrated during the course of the MD simulation.
In this particular example, the residues we are interested in titrating during the simulation are Glutamates (GL4), Aspartates (AS4), Cysteines (CYS), Lysines (LYS) and Tyrosines (TYR) (the structure used in this example doesn't contain Histidine residues).
Building Blocks used:
- parmed_cpinutil from biobb_amber.parmed.parmed_cpinutil
# Import module
from biobb_amber.parmed.parmed_cpinutil import parmed_cpinutil
# Create prop dict and inputs/outputs
output_cpin_path = 'structure.cpin'
output_top_cpin_path = 'structure.cpH.parmtop'
prop = {
"igb" : 2,
"resnames": "AS4 GL4 CYS LYS TYR", # No Histidines in our structure
"system": "BPTI"
}
# Create and launch bb
parmed_cpinutil(input_top_path=output_ions_top_path,
output_cpin_path=output_cpin_path,
output_top_path=output_top_cpin_path,
properties=prop)
Energetically minimize the system
Energetically minimize the system (protein structure + solvent + ions) using the sander tool from the AMBER MD package. Restraining backbone atoms with a force constant of 10 Kcal/mol.Å2 to their initial positions.
- Step 1: Energetically minimize the system through 500 minimization cycles.
- Step 2: Checking energy minimization results. Plotting energy by time during the minimization process.
Building Blocks used:
- sander_mdrun from biobb_amber.sander.sander_mdrun
- process_minout from biobb_amber.process.process_minout
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun
# Create prop dict and inputs/outputs
output_min_traj_path = 'sander.cpH.x'
output_min_rst_path = 'sander.cpH.rst'
output_min_log_path = 'sander.cpH.log'
prop = {
"simulation_type" : "minimization",
"mdin" : {
'maxcyc' : 500,
'ntr' : 1, # Turn on positional restraints
'restraint_wt' : 10, # 10 kcal/mol/A**2 restraint force constant
'restraintmask' : '\"@CA,C,O,N\"' # Restraints on the backbone atoms only
}
}
# Create and launch bb
sander_mdrun(input_top_path=output_top_cpin_path,
input_crd_path=output_ions_crd_path,
input_ref_path=output_ions_crd_path,
output_traj_path=output_min_traj_path,
output_rst_path=output_min_rst_path,
output_log_path=output_min_log_path,
properties=prop)
# Import module
from biobb_amber.process.process_minout import process_minout
# Create prop dict and inputs/outputs
output_h_min_dat_path = 'sander.min.energy.dat'
prop = {
"terms" : ['ENERGY']
}
# Create and launch bb
process_minout(input_log_path=output_min_log_path,
output_dat_path=output_h_min_dat_path,
properties=prop)
import plotly.graph_objs as go
with open(output_h_min_dat_path, 'r') as energy_file:
x, y = zip(*[
(float(line.split()[0]), float(line.split()[1]))
for line in energy_file
if not line.startswith(("#", "@"))
if float(line.split()[1]) < 1000
])
# Create a scatter plot
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines'))
# Update layout
fig.update_layout(title="Energy Minimization",
xaxis_title="Energy Minimization Step",
yaxis_title="Potential Energy kcal/mol",
height=600)
# Show the figure (renderer changes for colab and jupyter)
rend = 'colab' if 'google.colab' in sys.modules else ''
fig.show(renderer=rend)
Heating the system
Warming up the prepared system using the sander tool from the AMBER MD package. Going from 0 to the desired temperature, in this particular example, 300K. Protein backbone atoms restrained (force constant of 2 Kcal/mol). Length 5ps.
- Step 1: Warming up the system through 500 MD steps.
- Step 2: Checking results for the system warming up. Plotting temperature along time during the heating process.
Building Blocks used:
- sander_mdrun from biobb_amber.sander.sander_mdrun
- process_mdout from biobb_amber.process.process_mdout
Step 1: Warming up the system
The heat type of the simulation_type property contains the main default parameters to run a system warming up:
- imin = 0; Run MD (no minimization)
- ntx = 5; Read initial coords and vels from restart file
- cut = 10.0; Cutoff for non bonded interactions in Angstroms
- ntr = 0; No restrained atoms
- ntc = 2; SHAKE for constraining length of bonds involving Hydrogen atoms
- ntf = 2; Bond interactions involving H omitted
- ntt = 3; Constant temperature using Langevin dynamics
- ig = -1; Seed for pseudo-random number generator
- ioutfm = 1; Write trajectory in netcdf format
- iwrap = 1; Wrap coords into primary box
- nstlim = 5000; Number of MD steps
- dt = 0.002; Time step (in ps)
- tempi = 0.0; Initial temperature (0 K)
- temp0 = 300.0; Final temperature (300 K)
- irest = 0; No restart from previous simulation
- ntb = 1; Periodic boundary conditions at constant volume
- gamma_ln = 1.0; Collision frequency for Langevin dynamics (in 1/ps)
In this particular example, the heating of the system is done in 2500 steps (5ps) and is going from 0K to 300K (note that the number of steps has been reduced in this tutorial, for the sake of time).
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun
# Create prop dict and inputs/outputs
output_heat_traj_path = 'sander.heat.netcdf'
output_heat_rst_path = 'sander.heat.rst'
output_heat_log_path = 'sander.heat.log'
prop = {
"simulation_type" : "heat",
"mdin" : {
'nstlim' : 2500, # Reducing the number of steps for the sake of time (5ps)
'ntr' : 1, # Turn on positional restraints
'restraintmask' : '\"@CA,C,O,N\"', # Restraining protein backbone atoms
'restraint_wt' : 2.0 # With a force constant of 2 Kcal/mol*A2
}
}
# Create and launch bb
sander_mdrun(input_top_path=output_top_cpin_path,
input_crd_path=output_min_rst_path,
input_ref_path=output_min_rst_path,
output_traj_path=output_heat_traj_path,
output_rst_path=output_heat_rst_path,
output_log_path=output_heat_log_path,
properties=prop)
# Import module
from biobb_amber.process.process_mdout import process_mdout
# Create prop dict and inputs/outputs
output_dat_heat_path = 'sander.md.temp.dat'
prop = {
"terms" : ['TEMP']
}
# Create and launch bb
process_mdout(input_log_path=output_heat_log_path,
output_dat_path=output_dat_heat_path,
properties=prop)
import plotly.graph_objs as go
with open(output_dat_heat_path, 'r') as energy_file:
x, y = zip(*[
(float(line.split()[0]), float(line.split()[1]))
for line in energy_file
if not line.startswith(("#", "@"))
if float(line.split()[1]) < 1000
])
# Create a scatter plot
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines'))
# Update layout
fig.update_layout(title="Heating process",
xaxis_title="Heating Step (ps)",
yaxis_title="Temperature (K)",
height=600)
# Show the figure (renderer changes for colab and jupyter)
rend = 'colab' if 'google.colab' in sys.modules else ''
fig.show(renderer=rend)
Equilibrate the system (NVT)
Equilibrate the protein system in NVT ensemble (constant Number of particles, Volume and Temperature). Protein backbone atoms will be restrained using position restraining forces: movement is permitted, but only after overcoming a substantial energy penalty.
- Step 1: Equilibrate the protein system with NVT ensemble.
- Step 2: Checking NVT Equilibration results. Plotting system temperature by time during the NVT equilibration process.
Building Blocks used:
- sander_mdrun from biobb_amber.sander.sander_mdrun
- process_mdout from biobb_amber.process.process_mdout
Step 1: Equilibrating the system (NVT)
The nvt type of the simulation_type property contains the main default parameters to run a system equilibration in NVT ensemble:
- imin = 0; Run MD (no minimization)
- ntx = 5; Read initial coords and vels from restart file
- cut = 10.0; Cutoff for non bonded interactions in Angstroms
- ntr = 0; No restrained atoms
- ntc = 2; SHAKE for constraining length of bonds involving Hydrogen atoms
- ntf = 2; Bond interactions involving H omitted
- ntt = 3; Constant temperature using Langevin dynamics
- ig = -1; Seed for pseudo-random number generator
- ioutfm = 1; Write trajectory in netcdf format
- iwrap = 1; Wrap coords into primary box
- nstlim = 5000; Number of MD steps
- dt = 0.002; Time step (in ps)
- irest = 1; Restart previous simulation
- ntb = 1; Periodic boundary conditions at constant volume
- gamma_ln = 5.0; Collision frequency for Langevin dynamics (in 1/ps)
In this particular example, the NVT equilibration of the system is done in 500 steps (note that the number of steps has been reduced in this tutorial, for the sake of time).
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun
# Create prop dict and inputs/outputs
output_nvt_traj_path = 'sander.nvt.netcdf'
output_nvt_rst_path = 'sander.nvt.rst'
output_nvt_log_path = 'sander.nvt.log'
prop = {
"simulation_type" : 'nvt',
"mdin" : {
'nstlim' : 500, # Reducing the number of steps for the sake of time (1ps)
'ntr' : 1, # Turn on positional restraints
'restraintmask' : '\"@CA,C,O,N\"', # Restraining protein backbone atoms
'restraint_wt' : 0.1 # With a force constant of 0.1 Kcal/mol*A2
}
}
# Create and launch bb
sander_mdrun(input_top_path=output_top_cpin_path,
input_crd_path=output_heat_rst_path,
input_ref_path=output_heat_rst_path,
output_traj_path=output_nvt_traj_path,
output_rst_path=output_nvt_rst_path,
output_log_path=output_nvt_log_path,
properties=prop)
# Import module
from biobb_amber.process.process_mdout import process_mdout
# Create prop dict and inputs/outputs
output_dat_nvt_path = 'sander.md.nvt.temp.dat'
prop = {
"terms" : ['TEMP']
}
# Create and launch bb
process_mdout(input_log_path=output_nvt_log_path,
output_dat_path=output_dat_nvt_path,
properties=prop)
import plotly.graph_objs as go
with open(output_dat_nvt_path, 'r') as energy_file:
x, y = zip(*[
(float(line.split()[0]), float(line.split()[1]))
for line in energy_file
if not line.startswith(("#", "@"))
if float(line.split()[1]) < 1000
])
# Create a scatter plot
fig = go.Figure(data=go.Scatter(x=x, y=y, mode='lines'))
# Update layout
fig.update_layout(title="NVT equilibration",
xaxis_title="Equilibration Step (ps)",
yaxis_title="Temperature (K)",
height=600)
# Show the figure (renderer changes for colab and jupyter)
rend = 'colab' if 'google.colab' in sys.modules else ''
fig.show(renderer=rend)
Equilibrate the system (NPT)
Equilibrate the protein system in NPT ensemble (constant Number of particles, Pressure and Temperature). Protein backbone atoms will be restrained using position restraining forces: movement is permitted, but only after overcoming a substantial energy penalty.
- Step 1: Equilibrate the protein system with NPT ensemble.
- Step 2: Checking NPT Equilibration results. Plotting system pressure and density by time during the NVT equilibration process.
Building Blocks used:
- sander_mdrun from biobb_amber.sander.sander_mdrun
- process_mdout from biobb_amber.process.process_mdout
Step 1: Equilibrating the system (NPT)
The npt type of the simulation_type property contains the main default parameters to run a system equilibration in NPT ensemble:
- imin = 0; Run MD (no minimization)
- ntx = 5; Read initial coords and vels from restart file
- cut = 10.0; Cutoff for non bonded interactions in Angstroms
- ntr = 0; No restrained atoms
- ntc = 2; SHAKE for constraining length of bonds involving Hydrogen atoms
- ntf = 2; Bond interactions involving H omitted
- ntt = 3; Constant temperature using Langevin dynamics
- ig = -1; Seed for pseudo-random number generator
- ioutfm = 1; Write trajectory in netcdf format
- iwrap = 1; Wrap coords into primary box
- nstlim = 5000; Number of MD steps
- dt = 0.002; Time step (in ps)
- irest = 1; Restart previous simulation
- gamma_ln = 5.0; Collision frequency for Langevin dynamics (in 1/ps)
- pres0 = 1.0; Reference pressure
- ntp = 1; Constant pressure dynamics: md with isotropic position scaling
- taup = 2.0; Pressure relaxation time (in ps)
In this particular example, the NPT equilibration of the system is done in 500 steps (note that the number of steps has been reduced in this tutorial, for the sake of time).
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun
# Create prop dict and inputs/outputs
output_npt_traj_path = 'sander.npt.netcdf'
output_npt_rst_path = 'sander.npt.rst'
output_npt_log_path = 'sander.npt.log'
prop = {
"simulation_type" : 'npt',
"mdin" : {
'nstlim' : 500, # Reducing the number of steps for the sake of time (1ps)
'ntr' : 1, # Turn on positional restraints
'restraintmask' : '\"@CA,C,O,N\"', # Restraining protein backbone atoms
'restraint_wt' : 0.1 # With a force constant of 0.1 Kcal/mol*A2
}
}
# Create and launch bb
sander_mdrun(input_top_path=output_top_cpin_path,
input_crd_path=output_nvt_rst_path,
input_ref_path=output_nvt_rst_path,
output_traj_path=output_npt_traj_path,
output_rst_path=output_npt_rst_path,
output_log_path=output_npt_log_path,
properties=prop)
# Import module
from biobb_amber.process.process_mdout import process_mdout
# Create prop dict and inputs/outputs
output_dat_npt_path = 'sander.md.npt.dat'
prop = {
"terms" : ['PRES','DENSITY']
}
# Create and launch bb
process_mdout(input_log_path=output_npt_log_path,
output_dat_path=output_dat_npt_path,
properties=prop)
import plotly.graph_objs as go
# Read pressure and density data from file
with open(output_dat_npt_path, 'r') as pd_file:
x, y, z = zip(*[
(float(line.split()[0]), float(line.split()[1]), float(line.split()[2]))
for line in pd_file
if not line.startswith(("#", "@"))
if float(line.split()[1]) < 1000
])
# Create a scatter plot
trace1 = go.Scatter(
x=x,y=y
)
trace2 = go.Scatter(
x=x,y=z
)
fig = subplots.make_subplots(rows=1, cols=2, print_grid=False)
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig['layout']['xaxis1'].update(title='Time (ps)')
fig['layout']['xaxis2'].update(title='Time (ps)')
fig['layout']['yaxis1'].update(title='Pressure (bar)')
fig['layout']['yaxis2'].update(title='Density (Kg*m^-3)')
fig['layout'].update(title='Pressure and Density during NPT Equilibration')
fig['layout'].update(showlegend=False)
fig['layout'].update(height=500)
# Show the figure (renderer changes for colab and jupyter)
rend = 'colab' if 'google.colab' in sys.modules else ''
fig.show(renderer=rend)
Constant pH Molecular Dynamics Simulation
Upon completion of the two equilibration phases (NVT and NPT), the system is now well-equilibrated at the desired temperature and pressure. The position restraints can now be released. The last step of the protein MD setup is a short, free MD simulation, to ensure the robustness of the system.
- Step 1: Run short MD simulation of the protein system.
- Step 2: Checking results for the final step of the setup process, the free MD run. Plotting Root Mean Square deviation (RMSd) and Radius of Gyration (Rgyr) by time during the free MD run step.
Building Blocks used:
- sander_mdrun from biobb_amber.sander.sander_mdrun
- process_mdout from biobb_amber.process.process_mdout
- cpptraj_rms from biobb_analysis.cpptraj.cpptraj_rms
- cpptraj_rgyr from biobb_analysis.cpptraj.cpptraj_rgyr
Step 1: Creating portable binary run file to run a Constant pH MD simulation
The free type of the simulation_type property contains the main default parameters to run an unrestrained MD simulation:
- imin = 0; Run MD (no minimization)
- ntx = 5; Read initial coords and vels from restart file
- cut = 10.0; Cutoff for non bonded interactions in Angstroms
- ntr = 0; No restrained atoms
- ntc = 2; SHAKE for constraining length of bonds involving Hydrogen atoms
- ntf = 2; Bond interactions involving H omitted
- ntt = 3; Constant temperature using Langevin dynamics
- ig = -1; Seed for pseudo-random number generator
- ioutfm = 1; Write trajectory in netcdf format
- iwrap = 1; Wrap coords into primary box
- nstlim = 5000; Number of MD steps
- dt = 0.002; Time step (in ps)
In this particular example, a short, 5ps-length simulation (2500 steps) is run, for the sake of time.
On top of these parameters, we will include the constant pH specific properties:
- icnstph = 2; Turn on constant pH for explicit solvent
- saltcon = 0.1; Use the salt concentration CpHMD was parameterized for
- ntcnstph = 100; Protonation state change attempt every 100 steps
- ntrelax = 100; Number of relaxation steps after a successful protonation state change
- solvph = 7.0; Solvent pH
# Import module
from biobb_amber.sander.sander_mdrun import sander_mdrun
# Create prop dict and inputs/outputs
output_pH_traj_path = 'sander.pH.netcdf'
output_pH_rst_path = 'sander.pH.rst'
output_pH_cpout_path = 'sander.pH.cpout'
output_pH_cprst_path = 'sander.pH.cprst'
output_pH_log_path = 'sander.pH.log'
output_pH_mdinfo_path = 'sander.pH.mdinfo'
prop = {
"simulation_type" : 'free',
"mdin" : {
'nstlim' : 2500, # Reducing the number of steps for the sake of time (5ps)
'ntwx' : 500, # Print coords to trajectory every 500 steps (1 ps)
'icnstph' : 2, # Turn on constant pH for explicit solvent
'saltcon' : 0.1, # Use the salt concentration CpHMD was parameterized for
'ntcnstph' : 100, # Protonation state change attempt every 100 steps
'ntrelax' : 100, # Number of relaxation steps after a successful protonation state change
'solvph' : 7.0, # Solvent pH
# 'solvph' : 3.0, # Acid pH
# 'solvph' : 10.0, # Basic (alkaline) pH
}
}
# Create and launch bb
sander_mdrun(input_top_path=output_top_cpin_path,
input_crd_path=output_npt_rst_path,
input_cpin_path=output_cpin_path,
output_traj_path=output_pH_traj_path,
output_rst_path=output_pH_rst_path,
output_cpout_path=output_pH_cpout_path,
output_cprst_path=output_pH_cprst_path,
output_log_path=output_pH_log_path,
output_mdinfo_path=output_pH_mdinfo_path,
properties=prop)
Step 2: Checking constant pH MD simulation results
Protonation states that are sampled throughout the course of the constant pH simulations are written to a cpout-formatted file. The program cphstats can be used to parse this cpout file and extract the predicted pKa values along different parameters:
- The difference between the predicted pKa and the system pH (Offset)
- The predicted pKa (Pred)
- The fraction of time the residue spends protonated (Frac Prot)
- The number of accpeted protonations state transitions (Transitions)
- The sum of the fractional protonations (Average total molecular protonation)
An additional population file is generated, containing the populations of every state for every titratable residue, the fraction of snapshots that the system spent in each state for each residue.
# Import module
from biobb_amber.cphstats.cphstats_run import cphstats_run
# Create prop dict and inputs/outputs
output_pH_dat_path = 'cphstats.pH.dat'
output_pH_pop_path = 'cphstats.pH.pop.dat'
prop = {
'verbose' : True,
'running_avg_window' : 1
}
# Create and launch bb
cphstats_run(input_cpin_path=output_cpin_path,
input_cpout_path=output_pH_cpout_path,
output_dat_path=output_pH_dat_path,
output_population_path=output_pH_pop_path,
properties=prop)
Last Remarks
When checking the information from the predicted pKa values cphstats.pH.dat and the state population cphstats.pH.pop.dat coming from the constant pH simulation at physiological pH (~7), you will find that no different states other than the major species appear during the simulation. Try to re-run the constant pH simulation again, modifying the pH parameter, using acid (<7) or basic (>7) pH and analyse again the results of the checking (cphstats) step.
An additional recommended and useful study is to repeat the constant pH simulation done in the previous step with different pH values (typically from 0 to 14), and then use the output deprotonated fractions for each residue as a function of the pH to plot titration curves. See the AMBER tutorial nº18 or AMBER tutorial nº33 for more information.
Output files
Important Output files generated:
- output_pH_dat_path (cphstats.pH.dat): Predicted pKa values extracted from the constant pH MD simulation.
- output_pH_pop_path (cphstats.pH.pop.dat): Populations of every state for every titratable residue, fraction of snapshots that the system spent in each state for each residue.