MoDEL. Molecular Dynamics Extended Library

B-Factors

The B-factor is the standard measure of residue/atom flexibility. It is determined from the oscillations of a residue with respect to its equilibrium position

$B-factor=\frac{8}{3}\pi^2\left<\Delta r^2\right>$

(1)

where $\left<\Delta r^2\right>$ stands for the oscillations of residues around equilibrium positions.

B-factor profiles represent the distribution of residue harmonic oscillations. They can be compared with X-ray data, but caution is needed, since crystal lattice effects tend to rigidify exposed protein residues. Very large-B-factors should be taken with caution since indicate very flexible residues that might display conformational changes along the trajectory, which is difficult to follow within the harmonic approximation implicit to B-factor analysis.

Data provided include the global value and basic statistics, BFactors plotted by residue and a measure of their anisotropy.

Contacts & Hydrogen bonds.

Native Contacts. A contact in a snapshot of the simulation is recorded when the CA of two residues are at less than 7 Å. When the same contact was present in the experimental structure, such contact is defined as “native”. We consider that a native contact is lost when it is not preserved in more than half of the collected snapshots.

Hydrogen bonds. We determined that two residues are hydrogen bonded when donor and acceptor are at less than 3.5 Å and acceptor-H-donor angle is less than 120 degrees. We consider that a hydrogen bonds is maintained in a portion of the trajectory when it is detected in more than a half of the collected snapshots

Entropy

Strand entropies. They are computed using a pseudoharmonic approach following either Schlitter or Andricioaei-Karplus methods and the mass-weighted covariance matrix derived from the MD-ensembles.

Schlitter Entropy

Andricioaei Entropy

where $\alpha _{i} = h\omega _{i}/kT$ , ω being the eigenvalues obtained by diagonalization of the mass-weigthed covariance matrix, and the sum extends to all the non-trivial vibrations.

Values are extrapolated to infinite simulation time using optimized exponential relationship using

$S(t)= S_{\infty} - \frac{a}{t^{\beta }}$

where t is simulation time α and β are fitted parameters.

Lindemann coefficients

This coefficient is an estimate of the solid-liquid behaviour of the protein. Values below 0.15 are considered to be solid, while values over 0.15 will be called liquids. The usual behaviour is that the residues inside the protein (the buried ones) have a lower Lindemann coefficient (are more solid-like) than the residues that remain in the surface of the protein.

This coefficient is taken from a paper by Y. Zhou, D. Vitkup and M. Karplus (1999), Native proteins are Surface-molten solids: application of the Lindemann criterion for the solid versus liquid state.

In this analysis you can observe the values for all the residues, classified according to accesibility and secondary structure.

The equations used to compute the coefficient follows:

$\Delta_L=\frac{\sqrt{\sum_i\left<\Delta r_i^2/N\right>}}{a^\prime}$

$\Delta r_i^2=\left(r_i-\left<r_i\right>\right)^2$

where N is the number of atoms, a' is an empirical constant, and ri is the position of atom i.

Principal component Analysis

Following essential dynamics algorithm (ED), the orthogonal movements describing the variance of a system is obtained by diagonalization of the covariance matrix. The trajectory needed can be obtained from atomistic, NMA-based MC, DMD or BD simulations. Note that for a complete space definition NMA eigenvectors and ED (NMA-based) eigenvectors are equivalent, but if a reduced set of NMA-eigenvectors/eigenvalues is used divergence between NMA and ED/NMA results is possible.

The result of the analysis is the generation of a set of eigenvectors (the modes or the principal components), which describe the nature of the deformation movements of the protein and a set of eigenvalues, which indicate the stiffness associated to every mode. By default the eigenvalues appear in distance² units, but can be transformed in energy units using harmonic approximation.

The eigenvectors appear ranked after a principal component analysis, the first one is that explaining the largest part of the variance (as indicated by the associated eigenvalue). Since the eigenvectors represent a full-basis set the original Cartesian trajectory can be always projected into the eigenvectors space, without lost of information. Furthermore, if a restricted set of eigenvectors is used information is lost, but the level of error introduced in the simplification is always on user-control by considering the annihilated variance (the residual value between the variance explained by the set of the eigenvectors considered and the total variance).

Inspection of the atomic components of the most important eigenvalues provided by this option helps to determine the contribution of different residues to the key essential deformations of the protein.

Variance-related metrics. The total (and relative to the number of residues) variance is computed to take a rough estimate of the degree of movement in the protein along the dynamics. Definition of the reported magnitudes follows:

Total variance: Total variability in cartesian coordinates
Essential variance: Amount of variance explaind by eigenvalues greater than 1Å²
Explained variance: Percentage of total variance included in essential variance
Reduced variance: Sum of 5 most significant eigenvalues
Dimension: Number of components included in the essential variance.
Complexity: The number of essential movements needed to explain 90% of total variance.

Protein similarity

BLAST is an acronym for Basic Local Alignment Search Tool and is used to search for matching proteins.

This tool searches for sequences in a specific database that are similar to the sequence provided. MoDEL uses Blast to locate the closest simulated homologue. Searches form PDB and Uniprot have been pre-calculated to assure a quick response, sequence searches are performed on-the-fly, and may take some time. Output hits are ordered according to increasing Blast E-Values. The alignment coverage and simulatiry are also reported to help inassessing the validity of the hit. Users are adviced that even small sequence differences may affect dynamics significantly, so homologies have to be taken with care.

Radius of gyration

The radius of gyration is used to describe the dimensions of a polymer chain. It is calculated according to

$R_{g}^{2}=\frac{1}{N}\sum_{k=1}^{N}(r_{k}-r_{mean})^{2}$

Root mean square deviation RMSd

RMSd is the standard magnitude to calibrate the deviation of a structure with respect to a reference conformation. It is computed as:

$RMSd=\sqrt{\frac{\sum_i\left(R_i-R_i^0\right)^2}{N}}$

(1),

Where the sum extends for the total number of atoms/residues considered in the calculation ( $N$ ) and the position vectors of the considered structure ( $R_i$ ) and reference conformation ( $R_i^0$ ) are computed after alignment of the structure to the reference conformation to minimize the value of the RMSd. MoDEL uses the Kabsch algorithm (Kabsch, 1976) for alignment of the structures.

Analysis provided include RMSd average values and plots over time for CA, Backbone, Heavy and All Atoms versus the average structure of the simulation and versus the experimental structure. Linear or exponential trends of the RMSd data over time are included in the plots to assess the stability of the structure.

SASA. Solvent accesible surface area

SASA is defined as the surface built by the center of an ideal spherical solvent molecule, rolling over the molecular sorface. This attribute is a standard measure to evaluate protein shapes and its ability to interact with other molecules.

MoDEL provides analysis of the global SASA values for all, polar and apolar amino acids and plots of these values along the trajectory. SASA is evaluated usign the Naccess program.

Secondary Structure

Secondary Structure is obtained according Kabsch & Sander (1933) as implemented in Ptraj. Global values for helix. sheet and coil are reported. Plots of the variation over time are also reported according to the following classes:

H	Alpha Helix
G	3_10 Helix
I	Pi Helix
B	Parallel Beta Sheet
b	Antipara. Beta Sheet
T	Turn
0	Coil

TMScore. Template modelling score

TMScore is a scoring method originally developed for the assessment of threading templates and adapted to assess the stability of molecular dynamics trajectories.. TMScore is based in comparison of local distances and is independent on the protein size.

TMScore

where L_N is the length of the native structure, L_T is the length of the aligned residues to the template structure, di is the distance between the ith pair of aligned residues and d₀ is a scale to normalize the match difference depending on the size of the protein (d₀ = 1.24(L_N-15)^1/3-1.8). Max denotes the maximum value after optimal spatial superposition (Zhang, Y. and Skolnick J. (2004)). TMScore is always a values between 0 and 1 and higher values indicate more similarity to the reference structure. Note that the analysis reported correspond to the rmsd associated to the Tm-score computed as:

RMSd TMScore