Molecular Modeling and Bioinformatics Group

Tutorial 4 - Experimental vs MD Analysis

This tutorial shows the information integrated in BIGNASim to easily compare theoretical results obtained with MD simulations with values obtained from experimental structures/trajectories. Comparison with experimental structures in BIGNASim is done at three different levels:

Individual helical parameters obtained from the corresponding experimental structure from the PDB

Averaged analysis of helical parameters from a complete dataset made from available nucleic acids experimental structures from the PDB

Experimental ensembles with PDB structures of equivalent sequences, analyzed as pseudotrajectories, superimposed to the appropriate analysis

Individual helical parameters obtained from the corresponding experimental structure from the PDB

In those simulations started from a reference PDB structure, the most direct analysis are the comparison between geometrical values extracted from the crystallographic structure and those averaged from the simulation trajectory. From the broad range of possible analyses, the helical parameters, and more specificaly, the base-pair step helical parameters are the most used in the last scientific studies (see references section, Ascona B-DNA Consortium articles). In these cases, BIGNASim database store not only the analysis for the simulated trajectory, but also the ones computed on the PDB structure. Then, just joining both information, a direct comparison can be obtained. The most interesting cases where this comparison can give valuable information are those nucleic acids complexed with protein or ligands. The typical helical conformation of the nucleic acid structure can be distorted due to the influence of the attached molecule. Studying the time-averaged values for a determined simulated trajectory, with its standard deviation, can give a clue about whether the nucleic acid structure explore the conformation needed to accept the docking molecule, or it needs a complete change of shape. Additionaly, docking regions can be easily identified due to the helical parameters distortions shown (e.g. intercalator ligands). To llustrate the procedure a protein-nucleic acid complex, PDB Code 1hlv, will be chosen. The Quick Search section can be used to select all simulations available related to such PDB entry. Quick search does a global search in any of fields in the database, including ontology terms, or even sequences. It is particularly useful to identify simulations based on experimental structures.

In this case, “1hlv” is found both as PDB reference and in the internal identifier of the simulation. The description column shows the details of the specific simulation, showing that corresponds to “naked” DNA, meaning that the available simulation was done with just the nucleic acid portion of the complex; otherwise, the description would indicate “complex”. This can also be seen in the simulation details, where a straight DNA molecule is depicted.

To see the set of pre-computed analyses offered by BIGNASim, we click at the Trajectory Analyses section. And from those available, we are interested in the ones named Curves.

This will open a new interface, NAFlex. For a complete description of NAFlex analysis interface and its interactive sequence possibilites, please refer to NAFlex help pages or NAFlex paper. To obtain what we are looking for, we will just select Average Results and then Inter-Base Pair Helical Parameters.

This will show a set of six different inter-base pair step helical parameters: Rise, Roll, Shift, Slide, Tilt and Twist. Each of them have an associated plot with the comparison between the time-averaged theoretical results with the experimental ones. Just to see how they look, let's open the Twist parameter.

This plot shows the time-averaged Twist parameter along the sequence. Four lines with different colors are shown: NAFlex_1hlv theoretical MD simulation in red, with standard deviation values; average values coming from a set of MD simulations (green) and X-ray structures (blue) (see P. Dans et al, NAR (2012) 40, 10668-10678) and finally the values computed to the corresponding PDB experimental structure 1HLV in violet. One can easily identify the regions of the nucleic acid where the protein is attached, disturbing the helical conformation. The same analysis can be obtained for the rest of base-pair parameters.

Go to Top of the Page

Averaged analysis of helical parameters from a complete dataset made from available nucleic acids experimental structures from the PDB

The second way BIGNASim offers to compare theoretical and experimental values covers a more global vision of nucleic acids flexibility. For that, an averaged analysis of standard fragments (base, base pairs, and base-pairs steps) of a complete dataset made from available nucleic acids experimental structures (taken directly from the PDB), is included in global analyses. See the corresponding help section for more information. These averaged values can be used to compare global values extracted from our database having information from many different simulations on a single base, base pair, and base-pairs step and thus obtain a direct representation of how much our simulations reproduce the experimental observations. To illustrate how this information is added in the BIGNASim interface, we are going to see a brief example. First thing to do is go to the Global analyses section of the webpage:

Just as an example, we are going to go through the set of global analyses for the CG base-pair step, but you could follow this short tutorial choosing any of the base-pair or base-pair steps fragments available. So, in the global analyses page, we select the CG bullet:

Next step is selecting CURVES analysis:

From the different CURVES analyses, we are interested in the helical parameters:

Now, from the six helical base-pair step parameters (Rise, Roll, Shift, Slide, Tilt and Twist), we will open the Roll one:

BIGNASim automatically generates a histogram with all the Roll values extracted from the database. These values correspond to the Roll angle (see NAFlex helical parameters description for more information about helical parameters). Two vertical lines are plotted on the histogram: a blue one, indicating the mean for the represented set of values (MDs) and a red one, that corresponds to the experimental average value (PDBs) explained at the beggining of this section.

In this case one can easily see that the observed values from the simulated systems agree pretty well with the experimental average. The distribution of values follows an almost perfect normal distribution, with the mean value (6.225) distant just 0.5 degrees from the experimental average value (7.35). Before closing this short tutorial, let's open another base-pair step parameter, the Twist one. From (P. Dans et al, NAR 2014) we know that CG-Twist has an interesting behavior, showing a polymorphism between a high twist (~20º) and a low twist (~40º). For that, we should click at the [helical_bpstep] section of the navigation Breadcrumb to go one step back (please do not use the browser back button, as it will leave the global analyses page) and then select the Twist parameter:

And that's the generated histogram:

The difference with the previous Roll distribution is clear. In this case, two different distributions can be easily identified, centered at ~25º and ~35º (in good agreement with P. Dans et al, NAR 2014). Another interesting information regarding the comparison with experimental values is that, although the theoretical mean value (33.639) is again close to the experimental one (30.66), both lose information about the bimodality of the Twist parameter, and its values rely in the middle of the two possible observed conformations.

Go to Top of the Page

Experimental ensembles with PDB structures of equivalent sequences, analyzed as pseudotrajectories, superimposed to the appropriate analysis

The last experimental comparison in BIGNASim is done using analysis taken from experimental ensembles. To know how we built these ensembles, please see the corresponding experimental trajectories help section. Having these experimental ensembles allows us to compare our theoretical values not just against experimental static information but also against dynamic behavior. If a particular helical parameter exists in two different conformations (see previous section, CG-Twist example), that information can be obtained from the experimental ensemble and, thus, be compared with our MD simulation values. Unfortunately, due to the scarce number of deposited PDB structures for a determined nucleotidic sequence, the number of generated experimental trajectories goes just up to 15 different ensembles (see experimental trajectories table). In this tutorial, we are going to see how this type of comparisons looks like with a particular example: a protein-DNA complex with PDB Code 2LEF, the structure of a Lymphoid Enhancer-Binding Factor (Lef1 hmg domain, from mouse), complexed with DNA (15bp). As the deposited structure in the PDB was obtained by Nuclear Magnetic Resonance (NMR), it contains 12 different structures, what allows us to construct the experimental ensemble and compute the corresponding helical parameters. Simulation information for this structure stored in our database can be found here. The simulation can be easily found from our Quick Search button. Once in the simulation page for our 2LEF complex, we are going to open the Trajectory Analyses:

For the nucleotidic sequences in our database having an associated experimental ensemble, a new analysis section named Experimental vs MD Analysis automatically appears. Clicking on this element will redirect to a NAFlex interface with the comparison results.

For a complete description of NAFlex analysis interface and its interactive sequence possibilites, please refer to NAFlex help pages or NAFlex paper. As in the first section of this tutorial, to obtain what we are looking for we are going to select Average Results and then Inter-Base Pair Helical Parameters:

Please note that direct access to this NAFlex interface for all the sequences having a calculated experimental trajectory is also available in the Experimental Trajectories help section. As this text is intended to be a tutorial, we prefered to explain all the previous steps to get to this point. At that point, we have two different kind of comparative analysis: one single parameter (2D), or two parameters (3D). Let's start by choosing a single parameter: Roll.

This is the resulting 2D plot:

The plot shows two lines representing time-averaged values for the Roll parameter: red line corresponds to the MD simulation and green line corresponds to the experimental ensemble. It can be clearly identified a distorted region of the DNA, from the 4th to the 6th base-pair steps of the sequence, probably influenced by the attached protein. Our MD simulation is not exploring these Roll values, so the behavior of the nucleic acid is in this case cleary influenced by the protein. Now we will go back and select another analysis, in this case, one implying two different parameters: Roll and Twist (3D).

Analysis of helical parameters pairs are done for each of the base-pair steps present in the sequence. In this case, we have 6 different plots, corresponding to (AA (including TT), AC, AG, CC (including GG), CT, GA, GC and TG).

We will choose the one corresponding to the base-pair step AA (TT) Twist vs Roll to analyse in more detail:

Information shown in these 3D plots are points corresponding to the correlation between both parameters, where colors represent different densities, from higher density in yellow, to lower density in blue. Experimental values are shown as asterisks symbols in red color. As expected from the results found in the previous Roll plot, there is a strong deviation in the Roll parameter for some of the base-pair steps, placed in a region of the plot almost unexplored by the complete MD simulation. Unsurprisingly, they also have distorted values of the Twist parameter. This is just one example (inter-base pair step helical parameters) of the type of comparisons offered in this NAFlex interface. But comparisons are made also for backbone torsions, axis base pairs and intra-base pair helical parameters. As a last example, going back to the Average values selection, we can now choose Backbone Torsions:

There are 4 different analyses available: BI/BII population and Puckering percentages, and a couple of angle analysis: α/γ ε/ζ. We choose Backbone α/γ analysis:

The resulting plot shows again the MD (black) vs the experimental (red) values:

No distortion is observed in this case, with the experimental ensemble exploring just a single conformation, whereas the MD simulation is exploring different well-known regions of the space.

Go to Top of the Page

BIGNASim database structure and analysis portal for nucleic acids simulation data

Tutorial 4 - Experimental vs MD Analysis

Individual helical parameters obtained from the corresponding experimental structure from the PDB

Averaged analysis of helical parameters from a complete dataset made from available nucleic acids experimental structures from the PDB

Experimental ensembles with PDB structures of equivalent sequences, analyzed as pseudotrajectories, superimposed to the appropriate analysis