BIGNASim database structure and analysis portal for nucleic acids simulation data


Example 1. Obtaining information about the Drew-Dickerson dodecamer

This use case shows several examples on using the BIGNASim search engine to locate a particular nucleic acid sequence, nucleic acid fragment or base pair step. The search section of the portal is accessed through the main menu:

The search section contains three different possibilities (see this help searching section for more details):

  • Search by sequence or specific sequence fragments (using regular expressions)
  • Search by specific base-pair-steps (with or without flanking regions)
  • Search by an extensive nucleic acid ontology (as defined here)

Examples will show the way of finding three different types of information from the database:


Drew Dickerson Dodecamer (DDD)


In this case, the nucleotidic sequence of the DDD (CGCGAATTCGCG) can be just searched using the "Search by Sequence" section of the portal:

Javascript code examples

// Finding DDD simulations
SimulationList = db.simData.find({'sequence' : 'CGCGAATTCGCG'}).toArray()

// Finding simulations containing DDD sequence using regular expressions
SimulationList = db.simData.find({'sequence' : /CGCGAATTCGCG/}).toArray()

// Finding simulations containing DDD sequence using possible variations
SimulationList = db.simData.find({'sequence' : /^CGCGA[AC]TTCGCG$/}).toArray() 

Due to the importance of DDD in the field, it is specifically included in the Sequence Features section of the nucleic acids ontology, and can be located directly:

Javascript example code

// Finding DDD simulation from ontology search
SimulationList = db.simData.find({'ontology' : '10603'}).toArray()

Both access ways open a browse page showing the simulations stored in the database for this particular sequence. In this case, 5 different trajectories are found, the longest having 10µseconds. Each of the simulations can be opened individually to look at the MD simulation metadata and trajectory analyses. Combined information from more than one simulation can be obtained, by selecting the desired entries and clicking at Open analyses for selected simulations (see browsing section for more information).


AT Base-Pair Step (central in DDD)


The central base-pair step of DDD (CGCGAATTCGCG), can be obtained from the Search by Base Pair Step section:

Javascript equivalent code

// Finding simulations containing AT BpStep with 2 flanking bases
SimulationList = db.simData.find(
    {'sequence':/..AT../},
    {_id:1}
).toArray();
// Finding simulations containing AT BpStep on any strand (not needed for ApT due to symmetry)
SimulationList = db.simData.find(
    { $or:[ {'sequence':/..AT../}, {'rev-sequence':/..AT../} ]},
    {_id:1}
).toArray();

In the selector, the desired base-pair step, in this case AT, must be chosen. There is also the possibility to add a number of required flanking nucleotides, to ensure that information obtained will not be from base-pair steps placed at terminal regions, which can show distorted flexibility parameters. In this example, two flanking nucleotides are forced. The same procedure can be applied for any base-pair step.

The results obtained for the AT base-pair step with the current content of the database are 51 different simulations. Looking at the sequence column, the interesting AT pair, together with the flanking region, can be easily identified thanks to the marking in yellow and orange colours, respectively. The first thing we can see in the browse page is that the database contains simulated systems different from DDD containing also the AT base pair. Specifically 46 sequences, some of them having more than one occurrence of it, are recovered. That offers enough information to compare between the flexibility parameters obtained for just the 5 sequences of DDD obtained in the previous section of this example with the remaining sequences having the AT base pair. To exclude DDD simulations from the recovered analyses, selector of records shown should be set to the maximum (100 records), select all simulations by using the checkbox placed at the left part of the table header, next to the Id. title, and then uncheck the ones corresponding to the DDD sequence (sorting the results by Id will help in finding the 5 sequences next to each other).

The final step consists on clicking at the Open Analyses for selected simulations button at the bottom of the browse page, which will lead to the analysis section of BIGNASim (see analysis section).

In this section, the AT base-pair step button will open the available analyses for the AT base-pair step. In order to compare the results with the ones corresponding to the AT base-pairs from just DDD sequences, the procedurec an be repeated for these particular simulations, using an additional browser window.

Continuation Javascript code to retrieve analysis data for ApT steps

(...)
idSim = SimulationList[i]._id.idSim;
// retrieve the position of the AT bpSteps (stored as class:'ATAT')
ATPos = db.groupDef.find(
    {'_id.idSim': idSim, 'class': 'ATAT'}, 
    {_id: 1}
    ).toArray()
(...)
// obtain available data for a given group
dataCur = db.analData.find(
    {'_id.idSim': idSim, '_id.nGroup': ATPos[i]._id.n, '_id.idGroup': ATPos[j]._id.idGroup}
);
while (dataCur.hasNext()) {
    printjson(dataCur.next());
}

Naked duplex B-DNA structure with a particular nucleotide fragment


The third example shows a more specific search: trajectories having the DDD central tetramer (AATT) computed on naked B-DNA duplex structures, simulated in equilibrium conditions and electroneutral charge schema. AATT sequence should be included in the Search by Sequence section; and then the search refined using the Search by Ontology section.

In the Search by Ontology section, search can be refined using keywords organized in a series of groups. In this case, the keywords chosen should be DNA in Nucleic Acid Type area, Duplex in Structure area, Naked in System Type area, Equilibrium in Trajectory Type, B in Helical Conformation and, finally, Electroneutral in Simulation Conditions, Ionic Concentration. Every time a search parameter is chosen, the search engine computes the number of results stored for the current selected refinement specification and shows it on-the-fly in the top right part of the Search by Ontology section.

Javascript equivalent code

// Finding simulations containing TTAA fragment with 2 flanking bases, including
// TTAA is palindromic, only one strand need to be considered
// ontology tags: 'DNA' (10101), 'Duplex' (10202), 'Naked' (10301), 'B' (10402)
//                'Equilibrium' (20201), 'B', 'Electroneutral' (2010501)
//                further check on subclasses has been eliminated for clarity 
SimulationList = db.simData.find(
    {
        'sequence': /..TTAA../,
        'ontology': {$all: ['10101','10202','10301','2010501','20201']}
    }
,{_id:1}).toArray();

Again, the results are shown in a browse page. Descriptions show the keywords assigned to each simulation, and confirm that results are indeed duplex naked B-DNA structure simulations, as defined in the search. Still, results obtained contain a sequence different than the DDD having the (AATT) tetramer: 1rvh (GCAAAATTTTGC). For the rest of DDD trajectories, the differences rely on the particular simulation parameters used in the MD, e.g. solvent type, ionic parameters or total length.

From the simulation list, flexibility analyses can be obtained independently, or combined, clicking at Open Analysis for selected simulations button, or a meta-trajectory with this particular nucleotide fragment can be generated, joining together atomic coordinates of the selected set of simulations. More details on how to build a meta-trajectory with BIGNASim can be found at the meta-trajectory tutorial and the meta-trajectory section in the BIGNASim help pages.