MoDEL main objective is to provide information of the near-equilibrium dynamics of proteins in near-physiological conditions. This information can then be used for many different purposes ranging from evolutionary studies to biophysical analysis and drug-design processes. Additionally, MoDEL is an excellent reference set for calibration, refinement and validation of coarse-grained methods of flexibility (Rueda et al., 2007a; Emperador et al., 2008a) and for benchmarking of force-fields, computer programs and simulation procedures (Rueda et al., 2007a). MoDEL is not a closed project and extending and maintaining it is one of the main commitments of the group.
MoDEL is an acronym that defines a complex infrastructure of software and databases developed in our group for several years (Figure 1). It is divided into five main blocks: i) tools for the automatic set-up of MD simulations, ii) tools for validation of trajectories and error detection, iii) MoDEL data warehouse, comprised from a relational database and the underlying trajectories database, iv) tools for basic and advanced analysis and v) MoDEL web server and related web applications. All tools have been built using in-house software combined with external software modules organized and integrated through Perl scripts. System preparation, simulation and analysis modules are available as web services following Biomoby, the standard framework of the Spanish National Institute of Bioinformatics (BioMoby Consortium, (2008)). This allows us to combine them in fully automated workflows reducing at minimum human intervention and facilitating maintenance, update, and, integration with the wide offer of bioinformatics services available in the community. The MoDEL platform is linked directly to a battery of tools for “in depth” analysis of trajectories and with our coarse-grained platform FlexServ (Camps et al., 2009) which allows a fast coarse-grained analysis using either normal mode, Brownian Go-like dynamics or Discrete Molecular Dynamics (Emperador et al., 2008a; Rueda et al., 2007a).
Simulations in MoDEL moves in four axis: i) protein, ii) length of the trajectory, iii) force-field, and iv) solvent environment. Only cytoplasmatic monomeric proteins selected by diversity criteria are now available in the database, but extensions of the database to membrane proteins and specific protein families are now in process. At the time of writing this report the MoDEL relational database contains more than 1,700 protein trajectories ranging from 10 ns (the shortest) to 1 ms (the longest). The raw trajectories collected represent nearly 15 Tb of data corresponding to around 250,000 residues, 4.5 millions of protein atoms and around 19 millions of water molecules. Computational effort required for the derivation of MoDEL was equivalent to many centuries in a personal desktop and required massive use of the MareNostrum supercomputer at the Barcelona Supercomputing Center and local platforms in our group and took more than 4 years to arrive to current completion state.