Bioactive Conformational Ensemble Help - Subsets Upload data
Hereunder are described the instructions to submit new molecules to the Bioactive Conformational Ensemble database and the curation protocol to follow:
- Curation protocol
Important points to consider
Only fully atomistic Molecular Dynamics (MD) trajectories of small molecules are accepted in the BCE database. Trajectories uploaded to the BCE database are preferentially computed using an enhanced sampling technique in which an unbiased replica is present, as it will be used for the analysis. Long plain MD simulations are also a possible alternative. Please note that, beside cluster population, no other automatic tool will be used to detect convergence of the simulations. Future versions may include advanced methods to look at the convergence of dihedral population distribution. The project aims to obtain conformational ensembles representing the broadest possible conformational space for the chosen thermodynamic ensemble. To this aim, Parallel Tempering or Replica Exchange MD (REMD) and Hamiltonian Replica Exchange (HREX) are two good examples. A minimum trajectory-length of 10ns is highly recommended.
Most of the analyses stored in the database and shown in the graphical interface will be computed on-site. This is an automated process that takes some time and computational resources. Due to that, a delay should be expected between uploading of files and results being published in the database. During this process, direct communication with the author of the deposition might be maintained, to speed up the progression.
Simulation files generated using Gromacs MD package are recommended, as all the analyses presented in this study were optimized for Gromacs formatted files. Simulation files generated with other MD packages (e.g. Amber, Namd) are accepted, but the number of analyses shown in the web interface may be reduced in this version of the database and associated web server. Full compatibility with these formats is planned for future versions.
Fill in the metadata form
Fill in the metadata form in the graphical interface (https://mmb.irbbarcelona.org/BCE/db/upload). The information required is (mandatory fields are highlighted with *):
- E-mail*: Contact information, needed for possible interactions during the curation process. (e.g. email@example.com)
- Ligand code (optional): Ligand three-letter code, used in the interface to link the small molecule to several useful databases. (e.g. IBP)
- Enhanced sampling method*: Method used in the enhanced sampling to overcome energy barriers (e.g. REMD, HREX)
- Number of MD replicas*: Number of different replicas computed in the enhanced sampling simulation (e.g. 16)
- Length of the MD simulations (ns)*: Length of the computed simulation replicas, in ns (e.g. 10ns)
- Number of steps between exchanges*: Number of MD integration steps between replica exchange attempts (e.g. 100 steps)
- Force Field*: Particular force-field used for the small molecule parameters (e.g. GAFF, CGenFF)
- Charge scheme*: Method used to generate the small molecule atomic charges (e.g. AM1, AM1-BCC, RESP)
- REMD Progression*: Distribution (difference) of temperatures (REMD) or scaling factors (HREX) across the replicas (e.g. geometric, arithmetic)
- Initial (low) temperature (K)*: Initial (low) temperature for the enhanced sampling simulation. Can be converted to a scaling factor if HREX is used (e.g. 298K)
- Final (high) temperature (K)*: Final (high) temperature for the enhanced sampling simulation. Can be converted to a scaling factor if HREX is used (e.g. 498K)
- Molecule number of atoms*: Number of atoms of the small molecule (e.g. 41)
- Molecule charge (simulated)*: Net charge of the small molecule used in the simulation (e.g. -1)
- System number of atoms*: Total number of atoms of the simulated systems, including ions and/or water molecules (e.g. 2000)
- Box size in simulations (nm)*: Size of the system box used in the simulation, measured as the distance between the edges of the box and the small molecule atoms in nanometers (e.g. 0.8nm, 1nm)
- Box type in simulations*: Type of the system box shape (e.g. cubic, octahedron)
- Solvent model*: Type of water molecule model (other solvents are also accepted) (e.g. TIP3P, SPC/E)
- MD ensemble*: MD ensemble used in the simulation (e.g. NVT, NPT)
- Clustering method (optional): Clustering method used in the extraction of representative conformations from the computed trajectory (e.g. gromos, K-means)
- Final clustering cutoff (nm) (optional): Cutoff applied in the clustering algorithm to extract the representative conformations. If an iterative process similar to the one presented in this work has been applied, this is the final cutoff used to extract the final set of conformations (e.g. 0.07nm)
- Initial clustering cutoff (nm) (optional): If an iterative process similar to the one presented in this work has been applied, this is the initial cutoff used to extract the first set of conformations (e.g. 0.05nm)
- Number of clusters (optional): Final number of representative conformations (e.g. 30)
- Number of clusters representing 95% (optional): Subset of the final number of representative conformations needed to represent a 95% of the conformational landscape explored by the enhanced sampling simulation (e.g. 8)
- QM Theory Level (optional): Theory level used in the Quantum Mechanics calculation (e.g. B3LYP, M06)
Files required to submit a new entry in the BCE database are:
- Trajectory of the unbiased/cold replica in the enhanced sampling simulation. Recommended formats are Gromacs-compatible formats: xtc and trr. Other formats (netcdf, dcd, mdcrd) might be accepted, but some of the analyses may still not be available from them. Uploading of a solvent-free trajectory is recommended, with a substantial reduction on the size of the file to be uploaded.
- Topology of the system in the enhanced sampling simulation. Recommended format is the Gromacs-compatible format tpr. Other formats (parm7, psf, gro, pdb) might be accepted, but some of the analyses may still not be available from them. Note that the topology should match the trajectory. If the trajectory has been dehydrated, the topology should also contain information of the small molecule without solvent.
- Force-field parameters for the small molecule. Recommended format is the Gromacs-compatible format itp. Other formats (prm, lib) might be accepted, but some of the analyses may still not be available from them.
- Simulation Log file. Recommended format is the gromacs log file. Log files coming from other MD packages might be accepted, but REMD stats may not be available.
Optional files, required to complement the trajectory information with a set of associated analyses are:
- Ensemble of conformations for the small molecule extracted from the simulation using a clustering method specified in the metadata form. Recommended formats are pdb or gro files.
- Conformations population from the clustering calculation. Recommended formats are csv or xvg.
- Conformations convergence during the clustering calculation. Recommended formats are csv or xvg.
- QM-refined conformation structures obtained using Quantum Mechanics calculations on the ensemble of conformations extracted from the simulation. If more than one level of theory has been used, multiple files are accepted (with the corresponding annotation). Recommended formats are pdb or gro files.
- QM-refined conformations final energy (in Ha), extracted from the QM calculation. If more than one level of theory has been used, multiple files are accepted (with the corresponding annotation). Recommended formats are csv or xvg.
The following general requirements should be met for a simulated dataset to be accepted in the BCE database:
- Files should correspond to an enhanced-sampling Molecular Dynamics simulation trajectory of a small molecule.
- Trajectories should represent a minimum simulation length of 10ns, and should contain a minimum of 10,000 frames (e.g. 10ns with a frame resolution of 1ps).
- Topology and trajectory files must be coherent with the number and order of atoms.
- Metadata submitted must be coherent with the uploaded files information.
- Files uploaded should fulfill the quality requirements shown in the curation protocol (see next section).
After uploading the required files and associated metadata, a manual curation process of the input data will start to ensure correctness and completeness. An automatic check-up procedure will be pursued in future versions. The curation protocol covers these issues:
- Technical suitability: Coherence between topology and trajectory files (number of atoms, order). Compatibility of trajectory and topology files format with BCE backend analysis pipeline. Uncorrupted files.
- Simulation Quality Control: Coherence between metadata and uploaded files. Check for correct number and percentages of replica interchanges (if any), RMSd, Radius of Gyration and atomic fluctuation with respect to small molecule size.
- Ensemble Quality Control (if uploaded): Sufficient coverage (≥95% of the total population) of the conformational landscape explored by the simulation from the structures included in the conformational ensemble. The sum of the cluster population should be more than 95% of the entire trajectory snapshots.
- QM-Refined Ensemble Quality Control (if uploaded): Coherence between conformational ensemble and QM-refined structures.
If any of these points is not satisfied, personal communication with the author of the deposition will be started in order to solve the problems encountered.