R PACKAGE: veriNA3d
VeriNA3d is an R package for the analysis of Nucleic Acid structural data. The software was developed on top of bio3d (Grant et al, 2006) with a higher level of abstraction. In addition of single-structure analyses, veriNA3d also implements pipelines to handle whole datasets of mmCIF/PDB structures. As far as we know, no similar software has been previously distributed, thus it aims to fill a gap in the data mining pipelines of PDB structural data analyses.
Installation
Instructions
1- Make sure you have all the dependencies already installed in R. If not the case, open R and run:
install.packages(c("bio3d", "circlize", "jsonlite", "plot3D", "MASS", "RColorBrewer", "RANN"))
2- Install veriNA3d according with your R version:
install.packages("http://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d/raw/master/veriNA3d.tar.gz", repos = NULL)
3- To start using it, just load the package!
library(veriNA3d)
Documentation
Dataset level
getRNAList
: Get list of representative/non-redundant RNA structures organized in Equivalence Classes (source: Leontis & Zirbel, 2012).
getAltRepres
: Apply filters (e.g. just protein-RNA structures) to select other representants from the members of each class.
represAsDataFrame
: From the output of getRNAList or getAltRepres, generate a data.frame in which each row corresponds to a RNA chain, rather than an Equivalence Class.
pipeNucData
: From a list of RNA structures/chains computes and returns structural data at the level of the nucleotide.
pipeProtNucData
: From a list of protein-RNA structures computes and returns the interaction sites distances and atoms.
applyToPDB
: Applies a desired function to a list of PDB IDs.
queryEntryList
: Returns the whole list of PDB IDs in the database.
queryObsoleteList
: Returns the list of Obsolete PDB IDs in the database.
cleanByPucker
: From the output of pipeNucData subsets a desired subset of nucleotides in a given puckering conformation.
Single-structure level
Functions to query PDB data using the PDBe (EMBL-EBI) REST API or a mirror API from the MMB Lab (All of them take a PDB ID as input)
queryAuthors
: List of authors.
queryReldate
: Release date.
queryDepdate
: Deposition date.
queryRevdate
: Revision date.
queryDescription
: Author description.
queryCompType
: Compound type (e.g. Nuc or Prot-nuc).
queryChains
: Chain information.
queryEntities
: Entitity information.
countEntities
: In a given pdbID it counts the total number of each different kind of entity (RNA, DNA, Protein ...).
queryFormats
: File formats for the given ID.
queryHeader
: PDB Header.
queryHetAtms
: HETATM entities in structure (includes modified residues, ions and ligands).
hasHetAtm
: Checks wether a a given structure contains a particular HETATM entity. It makes use of queryHetAtms.
queryModres
: Modified residues.
queryLigands
: Ligands in structure.
queryOrgLigands
: Ligands in structure (substracting ions).
queryResol
: Resolution (if applicable).
queryTechnique
: Experimental Technique.
queryStatus
: Released/Obsolete and related status information.
queryNDBId
: Cross-reference NDB ID.
queryAPI
: Subfunction of all the previous, which can be used to make alternative queries.
Classify PDB structures (PDB ID as input)
classifyRNA
: Categorizes a structure in different RNA groups.
classifyDNA
: Categorizes a structure in different DNA groups.
Input mmCIF data
cifDownload
: Downloads structure from Protein Data Bank.
cifParser
: Reads the 14th common sections of all mmCIF files in the PDB and generates a CIF S4 object.
cifAsPDB
: Wrapper of cifParser that generates a pdb object (bio3d compatible S3 object).
http://mmcif.wwpdb.org/)
CIF accessors (Find descriptions in mmCIF dicctionary:cifAtom_site
: Access the coordinates of a CIF object (read by cifParser). The resulting object is not compatible with bio3d functions, see cifAsPDB for that.
cifAtom_sites
cifAtom_type
cifAudit_author
cifAudit_conform
cifChem_comp
cifDatabase_2
cifEntity
cifEntry
cifExptl
cifPdbx_database_status
cifStruct
cifStruct_asym
cifStruct_keywords
Structure analysis
selectModel
: Selects the model of interest.
findBindingSite
: Same as pipeProtNucData for a single structure.
measureEntityDist
: Measures distances between given entities.
measureElenoDist
: Measures distances between given atoms.
trimSphere
: Trim a pdb object and a surrounding sphere of atoms.
trimByID
: Same as trimSphere using the IDs and output of pipeNucData.
checkNuc
: Checks the integrity of all the nucleotides in a given Nucleic Acid structure.
measureNuc
: Measures a defult/desired set of distances, angles and torsional angles for a given Nucleic Acid structure.
rVector
: Computes the rVectors between all nucleobases of a structure (source: Bottaro et al, 2014).
eRMSD
: Compares structures with the same number of residues using the rVectors (source: Bottaro et al, 2014).
RMSD
: Compares structures with the RMSD measure.
dssr
: Wrapper of DSSR software (source: Lu et al, 2015), if installed.
Exploratory analysis
findHDR
: Finds High Density Regions in a 2D scatter plot
plot2D
: Scatter plot of angles
plot3Ddens
: 3D view of the density of 2D data.
plotCategorical
plotCircularDistribution
Developers
Diego Gallego
Eric Matamoros
Leonardo Darré (Former Developer)
Molecular Modeling and Bioinformatics Group.
License
GPL-3