README.md 6.08 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<snippet>
  <content>
# R PACKAGE: veriNA3d

VeriNA3d is an R package for the analysis of Nucleic Acid structural data. The software was developed on top of bio3d (Grant et al, 2006) with a higher level of abstraction. In addition of single-structure analyses, veriNA3d also implements pipelines to handle whole datasets of mmCIF/PDB structures. As far as we know, no similar software has been previously distributed, thus it aims to fill a gap in the data mining pipelines of PDB structural data analyses.

## Installation
---------------

Instructions for Unix systems

1- Make sure you have all the dependencies already installed in R. If not the case, open R and run:
&nbsp;

    install.packages(c("bio3d", "circlize", "jsonlite", "plot3D", "MASS", "RColorBrewer", "RANN"))

17
2- Download veriNA3d from GitLab. The zip file contains two equivalent versions of the package:
Digp's avatar
Digp committed
18

19
`veriNA3d_R-3.5.tar.gz`
Digp's avatar
Digp committed
20

21
`veriNA3d_R-3.4.tar.gz`
22

23
The whole package has been developed and tested in R-3.5, which makes it the recommended option. Furthermore, using R-3.5 speeds up the cifParser function, which has a dramatic effect when working with large mmCIF files. The package has been also made available for R-3.4 since some unix users are experiencing problems when installing R-3.5.
Digp's avatar
Digp committed
24
25

3- Unzip the file and copy-paste the desired version of the package in your working directory.
26

Digp's avatar
Digp committed
27
4- Open R and run:
28
29
&nbsp;

Digp's avatar
Digp committed
30
31
32
33
    install.packages("veriNA3d_R-3.5.tar.gz", repos = NULL, type="source")

5- If desired, remove the unnecessary .tar.gz files and the .zip file.

34
35
36
37
6- To start using it, just load the package!

    library(veriNA3d)

38
39
40
41
42
43

## Documentation
----------------

### Dataset level

Digp's avatar
Digp committed
44
`getLeontisList`: Get list of representative/non-redundant RNA structures organized in Equivalence Classes (source: Leontis & Zirbel, 2012).
45

Digp's avatar
Digp committed
46
`getAltRepres`: Apply filters (e.g. just protein-RNA structures) to select other representants from the members of each class.
47

Digp's avatar
Digp committed
48
`represAsDataFrame`: From the output of getLeontisList or getAltRepres, generate a data.frame in which each row corresponds to a RNA chain, rather than an Equivalence Class.
49

Digp's avatar
Digp committed
50
`pipeNucData`: From a list of RNA structures/chains computes and returns structural data at the level of the nucleotide.
51

Digp's avatar
Digp committed
52
`pipeProtNucData`: From a list of protein-RNA structures computes and returns the interaction sites distances and atoms.
53

Digp's avatar
Digp committed
54
`applyToPDB`: Applies a desired function to a list of PDB IDs.
55

Digp's avatar
Digp committed
56
`queryEntryList`: Returns the whole list of PDB IDs in the database.
57

Digp's avatar
Digp committed
58
`queryObsoleteList`: Returns the list of Obsolete PDB IDs in the database.
59

Digp's avatar
Digp committed
60
`cleanByPucker`: From the output of pipeNucData subsets a desired subset of nucleotides in a given puckering conformation.
61
62
63
64
65
66
67
68
69
&nbsp;

&nbsp;


### Single-structure level

#### **Functions to query PDB data using the PDBe (EMBL-EBI) REST API or a mirror API from the MMB Lab** (All of them take a PDB ID as input)

Digp's avatar
Digp committed
70
`queryAuthors`: List of authors.
71

Digp's avatar
Digp committed
72
`queryReldate`: Release date.
73

Digp's avatar
Digp committed
74
`queryDepdate`: Deposition date.
75

Digp's avatar
Digp committed
76
`queryRevdate`: Revision date.
77

Digp's avatar
Digp committed
78
`queryDescription`: Author description.
79

Digp's avatar
Digp committed
80
`queryCompType`: Compound type (e.g. Nuc or Prot-nuc).
81

Digp's avatar
Digp committed
82
`queryChains`: Chain information.
83

Digp's avatar
Digp committed
84
`queryEntities`: Entitity information.
85

Digp's avatar
Digp committed
86
`countEntities`: In a given pdbID it counts the total number of each different kind of entity (RNA, DNA, Protein ...).
87

Digp's avatar
Digp committed
88
`queryFormats`: File formats for the given ID.
89

Digp's avatar
Digp committed
90
`queryHeader`: PDB Header.
91

Digp's avatar
Digp committed
92
`queryHetAtms`: HETATM entities in structure (includes modified residues, ions and ligands).
93

Digp's avatar
Digp committed
94
`hasHetAtm`: Checks wether a a given structure contains a particular HETATM entity. It makes use of queryHetAtms.
95

Digp's avatar
Digp committed
96
`queryModres`: Modified residues.
97

Digp's avatar
Digp committed
98
`queryLigands`: Ligands in structure.
99

Digp's avatar
Digp committed
100
`queryOrgLigands`: Ligands in structure (substracting ions).
101

Digp's avatar
Digp committed
102
`queryResol`: Resolution (if applicable).
103

Digp's avatar
Digp committed
104
`queryTechnique`: Experimental Technique.
105

Digp's avatar
Digp committed
106
`queryStatus`: Released/Obsolete and related status information.
107

Digp's avatar
Digp committed
108
`queryNDBId`: Cross-reference NDB ID.
109

Digp's avatar
Digp committed
110
`queryAPI`: Subfunction of all the previous, which can be used to make alternative queries.
111
112
113
114
&nbsp;

#### **Classify PDB structures** (PDB ID as input)

Digp's avatar
Digp committed
115
`classifyRNA`: Categorizes a structure in different RNA groups.
116

Digp's avatar
Digp committed
117
`classifyDNA`: Categorizes a structure in different DNA groups.
118
119
120
121
&nbsp;

#### **Input mmCIF data**

Digp's avatar
Digp committed
122
123
`cifDownload`: Downloads structure from Protein Data Bank.

124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
`cifParser`: Reads the 14th common sections of all mmCIF files in the PDB and generates a CIF S4 object.

`cifAsPDB`: Wrapper of cifParser that generates a pdb object (bio3d compatible S3 object).
&nbsp;

#### **CIF accessors** (Find descriptions in mmCIF dicctionary: https://mmcif.wwpdb.org/)

`cifAtom_site`: Access the coordinates of a CIF object (read by cifParser). The resulting object is not compatible with bio3d functions, see cifAsPDB for that.

`cifAtom_sites`

`cifAtom_type`

`cifAudit_author`

`cifAudit_conform`

`cifChem_comp`

`cifDatabase_2`

`cifEntity`

`cifEntry`

`cifExptl`

`cifPdbx_database_status`

`cifStruct`

`cifStruct_asym`

`cifStruct_keywords`
&nbsp;

#### **Structure analysis**

Digp's avatar
Digp committed
162
`selectModel`: Selects the model of interest.
163

Digp's avatar
Digp committed
164
`findBindingSite`: Same as pipeProtNucData for a single structure.
165

Digp's avatar
Digp committed
166
`measureEntityDist`: Measures distances between given entities.
167

Digp's avatar
Digp committed
168
`measureElenoDist`: Measures distances between given atoms.
169

Digp's avatar
Digp committed
170
`trimSphere`: Trim a pdb object and a surrounding sphere of atoms.
171

Digp's avatar
Digp committed
172
`trimByID`: Same as trimSphere using the IDs and output of pipeNucData.
173

Digp's avatar
Digp committed
174
`checkNuc`: Checks the integrity of all the nucleotides in a given Nucleic Acid structure.
175

Digp's avatar
Digp committed
176
`measureNuc`: Measures a defult/desired set of distances, angles and torsional angles for a given Nucleic Acid structure.
177

Digp's avatar
Digp committed
178
`rVector`: Computes the rVectors between all nucleobases of a structure (source: Bottaro et al, 2014).
179

Digp's avatar
Digp committed
180
`eRMSD`: Compares structures with the same number of residues using the rVectors (source: Bottaro et al, 2014).
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218

`dssr`: Wrapper of DSSR software (source: Lu et al, 2015), if installed.
&nbsp;

&nbsp;

### Exploratory analysis

`plotCategorical`

`plotCircularDistribution`

`plotEtaTheta`

`plot_et`

`plotSetOfDistributions`

`rvec_plot`


## Developers
-------------

Diego Gallego

Leonardo Darré (Former Developer)
&nbsp;

&nbsp;

*Molecular Modeling and Bioinformatics Group.*


## License
----------

GPL-3 (See LICENSE)