README.md 5.96 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<snippet>
  <content>
# R PACKAGE: veriNA3d

VeriNA3d is an R package for the analysis of Nucleic Acid structural data. The software was developed on top of bio3d (Grant et al, 2006) with a higher level of abstraction. In addition of single-structure analyses, veriNA3d also implements pipelines to handle whole datasets of mmCIF/PDB structures. As far as we know, no similar software has been previously distributed, thus it aims to fill a gap in the data mining pipelines of PDB structural data analyses.

## Installation
---------------

Instructions for Unix systems

1- Make sure you have all the dependencies already installed in R. If not the case, open R and run:
&nbsp;

    install.packages(c("bio3d", "circlize", "jsonlite", "plot3D", "MASS", "RColorBrewer", "RANN"))

Digp's avatar
Digp committed
17
18
19
2- Download veriNA3d from GitLab ("http://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d/repository/archive.zip?ref=master").
The zip file contains two equivalent versions of the package:

20
    A- veriNA3d_R-3.5.tar.gz
Digp's avatar
Digp committed
21

22
23
    B- veriNA3d_R-3.4.tar.gz 

Digp's avatar
Digp committed
24
25
26
The whole package has been developed and tested in R-3.5, which makes it the recommended option. Furthermore, using R-3.5 speeds up the cifParser function, which has a dramatic effect when working with large mmCIF files.

3- Unzip the file and copy-paste the desired version of the package in your working directory.
27

Digp's avatar
Digp committed
28
4- Open R and run:
29
30
&nbsp;

Digp's avatar
Digp committed
31
32
33
34
    install.packages("veriNA3d_R-3.5.tar.gz", repos = NULL, type="source")

5- If desired, remove the unnecessary .tar.gz files and the .zip file.

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213

## Documentation
----------------

### Dataset level

`getLeontisList`: Get list of representative/non-redundant RNA structures organized in Equivalence Classes (source: Leontis & Zirbel, 2012)

`getAltRepres`: Apply filters (e.g. just protein-RNA structures) to select other representants from the members of each class

`represAsDataFrame`: From the output of getLeontisList or getAltRepres, generate a data.frame in which each row corresponds to a RNA chain, rather than an Equivalence Class

`pipeNucData`: From a list of RNA structures/chains computes and returns structural data at the level of the nucleotide

`pipeProtNucData`: From a list of protein-RNA structures computes and returns the interaction sites distances and atoms

`applyToPDB`: Applies a desired function to a list of PDB IDs

`queryEntryList`: Returns the whole list of PDB IDs in the database

`queryObsoleteList`: Returns the list of Obsolete PDB IDs in the database

`cleanByPucker`: From the output of pipeNucData subsets a desired subset of nucleotides in a given puckering conformation
&nbsp;

&nbsp;


### Single-structure level

#### **Functions to query PDB data using the PDBe (EMBL-EBI) REST API or a mirror API from the MMB Lab** (All of them take a PDB ID as input)

`queryAuthors`: List of authors

`queryReldate`: Release date

`queryDepdate`: Deposition date

`queryRevdate`: Revision date

`queryCompound`: PDB structure title

`queryCompType`: Compound type (e.g. Nuc or Prot-nuc)

`queryChains`: Chain information

`queryEntities`: Entitity information

`countEntities`: In a given pdbID it counts the total number of each different kind of entity (RNA, DNA, Protein ...)

`queryFormats`: File formats for the given ID

`queryHeader`: PDB Header

`queryHetAtms`: HETATM entities in structure (includes modified residues, ions and ligands)

`hasHetAtm`: Checks wether a a given structure contains a particular HETATM entity. It makes use of queryHetAtms

`queryModres`: Modified residues

`queryLigands`: Ligands in structure

`queryOrgLigands`: Ligands in structure (substracting ions)

`queryResol`: Resolution (if applicable)

`queryTechnique`: Experimental Technique

`queryStatus`: Released/Obsolete and related status information

`queryNDBId`: Cross-reference NDB ID

`queryAPI`: Subfunction of all the previous, which can be used to make alternative queries
&nbsp;

#### **Classify PDB structures** (PDB ID as input)

`classifyRNA`: Categorizes a structure in "nakedRNA", "protRNA", "ligandRNA", "DNARNA" or "NoRNA"

`classifyDNA`: Categorizes a structure in "nakedDNA", "protDNA", "ligandDNA", "DNARNA" or "NoDNA"
&nbsp;

#### **Input mmCIF data**

`cifParser`: Reads the 14th common sections of all mmCIF files in the PDB and generates a CIF S4 object.

`cifAsPDB`: Wrapper of cifParser that generates a pdb object (bio3d compatible S3 object).
&nbsp;

#### **CIF accessors** (Find descriptions in mmCIF dicctionary: http://mmcif.wwpdb.org/)

`cifAtom_site`: Access the coordinates of a CIF object (read by cifParser). The resulting object is not compatible with bio3d functions, see cifAsPDB for that.

`cifAtom_sites`

`cifAtom_type`

`cifAudit_author`

`cifAudit_conform`

`cifChem_comp`

`cifDatabase_2`

`cifEntity`

`cifEntry`

`cifExptl`

`cifPdbx_database_status`

`cifStruct`

`cifStruct_asym`

`cifStruct_keywords`
&nbsp;

#### **Structure analysis**

`selectModel`: Selects the model of interest

`findBindingSite`: Same as pipeProtNucData for a single structure

`measureEntityDist`: Measures distances between given entities

`measureElenoDist`: Measures distances between gicen atoms

`trimSphere`: Trim a pdb object and a surrounding sphere of atoms

`trimByID`: Same as trimSphere using the IDs and output of pipeNucData

`checkNuc`: Checks the integrity of all the nucleotides in a given Nucleic Acid structure

`measureNuc`: Measures a defult/desired set of distances, angles and torsional angles for a given Nucleic Acid strucutre

`rVector`: Computes the rVectors between all nucleobases of a structure (source: Bottaro et al, 2014)

`eRMSD`: Compares structures with the same number of residues using the rVectors (source: Bottaro et al, 2014)

`dssr`: Wrapper of DSSR software (source: Lu et al, 2015), if installed.
&nbsp;

&nbsp;

### Exploratory analysis

`plotCategorical`

`plotCircularDistribution`

`plotEtaTheta`

`plot_et`

`plotSetOfDistributions`

`rvec_plot`


## Developers
-------------

Diego Gallego

Leonardo Darré (Former Developer)
&nbsp;

&nbsp;

*Molecular Modeling and Bioinformatics Group.*


## License
----------

GPL-3 (See LICENSE)