README.md 6.14 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<snippet>
  <content>
# R PACKAGE: veriNA3d

VeriNA3d is an R package for the analysis of Nucleic Acid structural data. The software was developed on top of bio3d (Grant et al, 2006) with a higher level of abstraction. In addition of single-structure analyses, veriNA3d also implements pipelines to handle whole datasets of mmCIF/PDB structures. As far as we know, no similar software has been previously distributed, thus it aims to fill a gap in the data mining pipelines of PDB structural data analyses.

## Installation
---------------

Instructions for Unix systems

1- Make sure you have all the dependencies already installed in R. If not the case, open R and run:
&nbsp;

    install.packages(c("bio3d", "circlize", "jsonlite", "plot3D", "MASS", "RColorBrewer", "RANN"))

17
2- Install veriNA3d according with your R version:
Digp's avatar
Digp committed
18

19
20
For R >= 3.5
&nbsp;
Digp's avatar
Digp committed
21

22
    install.packages("https://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d/raw/master/veriNA3d_R-3.5.tar.gz", repos = NULL)
23

24
If you have R-3.4:
25
26
&nbsp;

27
    install.packages("https://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d/raw/master/veriNA3d_R-3.4.tar.gz", repos = NULL)
Digp's avatar
Digp committed
28

29
The whole package has been developed and tested in R-3.5, which makes it the recommended option. Furthermore, using R-3.5 speeds up the cifParser function, which has a dramatic effect when working with large mmCIF files. The package has been also made available for R-3.4 since some unix users are experiencing problems when installing R-3.5.
Digp's avatar
Digp committed
30

31
32
3- To start using it, just load the package!
&nbsp;
33
34
35

    library(veriNA3d)

36
37
38
39
40
## Documentation
----------------

### Dataset level

Digp's avatar
Digp committed
41
`getLeontisList`: Get list of representative/non-redundant RNA structures organized in Equivalence Classes (source: Leontis & Zirbel, 2012).
42

Digp's avatar
Digp committed
43
`getAltRepres`: Apply filters (e.g. just protein-RNA structures) to select other representants from the members of each class.
44

Digp's avatar
Digp committed
45
`represAsDataFrame`: From the output of getLeontisList or getAltRepres, generate a data.frame in which each row corresponds to a RNA chain, rather than an Equivalence Class.
46

Digp's avatar
Digp committed
47
`pipeNucData`: From a list of RNA structures/chains computes and returns structural data at the level of the nucleotide.
48

Digp's avatar
Digp committed
49
`pipeProtNucData`: From a list of protein-RNA structures computes and returns the interaction sites distances and atoms.
50

Digp's avatar
Digp committed
51
`applyToPDB`: Applies a desired function to a list of PDB IDs.
52

Digp's avatar
Digp committed
53
`queryEntryList`: Returns the whole list of PDB IDs in the database.
54

Digp's avatar
Digp committed
55
`queryObsoleteList`: Returns the list of Obsolete PDB IDs in the database.
56

Digp's avatar
Digp committed
57
`cleanByPucker`: From the output of pipeNucData subsets a desired subset of nucleotides in a given puckering conformation.
58
59
60
61
62
63
64
65
66
&nbsp;

&nbsp;


### Single-structure level

#### **Functions to query PDB data using the PDBe (EMBL-EBI) REST API or a mirror API from the MMB Lab** (All of them take a PDB ID as input)

Digp's avatar
Digp committed
67
`queryAuthors`: List of authors.
68

Digp's avatar
Digp committed
69
`queryReldate`: Release date.
70

Digp's avatar
Digp committed
71
`queryDepdate`: Deposition date.
72

Digp's avatar
Digp committed
73
`queryRevdate`: Revision date.
74

Digp's avatar
Digp committed
75
`queryDescription`: Author description.
76

Digp's avatar
Digp committed
77
`queryCompType`: Compound type (e.g. Nuc or Prot-nuc).
78

Digp's avatar
Digp committed
79
`queryChains`: Chain information.
80

Digp's avatar
Digp committed
81
`queryEntities`: Entitity information.
82

Digp's avatar
Digp committed
83
`countEntities`: In a given pdbID it counts the total number of each different kind of entity (RNA, DNA, Protein ...).
84

Digp's avatar
Digp committed
85
`queryFormats`: File formats for the given ID.
86

Digp's avatar
Digp committed
87
`queryHeader`: PDB Header.
88

Digp's avatar
Digp committed
89
`queryHetAtms`: HETATM entities in structure (includes modified residues, ions and ligands).
90

Digp's avatar
Digp committed
91
`hasHetAtm`: Checks wether a a given structure contains a particular HETATM entity. It makes use of queryHetAtms.
92

Digp's avatar
Digp committed
93
`queryModres`: Modified residues.
94

Digp's avatar
Digp committed
95
`queryLigands`: Ligands in structure.
96

Digp's avatar
Digp committed
97
`queryOrgLigands`: Ligands in structure (substracting ions).
98

Digp's avatar
Digp committed
99
`queryResol`: Resolution (if applicable).
100

Digp's avatar
Digp committed
101
`queryTechnique`: Experimental Technique.
102

Digp's avatar
Digp committed
103
`queryStatus`: Released/Obsolete and related status information.
104

Digp's avatar
Digp committed
105
`queryNDBId`: Cross-reference NDB ID.
106

Digp's avatar
Digp committed
107
`queryAPI`: Subfunction of all the previous, which can be used to make alternative queries.
108
109
110
111
&nbsp;

#### **Classify PDB structures** (PDB ID as input)

Digp's avatar
Digp committed
112
`classifyRNA`: Categorizes a structure in different RNA groups.
113

Digp's avatar
Digp committed
114
`classifyDNA`: Categorizes a structure in different DNA groups.
115
116
117
118
&nbsp;

#### **Input mmCIF data**

Digp's avatar
Digp committed
119
120
`cifDownload`: Downloads structure from Protein Data Bank.

121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
`cifParser`: Reads the 14th common sections of all mmCIF files in the PDB and generates a CIF S4 object.

`cifAsPDB`: Wrapper of cifParser that generates a pdb object (bio3d compatible S3 object).
&nbsp;

#### **CIF accessors** (Find descriptions in mmCIF dicctionary: https://mmcif.wwpdb.org/)

`cifAtom_site`: Access the coordinates of a CIF object (read by cifParser). The resulting object is not compatible with bio3d functions, see cifAsPDB for that.

`cifAtom_sites`

`cifAtom_type`

`cifAudit_author`

`cifAudit_conform`

`cifChem_comp`

`cifDatabase_2`

`cifEntity`

`cifEntry`

`cifExptl`

`cifPdbx_database_status`

`cifStruct`

`cifStruct_asym`

`cifStruct_keywords`
&nbsp;

#### **Structure analysis**

Digp's avatar
Digp committed
159
`selectModel`: Selects the model of interest.
160

Digp's avatar
Digp committed
161
`findBindingSite`: Same as pipeProtNucData for a single structure.
162

Digp's avatar
Digp committed
163
`measureEntityDist`: Measures distances between given entities.
164

Digp's avatar
Digp committed
165
`measureElenoDist`: Measures distances between given atoms.
166

Digp's avatar
Digp committed
167
`trimSphere`: Trim a pdb object and a surrounding sphere of atoms.
168

Digp's avatar
Digp committed
169
`trimByID`: Same as trimSphere using the IDs and output of pipeNucData.
170

Digp's avatar
Digp committed
171
`checkNuc`: Checks the integrity of all the nucleotides in a given Nucleic Acid structure.
172

Digp's avatar
Digp committed
173
`measureNuc`: Measures a defult/desired set of distances, angles and torsional angles for a given Nucleic Acid structure.
174

Digp's avatar
Digp committed
175
`rVector`: Computes the rVectors between all nucleobases of a structure (source: Bottaro et al, 2014).
176

Digp's avatar
Digp committed
177
`eRMSD`: Compares structures with the same number of residues using the rVectors (source: Bottaro et al, 2014).
178

Digp's avatar
Digp committed
179
180
`RMSD`: Compares structures with the RMSD measure.

181
182
183
184
185
186
187
`dssr`: Wrapper of DSSR software (source: Lu et al, 2015), if installed.
&nbsp;

&nbsp;

### Exploratory analysis

Digp's avatar
Digp committed
188
`findHDR`: Finds High Density Regions in a 2D scatter plot
189

Digp's avatar
Digp committed
190
`plot2D`: Scatter plot of angles
191

Digp's avatar
Digp committed
192
`plot3Ddens`: 3D view of the density of 2D data.
193

Digp's avatar
Digp committed
194
`plotCategorical`
195

Digp's avatar
Digp committed
196
`plotCircularDistribution`
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214


## Developers
-------------

Diego Gallego

Leonardo Darré (Former Developer)
&nbsp;

&nbsp;

*Molecular Modeling and Bioinformatics Group.*


## License
----------

Digp's avatar
Digp committed
215
GPL-3