README.md 5.62 KB
Newer Older
1
2
3
4
5
6
7
8
9
<snippet>
  <content>
# R PACKAGE: veriNA3d

VeriNA3d is an R package for the analysis of Nucleic Acid structural data. The software was developed on top of bio3d (Grant et al, 2006) with a higher level of abstraction. In addition of single-structure analyses, veriNA3d also implements pipelines to handle whole datasets of mmCIF/PDB structures. As far as we know, no similar software has been previously distributed, thus it aims to fill a gap in the data mining pipelines of PDB structural data analyses.

## Installation
---------------

10
Instructions
11
12
13
14
15
16

1- Make sure you have all the dependencies already installed in R. If not the case, open R and run:
&nbsp;

    install.packages(c("bio3d", "circlize", "jsonlite", "plot3D", "MASS", "RColorBrewer", "RANN"))

17
2- Install veriNA3d according with your R version:
Digp's avatar
Digp committed
18

19
&nbsp;
Digp's avatar
Digp committed
20

21
    install.packages("https://mmb.irbbarcelona.org/gitlab/dgallego/veriNA3d/raw/master/veriNA3d.tar.gz", repos = NULL)
Digp's avatar
Digp committed
22

23
24
3- To start using it, just load the package!
&nbsp;
25
26
27

    library(veriNA3d)

28
29
30
31
32
## Documentation
----------------

### Dataset level

Digp's avatar
Digp committed
33
`getLeontisList`: Get list of representative/non-redundant RNA structures organized in Equivalence Classes (source: Leontis & Zirbel, 2012).
34

Digp's avatar
Digp committed
35
`getAltRepres`: Apply filters (e.g. just protein-RNA structures) to select other representants from the members of each class.
36

Digp's avatar
Digp committed
37
`represAsDataFrame`: From the output of getLeontisList or getAltRepres, generate a data.frame in which each row corresponds to a RNA chain, rather than an Equivalence Class.
38

Digp's avatar
Digp committed
39
`pipeNucData`: From a list of RNA structures/chains computes and returns structural data at the level of the nucleotide.
40

Digp's avatar
Digp committed
41
`pipeProtNucData`: From a list of protein-RNA structures computes and returns the interaction sites distances and atoms.
42

Digp's avatar
Digp committed
43
`applyToPDB`: Applies a desired function to a list of PDB IDs.
44

Digp's avatar
Digp committed
45
`queryEntryList`: Returns the whole list of PDB IDs in the database.
46

Digp's avatar
Digp committed
47
`queryObsoleteList`: Returns the list of Obsolete PDB IDs in the database.
48

Digp's avatar
Digp committed
49
`cleanByPucker`: From the output of pipeNucData subsets a desired subset of nucleotides in a given puckering conformation.
50
51
52
53
54
55
56
57
58
&nbsp;

&nbsp;


### Single-structure level

#### **Functions to query PDB data using the PDBe (EMBL-EBI) REST API or a mirror API from the MMB Lab** (All of them take a PDB ID as input)

Digp's avatar
Digp committed
59
`queryAuthors`: List of authors.
60

Digp's avatar
Digp committed
61
`queryReldate`: Release date.
62

Digp's avatar
Digp committed
63
`queryDepdate`: Deposition date.
64

Digp's avatar
Digp committed
65
`queryRevdate`: Revision date.
66

Digp's avatar
Digp committed
67
`queryDescription`: Author description.
68

Digp's avatar
Digp committed
69
`queryCompType`: Compound type (e.g. Nuc or Prot-nuc).
70

Digp's avatar
Digp committed
71
`queryChains`: Chain information.
72

Digp's avatar
Digp committed
73
`queryEntities`: Entitity information.
74

Digp's avatar
Digp committed
75
`countEntities`: In a given pdbID it counts the total number of each different kind of entity (RNA, DNA, Protein ...).
76

Digp's avatar
Digp committed
77
`queryFormats`: File formats for the given ID.
78

Digp's avatar
Digp committed
79
`queryHeader`: PDB Header.
80

Digp's avatar
Digp committed
81
`queryHetAtms`: HETATM entities in structure (includes modified residues, ions and ligands).
82

Digp's avatar
Digp committed
83
`hasHetAtm`: Checks wether a a given structure contains a particular HETATM entity. It makes use of queryHetAtms.
84

Digp's avatar
Digp committed
85
`queryModres`: Modified residues.
86

Digp's avatar
Digp committed
87
`queryLigands`: Ligands in structure.
88

Digp's avatar
Digp committed
89
`queryOrgLigands`: Ligands in structure (substracting ions).
90

Digp's avatar
Digp committed
91
`queryResol`: Resolution (if applicable).
92

Digp's avatar
Digp committed
93
`queryTechnique`: Experimental Technique.
94

Digp's avatar
Digp committed
95
`queryStatus`: Released/Obsolete and related status information.
96

Digp's avatar
Digp committed
97
`queryNDBId`: Cross-reference NDB ID.
98

Digp's avatar
Digp committed
99
`queryAPI`: Subfunction of all the previous, which can be used to make alternative queries.
100
101
102
103
&nbsp;

#### **Classify PDB structures** (PDB ID as input)

Digp's avatar
Digp committed
104
`classifyRNA`: Categorizes a structure in different RNA groups.
105

Digp's avatar
Digp committed
106
`classifyDNA`: Categorizes a structure in different DNA groups.
107
108
109
110
&nbsp;

#### **Input mmCIF data**

Digp's avatar
Digp committed
111
112
`cifDownload`: Downloads structure from Protein Data Bank.

113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
`cifParser`: Reads the 14th common sections of all mmCIF files in the PDB and generates a CIF S4 object.

`cifAsPDB`: Wrapper of cifParser that generates a pdb object (bio3d compatible S3 object).
&nbsp;

#### **CIF accessors** (Find descriptions in mmCIF dicctionary: https://mmcif.wwpdb.org/)

`cifAtom_site`: Access the coordinates of a CIF object (read by cifParser). The resulting object is not compatible with bio3d functions, see cifAsPDB for that.

`cifAtom_sites`

`cifAtom_type`

`cifAudit_author`

`cifAudit_conform`

`cifChem_comp`

`cifDatabase_2`

`cifEntity`

`cifEntry`

`cifExptl`

`cifPdbx_database_status`

`cifStruct`

`cifStruct_asym`

`cifStruct_keywords`
&nbsp;

#### **Structure analysis**

Digp's avatar
Digp committed
151
`selectModel`: Selects the model of interest.
152

Digp's avatar
Digp committed
153
`findBindingSite`: Same as pipeProtNucData for a single structure.
154

Digp's avatar
Digp committed
155
`measureEntityDist`: Measures distances between given entities.
156

Digp's avatar
Digp committed
157
`measureElenoDist`: Measures distances between given atoms.
158

Digp's avatar
Digp committed
159
`trimSphere`: Trim a pdb object and a surrounding sphere of atoms.
160

Digp's avatar
Digp committed
161
`trimByID`: Same as trimSphere using the IDs and output of pipeNucData.
162

Digp's avatar
Digp committed
163
`checkNuc`: Checks the integrity of all the nucleotides in a given Nucleic Acid structure.
164

Digp's avatar
Digp committed
165
`measureNuc`: Measures a defult/desired set of distances, angles and torsional angles for a given Nucleic Acid structure.
166

Digp's avatar
Digp committed
167
`rVector`: Computes the rVectors between all nucleobases of a structure (source: Bottaro et al, 2014).
168

Digp's avatar
Digp committed
169
`eRMSD`: Compares structures with the same number of residues using the rVectors (source: Bottaro et al, 2014).
170

Digp's avatar
Digp committed
171
172
`RMSD`: Compares structures with the RMSD measure.

173
174
175
176
177
178
179
`dssr`: Wrapper of DSSR software (source: Lu et al, 2015), if installed.
&nbsp;

&nbsp;

### Exploratory analysis

Digp's avatar
Digp committed
180
`findHDR`: Finds High Density Regions in a 2D scatter plot
181

Digp's avatar
Digp committed
182
`plot2D`: Scatter plot of angles
183

Digp's avatar
Digp committed
184
`plot3Ddens`: 3D view of the density of 2D data.
185

Digp's avatar
Digp committed
186
`plotCategorical`
187

Digp's avatar
Digp committed
188
`plotCircularDistribution`
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206


## Developers
-------------

Diego Gallego

Leonardo Darré (Former Developer)
&nbsp;

&nbsp;

*Molecular Modeling and Bioinformatics Group.*


## License
----------

Digp's avatar
Digp committed
207
GPL-3