Nucleosome Dynamics Help - Method

Method

Method Help

nucleR

Reads from an MNase-seq experiment, aligned to a reference genome and saved as BAM files are processed with nucleR to detect position of nucleosomes in a given population of cells. Following signal theory, the nucleosome coverage is described as a combination of periodic waves which are then subjected to fast Fourier transformation to remove the high frequency components responsible of the noise. Peaks are detected in the smoothed profile to annotate the nucleosome dyads.

An automatic scoring of the peaks is provided, which accounts for peak height and sharpness.

  • The height of a peak is determined based on the number of reads mapped to the nucleosome dyad, but represented as a probability inside a Normal distribution (score between 0 and 1).
  • The sharpness is a measure of nucleosome fuzziness based on the dispersion of reads’ centers with respect to the nucleosome dyad. If a peak is very narrow and the surrounding regions have low coverage, nucleosome is well positioned, while wide or overlapping peaks are probably fuzzy nucleosomes. Width score can have values between 0 and 1.

Those high and sharp peaks (score height>0.4, score width>0.6) are labelled as “well positioned” nucleosomes, while flat peaks are labelled as “fuzzy”.

NucDyn

Changes in nucleosome architecture between two MNase-seq experiments can be detected with NucDyn. Each experiment must be aligned to the reference genome and stored in BAM format. The method identifies two types of changes:

  • displacement of nucleosomes (shifts)
  • occupancy difference (insertions and evictions)

The first step is to discard reads unchanged between the two experiments as follows:

  • Reads that are identical in both experiments are removed from further analysis.
  • Reads with the same starting or ending point or those that can be fitted in longer read in the paired experiment are also removed.

Then, the remaining reads are paired between the two experiments to identify shifts using a dynamic programming algorithm designed to maximize:

  1. the number of pairs,
  2. the proximity in the middle points of the paired reads,
  3. the assignment of the paired reads to the same nucleosome.

To achieve these objectives the dynamic programming highly penalizes gaps and scores read pairs inversely proportional to their distance, with a –infinite score when the distance between the middle point of the reads is longer than half the length of the nucleosome. The output of this procedure is a set of pairs where reads have been shifting from one experiment to the other. These shifts are accumulated to define hotspots which are further analyzed.

Changes in occupancy between the two experiments are then identified. To find locally significant differences in coverage, a Z-score is computed for every position x across the genome, normalizing it in 10 000 bp windows:

$$Z = {m - E(m) \over (V(m))^{1/2}}$$

where `m` is the number of reads covering position x in experiment 1, `E(m)=nf` (with `f` being the fraction of total reads in the window `(N)` that corresponds to experiment 1 `(M)`; `n` is the number of reads covering position x in both experiments) and `V(m)` is the expected variance of a hypergeometric distribution, i.e. $V(m)=nf(1-f){N-n \over n-1}$ . Positive Z- peaks mean that at that point the read coverage found at experiment 1 is higher than the coverage at experiment 2 and a (1→2) deletion hotspot is annotated. Similarly, negative Z- peaks signal (1→2) insertions.

The statistical significance of the changes found (shifts, insertions and deletions) is evaluated from the p-values derived from Fisher’s test using a contingency table between the reads in each experiment (columns) and the reads at a given position (rows):

`M-m``N-M-n+m``N-n`
`m``n-m``n`
`M``N-M``N`

Nucleosome Free Regions (NFR)

NFRs are regions that do not contain any nucleosome call. The minimum and maximum length considered can be defined by the user. Default values are 110 (“Minimum width” parameter) and 400 (“Maximum width” parameter).

Nucleosome Periodicity

To study the periodicity in the nucleosome coverage inside a gene, we define a theoretical nucleosome coverage formed by two periodic signals following an exponential decay:

  • starting from the +1 nucleosome (located just after the transcription start site; TSS) and going downstream,
  • starting at end of the gene (the –last nucleosome located just before the transcription termination site; TTS) and going upstream.

When +1 and –last originated signals sum up, the gene is considered “phased” and nucleosomes are typically well located inside the gene body, while when they cancel, the gene is considered “anti-phased” and typically show fuzzy nucleosomes in the gene body.

To quantify periodicity of genes we compute two numeric scores per gene:

  1. An autocorrelation score

    $$R(T) = \int_{X_1}^{X_2}I(x) * I(x-T)dx$$

    where `I(x)` stands for the nucleosome coverage, `X_1` and `X_2` are the dyads of +1 and –last nucleosomes and `T` is the periodic distance between two nucleosomes. This values range from 0 to 1, the closest to 1 the more periodic the signal is.

  2. A “phasing” score defined as the modulus of dividing the nucleosomal distance (distance between the dyads of +1 and -last nucleosomes) by `T`:

    $$nuc_{dist}-{T\left[{nuc_{dist} \over T}\right]}$$

    A “phased” gene is defined when the “phasing” score is close to 0, and an “anti-phased” gene is defined when the score is close to `T/2`.

TSS classification

Classification of genes according to their nucleosome architecture around TSS can also be performed.

  • -1 and +1 nucleosomes are detected in a window of 300 base-pairs around the TSS (this distance can be modified setting the parameter “Window”). They can be classified as well positioned, fuzzy or uncertain, or they can be missing if they are farther than the defined distance.
  • Depending on the distance between -1 and +1 nucleosome dyads, TSS can be defined as open or close (if the distance between the two dyads is shorter than 215 then the TSS is close, otherwise it is open. This value can be modified setting the parameter “Open threshold”). If +1 and -1 are closer than the required distance, they are marked as “overlap”.

Stiffness

Analysis of the sliding propensities of nucleosomes are also reported, by computing the stiffness associated to the sequence displacement. We estimate nucleosome stiffness using the standard deviation of reads mapped to a given nucleosome, fitting a Gaussian function to their coverage. Stiffness is derived assuming an harmonic potential

$$θ=2{k_BT \over sd^2}$$

where `k_B` is the Boltzmann’s constant, T is the temperature at which experiment is done and sd is the standard distribution of the fitted Gaussian.