DSSR in the visualization of DNA/RNA structures

By following DSSR citations, I recently came across the article Interactive Visualization of RNA and DNA Structures by Lindow et al. The paper introduced a DNA/RNA visualization tool that integrates 1D sequence, 2D secondary structure in linear and graph representations, and 3D backbone ribbons and base ladders, all in one package. Notably, the 3D visualization was tailored for DNA/RNA structures and achieved quite impressive results. A nice feature of the 2D graph representation is the handling of multiple chains.

Reading through the main text and the supplementary material, I was surprised to see the so many locations where DSSR was mentioned, especially the following:

Our approach detects all standard and many modified nucleotides as well as the most common base pairs. Further special cases could be easily added. Yet, the system we developed should not be seen as a replacement for well established tools like DSSR. Rather, it shows what can be achieved with modern techniques in terms of both computation and rendering.

Overall, DSSR is an analysis/annotation tool that is supposedly agnostic to visualization programs. It derives a huge number of structural features that are unlikely to be matched elsewhere. I collaborated with Bob Hanson so that Jmol can directly take advantage of what DSSR has to offer, not just for the visualization of (modified) nucleotides and some common base pairs, but also the interactive selection of loops, pseudoknots, coaxial stacks, and various motifs. In particular, the SQL-like selection syntax Bob developed is really flexible and extremely powerful. I collaborated with Thomas Holder so that PyMOL can gain DNA/RNA domain knowledge. The resultant dssr_block PyMOL plugin is quite useful for creating base/base-pair block images with many revealing features, especially for small to medium-sized DNA/RNA structures. It is obvious to me that PyMOL (or any other molecular visualization tool) would benefit greatly from SQL-like selections of DSSR-derived features of nucleic acid structures, just as Jmol does.

In the Lindow et al. paper, some of the references to DSSR are technical in nature. Here, I’d like to respond and clarify each of them. Since DSSR is being actively developed and supported, I always welcome any feedback on the 3DNA Forum. Following and responding to literature represents another way that I strive to make DSSR a better tool to serve the community.

Built on their experience from 3DNA, Lu et al. developed DSSR [27], a very powerful tool to analyze RNA structures that uses Jmol for the 3D visualization. Recently, Hanson and Lu described this integration [10], which is based on a JSON-interface that directly couples DSSR and the 3D visualization of Jmol. This is a great improvement, but still missing is the integration of 2D secondary structure visualizations and brushing & linking techniques to enable simple selection with and exploration of the 3D molecular structure. One contribution of this paper is to show how a full linking between 3D and 2D visualizations can be done and what benefits arise from such a tight coupling (see Sects. 8 and 9).

This is a valid point, and the authors did a good job. Actually, one of the reviewers of our DSSR-Jmol paper brought up this point, and we acknowledged the limitation. While passing DSSR-derived secondary structural features (in DBN or .ct format) to a 2D visualization tool is straightforward, the connection would not be as smooth as we’d like it to be.

For this purpose, other approaches rely on the unique naming and ordering of the atoms [27], for example, N1, C2, N3, C4, C5, C6 etc. We found that this information is not always reliable.

The naming of the purines and pyrimidines follows the IUPAC standard and is a prerequisite of DNA/RNA structures in the PDB. In my experience, I have never found a single case where such information is not reliable. See below for abasic sites in PDB id 3BWP, and 4SU (4-thiouridine) in PDB id 5AFI.

We compared these results with the latest version of DSSR [27]. Our approach is able to correctly detect all regular nucleotides and most of the modified and undefined nucleotides. In the following, we describe the minor differences.

It is not clear what was the “latest” version of DSSR that was actually used in the paper. Note that DSSR has version info as in v1.8.3-2018oct29. I deliberately put the release date along with the version number.

For dataset 4RGE, we detected 3 modified uracil nucleotides that were not labeled as modified by DSSR. These nucleotides have a DNA backbone instead of an RNA one.

DSSR takes A, C, G, T, U as standard nucleotides, even if T is in RNA or U is in DNA. So this result is expected.

Dataset 3BWP contains 7 nucleotides that only consist of the backbone part without bases. While our approach marks these as undefined, in DSSR they are not detected at all.

The 7 nucleotides on 3BWP are abasic sites, i.e., without base atoms (N1, C2, N3, C4, C5, C6 etc), so they do not possess base reference frames. From early on, DSSR had the --abasic option for such cases. As of v1.7.3-2017dec26, DSSR directly incorporated abasic sites into the analysis. So thereafter they are detected by DSSR, by default.

Furthermore, in 5AFI we mark 3 nucleotides as undefined, while these are detected as a modified uracil by DSSR. This is due to the base containing sulfur instead of oxygen, so they possibly are sulfur analogs of uracil.

Presumably, the authors are referring to 4SU, 4-thiouridine, clearly a modified nucleotide occurring in 137 PDB entries (as of 2018oct28). DSSR detects three cases in 5AFI, as shown here: 4SU-u 3 v.4SU8,w.4SU8,y.4SU8

We also compared the results of our base pair detection (Suppl. Tab. 1). We determined all Watson-Crick, Hoogsteen, and Wobble pairs, and the reverse versions of the first two. For most of the datasets, our method returned the same results as DSSR. In particular, both approaches never created contradicting results, which means all common base pairs had identical pair type. In general, our geometrical approach generates slightly more base pairs compared to DSSR. However, when investigating both, the base pairs determined by DSSR but not by our approach and vice versa, we found that most of these pairs are border-line cases, where the decision was made depending on the threshold of the geometrical heuristic. Only in a few cases, the differences were not clear for both approaches, see Suppl. Fig. 3.

In Suppl. Fig. 3,

… However, the hydrogen bonds for classical G-U Wobble pairs seem to be quite unrealistic for the bottom left pair. Either this is a limitation of DSSR or it is some kind of specific Wobble pair with other hydrogen bonds than the depicted ones that our approach does not detect.

I echo the point that border-line cases could cause discrepancies between different methods. However, things can get easily clarified in concrete examples. Unfortunately, the authors did not specify the cases used in their Suppl. Fig. 3. I finally figured out the DSSR-assigned G-U Wobble pair in PDB id 1S72, U2586—G2592. As shown in the figure below, DSSR detects two H-bonds (dashed pink lines), "N3(imino)*N2(amino)[3.05],O4(carbonyl)-N1(imino)[2.77]". Note that one of the H-bonds is between two donors, N3(U) and N2(G), thus the * symbol. The H-bonds are by no means as those in “classical G-U Wobble pairs”. Yet, the pair is clearly Wobble-like, and that’s why it was assigned “Wobble”. To avoid such confusions, I’ve revised DSSR to tighten the criteria of G-U Wobble pair. As of v1.8.3-2018oct29, this pair is called "~Wobble".

DSSR-assigned ~Wobble pair]

Nevertheless, our evaluation (Sect. 8.1) shows that with the proposed approaches in terms of quality we get very similar results to the ones obtained by tools like DSSR. In terms of speed, DSSR needs much longer run times. For example, for 4U4O, DSSR needed ~15 min for the secondary and tertiary structure analysis [27], while our algorithm only needs ~0.2 s (see Tab. 1).

As noted above, DSSR provides far more structural features than just the identifications of nucleotides and several common base pairs. Even for the identified base pairs, DSSR provides many more annotations and structural parameters than just the named pairs picked by the authors. Not surprisingly, DSSR is slower than the dedicated method for a specific purpose.

As of DSSR v1.8.3-2018oct29, I’ve added the --pair-only option that just outputs a complete listing of base-pairing information and then stops. Some sample runs are as below:

x3dna-dssr -i=1ehz.pdb --pair-only
x3dna-dssr -i=1ehz.pdb --pair-only --more
x3dna-dssr -i=1ehz.pdb --pair-only --json
x3dna-dssr -i=1ehz.pdb --pair-only --json | jq '.pairs[] | select(.name=="WC")'
x3dna-dssr -i=1ehz.pdb --pair-only --more --json | jq .
x3dna-dssr -i=4u4o.cif --pair-only -o=4u4o-pairs.txt

Compared to the default settings, DSSR runs ~10 times faster when the --pair-only option is set; 36s vs 5m48s for 4U4O on my MacBook Pro 2017 (2.9 GHz Intel Core i7). Note the timing here is a complete run of the DSSR program (as shown above), from reading the mmCIF file to writing out all the derived features. In my hand, simply reading and parsing the 85MB 4U4O.cif would take ~5s. As a reference, just loading 4U4O.cif into PyMOL takes >6s. I’m thus more than surprised (and remain to be convinced) by the claim that their new algorithm “only needs” ~0.2s “for the secondary and tertiary structure analysis” of 4U4O.





Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu