As mentioned in the blog post Integrating DSSR into Jmol and PyMOL,
“The small size, zero configuration, extensive features, and robust performance make DSSR ideal to be integrated into other bioinformatics tools.” In addition to the DSSR-Jmol and DSSR-PyMOL integrations which I initiated and got personally involved, other bioinformatics resources are increasingly taking advantage of what DSSR has to offer. Here are a few examples:
Before aligning structures, STAR3D preprocesses PDB files with base-pairing annotation using either MC-Annotate (Gendron et al., 2001; Lemieux and Major, 2002) (for PDB inputs) or DSSR (Lu et al., 2015) (for PDB and mmCIF inputs) and pseudo-knot removal using RemovePseudoknots (Smit et al., 2008).
2014, RNApdbee: In order to facilitate a more comprehensive study, the webserver integrates the functionality of RNAView, MC-Annotate and 3DNA/DSSR, being the most common tools used for automated identification and classification of RNA base pairs.
2018, RNApdbee 2.0: Base pairs can be identified by 3DNA/DSSR (default) (4), RNAView (5), MC-Annotate (3) or newly added FR3D (15).
- The Universe of RNA Structures (URS) web-interface to the URS database (URSDB) makes extensive use of DSSR. For each analyzed structure (including PDB entries), the DSSR text output file (termed “DSSR-file”) is also available. Impressively, the maintainers of URS are quick with DSSR updates. The current version used by the URS website is DSSR v1.7.4-2018jan30.
Forty years after the yeast phenylalanine tRNA structure was solved, modified nucleotides should no longer be an issue for RNA structural analysis, especially for this classic molecule. Automatic processing of modified nucleotides is just one aspect of DSSR’s substantial set of features. Based on my understanding of the field, more structural bioinformatics resources/tools could benefit from DSSR. Simply put, if one’s project is related to 3D DNA or RNA structures, DSSR may be of certain help. It’s just a timing issue that DSSR would benefit a (much) larger community.
It is textbook knowledge that the Watson-Crick (WC) pairs are specific, forming only between A and T/U (A–T/U or T/U–A) or G and C (G–C or C–G). Furthermore, an A only forms one WC pair with a T, so is G vs. C. The widely used dot-bracket-notation (DBN) of DNA/RNA secondary structure depends crucially on this feature of specificity and uniqueness, by using matched parentheses to represent WC pairs, such as ((....)) for a GCGA (GNRA-type) tetra-loop of sequence GCGCGAGC.
The reality is more complicated, even for what’s presumably to be a ‘simple’ question of deriving RNA secondary structure from 3D coordinates in PDB. One subtlety is related to the ambiguity of atomic coordinates that renders one base apparently forming two WC pairs with two other complementary bases. As always, the case can be best illustrated with a concrete example. The image shown below is taken from PDB entry 1qp5 where C20 (on chain B) forms two WC pairs, each with G4 and G5 (on chain A) respectively.

Clearly, taking both as valid WC G–C pairs would make the resultant DBN illegitimate. DSSR resolves such discrepancies by taking structural context into consideration to ensure that one base can only have a WC pair with another base. Here the G5–C20 WC pair is retained whilst the G4–C20 WC is removed.
This issue, one base can form two WC pairs as derived from the PDB, has been noticed for a long while. Two examples from literature are shown below:
The crystal structure data files were downloaded from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (Berman et al. 2000). For each crystal structure, the set of canonical base pairs was extracted by selecting all Watson–Crick and standard G-U wobble pairs found by RNAview (Yang et al. 2003). Occasional conflicts in this list, where RNAview annotates two bases, x and y, as a standard base pair and also y and z as another conflicting base pair, were removed manually by visual inspection of the crystal structure in the program PyMOL (http://pymol.sourceforge.net/). The helix-extension data set was created by taking the canonical pairs and adding all additional base–base interactions identified by RNAview (excluding stacked bases and tertiary interactions) for which the direct neighbor was already in the collection. This means each base pair (i,j) was added if both i and j were still unpaired and if either (i + 1, j – 1) or (i –1, j + 1) were already in the set.
… From these complexes, we retrieved all RNA chains also marked as non-redundant by RNA3DHub. Each chain was annotated by FR3D. Because FR3D cannot analyze modified nucleotides or those with missing atoms, our present method does not include them either. If several models exist for a same chain, the first one only was considered. For the rest of this paper, the base pairs extracted from the FR3D annotations are those defined in the Leontis–Westhof geometric classification (24).
For each chain a secondary structure without pseudoknots was deduced from the annotated interactions, as follows. First all canonical Watson–Crick and wobble base pairs (i.e. A-U, G-C and G-U) were identified. Then, since many structures are naturally pseudoknotted, we used the K2N (25) implementation in the PyCogent (26) Python module to remove pseudoknots. Problems arise when a nucleotide is involved in several Watson–Crick base pairs (which is geometrically not feasible), probably due to an error of the automatic annotation. Those discrepancies were removed with a ad hoc algorithm such that if a nucleotide is involved in several Watson–Crick base pairs, we remove the base pair which belongs to the shortest helix.
By design, DSSR takes care of these ‘little details’, among other handy features (such as handling modified nucleotides and removing pseudoknots). By providing a robust infrastructure and comprehensive framework, DSSR allows users to focus on their research topics. If you have experience with other tools, such as RNAView and FR3D cited above, give DSSR a try: it may fit your needs better.
An abasic site is a location in DNA or RNA where a purine or pyrimidine base is missing. It is also termed an AP site (i.e., apurinic/apyrimidinic site) in biochemistry and molecular genetics. The abasic site can be formed either spontaneously (e.g., depurination) or due to DNA damage (occurring as intermediates in base excision repair). According to Wikipedia, “It has been estimated that under physiological conditions 10,000 apurinic sites and 500 apyrimidinic may be generated in a cell daily.”
In DSSR and 3DNA v2.x, nucleotides are recognized using standard atom names and base planarity. Thus, abasic sites are not taken as nucleotides (by default), simply because they do not have base atoms. DSSR introduced the --abasic option to account for abasic sites, a feature useful for detecting loops with backbone connectivity.
For example, by default, DSSR identifies one internal loop (no. 1 in the list below) in PDB entry 1l2c. With the --abasic option, two internal loops (including the one with the abasic site C.HPD18, no. 2) are detected.
List of 2 internal loops
1 symmetric internal loop: nts=6; [1,1]; linked by [#-1,#1]
summary: [2] 1 1 [B.1 C.24 B.3 C.22] 1 4
nts=6 GTATAC B.DG1,B.DT2,B.DA3,C.DT22,C.DA23,C.DC24
nts=1 T B.DT2
nts=1 A C.DA23
2 symmetric internal loop: nts=6; [1,1]; linked by [#1,#2]
summary: [2] 1 1 [B.6 C.19 B.8 C.17] 4 5
nts=6 CTTA?G B.DC6,B.DT7,B.DT8,C.DA17,C.HPD18,C.DG19
nts=1 T B.DT7
nts=1 ? C.HPD18
Note that C.HPD18 in 1l2c is a non-standard residue, as shown in the HETATM records below. Since the identity of C.HPD18 cannot be deduced from the atomic records, its one-letter code is designated as ?.
HETATM 346 P HPD C 18 -14.637 52.299 29.949 1.00 49.12 P
HETATM 347 O5' HPD C 18 -14.658 52.173 28.359 1.00 48.28 O
HETATM 348 O1P HPD C 18 -15.167 51.040 30.537 1.00 49.35 O
HETATM 349 O2P HPD C 18 -13.303 52.798 30.369 1.00 46.43 O
HETATM 350 C5' HPD C 18 -15.703 51.469 27.687 1.00 45.70 C
HETATM 351 O4' HPD C 18 -16.364 50.501 25.561 1.00 44.15 O
HETATM 352 O3' HPD C 18 -13.990 51.738 24.335 1.00 45.75 O
HETATM 353 C1' HPD C 18 -16.105 54.187 25.684 1.00 52.47 C
HETATM 354 O1' HPD C 18 -17.309 54.085 26.496 1.00 56.16 O
HETATM 355 C3' HPD C 18 -14.756 52.250 25.426 1.00 46.23 C
HETATM 356 C4' HPD C 18 -15.263 51.093 26.291 1.00 45.72 C
HETATM 357 C2' HPD C 18 -16.030 52.889 24.898 1.00 49.05 C
In contrast, the R.U-8 in PDB entry 4ifd is a standard U, and is properly labeled by DSSR.
ATOM 26418 P U R -8 139.362 21.962 129.430 1.00208.29 P
ATOM 26419 OP1 U R -8 140.062 20.821 130.074 1.00207.30 O
ATOM 26420 OP2 U R -8 140.113 23.208 129.129 1.00208.44 O1+
ATOM 26421 O5' U R -8 138.712 21.439 128.071 1.00157.60 O
ATOM 26422 C5' U R -8 139.507 20.790 127.087 1.00155.47 C
ATOM 26423 C4' U R -8 138.843 20.804 125.731 1.00152.27 C
ATOM 26424 O4' U R -8 138.538 22.172 125.352 1.00149.29 O
ATOM 26425 C3' U R -8 139.677 20.275 124.572 1.00152.70 C
ATOM 26426 O3' U R -8 139.670 18.859 124.478 1.00155.04 O
ATOM 26427 C2' U R -8 139.053 20.969 123.369 1.00150.26 C
ATOM 26428 O2' U R -8 137.849 20.322 122.984 1.00146.83 O
ATOM 26429 C1' U R -8 138.700 22.334 123.958 1.00147.35 C
This is yet another little detail that DSSR takes care of. It is the close consideration to many such subtle points that makes DSSR different. Overall, DSSR represents my view of what a scientific software program could be (or should be).
DSSR deliberately makes a distinction between ‘stem’ and ‘helix’, as shown below:
a helix is defined by base-stacking interactions, regardless of bp type and backbone connectivity, and may contain more than one stem.
a stem is defined as a helix consisting of only canonical WC/wobble pairs, with a continuous backbone.
By definition, a helix or stem consists of at least two base-pairs with stacking interactions. Helix is more inclusive and may contain more than one stem. This differentiation between ‘helix’ and ‘stem’ naturally leads to the definition of coaxial stacking, another widely used yet vaguely specified concept.
Again, the abstract notion can be best illustrated with a concrete example. In the classic yeast phenylalanine tRNA (PDB id: 1ehz), DSSR identifies that two stems [the acceptor stem (right) and the T stem (left)] are coaxially stacked within one double helix. See the figure below.

In the above schematics cartoon-block representation, each Watson-Crick base pair is rendered as a single, long rectangular block. Base identities of the G–U wobble, and the two non-canonical pairs (left terminal) are illustrated separately, with a larger block size for purines (G and A), and a smaller size for pyrimidines (C, U, and T).
I picked up ‘stem’ as a more specialized duplex because it is widely used in the RNA stem-loop structure, and in describing the four ‘paired regions’ of the classic tRNA cloverleaf secondary structure. On the other hand, ‘helix’ is (to me at least) a more general term, and thus more inclusive. It is worth noting that other terms such as ‘arm’, ‘paired region’, or ‘helix’ etc. have also been used interchangeably in the literature to refer what DSSR designated as ‘stem’.
As a side note, the basic algorithm for identifying helixes/stems in DSSR is also applicable for detecting G-quadruplexes. The same idea of ‘helix’ or ‘stem’ also applies here (see figure below for PDB entry: 5dww). Indeed, as of v1.7.0-2017oct19, DSSR contains a new section for the identification and characterization of G-quadruplexes.

DSSR is “an integrated software tool for dissecting the spatial structure of RNA”. It excels in consolidating the diverse pieces together via a coherent framework, readily accessible in a solid software product. DSSR may well serve as a cornerstone in RNA structural bioinformatics and would facilitate communications in the broad areas related to nucleic acids structures.
Among the rich set of RNA structural features derived by DSSR, the section of “List of stacks” apparently has not drawn much attention from the user community. As noted in the DSSR output,
a stack is an ordered list of nucleotides assembled together via base-stacking interactions, regardless of backbone connectivity. Stacking interactions within a stem are not included.
As always, the concept is best illustrated via concrete examples. Shown below are two such base stacks automatically identified by DSSR in the PDB entry 4p5j, the crystal structure of the tRNA-mimic from Turnip Yellow Mosaic Virus (TYMV) which was analyzed in detail in the 2015 DSSR NAR paper
 |
 |
| This critical linchpin in the tRNA mimic is stabilized by extensive base-stacking interactions. |
The intricate interactions between the D- and T-loops in the tRNA mimic include a five-base stack. |
The DSSR-introduced schematic block representation makes the base-stacking interactions immediately obvious. One can even easily discern the identity of bases, given the color-coding convention: A-red; C-yellow; G-green; T-blue; U-cyan. For example, the five stacked bases involved in the interaction of the D- and T-loops are: CAAAC
Moreover, longer and more complicate base-stacks can also be auto-detected by DSSR, as shown below for the asymmetric unit of PDB entry 1j8g, the crystal structure of an RNA quadruplex r(UGGGGU)4 at 0.61 Å resolution. Here DSSR identifies two 10-base stacks, each of UGGGGGGGGU (UG8U).

The corresponding DSSR output is as below:
List of 2 stacks
Note: a stack is an ordered list of nucleotides assembled together via
base-stacking interactions, regardless of backbone connectivity.
Stacking interactions within a stem are *not* included.
1 nts=10 UGGGGGGGGU A.U6,A.G5,A.G4,A.G3,A.G2,C.G22,C.G23,C.G24,C.G25,C.U26
2 nts=10 UGGGGGGGGU B.U16,B.G15,B.G14,B.G13,B.G12,D.G32,D.G33,D.G34,D.G35,D.U36
G-quadruplexes (hereafter referred to as G4) are a common type of higher-order DNA and RNA structures formed from G-rich sequences. The building block of G4 is a tetrad of guanines in a cyclic planar alignment, with four G+G pairs (cW+M type, see Figure below). A G4 structure is formed by stacking of G-tetrads and stabilized by cations at the center of the layers. G4 structures are polymorphic: the four strands can be parallel or anti-parallel, and loops connecting them can be of different types: lateral (edgewise), diagonal, or propeller (double-chain reversal). Moreover, G4 structures can be intra- or intermolecular, and even contain bulges.
From its initial releases, DSSR was able to detect G-tetrads, and listed them in a separate section. As of v1.7.0-2017oct19, DSSR has integrated existing features and created a new module to automatically identify and fully characterize G4 structures. The underlying algorithms have been further refined in v1.7.1-2017nov01, which was tested against all nucleic-acid-containing structures in the PDB.
Characterizations of three representative G4 examples (PDB entries 2m4p, 2hy9, and 5hix) are shown below, illustrating salient features (e.g., different types of loops) automatically extracted by DSSR.
2m9p
stem#1[#1] layers=3 INTRA-molecular parallel bulged-strands=1
1 syn=---- WC-->Major area=8.38 rise=3.64 twist=33.34 nts=4 GGGG A.DG3,A.DG8,A.DG12,A.DG16
2 syn=---- WC-->Major area=10.73 rise=3.23 twist=32.42 nts=4 GGGG A.DG5,A.DG9,A.DG13,A.DG17
3 syn=---- WC-->Major nts=4 GGGG A.DG6,A.DG10,A.DG14,A.DG18
strand#1* +1 DNA syn=--- nts=3 GGG A.DG3,A.DG5,A.DG6 bulged-nts=1 T A.DT4
strand#2 +1 DNA syn=--- nts=3 GGG A.DG8,A.DG9,A.DG10
strand#3 +1 DNA syn=--- nts=3 GGG A.DG12,A.DG13,A.DG14
strand#4 +1 DNA syn=--- nts=3 GGG A.DG16,A.DG17,A.DG18
loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT7
loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT11
loop#3 type=propeller strands=[#3,#4] nts=1 T A.DT15
2hy9
stem#1[#1] layers=3 INTRA-molecular anti-parallel
1 syn=ss-s Major-->WC area=13.69 rise=3.14 twist=19.08 nts=4 GGGG 1.DG4,1.DG10,1.DG18,1.DG22
2 syn=--s- WC-->Major area=13.40 rise=3.05 twist=28.05 nts=4 GGGG 1.DG5,1.DG11,1.DG17,1.DG23
3 syn=--s- WC-->Major nts=4 GGGG 1.DG6,1.DG12,1.DG16,1.DG24
strand#1 +1 DNA syn=s-- nts=3 GGG 1.DG4,1.DG5,1.DG6
strand#2 +1 DNA syn=s-- nts=3 GGG 1.DG10,1.DG11,1.DG12
strand#3 -1 DNA syn=-ss nts=3 GGG 1.DG18,1.DG17,1.DG16
strand#4 +1 DNA syn=s-- nts=3 GGG 1.DG22,1.DG23,1.DG24
loop#1 type=propeller strands=[#1,#2] nts=3 TTA 1.DT7,1.DT8,1.DA9
loop#2 type=lateral strands=[#2,#3] nts=3 TTA 1.DT13,1.DT14,1.DA15
loop#3 type=lateral strands=[#3,#4] nts=3 TTA 1.DT19,1.DT20,1.DA21
5hix
stem#1[#1] layers=4 inter-molecular anti-parallel
1 syn=s--s Major-->WC area=12.93 rise=3.64 twist=16.82 nts=4 GGGG A.DG1,B.DG4,A.DG12,B.DG9
2 syn=-ss- WC-->Major area=18.96 rise=3.71 twist=35.87 nts=4 GGGG A.DG2,B.DG3,A.DG11,B.DG10
3 syn=s--s Major-->WC area=15.16 rise=3.64 twist=18.64 nts=4 GGGG A.DG3,B.DG2,A.DG10,B.DG11
4 syn=-ss- WC-->Major nts=4 GGGG A.DG4,B.DG1,A.DG9,B.DG12
strand#1 +1 DNA syn=s-s- nts=4 GGGG A.DG1,A.DG2,A.DG3,A.DG4
strand#2 -1 DNA syn=-s-s nts=4 GGGG B.DG4,B.DG3,B.DG2,B.DG1
strand#3 -1 DNA syn=-s-s nts=4 GGGG A.DG12,A.DG11,A.DG10,A.DG9
strand#4 +1 DNA syn=s-s- nts=4 GGGG B.DG9,B.DG10,B.DG11,B.DG12
loop#1 type=diagonal strands=[#1,#3] nts=4 TTTT A.DT5,A.DT6,A.DT7,A.DT8
loop#2 type=diagonal strands=[#2,#4] nts=4 TTTT B.DT5,B.DT6,B.DT7,B.DT8

The molecular structure of the G-tetrad and two G4 structures in schematics representation. Upper left: atomic structure of G-tetrad, the building block of G4 structures. Here the green ‘square’ is created by connecting the C1’ atoms of the guanosines, and it is used to simplify the representation of G4 structures of PDB entries 2m4p (lower left) and 5dww (right). Note that the asymmetric unit of 5dww contains four biological units, which are coaxially stacked in two columns.
The DSSR output for PDB entry 5dww is listed below, showing the differences of a G4-helix vs. a G4-stem.
5dww
Note: a G4-helix is defined by stacking interactions of G4-tetrads, regardless
of backbone connectivity, and may contain more than one G4-stem.
helix#1[#2] layers=6 inter-molecular stems=[#1,#2]
1 syn=---- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
2 syn=.--- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
3 syn=---- WC-->Major area=28.36 rise=3.31 twist=-9.78 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
4 syn=---- Major-->WC area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG14,C.DG9,C.DG5
5 syn=---- Major-->WC area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG15,C.DG10,C.DG6
6 syn=---- Major-->WC nts=4 GGGG C.DG3,C.DG16,C.DG11,C.DG7
strand#1 DNA syn=-.---- nts=6 GGGGGG A.DG3,A.DG2,A.DG1,C.DG1,C.DG2,C.DG3
strand#2 DNA syn=------ nts=6 GGGGGG A.DG7,A.DG6,A.DG5,C.DG14,C.DG15,C.DG16
strand#3 DNA syn=------ nts=6 GGGGGG A.DG11,A.DG10,A.DG9,C.DG9,C.DG10,C.DG11
strand#4 DNA syn=------ nts=6 GGGGGG A.DG16,A.DG15,A.DG14,C.DG5,C.DG6,C.DG7
......
List of 4 G4-stems
Note: a G4-stem is defined as a G4-helix with backbone connectivity.
Bulges are also allowed along each of the four strands.
stem#1[#1] layers=3 INTRA-molecular parallel
1 syn=---- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
2 syn=.--- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
3 syn=---- WC-->Major nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
strand#1 +1 DNA syn=-.- nts=3 GGG A.DG1,A.DG2,A.DG3
strand#2 +1 DNA syn=--- nts=3 GGG A.DG5,A.DG6,A.DG7
strand#3 +1 DNA syn=--- nts=3 GGG A.DG9,A.DG10,A.DG11
strand#4 +1 DNA syn=--- nts=3 GGG A.DG14,A.DG15,A.DG16
loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT4
loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT8
loop#3 type=propeller strands=[#3,#4] nts=2 TT A.DT12,A.DT13
--------------------------------------------------------------------------
stem#2[#1] layers=3 INTRA-molecular parallel
1 syn=---- WC-->Major area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG5,C.DG9,C.DG14
2 syn=---- WC-->Major area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG6,C.DG10,C.DG15
3 syn=---- WC-->Major nts=4 GGGG C.DG3,C.DG7,C.DG11,C.DG16
strand#1 +1 DNA syn=--- nts=3 GGG C.DG1,C.DG2,C.DG3
strand#2 +1 DNA syn=--- nts=3 GGG C.DG5,C.DG6,C.DG7
strand#3 +1 DNA syn=--- nts=3 GGG C.DG9,C.DG10,C.DG11
strand#4 +1 DNA syn=--- nts=3 GGG C.DG14,C.DG15,C.DG16
loop#1 type=propeller strands=[#1,#2] nts=1 T C.DT4
loop#2 type=propeller strands=[#2,#3] nts=1 T C.DT8
loop#3 type=propeller strands=[#3,#4] nts=2 TT C.DT12,C.DT13