3DNA is a versatile, integrated software system for the analysis, rebuilding, and visualization of three-dimensional nucleic-acid-containing structures. The software is applicable not only to DNA (as the name 3DNA may imply) but also to complicated RNA structures and DNA-protein complexes. In 3DNA, structural analysis and model rebuilding are two sides of the same coin: the description of the structure is rigorous and reversible, thus allowing for its exact reconstruction based on the derived parameters. 3DNA automatically detects all non-canonical base pairs, base triplets and higher-order associations (collectively termed multiplets), and coaxially stacked helices; provides a comprehensive collection of fiber models of regular DNA and RNA helices; generates highly effective schematic presentations that reveal key features of nucleic-acid structures; performs undisturbed base mutations, and have facilities for the analysis of molecular dynamics simulation trajectories.

DSSR is an integrated software tool for dissecting the spatial structure of RNA. It is a representative of what would become the brand new version 3 of 3DNA. DSSR consolidates, refines, and significantly extends the functionality of 3DNA v2.x for RNA structural analysis. Among other features, DSSR denotes base-pairs by common names (e.g., WC, reverse WC, Hoogsteen A+U, reverse Hoogsteen A—U, wobble G—U, sheared G—A), the Saenger classification of 28 H-bonding types, and the Leontis-Westhof nomenclature of 12 basic geometric classes; determines double-helical regions, differentiates stems from helices, and provides a pragmatic definition of coaxial stacking interactions; identifies hairpin loops, bulges, internal loops, and multi-branch (junction) loops; characterizes pseudoknots of arbitrary complexity; outputs RNA secondary structure in commonly used formats (including the dot-bracket notation and connectivity table); identifies A-minor interactions, splayed-apart dinucleotide conformations, base-capping interactions, ribose zippers, G quadruplexes, i-motifs, kissing loops, U-turns, and k-turns etc. By connecting dots in RNA structural bioinformatics, it makes many common tasks simple and advanced applications feasible. DSSR comes with a professional User Manual, and some of its features have been integrated into Jmol and PyMOL. Moreover, the DSSR-Jmol paper, titled DSSR-enhanced visualization of nucleic acid structures in Jmol, has been featured in the cover image of the 2017 Web-server issue of Nucleic Acids Research (NAR).

3DNA version 3 is under active development. The SNAP program has been created from scratch for an integrated characterization of the three-dimensional Structures of Nucleic Acid-Protein complexes. Sharing the same new codebase as DSSR, SNAP works for DNA-protein as well as RNA-protein interactions. Other 3DNA v2.x programs (e.g., fiber, rebuild etc) are gradually distilled into version 3, and a new atomic coordinates-based homology searching tool is also being developed. In the end, 3DNA version 3 will consist of a suite of fully independent (as DSSR and SNAP) yet closely related programs, serving as cornerstones of DNA/RNA structural bioinformatics.

All 3DNA-related questions are welcome and should be directed to the 3DNA Forum. For the benefit of the community at large, I do not provide private support of 3DNA via email or personal message. As a general rule, I strive to provide a prompt and concrete response to each and every question posted on the Forum.

More info · Seeing is believing · Cover image · What’s new · 3DNA Forum · Download


One base forming two Watson-Crick pairs?

It is textbook knowledge that the Watson-Crick (WC) pairs are specific, forming only between A and T/U (A–T/U or T/U–A) or G and C (G–C or C–G). Furthermore, an A only forms one WC pair with a T, so is G vs. C. The widely used dot-bracket-notation (DBN) of DNA/RNA secondary structure depends crucially on this feature of specificity and uniqueness, by using matched parentheses to represent WC pairs, such as ((....)) for a GCGA (GNRA-type) tetra-loop of sequence GCGCGAGC.

The reality is more complicated, even for what’s presumably to be a ‘simple’ question of deriving RNA secondary structure from 3D coordinates in PDB. One subtlety is related to the ambiguity of atomic coordinates that renders one base apparently forming two WC pairs with two other complementary bases. As always, the case can be best illustrated with a concrete example. The image shown below is taken from PDB entry 1qp5 where C20 (on chain B) forms two WC pairs, each with G4 and G5 (on chain A) respectively.

C forming two WC G-C pairs in PDB entry 1qp5

Clearly, taking both as valid WC G–C pairs would make the resultant DBN illegitimate. DSSR resolves such discrepancies by taking structural context into consideration to ensure that one base can only have a WC pair with another base. Here the G5–C20 WC pair is retained whilst the G4–C20 WC is removed.

This issue, one base can form two WC pairs as derived from the PDB, has been noticed for a long while. Two examples from literature are shown below:

The crystal structure data files were downloaded from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (Berman et al. 2000). For each crystal structure, the set of canonical base pairs was extracted by selecting all Watson–Crick and standard G-U wobble pairs found by RNAview (Yang et al. 2003). Occasional conflicts in this list, where RNAview annotates two bases, x and y, as a standard base pair and also y and z as another conflicting base pair, were removed manually by visual inspection of the crystal structure in the program PyMOL (http://pymol.sourceforge.net/). The helix-extension data set was created by taking the canonical pairs and adding all additional base–base interactions identified by RNAview (excluding stacked bases and tertiary interactions) for which the direct neighbor was already in the collection. This means each base pair (i,j) was added if both i and j were still unpaired and if either (i + 1, j – 1) or (i –1, j + 1) were already in the set.

… From these complexes, we retrieved all RNA chains also marked as non-redundant by RNA3DHub. Each chain was annotated by FR3D. Because FR3D cannot analyze modified nucleotides or those with missing atoms, our present method does not include them either. If several models exist for a same chain, the first one only was considered. For the rest of this paper, the base pairs extracted from the FR3D annotations are those defined in the Leontis–Westhof geometric classification (24).

For each chain a secondary structure without pseudoknots was deduced from the annotated interactions, as follows. First all canonical Watson–Crick and wobble base pairs (i.e. A-U, G-C and G-U) were identified. Then, since many structures are naturally pseudoknotted, we used the K2N (25) implementation in the PyCogent (26) Python module to remove pseudoknots. Problems arise when a nucleotide is involved in several Watson–Crick base pairs (which is geometrically not feasible), probably due to an error of the automatic annotation. Those discrepancies were removed with a ad hoc algorithm such that if a nucleotide is involved in several Watson–Crick base pairs, we remove the base pair which belongs to the shortest helix.

By design, DSSR takes care of these ‘little details’, among other handy features (such as handling modified nucleotides and removing pseudoknots). By providing a robust infrastructure and comprehensive framework, DSSR allows users to focus on their research topics. If you have experience with other tools, such as RNAView and FR3D cited above, give DSSR a try: it may fit your needs better.



DNA conformational changes may play an active role in viral genome packaging

An article titled Simulations and electrostatic analysis suggest an active role for DNA conformational changes during genome packaging by bacteriophages has recently been published in bioRxiv. I was honored to have the opportunity collaborating with fellow researchers from University of Pennsylvania and Thomas Jefferson University in this significant piece of work.

Here is the abstract. Please download the PDF version to know more.

Motors that move DNA, or that move along DNA, play essential roles in DNA replication, transcription, recombination, and chromosome segregation. The mechanisms by which these DNA translocases operate remain largely unknown. Some double-stranded DNA (dsDNA) viruses use an ATP-dependent motor to drive DNA into preformed capsids. These include several human pathogens, as well as dsDNA bacteriophages (viruses that infect bacteria). We previously proposed that DNA is not a passive substrate of bacteriophage packaging motors but is, instead, an active component of the machinery. Computational studies on dsDNA in the channel of viral portal proteins reported here reveal DNA conformational changes consistent with that hypothesis. dsDNA becomes longer (“stretched”) in regions of high negative electrostatic potential, and shorter (“scrunched”) in regions of high positive potential. These results suggest a mechanism that couples the energy released by ATP hydrolysis to DNA translocation: The chemical cycle of ATP binding, hydrolysis and product release drives a cycle of protein conformational changes. This produces changes in the electrostatic potential in the channel through the portal, and these drive cyclic changes in the length of dsDNA. The DNA motions are captured by a coordinated protein-DNA grip-and-release cycle to produce DNA translocation. In short, the ATPase, portal and dsDNA work synergistically to promote genome packaging.



Handling of abasic sites in DSSR

An abasic site is a location in DNA or RNA where a purine or pyrimidine base is missing. It is also termed an AP site (i.e., apurinic/apyrimidinic site) in biochemistry and molecular genetics. The abasic site can be formed either spontaneously (e.g., depurination) or due to DNA damage (occurring as intermediates in base excision repair). According to Wikipedia, “It has been estimated that under physiological conditions 10,000 apurinic sites and 500 apyrimidinic may be generated in a cell daily.”

In DSSR and 3DNA v2.x, nucleotides are recognized using standard atom names and base planarity. Thus, abasic sites are not taken as nucleotides (by default), simply because they do not have base atoms. DSSR introduced the --abasic option to account for abasic sites, a feature useful for detecting loops with backbone connectivity.

For example, by default, DSSR identifies one internal loop (no. 1 in the list below) in PDB entry 1l2c. With the --abasic option, two internal loops (including the one with the abasic site C.HPD18, no. 2) are detected.

List of 2 internal loops
   1 symmetric internal loop: nts=6; [1,1]; linked by [#-1,#1]
     summary: [2] 1 1 [B.1 C.24 B.3 C.22] 1 4
     nts=6 GTATAC B.DG1,B.DT2,B.DA3,C.DT22,C.DA23,C.DC24
       nts=1 T B.DT2
       nts=1 A C.DA23
   2 symmetric internal loop: nts=6; [1,1]; linked by [#1,#2]
     summary: [2] 1 1 [B.6 C.19 B.8 C.17] 4 5
     nts=6 CTTA?G B.DC6,B.DT7,B.DT8,C.DA17,C.HPD18,C.DG19
       nts=1 T B.DT7
       nts=1 ? C.HPD18

Note that C.HPD18 in 1l2c is a non-standard residue, as shown in the HETATM records below. Since the identity of C.HPD18 cannot be deduced from the atomic records, its one-letter code is designated as ?.

HETATM  346  P   HPD C  18     -14.637  52.299  29.949  1.00 49.12           P
HETATM  347  O5' HPD C  18     -14.658  52.173  28.359  1.00 48.28           O
HETATM  348  O1P HPD C  18     -15.167  51.040  30.537  1.00 49.35           O
HETATM  349  O2P HPD C  18     -13.303  52.798  30.369  1.00 46.43           O
HETATM  350  C5' HPD C  18     -15.703  51.469  27.687  1.00 45.70           C
HETATM  351  O4' HPD C  18     -16.364  50.501  25.561  1.00 44.15           O
HETATM  352  O3' HPD C  18     -13.990  51.738  24.335  1.00 45.75           O
HETATM  353  C1' HPD C  18     -16.105  54.187  25.684  1.00 52.47           C
HETATM  354  O1' HPD C  18     -17.309  54.085  26.496  1.00 56.16           O
HETATM  355  C3' HPD C  18     -14.756  52.250  25.426  1.00 46.23           C
HETATM  356  C4' HPD C  18     -15.263  51.093  26.291  1.00 45.72           C
HETATM  357  C2' HPD C  18     -16.030  52.889  24.898  1.00 49.05           C

In contrast, the R.U-8 in PDB entry 4ifd is a standard U, and is properly labeled by DSSR.

ATOM  26418  P     U R  -8     139.362  21.962 129.430  1.00208.29           P
ATOM  26419  OP1   U R  -8     140.062  20.821 130.074  1.00207.30           O
ATOM  26420  OP2   U R  -8     140.113  23.208 129.129  1.00208.44           O1+
ATOM  26421  O5'   U R  -8     138.712  21.439 128.071  1.00157.60           O
ATOM  26422  C5'   U R  -8     139.507  20.790 127.087  1.00155.47           C
ATOM  26423  C4'   U R  -8     138.843  20.804 125.731  1.00152.27           C
ATOM  26424  O4'   U R  -8     138.538  22.172 125.352  1.00149.29           O
ATOM  26425  C3'   U R  -8     139.677  20.275 124.572  1.00152.70           C
ATOM  26426  O3'   U R  -8     139.670  18.859 124.478  1.00155.04           O
ATOM  26427  C2'   U R  -8     139.053  20.969 123.369  1.00150.26           C
ATOM  26428  O2'   U R  -8     137.849  20.322 122.984  1.00146.83           O
ATOM  26429  C1'   U R  -8     138.700  22.334 123.958  1.00147.35           C

This is yet another little detail that DSSR takes care of. It is the close consideration to many such subtle points that makes DSSR different. Overall, DSSR represents my view of what a scientific software program could be (or should be).



Stem, helix, and coaxial stacking in DSSR

DSSR deliberately makes a distinction between ‘stem’ and ‘helix’, as shown below:

a helix is defined by base-stacking interactions, regardless of bp type and backbone connectivity, and may contain more than one stem.

a stem is defined as a helix consisting of only canonical WC/wobble pairs, with a continuous backbone.

By definition, a helix or stem consists of at least two base-pairs with stacking interactions. Helix is more inclusive and may contain more than one stem. This differentiation between ‘helix’ and ‘stem’ naturally leads to the definition of coaxial stacking, another widely used yet vaguely specified concept.

Again, the abstract notion can be best illustrated with a concrete example. In the classic yeast phenylalanine tRNA (PDB id: 1ehz), DSSR identifies that two stems [the acceptor stem (right) and the T stem (left)] are coaxially stacked within one double helix. See the figure below.

tRNA acceptor and T stems in one helix (1ehz)

In the above schematics cartoon-block representation, each Watson-Crick base pair is rendered as a single, long rectangular block. Base identities of the G–U wobble, and the two non-canonical pairs (left terminal) are illustrated separately, with a larger block size for purines (G and A), and a smaller size for pyrimidines (C, U, and T).

I picked up ‘stem’ as a more specialized duplex because it is widely used in the RNA stem-loop structure, and in describing the four ‘paired regions’ of the classic tRNA cloverleaf secondary structure. On the other hand, ‘helix’ is (to me at least) a more general term, and thus more inclusive. It is worth noting that other terms such as ‘arm’, ‘paired region’, or ‘helix’ etc. have also been used interchangeably in the literature to refer what DSSR designated as ‘stem’.

As a side note, the basic algorithm for identifying helixes/stems in DSSR is also applicable for detecting G-quadruplexes. The same idea of ‘helix’ or ‘stem’ also applies here (see figure below for PDB entry: 5dww). Indeed, as of v1.7.0-2017oct19, DSSR contains a new section for the identification and characterization of G-quadruplexes.

G-quadruplex (PDB entry: 5dww)

DSSR is “an integrated software tool for dissecting the spatial structure of RNA”. It excels in consolidating the diverse pieces together via a coherent framework, readily accessible in a solid software product. DSSR may well serve as a cornerstone in RNA structural bioinformatics and would facilitate communications in the broad areas related to nucleic acids structures.



Base stacks in non-stem regions

Among the rich set of RNA structural features derived by DSSR, the section of “List of stacks” apparently has not drawn much attention from the user community. As noted in the DSSR output,

a stack is an ordered list of nucleotides assembled together via base-stacking interactions, regardless of backbone connectivity. Stacking interactions within a stem are not included.

As always, the concept is best illustrated via concrete examples. Shown below are two such base stacks automatically identified by DSSR in the PDB entry 4p5j, the crystal structure of the tRNA-mimic from Turnip Yellow Mosaic Virus (TYMV) which was analyzed in detail in the 2015 DSSR NAR paper

tRNA mimic linchpin stablized by base-stacking The D- and T-loops stablized by base-stacking
This critical linchpin in the tRNA mimic is stabilized by extensive base-stacking interactions. The intricate interactions between the D- and T-loops in the tRNA mimic include a five-base stack.

The DSSR-introduced schematic block representation makes the base-stacking interactions immediately obvious. One can even easily discern the identity of bases, given the color-coding convention: A-red; C-yellow; G-green; T-blue; U-cyan. For example, the five stacked bases involved in the interaction of the D- and T-loops are: CAAAC

Moreover, longer and more complicate base-stacks can also be auto-detected by DSSR, as shown below for the asymmetric unit of PDB entry 1j8g, the crystal structure of an RNA quadruplex r(UGGGGU)4 at 0.61 Å resolution. Here DSSR identifies two 10-base stacks, each of UGGGGGGGGU (UG8U).

Two 10-base stacks in 1j8g

The corresponding DSSR output is as below:

List of 2 stacks
  Note: a stack is an ordered list of nucleotides assembled together via
        base-stacking interactions, regardless of backbone connectivity.
        Stacking interactions within a stem are *not* included.
   1 nts=10 UGGGGGGGGU A.U6,A.G5,A.G4,A.G3,A.G2,C.G22,C.G23,C.G24,C.G25,C.U26
   2 nts=10 UGGGGGGGGU B.U16,B.G15,B.G14,B.G13,B.G12,D.G32,D.G33,D.G34,D.G35,D.U36



Identification and characterization of G-quadruplexes

G-quadruplexes (hereafter referred to as G4) are a common type of higher-order DNA and RNA structures formed from G-rich sequences. The building block of G4 is a tetrad of guanines in a cyclic planar alignment, with four G+G pairs (cW+M type, see Figure below). A G4 structure is formed by stacking of G-tetrads and stabilized by cations at the center of the layers. G4 structures are polymorphic: the four strands can be parallel or anti-parallel, and loops connecting them can be of different types: lateral (edgewise), diagonal, or propeller (double-chain reversal). Moreover, G4 structures can be intra- or intermolecular, and even contain bulges.

From its initial releases, DSSR was able to detect G-tetrads, and listed them in a separate section. As of v1.7.0-2017oct19, DSSR has integrated existing features and created a new module to automatically identify and fully characterize G4 structures. The underlying algorithms have been further refined in v1.7.1-2017nov01, which was tested against all nucleic-acid-containing structures in the PDB.

Characterizations of three representative G4 examples (PDB entries 2m4p, 2hy9, and 5hix) are shown below, illustrating salient features (e.g., different types of loops) automatically extracted by DSSR.


stem#1[#1] layers=3 INTRA-molecular parallel bulged-strands=1
   1 syn=---- WC-->Major area=8.38  rise=3.64 twist=33.34 nts=4 GGGG A.DG3,A.DG8,A.DG12,A.DG16
   2 syn=---- WC-->Major area=10.73 rise=3.23 twist=32.42 nts=4 GGGG A.DG5,A.DG9,A.DG13,A.DG17
   3 syn=---- WC-->Major                                  nts=4 GGGG A.DG6,A.DG10,A.DG14,A.DG18
    strand#1* +1 DNA syn=--- nts=3 GGG A.DG3,A.DG5,A.DG6 bulged-nts=1 T A.DT4
    strand#2  +1 DNA syn=--- nts=3 GGG A.DG8,A.DG9,A.DG10
    strand#3  +1 DNA syn=--- nts=3 GGG A.DG12,A.DG13,A.DG14
    strand#4  +1 DNA syn=--- nts=3 GGG A.DG16,A.DG17,A.DG18
    loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT7
    loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT11
    loop#3 type=propeller strands=[#3,#4] nts=1 T A.DT15


stem#1[#1] layers=3 INTRA-molecular anti-parallel
   1 syn=ss-s Major-->WC area=13.69 rise=3.14 twist=19.08 nts=4 GGGG 1.DG4,1.DG10,1.DG18,1.DG22
   2 syn=--s- WC-->Major area=13.40 rise=3.05 twist=28.05 nts=4 GGGG 1.DG5,1.DG11,1.DG17,1.DG23
   3 syn=--s- WC-->Major                                  nts=4 GGGG 1.DG6,1.DG12,1.DG16,1.DG24
    strand#1  +1 DNA syn=s-- nts=3 GGG 1.DG4,1.DG5,1.DG6
    strand#2  +1 DNA syn=s-- nts=3 GGG 1.DG10,1.DG11,1.DG12
    strand#3  -1 DNA syn=-ss nts=3 GGG 1.DG18,1.DG17,1.DG16
    strand#4  +1 DNA syn=s-- nts=3 GGG 1.DG22,1.DG23,1.DG24
    loop#1 type=propeller strands=[#1,#2] nts=3 TTA 1.DT7,1.DT8,1.DA9
    loop#2 type=lateral   strands=[#2,#3] nts=3 TTA 1.DT13,1.DT14,1.DA15
    loop#3 type=lateral   strands=[#3,#4] nts=3 TTA 1.DT19,1.DT20,1.DA21


stem#1[#1] layers=4 inter-molecular anti-parallel
   1 syn=s--s Major-->WC area=12.93 rise=3.64 twist=16.82 nts=4 GGGG A.DG1,B.DG4,A.DG12,B.DG9
   2 syn=-ss- WC-->Major area=18.96 rise=3.71 twist=35.87 nts=4 GGGG A.DG2,B.DG3,A.DG11,B.DG10
   3 syn=s--s Major-->WC area=15.16 rise=3.64 twist=18.64 nts=4 GGGG A.DG3,B.DG2,A.DG10,B.DG11
   4 syn=-ss- WC-->Major                                  nts=4 GGGG A.DG4,B.DG1,A.DG9,B.DG12
    strand#1  +1 DNA syn=s-s- nts=4 GGGG A.DG1,A.DG2,A.DG3,A.DG4
    strand#2  -1 DNA syn=-s-s nts=4 GGGG B.DG4,B.DG3,B.DG2,B.DG1
    strand#3  -1 DNA syn=-s-s nts=4 GGGG A.DG12,A.DG11,A.DG10,A.DG9
    strand#4  +1 DNA syn=s-s- nts=4 GGGG B.DG9,B.DG10,B.DG11,B.DG12
    loop#1 type=diagonal  strands=[#1,#3] nts=4 TTTT A.DT5,A.DT6,A.DT7,A.DT8
    loop#2 type=diagonal  strands=[#2,#4] nts=4 TTTT B.DT5,B.DT6,B.DT7,B.DT8

Representative G4 structures

The molecular structure of the G-tetrad and two G4 structures in schematics representation. Upper left: atomic structure of G-tetrad, the building block of G4 structures. Here the green ‘square’ is created by connecting the C1’ atoms of the guanosines, and it is used to simplify the representation of G4 structures of PDB entries 2m4p (lower left) and 5dww (right). Note that the asymmetric unit of 5dww contains four biological units, which are coaxially stacked in two columns.

The DSSR output for PDB entry 5dww is listed below, showing the differences of a G4-helix vs. a G4-stem.


 Note: a G4-helix is defined by stacking interactions of G4-tetrads, regardless
        of backbone connectivity, and may contain more than one G4-stem.
  helix#1[#2] layers=6 inter-molecular stems=[#1,#2]
   1 syn=---- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
   2 syn=.--- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
   3 syn=---- WC-->Major area=28.36 rise=3.31 twist=-9.78 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
   4 syn=---- Major-->WC area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG14,C.DG9,C.DG5
   5 syn=---- Major-->WC area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG15,C.DG10,C.DG6
   6 syn=---- Major-->WC                                  nts=4 GGGG C.DG3,C.DG16,C.DG11,C.DG7
    strand#1 DNA syn=-.---- nts=6 GGGGGG A.DG3,A.DG2,A.DG1,C.DG1,C.DG2,C.DG3
    strand#2 DNA syn=------ nts=6 GGGGGG A.DG7,A.DG6,A.DG5,C.DG14,C.DG15,C.DG16
    strand#3 DNA syn=------ nts=6 GGGGGG A.DG11,A.DG10,A.DG9,C.DG9,C.DG10,C.DG11
    strand#4 DNA syn=------ nts=6 GGGGGG A.DG16,A.DG15,A.DG14,C.DG5,C.DG6,C.DG7
List of 4 G4-stems
  Note: a G4-stem is defined as a G4-helix with backbone connectivity.
        Bulges are also allowed along each of the four strands.
  stem#1[#1] layers=3 INTRA-molecular parallel
   1 syn=---- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
   2 syn=.--- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
   3 syn=---- WC-->Major                                  nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
    strand#1  +1 DNA syn=-.- nts=3 GGG A.DG1,A.DG2,A.DG3
    strand#2  +1 DNA syn=--- nts=3 GGG A.DG5,A.DG6,A.DG7
    strand#3  +1 DNA syn=--- nts=3 GGG A.DG9,A.DG10,A.DG11
    strand#4  +1 DNA syn=--- nts=3 GGG A.DG14,A.DG15,A.DG16
    loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT4
    loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT8
    loop#3 type=propeller strands=[#3,#4] nts=2 TT A.DT12,A.DT13
  stem#2[#1] layers=3 INTRA-molecular parallel
   1 syn=---- WC-->Major area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG5,C.DG9,C.DG14
   2 syn=---- WC-->Major area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG6,C.DG10,C.DG15
   3 syn=---- WC-->Major                                  nts=4 GGGG C.DG3,C.DG7,C.DG11,C.DG16
    strand#1  +1 DNA syn=--- nts=3 GGG C.DG1,C.DG2,C.DG3
    strand#2  +1 DNA syn=--- nts=3 GGG C.DG5,C.DG6,C.DG7
    strand#3  +1 DNA syn=--- nts=3 GGG C.DG9,C.DG10,C.DG11
    strand#4  +1 DNA syn=--- nts=3 GGG C.DG14,C.DG15,C.DG16
    loop#1 type=propeller strands=[#1,#2] nts=1 T C.DT4
    loop#2 type=propeller strands=[#2,#3] nts=1 T C.DT8
    loop#3 type=propeller strands=[#3,#4] nts=2 TT C.DT12,C.DT13



Detection of multiplets in DSSR

In addition to base pairs, DSSR also automatically detects higher-order base associations. They are generally termed multiplets, consisting of three or more co-planar bases arranged together via H-bonding interactions. The simplest multiplets are base triplets. For example, the yeast phenylalanine tRNA (PDB entry 1ehz) contains four base triplets, as shown below:

Four base triplets in tRNA 1ehz detected by DSSR

The well-known (types I and II) A-minor motifs are also multiplets of three bases. Similarly, the G-tetrad where four guanine bases associate via Hoogsteen H-bonding to form a square planar structure is also a special multiplet. The G-tetrad is the building block of the G-quadruplexes. As of v1.7.0-2017oct19, DSSR can automatically identify and characterize G-quadruplexes (see the DSSR User Manual).

The DSSR algorithm for detecting multiplets is generally applicable. It can identify as many co-planar bases as available in a given structure. Shown below is an octad, consisting of a G-tetrad in the middle and four Us on the peripheries. The octad is derived from PDB entry 1j8g using atomic coordinates from biological assembly 1 and 3.

Octad detected by DSSR in PDB entry 1j8g



DSSR-Jmol featured in cover image of NAR'17 web-server issue

The DSSR-Jmol paper, titled "DSSR-enhanced visualization of nucleic acid structures in Jmol", has been officially published in the 2017 web-server issue of Nucleic Acids Research (NAR). Notably, the work has been featured in the cover image, as shown below:

Cover image featuring the DSSR-Jmol paper
Caption: 3D interactive visualization of selected RNA structural features enabled by the DSSR-Jmol integration (http://jmol.x3dna.org). Clockwise from upper left: Structure of the xpt-pbuX guanine riboswitch in complex with hypoxanthine (PDB id: 4fe5) in ‘base blocks’ representation. The three-way junction loop encompassing the metabolite (in space-filling representation) is color-coded by base identity: A, red; C, yellow; G, green; U, cyan. The loop-loop interaction (a kissing-loop motif) at the top is highlighted in red (upper left corner). Structure of the Thermus thermophilus 30S ribosomal subunit in complex with antibiotics (PDB id: 1fjg) in step diagram. The 16S ribosomal RNA is color-coded in spectrum with the 5′-end in blue and the 3′-end in red (upper middle). Structure of the classic L-shaped yeast phenylalanine tRNA (PDB id: 1ehz) in step diagram, with the three hairpin loops highlighted in red and the [2,1,5,0] four-way junction loop in blue (upper right corner). Structure of the Pistol self-cleaving ribozyme (PDB id: 5ktj), showcasing (in red) the horizontal helix in space-filling representation. The helix is composed of six short stems stabilized via coaxial stacking interactions (bottom).

The DSSR-Jmol integration bridges the DSSR command-line analyzing tool and the Jmol molecular viewer seamlessly together via the standard JSON interface. Now users can select DSSR-derived RNA structural features (such as base pairs, double helices, various loops, etc.) and visualize them in novel representations in Jmol interactively. Moreover, fine-grained characteristics of these features can be queried via the Jmol SQL for DSSR. The DSSR-Jmol integration fills a gap in RNA structural bioinformatics, and brings RNA visualization to an entirely new level. The web interface (http://jmol.x3dna.org) is fully functional and easy to use, serving a huge user base of researchers, educators, and students alike.

Featured as the cover image of the 2017 NAR web-server issue, DSSR's publicity would surely increase through the DSSR-Jmol integration. Additionally, I've written a new post (on the 3DNA Forum) that provides the scripts and datafiles used to create the cover image.



DSSR-Jmol paper in NAR

I am pleased to announce the (advance online, May 3, 2017) publication of a new paper titled "DSSR-enhanced visualization of nucleic acid structures in Jmol" in Nucleic Acids Research (NAR). Co-authored by Robert Hanson (Jmol) and me (DSSR), the article will appear in the July 2017 web-server issue of NAR. Here are the key links related to the paper:

The DSSR-Jmol integration project was initiated in October 2013 when I approached Bob at a meeting organized by RCSB PDB at Rutgers. Thereafter, we met only once in July 2014 in Paris. Over the years, we have mostly communicated via email, occasionally facilitated by Skype. Our work bridges the DSSR command-line analyzing tool and the Jmol molecular viewer together via a simple JSON interface and a powerful query language. Users can now select DSSR-derived RNA structural features (such as base pairs, double helices, and various loops) as easily as they can select protein alpha-helices and beta-strands. Moreover, fine-grained characteristics of these features can be queried via Jmol SQL for DSSR (see examples below). Notably, the novel representation styles (step diagram and base blocks) and coloring schemes bring RNA visualization to an entirely new level (see Figure 3 of the paper).

load =1ehz/dssr   # load yeast phenylalanine tRNA to Jmol with DSSR annotation
SELECT hairpins   # select the three hairpin loops
SELECT junctions  # select the four-way junction loop
select within(dssr, "nts WHERE is_modified")  # select modified nucleotides (14 total)
SELECT within(dssr, "pairs WHERE name != 'WC'")  # select non-Watson-Crick pairs
SELECT within(dssr, "pairs WHERE name = 'WC' OR name = 'Wobble'")  # select canonical pairs
Select within(dssr, "pairs WHERE name != 'WC' AND name != 'Wobble'")  # select non-canonical pairs
SELECT within(dssr, "pairs WHERE LW = 'tSW'")  # select pairs of type tSW per Leontis-Westhof

The DSSR-Jmol integration fills a gap in RNA structural bioinformatics, serving a huge user base of researchers, educators, and students alike. Its functionality is freely accessible either via the Jmol application, or the JSmol-based website (http://jmol.x3dna.org). By adhering to web standards, the website is fully functional in all modern browsers on various computer/operating systems (including handheld devices, such as tablets and smart phones). The web interface is simple and intuitive, and new users can get started easily. It also allows power users to take full advantage of Jmol scripting via a command-line console.

This work also provides an example for integrating DSSR-derived features into other molecular graphics programs or bioinformatics pipelines involving nucleic acid structures. By design, DSSR is a stand-alone, command-line program written in ANSI C. The binary executables are only ~1MB in size, and self-contained. With zero dependencies, no setup or configuration, it is trivial to get DSSR up and running. DSSR uncovers a wide range of RNA/DNA structural features in a consistent, easily accessible framework. It possesses a much richer set of functionalities for nucleic acid structural analysis (see the DSSR User Manual) than any other existing tools I am aware of. Moreover, the program is efficient and robust, making it an ideal component to be integrated into other pipelines, especially via the standard and structured JSON interface.

Collaborating with Bob has been a truly exciting experience. The NAR-web publication represents a gratifying intermediate result along an on-going journey. Hopefully, others (may be some of you) can join us in pushing forward the field of RNA structural bioinformatics.



« Older ·

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu