It gives me great pleasure to announce that the 3DNA/DSSR project is now funded by the NIH R24GM153869 grant, titled "X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids". I am deeply grateful for the opportunity to continue working on a project that has basically defined who I am. It was a tough time during the funding gap over the past few years. Nevertheless, I have experienced and learned a lot, and witnessed miracles enabled by enthusiastic users.
Since late 2020 when I lost my R01 grant, DSSR has been licensed by the Columbia Technology Ventures (CTV). I appreciate the numerous users (including big pharma) who purchased a DSSR Pro License or a DSSR Basic paid License. Thanks to the NIH R24GM153869 grant, we are pleased to provide DSSR Basic free of charge to the academic community. Academic Users may submit a license request for DSSR Basic or DSSR Pro by clicking "Express Licensing" on the CTV landing page. Commercial users may inquire about pricing and licensing terms by emailing techtransfer@columbia.edu, copying xiangjun@x3dna.org.
The current version of DSSR is v2.4.5-2024sep24 which contains miscellaneous bug fixes (e.g., chain id with > 4 chars) and minor improvements. This release synchronizes with the new R24 funding, which will bring the project to the next level. All existing users are encouraged to upgrade their installation.
Lots of exciting things will happen for the project. The first thing is to make DSSR freely accessible to the academic community. In the past couple of weeks, CTV have already issued quite a few DSSR Basic Academic licenses to users from all over the world. So the demand is high, and it will become stronger as more academic users become aware of DSSR. I'm closely monitoring the 3DNA Forum, and is always ready to answer users questions.
I am committed to making DSSR a brand that stands for quality and value. By virtue of its unmatched functionality, usability, and support, DSSR saves users a substantial amount of time and effort when compared to other options. My track record throughout the years has unambiguously demonstrated my dedication to this solid software product.
DSSR Basic contains all features described in the three DSSR-related papers, and includes the originally separate SNAP program (still unpublished) for analyzing DNA/RNA-protein complexes. The Pro version integrates the classic 3DNA functionality, plus advanced modeling routines, with email/Zoom/phone support.
An article titled Simulations and electrostatic analysis suggest an active role for DNA conformational changes during genome packaging by bacteriophages has recently been published in bioRxiv. I was honored to have the opportunity collaborating with fellow researchers from University of Pennsylvania and Thomas Jefferson University in this significant piece of work.
Here is the abstract. Please download the PDF version to know more.
Motors that move DNA, or that move along DNA, play essential roles in DNA replication, transcription, recombination, and chromosome segregation. The mechanisms by which these DNA translocases operate remain largely unknown. Some double-stranded DNA (dsDNA) viruses use an ATP-dependent motor to drive DNA into preformed capsids. These include several human pathogens, as well as dsDNA bacteriophages (viruses that infect bacteria). We previously proposed that DNA is not a passive substrate of bacteriophage packaging motors but is, instead, an active component of the machinery. Computational studies on dsDNA in the channel of viral portal proteins reported here reveal DNA conformational changes consistent with that hypothesis. dsDNA becomes longer (“stretched”) in regions of high negative electrostatic potential, and shorter (“scrunched”) in regions of high positive potential. These results suggest a mechanism that couples the energy released by ATP hydrolysis to DNA translocation: The chemical cycle of ATP binding, hydrolysis and product release drives a cycle of protein conformational changes. This produces changes in the electrostatic potential in the channel through the portal, and these drive cyclic changes in the length of dsDNA. The DNA motions are captured by a coordinated protein-DNA grip-and-release cycle to produce DNA translocation. In short, the ATPase, portal and dsDNA work synergistically to promote genome packaging.
An abasic site is a location in DNA or RNA where a purine or pyrimidine base is missing. It is also termed an AP site (i.e., apurinic/apyrimidinic site) in biochemistry and molecular genetics. The abasic site can be formed either spontaneously (e.g., depurination) or due to DNA damage (occurring as intermediates in base excision repair). According to Wikipedia, “It has been estimated that under physiological conditions 10,000 apurinic sites and 500 apyrimidinic may be generated in a cell daily.”
In DSSR and 3DNA v2.x, nucleotides are recognized using standard atom names and base planarity. Thus, abasic sites are not taken as nucleotides (by default), simply because they do not have base atoms. DSSR introduced the --abasic
option to account for abasic sites, a feature useful for detecting loops with backbone connectivity.
For example, by default, DSSR identifies one internal loop (no. 1 in the list below) in PDB entry 1l2c. With the --abasic
option, two internal loops (including the one with the abasic site C.HPD18, no. 2) are detected.
List of 2 internal loops
1 symmetric internal loop: nts=6; [1,1]; linked by [#-1,#1]
summary: [2] 1 1 [B.1 C.24 B.3 C.22] 1 4
nts=6 GTATAC B.DG1,B.DT2,B.DA3,C.DT22,C.DA23,C.DC24
nts=1 T B.DT2
nts=1 A C.DA23
2 symmetric internal loop: nts=6; [1,1]; linked by [#1,#2]
summary: [2] 1 1 [B.6 C.19 B.8 C.17] 4 5
nts=6 CTTA?G B.DC6,B.DT7,B.DT8,C.DA17,C.HPD18,C.DG19
nts=1 T B.DT7
nts=1 ? C.HPD18
Note that C.HPD18 in 1l2c is a non-standard residue, as shown in the HETATM records below. Since the identity of C.HPD18 cannot be deduced from the atomic records, its one-letter code is designated as ?
.
HETATM 346 P HPD C 18 -14.637 52.299 29.949 1.00 49.12 P
HETATM 347 O5' HPD C 18 -14.658 52.173 28.359 1.00 48.28 O
HETATM 348 O1P HPD C 18 -15.167 51.040 30.537 1.00 49.35 O
HETATM 349 O2P HPD C 18 -13.303 52.798 30.369 1.00 46.43 O
HETATM 350 C5' HPD C 18 -15.703 51.469 27.687 1.00 45.70 C
HETATM 351 O4' HPD C 18 -16.364 50.501 25.561 1.00 44.15 O
HETATM 352 O3' HPD C 18 -13.990 51.738 24.335 1.00 45.75 O
HETATM 353 C1' HPD C 18 -16.105 54.187 25.684 1.00 52.47 C
HETATM 354 O1' HPD C 18 -17.309 54.085 26.496 1.00 56.16 O
HETATM 355 C3' HPD C 18 -14.756 52.250 25.426 1.00 46.23 C
HETATM 356 C4' HPD C 18 -15.263 51.093 26.291 1.00 45.72 C
HETATM 357 C2' HPD C 18 -16.030 52.889 24.898 1.00 49.05 C
In contrast, the R.U-8 in PDB entry 4ifd is a standard U, and is properly labeled by DSSR.
ATOM 26418 P U R -8 139.362 21.962 129.430 1.00208.29 P
ATOM 26419 OP1 U R -8 140.062 20.821 130.074 1.00207.30 O
ATOM 26420 OP2 U R -8 140.113 23.208 129.129 1.00208.44 O1+
ATOM 26421 O5' U R -8 138.712 21.439 128.071 1.00157.60 O
ATOM 26422 C5' U R -8 139.507 20.790 127.087 1.00155.47 C
ATOM 26423 C4' U R -8 138.843 20.804 125.731 1.00152.27 C
ATOM 26424 O4' U R -8 138.538 22.172 125.352 1.00149.29 O
ATOM 26425 C3' U R -8 139.677 20.275 124.572 1.00152.70 C
ATOM 26426 O3' U R -8 139.670 18.859 124.478 1.00155.04 O
ATOM 26427 C2' U R -8 139.053 20.969 123.369 1.00150.26 C
ATOM 26428 O2' U R -8 137.849 20.322 122.984 1.00146.83 O
ATOM 26429 C1' U R -8 138.700 22.334 123.958 1.00147.35 C
This is yet another little detail that DSSR takes care of. It is the close consideration to many such subtle points that makes DSSR different. Overall, DSSR represents my view of what a scientific software program could be (or should be).
DSSR deliberately makes a distinction between ‘stem’ and ‘helix’, as shown below:
a helix is defined by base-stacking interactions, regardless of bp type and backbone connectivity, and may contain more than one stem.
a stem is defined as a helix consisting of only canonical WC/wobble pairs, with a continuous backbone.
By definition, a helix or stem consists of at least two base-pairs with stacking interactions. Helix is more inclusive and may contain more than one stem. This differentiation between ‘helix’ and ‘stem’ naturally leads to the definition of coaxial stacking, another widely used yet vaguely specified concept.
Again, the abstract notion can be best illustrated with a concrete example. In the classic yeast phenylalanine tRNA (PDB id: 1ehz), DSSR identifies that two stems [the acceptor stem (right) and the T stem (left)] are coaxially stacked within one double helix. See the figure below.
In the above schematics cartoon-block representation, each Watson-Crick base pair is rendered as a single, long rectangular block. Base identities of the G–U wobble, and the two non-canonical pairs (left terminal) are illustrated separately, with a larger block size for purines (G and A), and a smaller size for pyrimidines (C, U, and T).
I picked up ‘stem’ as a more specialized duplex because it is widely used in the RNA stem-loop structure, and in describing the four ‘paired regions’ of the classic tRNA cloverleaf secondary structure. On the other hand, ‘helix’ is (to me at least) a more general term, and thus more inclusive. It is worth noting that other terms such as ‘arm’, ‘paired region’, or ‘helix’ etc. have also been used interchangeably in the literature to refer what DSSR designated as ‘stem’.
As a side note, the basic algorithm for identifying helixes/stems in DSSR is also applicable for detecting G-quadruplexes. The same idea of ‘helix’ or ‘stem’ also applies here (see figure below for PDB entry: 5dww). Indeed, as of v1.7.0-2017oct19, DSSR contains a new section for the identification and characterization of G-quadruplexes.
DSSR is “an integrated software tool for dissecting the spatial structure of RNA”. It excels in consolidating the diverse pieces together via a coherent framework, readily accessible in a solid software product. DSSR may well serve as a cornerstone in RNA structural bioinformatics and would facilitate communications in the broad areas related to nucleic acids structures.
Among the rich set of RNA structural features derived by DSSR, the section of “List of stacks” apparently has not drawn much attention from the user community. As noted in the DSSR output,
a stack is an ordered list of nucleotides assembled together via base-stacking interactions, regardless of backbone connectivity. Stacking interactions within a stem are not included.
As always, the concept is best illustrated via concrete examples. Shown below are two such base stacks automatically identified by DSSR in the PDB entry 4p5j, the crystal structure of the tRNA-mimic from Turnip Yellow Mosaic Virus (TYMV) which was analyzed in detail in the 2015 DSSR NAR paper
|
|
This critical linchpin in the tRNA mimic is stabilized by extensive base-stacking interactions. |
The intricate interactions between the D- and T-loops in the tRNA mimic include a five-base stack. |
The DSSR-introduced schematic block representation makes the base-stacking interactions immediately obvious. One can even easily discern the identity of bases, given the color-coding convention: A-red; C-yellow; G-green; T-blue; U-cyan. For example, the five stacked bases involved in the interaction of the D- and T-loops are: CAAAC
Moreover, longer and more complicate base-stacks can also be auto-detected by DSSR, as shown below for the asymmetric unit of PDB entry 1j8g, the crystal structure of an RNA quadruplex r(UGGGGU)4 at 0.61 Å resolution. Here DSSR identifies two 10-base stacks, each of UGGGGGGGGU
(UG8U).
The corresponding DSSR output is as below:
List of 2 stacks
Note: a stack is an ordered list of nucleotides assembled together via
base-stacking interactions, regardless of backbone connectivity.
Stacking interactions within a stem are *not* included.
1 nts=10 UGGGGGGGGU A.U6,A.G5,A.G4,A.G3,A.G2,C.G22,C.G23,C.G24,C.G25,C.U26
2 nts=10 UGGGGGGGGU B.U16,B.G15,B.G14,B.G13,B.G12,D.G32,D.G33,D.G34,D.G35,D.U36
G-quadruplexes (hereafter referred to as G4) are a common type of higher-order DNA and RNA structures formed from G-rich sequences. The building block of G4 is a tetrad of guanines in a cyclic planar alignment, with four G+G pairs (cW+M type, see Figure below). A G4 structure is formed by stacking of G-tetrads and stabilized by cations at the center of the layers. G4 structures are polymorphic: the four strands can be parallel or anti-parallel, and loops connecting them can be of different types: lateral (edgewise), diagonal, or propeller (double-chain reversal). Moreover, G4 structures can be intra- or intermolecular, and even contain bulges.
From its initial releases, DSSR was able to detect G-tetrads, and listed them in a separate section. As of v1.7.0-2017oct19, DSSR has integrated existing features and created a new module to automatically identify and fully characterize G4 structures. The underlying algorithms have been further refined in v1.7.1-2017nov01, which was tested against all nucleic-acid-containing structures in the PDB.
Characterizations of three representative G4 examples (PDB entries 2m4p
, 2hy9
, and 5hix
) are shown below, illustrating salient features (e.g., different types of loops) automatically extracted by DSSR.
2m9p
stem#1[#1] layers=3 INTRA-molecular parallel bulged-strands=1
1 syn=---- WC-->Major area=8.38 rise=3.64 twist=33.34 nts=4 GGGG A.DG3,A.DG8,A.DG12,A.DG16
2 syn=---- WC-->Major area=10.73 rise=3.23 twist=32.42 nts=4 GGGG A.DG5,A.DG9,A.DG13,A.DG17
3 syn=---- WC-->Major nts=4 GGGG A.DG6,A.DG10,A.DG14,A.DG18
strand#1* +1 DNA syn=--- nts=3 GGG A.DG3,A.DG5,A.DG6 bulged-nts=1 T A.DT4
strand#2 +1 DNA syn=--- nts=3 GGG A.DG8,A.DG9,A.DG10
strand#3 +1 DNA syn=--- nts=3 GGG A.DG12,A.DG13,A.DG14
strand#4 +1 DNA syn=--- nts=3 GGG A.DG16,A.DG17,A.DG18
loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT7
loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT11
loop#3 type=propeller strands=[#3,#4] nts=1 T A.DT15
2hy9
stem#1[#1] layers=3 INTRA-molecular anti-parallel
1 syn=ss-s Major-->WC area=13.69 rise=3.14 twist=19.08 nts=4 GGGG 1.DG4,1.DG10,1.DG18,1.DG22
2 syn=--s- WC-->Major area=13.40 rise=3.05 twist=28.05 nts=4 GGGG 1.DG5,1.DG11,1.DG17,1.DG23
3 syn=--s- WC-->Major nts=4 GGGG 1.DG6,1.DG12,1.DG16,1.DG24
strand#1 +1 DNA syn=s-- nts=3 GGG 1.DG4,1.DG5,1.DG6
strand#2 +1 DNA syn=s-- nts=3 GGG 1.DG10,1.DG11,1.DG12
strand#3 -1 DNA syn=-ss nts=3 GGG 1.DG18,1.DG17,1.DG16
strand#4 +1 DNA syn=s-- nts=3 GGG 1.DG22,1.DG23,1.DG24
loop#1 type=propeller strands=[#1,#2] nts=3 TTA 1.DT7,1.DT8,1.DA9
loop#2 type=lateral strands=[#2,#3] nts=3 TTA 1.DT13,1.DT14,1.DA15
loop#3 type=lateral strands=[#3,#4] nts=3 TTA 1.DT19,1.DT20,1.DA21
5hix
stem#1[#1] layers=4 inter-molecular anti-parallel
1 syn=s--s Major-->WC area=12.93 rise=3.64 twist=16.82 nts=4 GGGG A.DG1,B.DG4,A.DG12,B.DG9
2 syn=-ss- WC-->Major area=18.96 rise=3.71 twist=35.87 nts=4 GGGG A.DG2,B.DG3,A.DG11,B.DG10
3 syn=s--s Major-->WC area=15.16 rise=3.64 twist=18.64 nts=4 GGGG A.DG3,B.DG2,A.DG10,B.DG11
4 syn=-ss- WC-->Major nts=4 GGGG A.DG4,B.DG1,A.DG9,B.DG12
strand#1 +1 DNA syn=s-s- nts=4 GGGG A.DG1,A.DG2,A.DG3,A.DG4
strand#2 -1 DNA syn=-s-s nts=4 GGGG B.DG4,B.DG3,B.DG2,B.DG1
strand#3 -1 DNA syn=-s-s nts=4 GGGG A.DG12,A.DG11,A.DG10,A.DG9
strand#4 +1 DNA syn=s-s- nts=4 GGGG B.DG9,B.DG10,B.DG11,B.DG12
loop#1 type=diagonal strands=[#1,#3] nts=4 TTTT A.DT5,A.DT6,A.DT7,A.DT8
loop#2 type=diagonal strands=[#2,#4] nts=4 TTTT B.DT5,B.DT6,B.DT7,B.DT8
The molecular structure of the G-tetrad and two G4 structures in schematics representation. Upper left: atomic structure of G-tetrad, the building block of G4 structures. Here the green ‘square’ is created by connecting the C1’ atoms of the guanosines, and it is used to simplify the representation of G4 structures of PDB entries 2m4p
(lower left) and 5dww
(right). Note that the asymmetric unit of 5dww contains four biological units, which are coaxially stacked in two columns.
The DSSR output for PDB entry 5dww
is listed below, showing the differences of a G4-helix vs. a G4-stem.
5dww
Note: a G4-helix is defined by stacking interactions of G4-tetrads, regardless
of backbone connectivity, and may contain more than one G4-stem.
helix#1[#2] layers=6 inter-molecular stems=[#1,#2]
1 syn=---- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
2 syn=.--- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
3 syn=---- WC-->Major area=28.36 rise=3.31 twist=-9.78 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
4 syn=---- Major-->WC area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG14,C.DG9,C.DG5
5 syn=---- Major-->WC area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG15,C.DG10,C.DG6
6 syn=---- Major-->WC nts=4 GGGG C.DG3,C.DG16,C.DG11,C.DG7
strand#1 DNA syn=-.---- nts=6 GGGGGG A.DG3,A.DG2,A.DG1,C.DG1,C.DG2,C.DG3
strand#2 DNA syn=------ nts=6 GGGGGG A.DG7,A.DG6,A.DG5,C.DG14,C.DG15,C.DG16
strand#3 DNA syn=------ nts=6 GGGGGG A.DG11,A.DG10,A.DG9,C.DG9,C.DG10,C.DG11
strand#4 DNA syn=------ nts=6 GGGGGG A.DG16,A.DG15,A.DG14,C.DG5,C.DG6,C.DG7
......
List of 4 G4-stems
Note: a G4-stem is defined as a G4-helix with backbone connectivity.
Bulges are also allowed along each of the four strands.
stem#1[#1] layers=3 INTRA-molecular parallel
1 syn=---- WC-->Major area=11.63 rise=3.65 twist=31.14 nts=4 GGGG A.DG1,A.DG5,A.DG9,A.DG14
2 syn=.--- WC-->Major area=10.64 rise=3.54 twist=28.10 nts=4 GGGG A.DG2,A.DG6,A.DG10,A.DG15
3 syn=---- WC-->Major nts=4 GGGG A.DG3,A.DG7,A.DG11,A.DG16
strand#1 +1 DNA syn=-.- nts=3 GGG A.DG1,A.DG2,A.DG3
strand#2 +1 DNA syn=--- nts=3 GGG A.DG5,A.DG6,A.DG7
strand#3 +1 DNA syn=--- nts=3 GGG A.DG9,A.DG10,A.DG11
strand#4 +1 DNA syn=--- nts=3 GGG A.DG14,A.DG15,A.DG16
loop#1 type=propeller strands=[#1,#2] nts=1 T A.DT4
loop#2 type=propeller strands=[#2,#3] nts=1 T A.DT8
loop#3 type=propeller strands=[#3,#4] nts=2 TT A.DT12,A.DT13
--------------------------------------------------------------------------
stem#2[#1] layers=3 INTRA-molecular parallel
1 syn=---- WC-->Major area=11.60 rise=3.75 twist=29.43 nts=4 GGGG C.DG1,C.DG5,C.DG9,C.DG14
2 syn=---- WC-->Major area=10.35 rise=3.49 twist=28.74 nts=4 GGGG C.DG2,C.DG6,C.DG10,C.DG15
3 syn=---- WC-->Major nts=4 GGGG C.DG3,C.DG7,C.DG11,C.DG16
strand#1 +1 DNA syn=--- nts=3 GGG C.DG1,C.DG2,C.DG3
strand#2 +1 DNA syn=--- nts=3 GGG C.DG5,C.DG6,C.DG7
strand#3 +1 DNA syn=--- nts=3 GGG C.DG9,C.DG10,C.DG11
strand#4 +1 DNA syn=--- nts=3 GGG C.DG14,C.DG15,C.DG16
loop#1 type=propeller strands=[#1,#2] nts=1 T C.DT4
loop#2 type=propeller strands=[#2,#3] nts=1 T C.DT8
loop#3 type=propeller strands=[#3,#4] nts=2 TT C.DT12,C.DT13
In addition to base pairs, DSSR also automatically detects higher-order base associations. They are generally termed multiplets, consisting of three or more co-planar bases arranged together via H-bonding interactions. The simplest multiplets are base triplets. For example, the yeast phenylalanine tRNA (PDB entry 1ehz) contains four base triplets, as shown below:
The well-known (types I and II) A-minor motifs are also multiplets of three bases. Similarly, the G-tetrad where four guanine bases associate via Hoogsteen H-bonding to form a square planar structure is also a special multiplet. The G-tetrad is the building block of the G-quadruplexes. As of v1.7.0-2017oct19, DSSR can automatically identify and characterize G-quadruplexes (see the DSSR User Manual).
The DSSR algorithm for detecting multiplets is generally applicable. It can identify as many co-planar bases as available in a given structure. Shown below is an octad, consisting of a G-tetrad in the middle and four Us on the peripheries. The octad is derived from PDB entry 1j8g using atomic coordinates from biological assembly 1 and 3.
The DSSR-Jmol paper, titled "DSSR-enhanced visualization of nucleic acid structures in Jmol", has been officially published in the 2017 web-server issue of Nucleic Acids Research (NAR). Notably, the work has been featured in the cover image, as shown below:
Caption: 3D interactive visualization of selected RNA structural features enabled by the DSSR-Jmol integration (http://jmol.x3dna.org). Clockwise from upper left: Structure of the xpt-pbuX guanine riboswitch in complex with hypoxanthine (PDB id: 4fe5) in ‘base blocks’ representation. The three-way junction loop encompassing the metabolite (in space-filling representation) is color-coded by base identity: A, red; C, yellow; G, green; U, cyan. The loop-loop interaction (a kissing-loop motif) at the top is highlighted in red (upper left corner). Structure of the Thermus thermophilus 30S ribosomal subunit in complex with antibiotics (PDB id: 1fjg) in step diagram. The 16S ribosomal RNA is color-coded in spectrum with the 5′-end in blue and the 3′-end in red (upper middle). Structure of the classic L-shaped yeast phenylalanine tRNA (PDB id: 1ehz) in step diagram, with the three hairpin loops highlighted in red and the [2,1,5,0] four-way junction loop in blue (upper right corner). Structure of the Pistol self-cleaving ribozyme (PDB id: 5ktj), showcasing (in red) the horizontal helix in space-filling representation. The helix is composed of six short stems stabilized via coaxial stacking interactions (bottom).
The DSSR-Jmol integration bridges the DSSR command-line analyzing tool and the Jmol molecular viewer seamlessly together via the standard JSON interface. Now users can select DSSR-derived RNA structural features (such as base pairs, double helices, various loops, etc.) and visualize them in novel representations in Jmol interactively. Moreover, fine-grained characteristics of these features can be queried via the Jmol SQL for DSSR. The DSSR-Jmol integration fills a gap in RNA structural bioinformatics, and brings RNA visualization to an entirely new level. The web interface (http://jmol.x3dna.org) is fully functional and easy to use, serving a huge user base of researchers, educators, and students alike.
Featured as the cover image of the 2017 NAR web-server issue, DSSR's publicity would surely increase through the DSSR-Jmol integration. Additionally, I've written a new post (on the 3DNA Forum) that provides the scripts and datafiles used to create the cover image.
I am pleased to announce the (advance online, May 3, 2017) publication of a new paper titled "DSSR-enhanced visualization of nucleic acid structures in Jmol" in Nucleic Acids Research (NAR). Co-authored by Robert Hanson (Jmol) and me (DSSR), the article will appear in the July 2017 web-server issue of NAR. Here are the key links related to the paper:
The DSSR-Jmol integration project was initiated in October 2013 when I approached Bob at a meeting organized by RCSB PDB at Rutgers. Thereafter, we met only once in July 2014 in Paris. Over the years, we have mostly communicated via email, occasionally facilitated by Skype. Our work bridges the DSSR command-line analyzing tool and the Jmol molecular viewer together via a simple JSON interface and a powerful query language. Users can now select DSSR-derived RNA structural features (such as base pairs, double helices, and various loops) as easily as they can select protein alpha-helices and beta-strands. Moreover, fine-grained characteristics of these features can be queried via Jmol SQL for DSSR (see examples below). Notably, the novel representation styles (step diagram and base blocks) and coloring schemes bring RNA visualization to an entirely new level (see Figure 3 of the paper).
load =1ehz/dssr # load yeast phenylalanine tRNA to Jmol with DSSR annotation
SELECT hairpins # select the three hairpin loops
SELECT junctions # select the four-way junction loop
select within(dssr, "nts WHERE is_modified") # select modified nucleotides (14 total)
SELECT within(dssr, "pairs WHERE name != 'WC'") # select non-Watson-Crick pairs
SELECT within(dssr, "pairs WHERE name = 'WC' OR name = 'Wobble'") # select canonical pairs
Select within(dssr, "pairs WHERE name != 'WC' AND name != 'Wobble'") # select non-canonical pairs
SELECT within(dssr, "pairs WHERE LW = 'tSW'") # select pairs of type tSW per Leontis-Westhof
The DSSR-Jmol integration fills a gap in RNA structural bioinformatics, serving a huge user base of researchers, educators, and students alike. Its functionality is freely accessible either via the Jmol application, or the JSmol-based website (http://jmol.x3dna.org). By adhering to web standards, the website is fully functional in all modern browsers on various computer/operating systems (including handheld devices, such as tablets and smart phones). The web interface is simple and intuitive, and new users can get started easily. It also allows power users to take full advantage of Jmol scripting via a command-line console.
This work also provides an example for integrating DSSR-derived features into other molecular graphics programs or bioinformatics pipelines involving nucleic acid structures. By design, DSSR is a stand-alone, command-line program written in ANSI C. The binary executables are only ~1MB in size, and self-contained. With zero dependencies, no setup or configuration, it is trivial to get DSSR up and running. DSSR uncovers a wide range of RNA/DNA structural features in a consistent, easily accessible framework. It possesses a much richer set of functionalities for nucleic acid structural analysis (see the DSSR User Manual) than any other existing tools I am aware of. Moreover, the program is efficient and robust, making it an ideal component to be integrated into other pipelines, especially via the standard and structured JSON interface.
Collaborating with Bob has been a truly exciting experience. The NAR-web publication represents a gratifying intermediate result along an on-going journey. Hopefully, others (may be some of you) can join us in pushing forward the field of RNA structural bioinformatics.
Recently, while analyzing a representative set of RNA structures from the PDB, I came across three weird entries. They are documented below, primarily for my own record.
- 5els — “Structure of the KH domain of T-STAR in complex with AAAUAA RNA”. There are two alternative conformations for the six-nt
AAAUAA
RNA component, labeled A and B, respectively. Normally, the A/B alternative coordinates for each atom are put directly next to each other, and assigned the same chain id, as in 1msy for the phosphate group of G2669 on chain A. In 5els, however, the two alternative conformations (A/B) are separated into two chains: chain H for A, and chain I for B.
- 1vql — “The structure of the transition state analogue ‘DCSN’ bound to the large ribosomal subunit of Haloarcula marismortui”. The three-nt fragment DA179—C180—C181 on chain 4 is in the 3’—>5’ direction.
- 4r3i — “The crystal structure of m(6)A RNA with the YTHDC1 YTH domain”. The mmCIF file has a model number of 0, instead of 1 (as in other cases I am aware of).