3DNA is a versatile, integrated software system for the analysis, rebuilding, and visualization of three-dimensional nucleic-acid-containing structures. The software is applicable not only to DNA (as the name 3DNA may imply) but also to complicated RNA structures and DNA-protein complexes. In 3DNA, structural analysis and model rebuilding are two sides of the same coin: the description of the structure is rigorous and reversible, thus allowing for its exact reconstruction based on the derived parameters. 3DNA automatically detects all non-canonical base pairs, base triplets and higher-order associations (collectively termed multiplets), and coaxially stacked helices; provides a comprehensive collection of fiber models of regular DNA and RNA helices; generates highly effective schematic presentations that reveal key features of nucleic-acid structures; performs undisturbed base mutations, and have facilities for the analysis of molecular dynamics simulation trajectories.

DSSR is an integrated software tool for dissecting the spatial structure of RNA. It is a representative of what would become the brand new version 3 of 3DNA. DSSR consolidates, refines, and significantly extends the functionality of 3DNA v2.x for RNA structural analysis. Among other features, DSSR denotes base-pairs by common names (e.g., WC, reverse WC, Hoogsteen A+U, reverse Hoogsteen A—U, wobble G—U, sheared G—A), the Saenger classification of 28 H-bonding types, and the Leontis-Westhof nomenclature of 12 basic geometric classes; determines double-helical regions, differentiates stems from helices, and provides a pragmatic definition of coaxial stacking interactions; identifies hairpin loops, bulges, internal loops, and multi-branch (junction) loops; characterizes pseudoknots of arbitrary complexity; outputs RNA secondary structure in commonly used formats (including the dot-bracket notation and connectivity table); identifies A-minor interactions, splayed-apart dinucleotide conformations, base-capping interactions, ribose zippers, G quadruplexes, i-motifs, kissing loops, U-turns, and k-turns etc. By connecting dots in RNA structural bioinformatics, it makes many common tasks simple and advanced applications feasible. DSSR comes with a professional User Manual, and some of its features have been integrated into Jmol and PyMOL. Moreover, the DSSR-Jmol paper, titled DSSR-enhanced visualization of nucleic acid structures in Jmol, has been featured in the cover image of the 2017 Web-server issue of Nucleic Acids Research (NAR).

3DNA version 3 is under active development. The SNAP program has been created from scratch for an integrated characterization of the three-dimensional Structures of Nucleic Acid-Protein complexes. Sharing the same new codebase as DSSR, SNAP works for DNA-protein as well as RNA-protein interactions. Other 3DNA v2.x programs (e.g., fiber, rebuild etc) are gradually distilled into version 3, and a new atomic coordinates-based homology searching tool is also being developed. In the end, 3DNA version 3 will consist of a suite of fully independent (as DSSR and SNAP) yet closely related programs, serving as cornerstones of DNA/RNA structural bioinformatics.

All 3DNA-related questions are welcome and should be directed to the 3DNA Forum. For the benefit of the community at large, I do not provide private support of 3DNA via email or personal message. As a general rule, I strive to provide a prompt and concrete response to each and every question posted on the Forum.

More info · Seeing is believing · Cover image · What’s new · 3DNA Forum · Download


SNAP for the analysis of TF-DNA complexes containing 5-methyl-cytosines

The Kribelbauer et al. article, Towards a mechanistic understanding of DNA methylation readout by transcription factors has recently been published in the Journal of Molecular Biology (JMB). I am honored to be among the author list, and I learned a lot during the process. For the project, I added the --methyl-C (short-form: --5mc) option to SNAP (v1.0.6-2019sep30) for the automatic identification and annotation of DNA-transcription factor (TF) complexes containing 5-methyl-cytosine (5mC). The results are presented in a dynamic table, easily accessible at URL http://snap-5mc.x3dna.org, and summarized in Fig. 1 “Structural basis of how TFs recognize methylated DNA” (see below) of the JMB paper.

Fig. 1. Structural basis of how TFs recognize methylated DNA

Details on the SNAP-enabled curation of TF-DNA complexes containing 5mC from atomic coordinates in the Protein Data Bank (PDB) are available in a tutorial page at http://snap-5mc.x3dna.org/tutorial. In essence, the process can be easily understood via a concrete example with PDB id 4m9e, as shown below.

x3dna-snap --methyl-C --type=base -i=4m9e.pdb -o=4m9e-5mC.out

Here the --methyl-C option is specific for 5mC-DNA, and --type=base ensures that at least one DNA base atom is contacting protein amino acid(s). If these conditions are fulfilled, SNAP would produce two additional 5mC-related files, apart from the normal output file (i.e., 4m9e-5mC.out, as specified in the example):

  • 4m9e-5mC.txt — a simple text file with the following contents:
4m9e:B.5CM5: stacking-with-A.ARG443 is-WC-paired is-in-duplex [+]:GcG/cGC
4m9e:C.5CM5: other-contacts is-WC-paired is-in-duplex [-]:cGT/AcG
  • 4m9e-5mC.pdb — a corresponding PDB file, potentially multi-model, two as in this case. Moreover, the cluster of interacting residues (DNA nucleotides and protein amino acids) is oriented in the standard base reference frame of 5mC, allowing for easy comparison and direct overlap of multiple clusters.

In practice, SNAP needs to take care of many details for the automatic identification and annotation of 5mC-DNA-TF complexes directly from PDB entries. For example, 5mC in DNA is designated 5CM and the 5-methyl carbon atom is named C5A in the PDB (see the blogpost 5CM and 5MC, two forms of 5-methylcytosine in the PDB). Moreover, the --type=base option is employed to ensure that base atoms (regardless sugar-phosphate atoms) of 5mC are directly involved in interactions with amino acids.

It is also worth noting the combined use of DSSR for the generation of molecular images (rendered with PyMOL), as shown below. Here the DSSR options --block-file=fill-hbond (fill to fill base rings and hbond to draw hydrogen bonds) and --cartoon-block=sticks-label are used. The 3DNA DSSR/SNAP combo is a unique and powerful toolset for structural bioinformatics, as demonstrated in DNAproDB from the Rohs lab (see my blogpost SNAP and DSSR in DNAproDB). The JMB paper represents yet another example. I can only expect to see more combined DSSR/SNAP applications in the future.

DSSR-PyMOL image for PDB id: 4m9e



3DNA-DSSR is linked in the G4-society website

A couple of months ago, I came across the homepage of the newly-established G4 Society on G-quadruplexes (G4s). I checked the “Online tools” section and found a few links to G4 databases and sequence-based predication programs (e.g., G4Hunter). No tools, however, were listed for G4 identification and characterization from 3D atomic coordinates as those deposited in the Protein Data Bank (PDB). So I filled out the contact form and provided a brief description of 3DNA-DSSR, including a link to the website of G4s auto-curated with DSSR from the PDB.

I’ve recently visited the G4-society website again. I am pleased to see that 3DNA-DSSR is now listed under Online tools as a “program for detections/annotations of G4 from atomic coordinates in PDB or PDBx/mmCIF format”. The G4 module of 3DNA-DSSR has been created to streamline the identification and annotation of 3D structures of G4s. The collection of G4s in the PDB, available at G4.x3dna.org, is updated weekly. It represents a unique resource for the G4 community. Hopefully, its value will be more widely appreciated thanks to the link from the G4-society website.

At the G4-society homepage, I noticed the following two items in the “News” section (on December 13, 2019):

The Quadruplex Meeting Report

Meeting report: Seventh International Meeting on Quadruplex Nucleic Acids (Changchun, P.R. China, September 6e9, 2019) written by Jean-Louis Mergny. Reading through the report, I noticed the following:

Jonathan B. Chaires (U. Louisville, KY, USA) provided an overview and historical perspective of the quadruplex field in his inaugural lecture. As of August 2019, the quadruplex field gathers 8467 articles and 253,174 citations in the Science Citation Index. Over 200 G4 structures are available in the PDB.

I did not know how the survey of G4s in the PDB was performed. Based on my data, the PDB-G4 structures was already over 300 as of August 2019. As of December 11, 2019, the number of G4 structures in the PDB is 329. Importantly, the PDB-G4 website compiled using 3DNA-DSSR contains not only citation information but also detailed annotations and schematic images not available elsewhere. Here are a few recent examples:

  • PDB id: 6ge1 — “Unraveling the structural basis for the exceptional stability of RNA G-quadruplexes capped by a uridine tetrad at the 3’ terminus.” by Andralojc et al. in RNA (2019).
  • PDB id: 6gh0 — “Two-quartet kit* G-quadruplex is formed via double-stranded pre-folded structure.” by Kotar et al. in Nucleic Acids Res. (2019).
  • PDB id: 6e8u — “Structure and functional reselection of the Mango-III fluorogenic RNA aptamer.” by Trachman et al. in Nat. Chem. Biol. (2019).
  • PDB id: 6ac7 —“Structure of a (3+1) hybrid G-quadruplex in the PARP1 promoter.” by Sengar et al. in Nucleic Acids Res. (2019).

The Important Paper

A guide to computational methods for G-quadruplex prediction by Emilia Puig Lombardi and Arturo Londoňo-Vallejo in Nucleic Acids Res. (2019), which presents an updated overview of G4 prediction algorithms. I am impressed by the large number of sequence-based G4 prediction software tools, including the most recent G4-iM Grinder. Nevertheless, as noted by the authors in the concluding remarks, “All computational G-quadruplex prediction approaches have their drawbacks and limitations despite the recent advances in the field and the introduction of validation steps based on experimental data.”

The G4 module in 3DNA-DSSR belongs to a completely different category of software tool. It does not ‘predict’ G4 propensity/stability from a base sequence, but identify and annotate G4s in a 3D atomic coordinate file. It complements sequence-based predicting tools by gaining insights into the 3D G4 structures and refining folding rules to improve performance of prediction tools. Based on my knowledge, the 3D G4 structures contains features that are not captured by any of the sequence-based prediction tools.

While reading the review article, I found Fig. 1 informative (see below). The right side of Fig. 1A shows a “cartoon representation of the Oxytricha telomeric DNA G4 crystal structure (PDB accession 1JPQ (112))” using PyMOL. In comparison, the cartoon-block image auto-generated via 3DNA-DSSR and PyMOL for PDB id: 1jpq is shown at the bottom. The DSSR-PyMOL version is obviously different, presumably simpler and more informative, from that illustrated in Fig. 1A.

Figure 1. From guanines to G-quadruplexes

3DNA-DSSR cartoon-block schematic for PDB entry 1jpq, rendered with PyMOL



3DNA/blocview-PyMOL images in covers of the RNA journal

I recently performed a quick survey of the cover images of the RNA journal in 2019. I was pleased to find that 9 out of the 12 cover images were provided by the Nucleic Acid Database where 3DNA/blockview and PyMOL were employed, as noted below:

The RNA backbone is displayed as a red ribbon; bases are shown as blocks with NDB coloring: A—red, C—yellow, G—green, U—cyan; geneticin ligands are shown in spacefill with element colors: C—white, N—blue, O—red. The image was generated using 3DNA/blocview and PyMol software.

Details of the 9 cover images are listed below:

  1. January 2019 Rhodobacter sphaeroides Argonaute with guide RNA/target DNA duplex containing noncanonical A-G pair (PDB code: 6d9k)
  2. April 2019 Group I self-splicing intron P4-P6 domain mutant U131A (PDB code: 6d8l)
  3. May 2019 Crystal structure of T. thermophilus 50S ribosomal protein L1 in complex with helices H76, H77, and H78 of 23S RNA (PDB code: 5npm)
  4. June 2019 Crystal structure of ykoY-mntP riboswitch chimera bound to cadmium (PDB code: 6cc3)
  5. July 2019 G96A mutant of the PRPP riboswitch from T. mathranii bound to ppGpp (PDB code: 6ck4)
  6. August 2019 Crystal structure of the metY SAM V riboswitch (PDB code: 6fz0)
  7. October 2019 Crystal structure of protease factor Xa bound to RNA aptamer 11F7t and rivaroxaban (PDB code: 5vof)
  8. November 2019 Drosophila melanogaster nucleosome remodeling complex (PDB code: 6f4g)
  9. December 2019 Crystal structure of the Homo Sapiens cytoplasmic ribosomal decoding site in complex with Geneticin (PDB code: 5xz1)

Here is the composite figure of the 9 cover images.

3DNA/blockview-PyMOL cartoon-block schematics in the covers of the RNA journal in 2019

See also:



Web API to 3DNA

I’ve created a web API to 3DNA programs, which currently includes x3dna-dssr, x3dna-snap, and x3dna-fiber. The overall help message is available via http://api.x3dna.org. Individually, each program is accessed as below.

Help message on x3dna-dssr (DSSR): http://api.x3dna.org/dssr/help

Usage with 'http' (HTTPie):
    http -f http://api.x3dna.org/dssr [options] url=|model@
    http http://api.x3dna.org/dssr/help   -- display this help message

    json=true-or-FALSE(default)    [e.g., json=true # JSON output]
    pair=true-or-FALSE(default)    [e.g., pair=1    # base-pair only]
    hbond=true-or-FALSE(default)   [e.g., hbond=t   # H-bonding info]
    more=true-or-FALSE(default)    [e.g., more=y    # further details]

Required parameter:
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1ehz.pdb.gz]
    model@coordinate-file      [e.g., model@1ehz.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.

    http -f http://api.x3dna.org/dssr url=https://files.rcsb.org/download/1ehz.pdb.gz
    http -f http://api.x3dna.org/dssr model@1ehz.cif pair=1
    # with 'curl'
    curl http://api.x3dna.org/dssr -F 'url=https://files.rcsb.org/download/1ehz.pdb.gz'
    curl http://api.x3dna.org/dssr -F 'model=@1msy.pdb' -F 'pair=1'

    The web API has an upper limit on coordinate file size (gzipped): < 6 MB

Help message on x3dna-snap (SNAP): http://api.x3dna.org/snap/help

Usage with 'http' (HTTPie):
    http -f http://api.x3dna.org/snap [options] url=|model@
    http http://api.x3dna.org/snap/help   -- display this help message

    json=true-or-FALSE(default)    [e.g., json=true # JSON output]
    hbond=true-or-FALSE(default)   [e.g., hbond=t   # H-bonding info]

Required parameter:
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1oct.pdb.gz]
    model@coordinate-file      [e.g., model@1oct.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.

    http -f http://api.x3dna.org/snap url=https://files.rcsb.org/download/1oct.pdb.gz
    http -f http://api.x3dna.org/snap model@1oct.cif json=1
    # with 'curl'
    curl http://api.x3dna.org/snap -F 'url=https://files.rcsb.org/download/1oct.pdb.gz'
    curl http://api.x3dna.org/snap -F 'model=@1oct.cif' -F 'json=1'

    The web API has an upper limit on coordinate file size (gzipped): < 6 MB

Help message on x3dna-fiber (56 fiber models): http://api.x3dna.org/fiber/help

Usage with 'http' (HTTPie):
    http http://api.x3dna.org/fiber/help    # display this help message
    http http://api.x3dna.org/fiber/list    # show a list of available fiber models (56 in total)
    http http://api.x3dna.org/fiber/str_id  # build model 'str_id' in the range of [1, 56]
    http http://api.x3dna.org/fiber/name    # generate a model with common names as shown below:
              A-DNA, B-dna, C_DNA, D-DNA, ZDNA, RNA, RNAduplex, PaulingTriplex, G4
              Case does not matter, and the separator can be '-' or '_' or omitted.
              So a-dna, A-dNA, a_DNA, or ADNA is valid for building an A-DNA model.

Options (via query strings, or form fields):
    seq=base-sequence # A, C, G, T, U for generic model
    repeat=number     # number of repeats of the sequence
    cif=1             # output file in mmCIF format

Examples with 'http' (HTTPie):
    http http://api.x3dna.org/fiber/1       # model no. 1 (i.e., calf thymus A-DNA model)
    http -f http://api.x3dna.org/fiber/1 seq=A3TTT repeat=2  # specific sequence, repeated twice
    http http://api.x3dna.org/fiber/rna     # single-stranded RNA model
    http http://api.x3dna.org/fiber/rna-ds  # double-stranded RNA model
    http http://api.x3dna.org/fiber/pauling # the triplex model of Pauling & Corey
    http http://api.x3dna.org/fiber/g4      # G-quadruplex model
    # with 'curl'
    curl http://api.x3dna.org/fiber/1
    curl http://api.x3dna.org/fiber/1 -d 'seq=A3TTT' -d 'repeat=2'
    curl http://api.x3dna.org/fiber/rna
    curl http://api.x3dna.org/fiber/rna-ds
    curl http://api.x3dna.org/fiber/pauling
    curl http://api.x3dna.org/fiber/g4

    The web API has two upper limits: repeats < 1,000, and nucleotides < 10,000.



DSSR-enhanced visualization of nucleic acid structures in PyMOL

The skmatic.x3dna.org website (see screenshot below) aims to showcase DSSR-enabled cartoon-block schematics of nucleic acid structures using PyMOL. It presents a simple interface to browse pre-calculated PDB entries with a set of default settings: long rectangular blocks for Watson-Crick base-pairs, square blocks for G-tetrads in G-quadruplexes, with minor-groove edges in black. Users can also specify an URL to a PDB- or mmCIF-formatted file or upload such an atomic coordinates file directly, and set several common options to customerize to the rendered image.

Moreover, a web API to DSSR-PyMOL schematics is available to allow for its easy integration into third-party tools.

Screenshot of the homepage of DSSR/PyMOL schematics

Input a PDB id

Pre-calculated cartoon-block images together with summary information are available for PDB entries of nucleic-acid-containing structures. Note that gigantic structures like ribosomes that are only represented in mmCIF format are excluded from the resource. The base block images are most effective for small to medium-sized structures.

Here are a few examples:

  • 1ehz, the crystal structure of yeast phenylalanine tRNA at 1.93-Å resolution
  • 2lx1, the major conformation of the internal loop 5’GAGU/3’UGAG
  • 2grb”, the crystal structure of an RNA quadruplex containing inosine-tetrad
  • 4da3, the crystal structure of an intramolecular human telomeric DNA G-quadruplex 21-mer bound by the naphthalene diimide compound MM41
  • 1oct, crystal structure of the Oct-1 POU domain bound to an octamer site
  • 2hoj, the crystal structure of an E. coli thi-box riboswitch bound to thiamine pyrophosphate, manganese ions

Each entry is shown with images in six orthogonal perspectives: front, back, right, left, top, bottom. The ‘front’ image (upper-left in the panel) is oriented into the most-extended view with the DSSR --blocview option. The corresponding PyMOL session file and PDB coordinate file are available for download. One can also visualize the structure interactively via 3Dmol.js.

Sample PDB entries

Users can browse random samples of pre-calculated PDB entries. The number should be between 3 and 99, with a default of 12 entries (see below for an example). Simply click the ‘Submit’ button or the “Random samples (3 to 99)”: http://skmatic.x3dna.org/pdb_entry link to see results of randomly picked 12 PDB entries each time.

Specify a coordinate file

The atomic coordinate file must be in PDB or mmCIF format, and can be optionally gzipped (.gz). One can either specify an URL to or select a coordinate file. Several common options are available to allow for user customizations.

Web API help message

Usage with 'http' (HTTPie):
    http -f http://skmatic.x3dna.org/api [options] url=|model@
    http http://skmatic.x3dna.org/api/pdb/pdb_id  -- for a pre-calculated PDB entry
    http http://skmatic.x3dna.org/api/help        -- display this help message
    block_file=styles-in-free-text-format [e.g., block_file=wc-minor]
    block_color=nt-selection-and-color    [e.g., block_color='A:pink']
    block_depth=thickness-of-base-block   [e.g., block_depth=1.2]
    r3d_file=true-or-FALSE(default)       [e.g., r3d_file=true]
    raw_xyz=true-or-FALSE(default)        [e.g., raw_xyz=true]
Required parameter
    url=URL-to-coordinate-file [e.g., url=https://files.rcsb.org/download/1ehz.pdb.gz]
    model@coordinate-file      [e.g., model@1ehz.cif]
    # Only one must be specified. 'url' precedes 'model' when both are specified.
    # The coordinate file must be in PDB or PDBx/mmCIF format, optionally gzipped.
    http -f http://skmatic.x3dna.org/api block_file='wc-minor' model@1ehz.cif r3d_file=t
    http -f http://skmatic.x3dna.org/api url=https://files.rcsb.org/download/1ehz.pdb.gz -d -o 1ehz.png
    http http://skmatic.x3dna.org/api/pdb/1ehz -d -o 1ehz.png
    # with 'curl'
    curl http://skmatic.x3dna.org/api -F 'model=@1msy.pdb' -F 'block_file=wc-minor' -F 'r3d_file=1'
    curl http://skmatic.x3dna.org/api -F 'url=https://files.rcsb.org/download/1ehz.pdb.gz' -o 1ehz.png
    curl http://skmatic.x3dna.org/api/pdb/1ehz -o 1ehz.png

Sample images




While reading DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes, I noticed SNAP and DSSR being mentioned. The detailed citations are as below:

Information about individual nucleotide–residue interactions is also provided, such as hydrogen bonding, interaction geometry (based on SNAP (10)), buried solvent accessible surface areas and identification of the interacting residue and nucleotide moieties …

DNAproDB assigns a geometry for every nucleotide–residue interaction identified using SNAP, a component of the 3DNA program suite (10). The residues for which probabilities are shown are those with planar side chains so that a stacking conformation can be defined.

Base pairing and base stacking between nucleotides are identified using the program DSSR (20).

SNAP and DSSR are two (relatively) new programs in the 3DNA software suite. As the author, I am always glad to see them being cited explicitly in literature. The fact that SNAP and DSSR are cited together by DNAproDB, however, is especially significant. I am aware of the initial version of DNAproDB, but I definitely like the updated one better. This is what I recently wrote in response to a question on the 3DNA Forum:

Regarding DNA-protein interactions in general, you may want to have a look of DNAproDB from the Remo Rohs laboratory. A new paper has just been published in NAR, ‘DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes’.

I’ve no doubt that SNAP and DSSR would be widely used in applications related to DNA/RNA structural bioinformatics. DSSR (to a lesser extent, SNAP) represents my view of what a scientific software tool should be.



ONZ classification of G-tetrads

Recently I read the article Topology-based classification of tetrads and quadruplex structures in Bioinformatics by Popenda et al. In this work, the authors proposed an ONZ classification scheme of G-tetrads in intramolecular G-quadruplexes (G4) as shown below (Fig. 2 in the publication):

ONZ classification of G-tetrads in intramolecular G-quadruplexes

I am glad to find that DSSR has been used as a component in their computational tool ElTetrado to automatically identify and classify tetrads and quadruplexes.

Structures from both sets were analysed using self-implemented programs along with DSSR software from the 3DNA suite (Lu et al. (2015)). From DSSR, we acquired the information about base pairs and stacking.

I like the ONZ classification scheme: it is simple in concept yet provides a new perspective for the topologies of G-tetrads in intramolecular G4 structures. So I implemented the idea in DSSR v1.9.8-2019oct16, with this feature available via the --g4-onz option. Note that ElTetrado, according to the authors, is applicable to ONZ classifications of general types of tetrads and quadruplexes. The DSSR implementation of ONZ classifications, on the other hand, is strictly limited to G-tetrads in intramolecular G4 structures.

The DSSR ONZ classification results match the ones reported in Figs. 1, 5, and 6 of the Popenda et al. paper. For example, for PDB entry 6H1K (Fig. 6), the relevant results with the --g4-onz option and without it are listed below:

# x3dna-dssr -i=6h1k.pdb --g4-onz
List of 3 G-tetrads
   1 glyco-bond=s--- groove=w--n planarity=0.149 type=planar Z- nts=4 GGGG A.DG1,A.DG20,A.DG16,A.DG27
   2 glyco-bond=-sss groove=w--n planarity=0.136 type=planar Z+ nts=4 GGGG A.DG2,A.DG19,A.DG15,A.DG26
   3 glyco-bond=--s- groove=-wn- planarity=0.307 type=other  O+ nts=4 GGGG A.DG17,A.DG21,A.DG25,A.DG28
# ---------------------------------------
# x3dna-dssr -i=6h1k.pdb 
#   without option --g4-onz
List of 3 G-tetrads
   1 glyco-bond=s--- groove=w--n planarity=0.149 type=planar nts=4 GGGG A.DG1,A.DG20,A.DG16,A.DG27
   2 glyco-bond=-sss groove=w--n planarity=0.136 type=planar nts=4 GGGG A.DG2,A.DG19,A.DG15,A.DG26
   3 glyco-bond=--s- groove=-wn- planarity=0.307 type=other  nts=4 GGGG A.DG17,A.DG21,A.DG25,A.DG28

With the --json option, the ONZ classification results are always available. An example is shown below for PDB entry 6H1K (Fig. 6):

# x3dna-dssr -i=6h1k.pdb --json | jq -c '.G4tetrads[] | [.nts_long, .topo_class]'



H-bonds reported by DSSR and SNAP

I recently read a short communication by Pavel Afonine, titled phenix.hbond: a new tool for annotation hydrogen bonds in the July 2019 issue of the Computational Crystallography Newsletter (CCN). It appears that every bioinformatics tool (e.g., PyMOL or Jmol) has its own implementation of an algorithm on calculating H-bonds, one of the fundamental stabilizing forces of proteins and DNA/RNA structures. So does 3DNA/DSSR, as noted in my 2014-04-11 blogpost Get hydrogen bonds with DSSR.

Both DSSR and SNAP have the --get-hbond option, and they use the same underlying algorithm. However, the default output from the two programs differs: DSSR reports the H-bonds within nucleic acids, and SNAP covers only those at the DNA/RNA-protein interface. Using the PDB entry 1oct as an example, Running DSSR on it with the --get-hbond option gives 33 H-bonds in the DNA duplex, while SNAP reports 38 H-bonds at the DNA-protein interface. By design, the default output caters for the most-common use case of each program.

Under the scene, however, there exist variations in the seemingly simple --get-hbond option. One can attach text ‘nucleic’ (or ‘nuc’, ‘nt’), as in --get-hbond-nucleic, to output H-bonds within nucleic acids. Similarly, --get-hbond-protein (or ‘amino’, ‘aa’) would output H-bonds within proteins. Not surprisingly, the --get-hbond-nt-aa option would list H-bonds in nucleic acids and proteins, including those at their interface. These variations apply to both DSSR and SNAP, even though some are redundant with the default.

Notably, in combination with --json, the --get-hbond option by default would output all H-bonds, as if --get-hbond-nt-aa has been set. For PDB entry 1oct, DSSR or SNAP would report 208 H-bonds. Moreover, the JSON output has a residue_pair field for each identified H-bond, with values like "nt:nt", "nt:aa", or "aa:aa". Using 1oct as an example,

# x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[0]'
  "index": 1,
  "atom1_serNum": 34,
  "atom2_serNum": 608,
  "donAcc_type": "standard",
  "distance": 3.304,
  "atom1_id": "O6@A.DG202",
  "atom2_id": "N4@B.DC230",
  "atom_pair": "O:N",
  "residue_pair": "nt:nt"
# x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[60]'
  "index": 61,
  "atom1_serNum": 462,
  "atom2_serNum": 1187,
  "donAcc_type": "standard",
  "distance": 3.692,
  "atom1_id": "O2@B.DT223",
  "atom2_id": "NH2@C.ARG102",
  "atom_pair": "O:N",
  "residue_pair": "nt:aa"
# x3dna-dssr -i=1oct.pdb --get-hbond --json | jq '.hbonds[100]'
  "index": 101,
  "atom1_serNum": 791,
  "atom2_serNum": 818,
  "donAcc_type": "standard",
  "distance": 2.871,
  "atom1_id": "N@C.THR26",
  "atom2_id": "OD2@C.ASP29",
  "atom_pair": "N:O",
  "residue_pair": "aa:aa"

In the above three cases, using SNAP instead of DSSR would give the same results.

Also, one can take advantage of the residue_pair value to filter H-bonds by type. For example, the following command would extract only H-bonds at the DNA-protein interface (38 occurrences, same as the number noted above):

x3dna-snap -i=1oct.pdb --get-hbond --json | jq '.hbonds[] | select(.residue_pair=="nt:aa")'

Back to the phenix.hbond tool, the author noted that:

Running phenix.hbond requires atomic model in PDB or mmCIF format with all hydrogen atoms added, as well as ligand restraint files if the model contains unknown to the library items.

While there is no particular reason why this should not work for all bio-macromolecules, currently phenix.hbond is only optimized and tested to work with proteins, which is the limitation that will be removed in future.

In contrast, the H-bond identification algorithm in DSSR/SNAP does not require hydrogen atoms. In fact, hydrogen atoms are simply ignored if they exist. As shown above, the H-bond method as implemented in DSSR/SNAP works for DNA, RNA, protein, or their complexes. This does not necessarily mean that the 3DNA way is superior to other similar tools. It just works well in my hand, and it may serve as a pragmatic choice for other users.



DSSR is used in RNAMake and 3dRNA 2.0

Recently I noticed two new citations to DSSR, an integrated software tool for dissecting the spatial structure of RNA. One is from the Yesselman et al. article Computational design of three-dimensional RNA structure and function in Nature Nanotechnology, and the other is from the Wang et al. article 3dRNA v2.0: An Updated Web Server for RNA 3D Structure Prediction in International Journal of Molecular Sciences.

Yesselman et al. has used DSSR in RNAMake for building the motif library. The relevant section is as follows:

We processed each RNA structure to extract every motif with Dissecting the Spatial Structure of RNA (DSSR)54 with the following command:

x3dna-dssr –i file.pdb –o file_dssr.out

We manually checked each extracted motif to confirm that it was the correct type, as DSSR sometimes classifies tertiary contacts as higher-order junctions and vice versa. For each motif collected from DSSR, we ran the X3DNA find_pair and analyze programs to determine the reference frame for the first and last base pair of each motif to allow for the alignment between motifs:

find_pair file.pdb 2> /dev/null stdout | analyze stdin >& /dev/null

It is worth noting the sentence that “DSSR sometimes classifies tertiary contacts as higher-order junctions and vice versa.” Presumably. the authors are referring to the inclusion of ‘isolated canonical pairs’ in junctions by default in DSSR. Overall, the default DSSR settings follow the most common practice in RNA literature. In the meantime, I am aware that the community may not agree on every detail. Thus DSSR provide many options (documented or otherwise) to cater for other potential use cases. See the Stems of junction structure have only one base pair and Junction definition threads on the 3DNA Forum for two examples. In the long run, DSSR is likely to help consolidate RNA nomenclature that can be applied in a pragmatic, consistent manner.

Note also that DSSR provides the reference frame of each identified base pair via the JSON option. Using 1ehz as an example, the following command provides detailed information about base pairs:

x3dna-dssr -i=1ehz.pdb --json --more | jq .pairs

In the 3dRNA 2.0 paper, DSSR is cited as below. This is the first time DSSR is integrated in the 3dRNA pipeline.

The predicted structures are built from the sequence and secondary structure, while the former is obtained from their native structures fetched from PDB (https://www.rcsb.org/), and the latter is calculated from DSSR (Dissecting the Spatial Structure of RNA) [39].



« Older ·

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu