Crystal structure of SARS-CoV-2 stem–loop 5 (SL5) (PDB id: 9E9Q; Jones CP, Ferré-D'Amaré AR. 2025. Crystallographic and cryoEM analyses reveal SARS-CoV-2 SL5 is a mobile T-shaped four-way junction with deep pockets. RNA 31: 949–960). The T-shaped four-way junction of the coronavirus SL5 structural element provides a starting point for examining the structures of larger RNA motifs and their interactions with other molecules. Image highlighting the four arms of the junction. The RNA backbone is depicted by a gray ribbon. The bases within the arms of the junction are colored respectively in blue, red, yellow, and cyan. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

As the developer of DSSR, I am thrilled to see its application in cutting-edge research across multiple disciplines. Below is a list of four recent publications that highlight how DSSR has been utilized, underscoring its versatility and significance in structural bioinformatics.

In the Geng et al. (2025) Nucleic Acids Research (NAR) paper, titled 'Revealing hidden protonated conformational states in RNA dynamic ensembles', DSSR is simply cited as follows:

All bp geometries, hydrogen-bond, backbone, stacking, and sugar dihedral angles were calculated using X3DNA-DSSR [77].

In the preprint by Gordan et al. (2025), titled 'High-throughput characterization of transcription factors that modulate UV damage formation and repair at single-nucleotide resolution', DSSR is cited as follows:

Step base stacking, base pair shift, base pair slide, interbase angle, pseudorotation angle, and sugar puckering classifications of nucleobases were computed using X3DNA-DSSR (v2.5.0)⁷⁵. Base stacking was defined as the overlapping polygon area in Å² when projecting the dipyrimidine base ring atoms (excluding exocyclic atoms) into the mean base pair plane⁷⁶. The sugar ring pseudorotation phase angle of each pyrimidine was also calculated using X3DNA-DSSR as described by Altona, C. & Sundaralingam, M.⁷⁷ Interbase angle was defined as sqrt(propeller²+buckle²) per the X3DNA-DSSR documentation.

Figure 6: TF Binding Induces Structural Distortion Favorable to UV Dimerization is highly informative, particularly panel (a), which illustrates the ensemble of structural parameters that predispose dipyrimidines to cyclobutane pyrimidine dimers (CPD) or 6-4 pyrimidine-pyrimidones (6-4 PP) formation. DSSR is designed as an integrated software tool, offering a comprehensive suite of structural parameters not found in any other single tool I am aware of. Despite this, the innovative use of DSSR by Gordan et al. exceeds my expectations and demonstrates its versatility.

In the preprint by Kubaney et al. (2025) from the Baker group, titled 'RNA sequence design and protein-DNA specificity prediction with NA-MPNN', DSSR is cited as follows:

On the pseudoknot subset, we evaluate additional structure‐ and reactivity‐based metrics. DSSR v2.3.2⁴¹ is used to extract the ground‐truth secondary structure from the native crystal structures. For each designed sequence, RibonanzaNet predicts 2A3 reactivity profiles, from which we compute predicted OpenKnot scores (see https://github.com/eternagame/OpenKnotScore)³¹ using the predicted reactivity together with the DSSR ground truth.

In a recent NSMB paper from the Baker group, titled 'Computational design of sequence-specific DNA-binding proteins', 3DNA is cited as follows:

RIF docking of scaffolds onto DNA targets (DBP design step 1) Structures of B-DNA for each target (Supplementary Table 2) were generated by (1) using the DNA portion of PDB 1BC8 (ref. 60), PDB 1YO5 (ref. 61), PDB 1L3L (ref. 51) or PDB 2O4A (ref. 62) or (2) using the software X3DNA⁶³, followed by a constrained Rosetta relax of the DNA structure.

Please note that 3DNA has been replaced by DSSR. The functionality for constructing B-DNA models, previously provided by 3DNA, is now directly available in DSSR via its fiber and rebuild modules.

In the preprint by Si et al. (2025), titled 'End-to-End Single-Stranded DNA Sequence Design with All-Atom Structure Reconstruction', DSSR is cited as follows:

Since ViennaRNA and NUPACK require secondary structures as input, we used DSSR³⁵ to extract secondary structures from the corresponding ssDNA three-dimensional structures.

The above use cases are merely a sample of how DSSR is utilized in the scientific literature. It is reasonable to state that DSSR has emerged as a de facto standard tool within the field of nucleic acid structural bioinformatics. Overall, DSSR is a mature, robust, and efficient software product that is actively developed and maintained. I am committed to making DSSR synonymous with quality and value. Its unmatched functionality, usability, and support save users significant time and effort compared to alternative solutions.

DSSR is available free of charge for academic users. Additionally, it has been integrated into other high-profile bioinformatics resources, including NAKB, PDB-redo, and N•ESPript.

References

Geng A, Roy R, Ganser L, Li L, Al-Hashimi HM. Revealing hidden protonated conformational states in RNA dynamic ensembles. Nucleic Acids Research. 2025;53:gkaf1366. https://doi.org/10.1093/nar/gkaf1366.
Gordan R, Wasserman H, Chi B, Bohm K, Duan M, Sahay H, et al. High-throughput characterization of transcription factors that modulate UV damage formation and repair at single-nucleotide resolution. 2025. https://doi.org/10.21203/rs.3.rs-8197218/v1.
Kubaney A, Favor A, McHugh L, Mitra R, Pecoraro R, Dauparas J, et al. RNA sequence design and protein–DNA specificity prediction with NA-MPNN. 2025. https://doi.org/10.1101/2025.10.03.679414.
Glasscock CJ, Pecoraro RJ, McHugh R, Doyle LA, Chen W, Boivin O, et al. Computational design of sequence-specific DNA-binding proteins. Nat Struct Mol Biol. 2025;32:2252–61. https://doi.org/10.1038/s41594-025-01669-4.
Si Y, Xu Y, Chen L. End-to-end single-stranded DNA sequence design with all-atom structure reconstruction. 2025. https://doi.org/10.64898/2025.12.05.692525.

Cartoon-block representation of quadruplex-duplex interface

Recently I read the article titled Structural Insights into the Quadruplex−Duplex 3′ Interface Formed from a Telomeric Repeat: A Potential Molecular Target by Krauss et al.. I quickly ran DSSR on the corresponding PDB entry is 5dww. Not surprisingly, DSSR can automatically identify reported key structural features (see output file 5dww.out for details), including the TAT triplet at the quadruplex−duplex junction, and the three G-quartets. Note that the result is based on biological assembly 1 in PDB file 5dww.pdb1 since the asymmetric unit contains four such molecules.

List of 4 multiplets
   1 nts=3 TAT 1:A.DT17,1:A.DA19,1:B.DT7
   2 nts=4 GGGG 1:A.DG1,1:A.DG5,1:A.DG9,1:A.DG14
   3 nts=4 GGGG 1:A.DG2,1:A.DG6,1:A.DG10,1:A.DG15
   4 nts=4 GGGG 1:A.DG3,1:A.DG7,1:A.DG11,1:A.DG16

As its title suggests, however, this blog post is about the cartoon-block representations. Four styles of such schematics are shown below, which can all be easily generated using DSSR/PyMOL.


in default style	with base-pair blocks

minor-groove highlighted	top-face highlighted

The cartoon-block representations possess unique features not seen elsewhere. With the help of the dssr_block in PyMOL, they are extremely easy to generate. Such schematics are likely to become popular in illustrations of nucleic acid structures.

Comment

Integrating DSSR into Jmol and PyMOL

Over the past couple of years, one of the most significant achievements of DSSR has been its integration into Jmol and PyMOL, two widely used molecular graphics programs. None of the projects had been ‘planned’, and I am honored to have the opportunities collaborating directly with Bob Hanson (Jmol) and Thomas Holder (PyMOL). The integrations make salient features of DSSR readily accessible to the Jmol and PyMOL user communities. Moreover, Jmol and PyMOL take different approaches to interoperate with DSSR, and so far they have employed separate features that the program has to offer.

Key features of DSSR

DSSR was implemented in strict ANSI C as a self-contained command-line program. The binaries for common operating systems (Mac OS X, Linux and Windows) are tiny (<1MB), and without runtime dependencies on third-party libraries. DSSR also comes with an extensive PDF user manual.

Since its initial release in early 2013, DSSR has been continuously refined/expanded based on user feedback and my improved knowledge of RNA structures. User questions are always promptly addressed on the public 3DNA Forum. Over the years, DSSR has gradually established itself as an accountable software product.

The small size, zero configuration, extensive features, and robust performance make DSSR ideal to be integrated into other bioinformatics tools.

DSSR and Jmol

From the very beginning, Jmol has been employing a web-service at Columbia University, where all DSSR analyses take place. In addition to the sample DSSR-Jmol web interface, DSSR is also directly accessible from the console (see Fig.1 below). Jmol includes a sophisticated SQL syntax to drill down the various DSSR-derived structure features. Search ‘DSSR’ on the Jmol/JSmol interactive scripting documentation for details.

Fig. 1 DSSR is available from the Jmol/JSmol console via scripting.

The initial version of the integration (Jmol v14.2) was facilitated by the DSSR --jmol option to produce a Jmol-specific (e.g., residue id [C]2658:A) plain text output. However, ad hoc text file are rigid and fragile for programs to communicate with. As DSSR had been evolving, changes to existing features or newly added functionality were known to break the established DSSR-Jmol interface. Having to write extra code to maintain the same old --jmol output did not feel right.

JSON (JavaScript Object Notation) came to the rescue! The current DSSR-Jmol integration (Jmol v14.4) takes advantage of JSON, a standard, lightweight data-interchange format. Since JSON is structured, parsing its contents is straightforward. DSSR and Jmol can evolve independently, as always, but they no longer need to worry about touching each other’s toes.

Overall, Jmol has incorporated the most fundamental analysis features of DSSR. The Jmol SQL mini-language is very powerful for selecting arbitrary DSSR parameters. Background information about this collaboration can be found in the blog post Jmol and DSSR.

DSSR and PyMOL

So far, the DSSR-PyMOL integration has focused on visualization, i.e., the cartoon-block schematic representations of DNA/RNA structures. Moreover, instead of relying on a remote DSSR web-service as for Jmol, the PyMOL dssr_block command calls a locally installed DSSR executable for the job. As illustrated in the blogpost DSSR base blocks in PyMOL, interactively, the ‘dssr_block’ command makes it trivial to incorporate the highly effective rectangular blocks into PyMOL.

From early on, 3DNA includes the blocview script (first written in Perl, later converted to Ruby) to generate schematic images in the ‘best view’, by combining block representation of bases with backbone ribbon of proteins or nucleic acids. The script is essentially a glue, calling MolScript, Raster3D, ImageMagick, and several 3DNA utility programs to perform various tasks. With these dependencies, it’s a bit involved to set up blocview. Nevertheless, the resultant images are simple and revealing, and are still being used by NDB and RCSB PDB (among others) as of today.

DSSR does not depend on MolScript and Raster3D, or any other programs to generate .r3d output of rectangular blocks. The schematic blocks can be directly fed into PyMOL, combined with other representations, and ray-traced for high resolution images. The integration of DSSR into PyMOL by the dssr_block command is likely to prompt an even wider adoption of the cartoon-block representation. In this regard, it is well worth noting the news item “dssr_block is a wrapper for DSSR (3dna) and creates block-shaped nucleic acid cartoons” on the main page of PyMOLWiki (see Fig. 2 below). It will certainly bring this neat feature into the attention of many PyMOL users.

Fig. 2 Screenshot of the PyMOLWiki main page (2016-01-27) with ‘dssr_block’ in the news. A sample cartoon-block image of 355d is inserted as an example.

Integration of DSSR analysis results into PyMOL is underway, using the same JSON output. Before long, PyMOL users should be able to have access to the numerous DNA/RNA structural features derived by DSSR as in Jmol, along with the cartoon-block images enabled by dssr_block. Background information about DSSR-PyMOL can be found in blog post Open invitation on writing a DSSR plugin for PyMOL.

Notes

The DSSR-Jmol and DSSR-PyMOL integrations are two salient examples of what can be achieved via direct collaboration of dedicated scientists with complementary expertise. In addition to benefit the involved projects in particular and the (structural biology) community at large, technical and scientific advances are more likely to be achieved.
Both projects are still on going, with continued refinements of existing functionality and additions of new features. As an example, it is desirable and likely that Jmol would allow local access to DSSR for efficiency and data privacy.
JSON is the way to go for connecting DSSR to the outside world. Period. The obsolete --jmol will be removed from the next release of DSSR (v1.5). The default plain text output is useful for easy comprehension and will stilled be maintained. But do not count on its exact format for computer parsing — occasional changes to existing items are likely, and new features are bound to be added.
If you’d like to incorporate DSSR into your pipeline and need some customizations of its output, please let me know. It’s always easier to set things right at the source than to fix them downstream. Where practical, I’ll try to implement your requested features, quickly. Working together, we can and will build a better world.

Comment

Characterization of base-pair geometry

This post is a recap of the recently introduced ‘simple’ base-pair (bp) parameters (Fig. 1) useful for describing non-Waton-Crick pairs, and the highly effective cartoon-block representations of nucleic acid structures. Both features are readily available from 3DNA/DSSR, as detailed here using four examples of representative DNA/RNA structures (Fig. 2). Links to related blog posts are provided at the end.

Note added on Feb. 2, 2016: in fact, this post had been intended to supplement a short communication titled Characterization of base-pair geometry that Dr. Wilma Olson and I recently contributed to the January 2016 issue of Computational Crystallography Newsletter (CCN). That’s why the URL of this post is ‘http://home.x3dna.org/highlights/CCN-on-base-pair-geometry’ instead of what one would expect from the title. The data files, scripts, images, and linked herein should enable interested users a thorough understanding of the ‘simple’ base-pair parameters. If you have problems in reproducing our reported results, please do not hesitate to let me know (publicly). You are welcome to either leave comments to this post or ask any related questions on the 3DNA Forum.

Six rigid-body parameters

Fig. 1: Schematic diagrams of the six rigid-body parameters commonly used for the characterization of base-pair geometry.

Cartoon-block representations

Fig. 2: DSSR-introduced cartoon-block representations of DNA and RNA structures that combine PyMOL cartoon schematics with color-coded rectangular base blocks: A, red; C, yellow; G, green; T, blue; and U, cyan. (A) The Dickerson B-DNA dodecamer solved at 1.4-Å resolution [PDB id: 355d (Shui et al., 1998)], with significant negative Propeller. (B) The Z-DNA dodecamer [PDB id: 4ocb (Luo et al., 2014)], with virtually co-planar C–G pairs at the ends, and noticeable Buckle in the middle. © The GUAA tetraloop mutant of the sarcin/ricin domain from E. coli 23 S rRNA [PDB id: 1msy (Correll et al., 2003)], with large Buckle in the A+C pair, and base-stacking interactions of UAA in the GUAA tetraloop (upper-right corner). (D) The parallel double-stranded poly(A) RNA helix [PDB id: 4jrd (Safaee et al., 2013)], with up to +14° Propeller. The simple, informative cartoon-block representations facilitate understanding of the base interactions in small to mid-sized nucleic acid structures like these. The base identity, pairing geometry, and stacking interactions are obvious.

Scripts and data files (Lu-CCN-examples.tar.gz)

find_pair 355d.pdb | analyze   # 355d.out
x3dna-dssr -i=355d.pdb -more -o=355d-dssr.out
x3dna-dssr -i=355d.pdb --cartoon-block -o=355d.pml

find_pair 4jrd.pdb | analyze   # 4jrd.out
x3dna-dssr -i=4jrd.pdb -more -o=4jrd-dssr.out
x3dna-dssr -i=4jrd.pdb --cartoon-block -o=4jrd.pml

find_pair 1msy.pdb | analyze   # 1msy.out
x3dna-dssr -i=1msy.pdb -more -o=355d-dssr.out
x3dna-dssr -i=1msy.pdb --cartoon-block -o=1msy.pml

find_pair --symm 4ocb.pdb1 | analyze --symm  # 4ocb.out
x3dna-dssr -i=4ocb.pdb1 --symm -more -o=4ocb-dssr.out
x3dna-dssr -i=4ocb.pdb1 --symm --cartoon-block -o=4ocb.pml

Please note the following points:

The above examples are based on 3DNA v2.3-2016jan20 and DSSR v1.4.8-2016jan16.
All data files (including PyMOL ray-traced PNG images used in Fig. 2) are packed into a tarball named Lu-CCN-examples.tar.gz for download.
For PDB entry 4ocb, the biological unit (with suffix .pdb1) is used to get a complete duplex structure. The symm option must be specified.
PDB files are used in the above illustration. In fact, the corresponding mmCIF files (.cif) also work just fine.
The DSSR-derived .pml files can be fed into PyMOL for rendering. In addition to the directly generated *.pml files (e.g., 355d.pml), the PyMOL transformed version (i.e., orient; turn z, -90) are also included, with names *-orient.pml (e.g., 355d-orient.pml). The PNG images (as shown in Fig. 2) are ray-traced using these reoriented pml files for the most extended vertical view.
The ‘simple’ base-pair parameters for 4jrd is shown below.

This structure contains 10 non-Watson-Crick (with leading *) base pair(s)
----------------------------------------------------------------------------
Simple base-pair parameters based on RC8--YC6 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
*    1 A+A      -7.96      0.41     -0.03    -13.64     -4.06   -179.47   14.2
*    2 A+A      -7.86      0.38     -0.33    -10.20     -3.53   -179.34   10.8
*    3 A+A      -7.96      0.43      0.02    -10.15      5.23    179.91   11.4
*    4 A+A      -7.95      0.50      0.10     -9.24      8.04    179.15   12.2
*    5 A+A      -7.95      0.46      0.08     -7.36     10.12   -179.98   12.5
*    6 A+A      -7.97      0.60      0.06     -5.15     12.87   -176.75   13.9
*    7 A+A      -7.88      0.66     -0.02     -7.82     11.89   -179.55   14.2
*    8 A+A      -7.91      0.56     -0.05     -7.03     13.68    179.22   15.4
*    9 A+A      -7.94      0.47     -0.03     -3.78     13.76   -179.24   14.3
*   10 A+A      -7.92      0.42      0.10     -3.03      4.34   -178.91    5.3

Comment [2]

DSSR base blocks in PyMOL, interactively

In early 2015, Thomas Holder (the PyMOL Principal Developer at Schrodinger) and I agreed to work together on connecting DSSR to PyMOL. Moreover, we called for the community’s involvement in writing a DSSR plugin for PyMOL and received a few enthusiastic replies. Over the past few months, many significant progresses have been made in DSSR, including an article titled DSSR: an integrated software tool for dissecting the spatial structure of RNA published in Nucleic Acids Research (NAR) and a more streamlined DSSR-Jmol integration based on the --json output.

From the very beginning, Thomas and I had envisioned that the DSSR-PyMOL integration would include two components: one is to bring DSSR-derived RNA/DNA structural features into PyMOL (similar to the DSSR-Jmol interface, funcationality-wise), and the other is to render DSSR’s simple yet informative base-rectangular representations with PyMOL. While the ‘analysis’ component is a work in progress, the ‘visualization’ part is ready for the community to take advantage of.

Thomas has written a Python script named dssr_block.py. When the script is run in PyMOL, it adds the “dssr_block” command. The dssr_block.py script is less than 100 lines including documentation, with the real code taking no more than half of the total line number. The detailed documentation section (with two examples), when condensed, is as follows:

DESCRIPTION
    Create a nucleid acid cartoon with DSSR
USAGE
    dssr_block [selection [, state [, block_file [, block_depth [, name [, exe]]]]]]
ARGUMENTS
    selection = str: atom selection {default: all}
    state = int: object state (0 for all states) {default: -1, current state}
    block_file = face|edge|wc|equal|minor|gray {default: face}
    block_depth = float: thickness of rectangular blocks {default: 0.5}
    name = str: name of new CGO object {default: dssr_block##}
    exe = str: path to "x3dna-dssr" executable {default: x3dna-dssr}
EXAMPLE
    fetch 1ehz, async=0
    as cartoon
    dssr_block
    set cartoon_ladder_radius, 0.1
    set cartoon_ladder_color, gray
    set cartoon_nucleic_acid_mode, 1
    # multi-state
    fetch 2n2d, async=0
    dssr_block 2n2d, 0
    set all_states

Download the dssr_block.py script into a folder (directory) of your choice. Within PyMOL command window, type:

run dssr_block.py  # to make the 'dssr_block' command avaible
help dssr_block    # to get the help message, with contents shown above

The resultant cartoon-block image for running the documented commands (except for the additional orient command for best view) for case 1ehz is shown in Fig. 1 below.

Fig. 1: Cartoon-block image generated by dssr_block.py for PDB entry 1ehz (yeast phenylalanine tRNA)

For the NMR ensemble 2n2d, the corresponding image (after running orient) is illustrated in Fig. 2 as follows:

Fig. 2: Cartoon-block image generated by dssr_block.py for PDB entry 2n2d (an NMR ensemble).

In addition to the default settings, DSSR offers quite a few variations for the size and coloring of rectangular blocks, as demonstrated in Fig.3. The main settings are through the block_file option in PyMOL (note the underscore), corresponding to DSSR --block-file (or --block_file). The corresponding PyMOL commands are also listed for your reference. You can easily play around with the various styles interactively in PyMOL by toggling objects (dssr_block##) on or off. Enjoy!

Fig. 3: Cartoon-block image generated by dssr_block.py for PDB entry 355d (the Dickerson B-DNA dodecamer).

Fig. 3 is created with the following PyMOL commands:

reinitialize
fetch 355d, async=0
bg_color white

as cartoon
orient
turn z, -90
turn y, 180

set cartoon_ladder_mode, 1
set cartoon_ladder_radius, 0.1
set cartoon_ladder_color, black

set cartoon_tube_radius, 0.5
set cartoon_nucleic_acid_mode, 1
set cartoon_color, gold

dssr_block 355d                  # default base blocks in solid color
dssr_block block_file=edge       # rectangular blocks in wireframe (black)
dssr_block block_file=face+edge  # solid color with outline
dssr_block block_file=equal      # bases blocks in equal size
dssr_block block_file=minor      # with minor-groove colord black
dssr_block block_file=wc         # Watson-Crick base pairs in long bp blocks
dssr_block block_file=wc-minor   # Watson-Crick pairs + minor-groove edge
dssr_block block_file=gray       # rectangular blocks all in gray
dssr_block block_depth=1.8       # with increased thickness

Notes

The dssr_block.py script described here is the original version Thomas communicated to me. Current version of this script and related topics can be found in the Dssr block PyMOLWiki page.
For this script to work, DSSR needs to be installed and x3dna-dssr in the PATH.
In PyMOL, set cartoon_nucleic_acid_mode, 1 employs C3′ instead of the default P (‘mode 0’) for the smooth backbone trace. Since 5′ terminal phosphate groups are normally not available from X-ray crystal structures (e.g., 355d), ‘mode 1’ is used to avoid orphan base blocks from the backbone trace.

Comment [2]

3DNA Forum is spam free

As of today (2016-01-16), the number of registrations on the 3DNA Forum has reached 2,562. Moreover, all the members (as far as I can tell) are legitimate since the Forum has remained spam free. From the very beginning, ensuring a high information-to-noice ratio has been a top priority. The goal has been achieved by taking the following measures:

State the rules clearly in the “Registration Agreement”

This forum is dedicated to topics generally related to the 3DNA suite of software programs for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. To make the 3DNA forum a more pleasant virtual community for all of us to learn from and contribute to, please be considerate and practice good netiquette (http://www.albion.com/netiquette/).

I strive to make the forum spam free. Specifically, posts that are not 3DNA related in the broad sense are taken as spams, and are strictly forbidden. You are solely responsible for the content of your posts. We reserve the right to remove any post deemed as inappropriate, deactivate the account and ban the IP address of any abuser of the forum, WITHOUT NOTICE.

When posting on the Forum, please abide by the following rules: …

In a nutshell, you are welcome to participate and should not hesitate to ask questions, but remember to play nice and preferably share what you’ve learned! Please note that we do not tolerate spamming or off-topic trolling of any form.

Take advantage of anti-spam software

In additional to the verification of email address and check for black-listed IP addresses, the topic-specific questions have been very effective. Three examples of such questions are shown below:

What does the 'A' in 3DNA stand for? (hint: 4-char long)

How many standard bases does RNA have (hint: 1-digit number)

What is the value of the expression (3.1498 * 0 + 168)?

Overall, I do not like CAPTCHA — I’ve found the highly-distorted images in some websites especially troublesome. For the first few of years (to ~2014), the 3DNA Forum did not contain a captcha image in the registration page. Later on, however, I’ve noticed quite a few spam registrations/posts. In addition to quickly cleaning them up manually, I had refined the topic-specific questions, and turned on the visual verification image at level “Medium — Overlapping colored letters, with noise/lines”. Experience over the past couple of year has demonstrated the effectiveness of the combined strategy. As shown in the screen capture below, as of this writing, 177,562 spammers have been blocked by the anti-spam software!

Verify and approve ‘suspect’ accounts quickly

The above mentioned anti-spaming measures have blocked virtually all the “bad guys” so I do not need to waste time fighting them. I receive an email notification for each successful registration. The vast majority of registrants can then immediately access the member-only download section or post questions on the 3DNA Forum after registration. A significant portion (~1 out of 6) of the registrations, however, would be masked as suspicious and need my action. The email message for such cases reads like this:

‘xxxx’ has just signed up as a new member of your forum. Click the link below to view their profile. …
Before this member can begin posting they must first have their account approved. Click the link below to go to the approval screen. …

Wherever I have access to the Internet (including after hours with an iPad Air 2), I’ve always been quick in verifying and (mostly) approving these registrations.

Overall, since http://forum.x3dna.org was created in December 2011, the Forum has received significant attention in the field of DNA/RNA structural bioinformatics. As the community begins to appreciate and fully take advantage of what DSSR and SNAP have to offer, I have no doubt the Forum will gain even wider-spread recognition.

Comment

Ask reproducible questions, publicly

In recent years, reproducibility of ‘scientific’ publications has become quite a topic. See a recent essay Five selfish reasons to work reproducibly by Markowetz in Genome Biology (2015, 16:274). There are numerous reasons why reproducibility could become an issue at all in science. What I have continuously strived for in my scientific career, however, is to ensure that my published results are reproducible. As a concrete example, I created a dedicated section titled DSSR-NAR paper on the 3DNA Forum that provides full details (scripts and data files) so that any interested parties can rigorously reproduce the results reported in the DSSR Nucleic Acids Research (NAR) paper.

In my support of 3DNA for over a decade, the #1 issue I experienced is undoubtedly vague (non-reproducible) questions. For example, I have recently been asked via email why the 3DNA find_pair/analyze programs miss “some basepair … even though it is in the pdb file”. Without access to the PDB file to reproduce the problem, however, I cannot provide a concrete answer. In an effect to prevent ambiguous questions, I made the following explicit point in the “Registration Agreement” of the 3DNA Forum (no. 2 on the list):

Be specific with your questions; provide a minimal, reproducible example if possible; use attachments where appropriate.

The #2 issue is receiving 3DNA-related questions privately instead of on the intended public 3DNA Forum. I turned off “personal messaging” to receive private messages on the Forum long time ago, yet I have kept receiving questions via emails. In several locations on the 3DNA Forum, I have made this ‘public-question’ policy crystal clear:

Ask your questions in the public 3DNA forum instead of sending xiangjun emails or personal messages. (no. 1 on the ‘Registration Agreement’)

Please be aware that for the benefit of the 3DNA-user community at large, I do not provide private email/personal message support; the forum has been created specifically for open discussions of all 3DNA-related issues. In other words, any 3DNA-associated questions are welcome and should be directed here. Presumably I’ve made the message simple and clear enough to get across without further explanation. (in ‘Site announcements » Download instructions’ and ‘Downloads » 3DNA download’)

In response to the many 3DNA-related questions that still keep coming via email, I created the following entry of Canned Responses in gmail:

Thanks for your interest in using 3DNA. Please be aware that for the benefit of the 3DNA-user community at large, I do not provide private email support; the 3DNA Forum (http://forum.x3dna.org/) has been created specifically for open discussions of all 3DNA-related issues. In other words, any 3DNA-associated questions are welcome and should be directed there. I monitor the forum regularly and respond to posts promptly.

I look forward to seeing you on the 3DNA Forum (http://forum.x3dna.org/).

Overall, I’ve learned from experience that addressing reproducible questions publicly does the best for the 3DNA community. Users can register with personal (free) email address, and post simulated data to illustrate the problem at hand. Moreover, questions on the Forum have always received quick responses. Over time, the Forum has served as an archive that everyone can benefit from.

Comment [2]

Details on the simple base-pair parameters

With the foundation laid by the previous two posts on Fitting of base reference frame and Automatic identification of nucleotides, we can now get into the details on how the ‘simple’ base-pair (bp) parameters are derived. To make the point clear, I am using two concrete examples from the yeast phenylalanine tRNA (PDB id: 1ehz): the first pair is 2MG10+G45, of type M+N (shortened to g+G) in 3DNA/DSSR; and the second example is a Watson-Crick pair U6–A67, of type M–N (shortened to U–A).

Pair 2MG10+G45 (g+G, of type `M+N`, see Fig. 1)

Base reference frames

Fig. 1: Base pair 2MG10+G45 (g+G) of type M+N in yeast phenylalanine tRNA 1ehz

In the original coordinate system (as in 1ehz.pdb downloaded from the RCSB PDB), the base-reference frames for 2MG10 and G45 are:

# base reference frame of 2MG10
{
  "rsmd": 0.018218,
  "origin": [65.696016, 45.134944, 18.125044],  # o1
  "x_axis": [0.690346, 0.713907, -0.117302],    # x1
  "y_axis": [-0.706849, 0.700116, 0.101003],    # y1
  "z_axis": [0.154232, 0.013188, 0.987947]      # z1
}
# base reference frame of G45
{
  "rsmd": 0.025865,
  "origin": [70.584399, 50.526567, 17.229626],  # o2
  "x_axis": [0.818521, 0.49914, -0.284399],     # x2
  "y_axis": [-0.574112, 0.728382, -0.373973],   # y2
  "z_axis": [0.020486, 0.469381, 0.882758]      # z2
}

The base-pair reference frame

Since dot(z1, z2) = 0.88 (positive), this pair is of type M+N in 3DNA/DSSR. The ‘mean’ z-axis of the pair is the average of z1 and z2, which is z = [0.090069, 0.248769, 0.964366] (normalized). This is the z-axis of the bp frame, as in 3DNA/DSSR.

The ‘long’ axis employs RC8 (purines) and YC6 (pyrimidines) base atoms. Here 2MG10 and G45 are all purines, so the following two C8 atoms are used:

# C8 atoms of 2MG10 and G45 in 1ehz
HETATM  208  C8  2MG A  10      62.199  48.621  18.635  1.00 40.38           C
ATOM    987  C8    G A  45      67.772  54.149  15.386  1.00 40.45           C

The vector from C8 of G45 to C8 of 2MG10 is:

y0 = [62.199  48.621  18.635] - [67.772  54.149  15.386]
   = [-5.573  -5.528   3.249]

Normally, y0 and z-axis are not orthogonal. Here they have an angle of ~81º. The orthogonal component of y0 with reference to the z-axis, when normalized, is the y-axis:

y = [-0.676751, -0.695120, 0.242520]

The x-axis is defined by the right-handed rule:

x = [-0.730682, 0.674479, -0.105746]

Overall, the orthonormal x-, y- and z-axes of the pair defined thus far are:

x = [-0.730682, 0.674479, -0.105746]
y = [-0.676751, -0.695120, 0.242520]
z = [0.090069, 0.248769, 0.964366]

Derivation of the six ‘simple’ base-pair parameters (Fig. 2)

Fig. 2: Schematic diagram of six rigid-body base-pair parameters

Propeller is the ‘torsion’ angle of z2 to z1 with reference to the y-axis, and is calculated using the method detailed in the blog post How to calculate torsion angle?. Here Propeller is: -24.24º. Similarly, Buckle is defined as the ‘torsion’ angle of z2 to z1 with reference to the x-axis, and is -14.81º. Opening is defined as the angle from y2 to y1 with reference to the z-axis, and is: 13.32º.

The corresponding translational parameters are simply projects of the o2 to o1 vector onto the x-, y- and z-axis, respectively. Here, they have values:

d = o1 - o2 = [-4.888383, -5.391623, 0.895418]
Shear = dot(d, x) = -0.16
Stretch = dot(d, y) = 7.27
Stagger = dot(d, z) = -0.92

‘Corrections’ of Buckle and Propeller

Base-pair non-planarity is due to the following three parameters: Buckle, Propeller, and Stagger. In particular, Buckle and Propeller cause the two bases to be non-parallel, the most noticeable characteristic of a pair. These two angular parameters are well-documented in literature, even among the canonical Watson-Crick base pairs. In 3DNA/DSSR, the angle between the two base normal vectors (in range [0, 90º]) is related to Buckle and Propeller with the formula:

interBase-angle = sqrt(Buckle^2+Propeller^2)

For the 2MG10+G45 pair, the angle between z1 and z2 is 28.18º, and sqrt(Buckle^2+Propeller^2) = 28.405º. So the following ‘corrections’ are made:

Buckle = -14.81 * 28.18 / 28.405 = -14.69
Propeller = -24.24 * 28.18 / 28.405 = -24.05

Overall, the ‘corrections’ have only small influence on the numerical values of the reported Buckle and Propeller parameters. It is ‘sensible’ that the ‘simple’ parameters have the property interBase-angle = sqrt(Buckle^2+Propeller^2), just as the original 3DNA/DSSR bp parameters.

Now, the six ‘simple’ bp parameters for 2MG10+G45, reported in 3DNA analyze program as of v2.3-2016jan01 are:

Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
*    1 g+G      -0.16      7.27     -0.92    -14.69    -24.05     13.32   28.2

The corresponding local bp parameters as originally reported by 3DNA/DSSR are as follows. Note the significant differences in Shear vs. Stretch, and Buckle vs. Propeller in the two sets of bp parameters. On the other hand, Stagger is identical and Opening should be quite close, by definition. Due to the similarity in Stagger and Opening, DSSR only reports four ‘simple’ parameters (i.e., Shear, Stretch, Buckle, and Propeller).

Local base-pair parameters
     bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening
    1 g+G      -7.21     -0.97     -0.92     25.58    -11.83     13.07

Base-pair U6–A67 (Watson-Crick U–A, of type `M–N`, see Fig. 3)

Fig. 3: Base pair U6–A67 (U–A) of type M–N in yeast phenylalanine tRNA 1ehz

Base reference frames

In the original coordinate system (as in 1ehz.pdb downloaded from the RCSB PDB), the base-reference frames for U6 and A67 are:

# base reference frame of U6 (white in Fig. 3)
{
  "rsmd": 0.010835,
  "origin": [60.441988, 48.83479, 41.242523],  # o1
  "x_axis": [0.28491, 0.503019, 0.815965],     # x1
  "y_axis": [0.887155, -0.460753, -0.025726],  # y1
  "z_axis": [0.363018, 0.731217, -0.577529]    # z1
}
# base reference frame of A67 (colored yellow in Fig. 3)
{
  "rsmd": 0.01992,
  "origin": [60.578326, 48.823104, 41.154211], # o2
  "x_axis": [0.034097, 0.205538, 0.978055],    # x2
  "y_axis": [-0.90687, 0.417653, -0.056155],   # y2
  "z_axis": [-0.420029, -0.885054, 0.200637]   # z2
}

The base-pair reference frame

Since dot(z1, z2) = -0.92 (negative), this pair is of type M–N in 3DNA/DSSR. The y- and z-axis are thus reversed (corresponding to a 180º rotation around the x-axis) to align z2 with z1.

# base reference frame of A67, with y- and z-axes reversed
{
  "origin": [60.578326, 48.823104, 41.154211], # o2
  "x_axis": [0.034097, 0.205538, 0.978055],    # x2
  "y_axis": [0.90687, -0.417653, 0.056155],    # y2 -- reversed
  "z_axis": [0.420029, 0.885054, -0.200637]    # z2 -- reversed
}

Thereafter, the procedure is similar to the one for the M+N type above. Note here U6 is a pyrimidine, so its C6 atom is used. The final results are:

# C6 atom of U6 and C8 atom A67 in 1ehz
ATOM    132  C6    U A   6      64.926  46.497  41.084  1.00 35.72           C  
ATOM   1457  C8    A A  67      56.129  50.866  40.893  1.00 40.04           C  
#--------- 
y0 = [64.926  46.497  41.084] - [56.129  50.866  40.893]
   = [8.797  -4.369   0.191]
x = [0.160777, 0.363836, 0.917482]
y = [0.902274, -0.430972, 0.012793]
z = [0.400064, 0.825764, -0.397570]

The six ‘simple’ and original base-pair parameters

Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
     1 U-A       0.06     -0.13     -0.08     -0.59    -23.71      5.39   23.7
# ------------
Local base-pair parameters
     bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening
    1 U-A       0.06     -0.13     -0.08     -0.63    -23.71      5.50

As can be seen, for Watson-Crick pairs, the ‘simple’ and the original bp parameters are very similar.

Special notes on the ‘simple’ base-pair parameters

For the most common Watson-Crick pairs, the newly introduced ‘simple’ bp parameters match those of the original 3DNA/DSSR parameters very well (as shown by the U6–A67 pair). For non-canonical pairs, significant differences in Shear, Stretch, Buckle and Propeller are expected (as illustrated by the 2MG10+G45 pair). The differences come from the divergent definitions of the bp reference frame, which is distinct for each type of non-canonical pairs.
Only the original 3DNA/DSSR six bp parameters can be used for exact reconstruction (with the 3DNA rebuild program) of the corresponding bp geometry. The ‘simple’ bp parameters are for description only, and they could be more intuitive than the original 3DNA/DSSR counterparts. They complement, buy by no means replace, the classic “local” bp parameters. The term ‘simple’ is used to distinguish the new from the original closely related, yet quite different bp parameters.
As details for the 2MG10+G45 pair, several ad hoc decisions are made in deriving the ‘simple’ bp parameters. For example, instead of using RC8–YC6 to define the y-axis, one can also use RN9–YN1 (as did by Richardson). Each such choice will lead (slightly) different numerical values, depending on the type of the non-canonical pairs. In some cases, Buckle and Propeller could differ by several degrees. Since RC8 and YC6 atoms lie near the ‘center’ of purines and pyrimidines, they are used to define the y-axis (by default). DSSR has provisions of selecting RN9–YN1, as well as a couple of other choices, for the definition of the y-axis.
When the M+N pair is counted as N+M, Shear, Stretch, Buckle, and Propeller remain the same, but Stagger and Opening reverse their signs. For example, here are the results of 2MG10+G45 vs. G45+2MG10:

# 2MG10+G45
Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
*    1 G+g      -0.16      7.27      0.92    -14.69    -24.05    -13.32   28.2
# Reverse the order: treated as G45+2MG10
Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
*    1 g+G      -0.16      7.27     -0.92    -14.69    -24.05     13.32   28.2

When the M–N pair is counted as N–M, Stretch, Stagger, Propeller, and Opening remain the same, but Shear and Buckle reverse their signs. For example, here are the results of U6–A67 vs. A67–U6:

# U6–A67
Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
     1 U-A       0.06     -0.13     -0.08     -0.59    -23.71      5.39   23.7
# Reverse the order: treated as A67–U6
Simple base-pair parameters based on YC6-RC8 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening  angle
     1 A-U      -0.06     -0.13     -0.08      0.59    -23.71      5.39   23.7

Comment

Fitting of base reference frame

Once a nucleotide (nt) is identified, and matched to A (C, G, T, U) for the standard case or a (c, g, t, u) for a modified one, 3DNA/DSSR performs a least-squares fitting procedure to locate the base reference frame in three-dimensional space. The basic idea is very simple and widely applicable. The algorithm constitutes one of the key components of 3DNA/DSSR. As always, the details can be most effectively illustrated with a worked example. Using G1 in the yeast phenylalanine tRNA (PDB id: 1ehz) as an example, the atomic coordinates of its nine base-ring atoms are:

# G1, nine base-ring atoms for ls-fitting
ATOM     14  N9    G A   1      51.628  45.992  53.798  1.00 93.67           N  
ATOM     15  C8    G A   1      51.064  46.007  52.547  1.00 92.60           C  
ATOM     16  N7    G A   1      51.379  44.966  51.831  1.00 91.19           N  
ATOM     17  C5    G A   1      52.197  44.218  52.658  1.00 91.47           C  
ATOM     18  C6    G A   1      52.848  42.992  52.425  1.00 90.68           C  
ATOM     20  N1    G A   1      53.588  42.588  53.534  1.00 90.71           N  
ATOM     21  C2    G A   1      53.685  43.282  54.716  1.00 91.21           C  
ATOM     23  N3    G A   1      53.077  44.429  54.946  1.00 91.92           N  
ATOM     24  C4    G A   1      52.356  44.836  53.879  1.00 92.62           C

The corresponding nine base-ring atoms of G in its standard base reference frame are listed below. See Table 1 of the report A Standard Reference Frame for the Description of Nucleic Acid Base-pair Geometry, and file Atomic_G.pdb distributed with 3DNA ($X3DNA/config/Atomic_G.pdb). In DSSR, the content has been integrated into the source code to make the program self-contained.

# G in standard base reference frame
ATOM      2  N9    G A   1      -1.289   4.551   0.000
ATOM      3  C8    G A   1       0.023   4.962   0.000
ATOM      4  N7    G A   1       0.870   3.969   0.000
ATOM      5  C5    G A   1       0.071   2.833   0.000
ATOM      6  C6    G A   1       0.424   1.460   0.000
ATOM      8  N1    G A   1      -0.700   0.641   0.000
ATOM      9  C2    G A   1      -1.999   1.087   0.000
ATOM     11  N3    G A   1      -2.342   2.364   0.001
ATOM     12  C4    G A   1      -1.265   3.177   0.000

A least-squares fitting of the standard onto the experimental set of base-ring atoms defines the base reference frame (Fig. 1). The information is available via the following commands:

# find_pair -s 1ehz.pdb # in file 'ref_frames.dat'
...     1 G   # A:...1_:[..G]G
   53.7571    41.8678    52.9303  # origin
   -0.2589    -0.2496    -0.9331  # x-axis
   -0.5430     0.8365    -0.0731  # y-axis
    0.7988     0.4878    -0.3521  # z-axis
# --------
# x3dna-dssr -i=1ehz.pdb --json | jq .nts[0].frame
{
  rsmd: 0.008,
  origin: [53.757, 41.868, 52.93],
  x_axis: [-0.259, -0.25, -0.933],
  y_axis: [-0.543, 0.837, -0.073],
  z_axis: [0.799, 0.488, -0.352]
}

Fig. 1: G1 in tRNA 1ehz, with base reference frame attached

Please note the following subtle points:

The standard base (Atomic_G.pdb) is already set in its reference frame: the _z_-coordinates are virtually zeros, _y_-coordinates are positive, the atoms along the minor-groove edge have negative _x_-coordinates, as can be visualized clearly from the attached coordinate frame. In 3DNA, the five standard standard bases are in stored in files Atomic_[ACGTU].pdb, and the corresponding modified ones are in Atomic_[acgtu].pdb. For simplicity, Atomic_A.pdb and Atomic_a.pdb are the same by default, as are the other four cases.

The translation and rotation of the least-squares fitting process define the experimental base reference frame (for G1 in the above example), and its three axes are orthonormal by definition.

By design, the base rings of Atomic_A.pdb and Atomic_G.pdb match each other closely (see below), as are the pyrimidines bases. The least-square fitted root-mean-square deviation (rmsd) of the nine base-ring atoms between standard A and G is only 0.04 Å. Fitting the standard A (instead of G) onto G1 of 1ehz leads to a base reference frame that is essentially indistinguishable from the one above (see below). This feature shows that any ambiguity in assigning modified purines to A or G, or pyrimidines to C, T, or U causes no notable differences in 3DNA/DSSR results.

Comparison of base-ring atomic coordinates in standard G and A
          Atomic_G.pdb                         Atomic_A.pdb
N9 G   -1.289   4.551   0.000   |   N9  A   -1.291   4.498   0.000
C8 G    0.023   4.962   0.000   |   C8  A    0.024   4.897   0.000
N7 G    0.870   3.969   0.000   |   N7  A    0.877   3.902   0.000
C5 G    0.071   2.833   0.000   |   C5  A    0.071   2.771   0.000
C6 G    0.424   1.460   0.000   |   C6  A    0.369   1.398   0.000
N1 G   -0.700   0.641   0.000   |   N1  A   -0.668   0.532   0.000
C2 G   -1.999   1.087   0.000   |   C2  A   -1.912   1.023   0.000
N3 G   -2.342   2.364   0.001   |   N3  A   -2.320   2.290   0.000
C4 G   -1.265   3.177   0.000   |   C4  A   -1.267   3.124   0.000

Comparison of G1 (1ehz) base reference frame derived using standard G or A
             Atomic_G.pdb                |             Atomic_A.pdb
 53.7571    41.8678    52.9303  # origin | 53.7286    41.9276    52.9482  # origin
 -0.2589    -0.2496    -0.9331  # x-axis | -0.2562    -0.2540    -0.9327  # x-axis
 -0.5430     0.8365    -0.0731  # y-axis | -0.5444     0.8352    -0.0780  # y-axis
  0.7988     0.4878    -0.3521  # z-axis |  0.7988     0.4878    -0.3522  # z-axis

Automatic identification of nucleotides

Any analysis of nucleic acid structures start with the identification of nucleotides (nts), the basic building unit. As per the PDB convention, each nt (like any other ligands) is specified by a three-letter identifier. For example, the four standard RNA nts are ..A, ..C, ..G, and ..U, respectively. The four corresponding standard DNA nts are .DA, .DC, .DG, and .DT, respectively. Note that here, for visualization purpose, each space is represented by a dot (.). In practice, the following codes for the five standard DNA/RNA nts — ADE, CYT, GUA, THY, and URA — are also commonly encountered, among other variants.

On top of the standard nts, there are numerous modified ones, each assigned a unique three-letter code. In the classic yeast phenylalanine tRNA (PDB id: 1ehz), 14 out of the 76 nts are modified, as shown in Fig. 1 below.

Fig. 1: Modified nucleotides in yeast phenylalanine tRNA 1ehz

It is challenging to maintain a comprehensive and updated list of ever-inceasing nts encountered in the PDB and molecular dynamics (MD) simulation packages (e.g., AMBER, GROMACS, and CHARMM). Thus, as of today, some well-known DNA/RNA structural bioinformatics tools can handle only standard nts or a limited list of modified ones.

From early on in the development of 3DNA, I observed that all recognized nts have a core six-membered ring, with atoms named N1,C2,N3,C4,C5,C6 consecutively (see Fig. 2 below). Purines have three additional atoms, named N7,C8,N9. So it is feasible to automatically identify nts, and classify them as pyrimidines and purines, based on the common core skeleton shared by all of them. Moreover, the ‘skeleton’ is not effected by any possible tautomeric or protonation state.

Fig. 2: Identification of nts in 3DNA/DSSR based on atomic names and planar geometry

Early versions of 3DNA employed only three atoms (N1, C2 and C6) and three distances to decide a nt. Purines were further discriminated by the N9 atom, and the N1–N9 distance. While developing DSSR, I revised the nt-identification algorithm by using a least-squares fitting procedure that makes use of all available base ring atoms instead of selected ones. The same new algorithm has also been adapted into the find_pair/analyze etc programs in 3DNA, as of v2.2.

As always, the idea can be best illustrated with a worked example. Guanine in its standard base reference frame, with the following list of nine ring atoms coordinates, is chosen for the least-squares fitting. See file Atomic_G.pdb in the 3DNA distribution, and also Table 1 of the report A Standard Reference Frame for the Description of Nucleic Acid Base-pair Geometry.

ATOM      2  N9    G A   1      -1.289   4.551   0.000
ATOM      3  C8    G A   1       0.023   4.962   0.000
ATOM      4  N7    G A   1       0.870   3.969   0.000
ATOM      5  C5    G A   1       0.071   2.833   0.000
ATOM      6  C6    G A   1       0.424   1.460   0.000
ATOM      8  N1    G A   1      -0.700   0.641   0.000
ATOM      9  C2    G A   1      -1.999   1.087   0.000
ATOM     11  N3    G A   1      -2.342   2.364   0.001
ATOM     12  C4    G A   1      -1.265   3.177   0.000

By using a ls-fitting procedure, only (any) three atoms are needed. We no longer need to make explicit selection, as we did previously (N1,C2,C6 and N9), thus allowing for possible modification on these atoms.

Using four nts (G1, 2MG10, H2U16, and PSU39, see Fig. 1 above top) of 1ehz as examples, the following list gives the atomic coordinates of base ring atoms, and root-mean-squres devisions (rmsd) of the least-squares fit. Of course, when performing least-squares fitting, the names of corresponding atoms must match (note the different ordering of atoms for H2U and PSU in the list vs the above standard G reference).

#G1, rmsd=0.008
ATOM     14  N9    G A   1      51.628  45.992  53.798  1.00 93.67           N  
ATOM     15  C8    G A   1      51.064  46.007  52.547  1.00 92.60           C  
ATOM     16  N7    G A   1      51.379  44.966  51.831  1.00 91.19           N  
ATOM     17  C5    G A   1      52.197  44.218  52.658  1.00 91.47           C  
ATOM     18  C6    G A   1      52.848  42.992  52.425  1.00 90.68           C  
ATOM     20  N1    G A   1      53.588  42.588  53.534  1.00 90.71           N  
ATOM     21  C2    G A   1      53.685  43.282  54.716  1.00 91.21           C  
ATOM     23  N3    G A   1      53.077  44.429  54.946  1.00 91.92           N  
ATOM     24  C4    G A   1      52.356  44.836  53.879  1.00 92.62           C

#2MG10, rmsd=0.018
HETATM  207  N9  2MG A  10      61.581  47.402  18.752  1.00 42.14           N  
HETATM  208  C8  2MG A  10      62.199  48.621  18.635  1.00 40.38           C  
HETATM  209  N7  2MG A  10      63.494  48.534  18.422  1.00 40.70           N  
HETATM  210  C5  2MG A  10      63.745  47.167  18.395  1.00 43.82           C  
HETATM  211  C6  2MG A  10      64.965  46.449  18.205  1.00 43.45           C  
HETATM  213  N1  2MG A  10      64.767  45.086  18.293  1.00 44.71           N  
HETATM  214  C2  2MG A  10      63.541  44.482  18.486  1.00 47.21           C  
HETATM  217  N3  2MG A  10      62.411  45.125  18.614  1.00 45.85           N  
HETATM  218  C4  2MG A  10      62.574  46.451  18.582  1.00 43.27           C

#H2U16, rmsd=0.188
HETATM  336  N1  H2U A  16      77.347  53.323  34.582  1.00 91.19           N  
HETATM  337  C2  H2U A  16      76.119  52.865  34.160  1.00 92.39           C  
HETATM  339  N3  H2U A  16      75.123  52.894  35.107  1.00 93.28           N  
HETATM  340  C4  H2U A  16      75.289  52.711  36.458  1.00 93.34           C  
HETATM  342  C5  H2U A  16      76.696  52.479  36.909  1.00 93.77           C  
HETATM  343  C6  H2U A  16      77.717  53.238  36.039  1.00 93.22           C

#PSU39, rmsd=0.004
HETATM  845  N1  PSU A  39      74.080  36.066   5.459  1.00 75.82           N  
HETATM  846  C2  PSU A  39      74.415  36.835   4.354  1.00 75.59           C  
HETATM  847  N3  PSU A  39      75.735  36.769   3.984  1.00 76.29           N  
HETATM  848  C4  PSU A  39      76.728  36.038   4.591  1.00 77.28           C  
HETATM  849  C5  PSU A  39      76.307  35.280   5.732  1.00 77.93           C  
HETATM  850  C6  PSU A  39      75.025  35.316   6.112  1.00 76.07           C

As noted in the DSSR paper, the rmsd is normally <0.1 Å since base rings are rigid. To account for experimental error and special non-planar cases, such as H2U in 1ehz, the default rmsd cutoff is set to 0.28 Å by default.

With the above detailed algorithm, DSSR (and the 3DNA find_pair/analyze programs) can automatically identify virtually all ‘recognizable’ nts in the PDB. A survey performed in June 2015 detected 630 different types of modified nucleotides in the PDB.

It is worth noting the following points:

The choice of standard G instead of A as the reference base has no impact on the results. As a matter of fact, the rmsd between G and A is only 0.04 Å. Note also the generous default cutoff of 0.28 Å.
The method obviously depends on proper naming of the ring atoms. Specially, the base ring atoms must be named N1,C2,N3,C4,C5,C6 consecutively, with purines having three additional atoms named N7,C8,N9. Thus, under this scheme, TPP (thiamine diphosphate) would not be recognized as a nt by default, simply because of the extra prime (′) of atoms in the six-membered ring. In nucleic acid structures, the prime symbol is normally associated with atoms of the sugar moiety (e.g., the C5′ atom).

Fig. 3: TPP (thiamine diphosphate) would not be recognized as a nt.

On the other hand, nt cofactors in an otherwise ‘pure’ protein structure will also be recognized. One example is the two AMP (adenosine monophosphate) ligands in PDB entry 12as. This extra identification of nts does no harm in such cases. As shown in the analysis of the SAM-I riboswitch in the DSSR paper, taking the SAM ligand as a nt in base triplet recognition is a neat feature.

Once a nucleotide has been identified and classified into purines and pyrimidines, exocyclic atoms can be used for further assignment: O6 or N2 distinguishes guanine from adenine, N4 separates cytosine from thymine and uracil, and C7 (or C5M, the methyl group) differentiates thymine from uracil. For some modified nts, the distinctions within purines or pyrimidines may not be that obvious. For example, inosine may be taken as a modified guanine or adenine. However, this ambiguity does not pose any significant effect on the calculated base-pair parameters.

In DSSR and 3DNA, each identified nt is assigned a one-letter shorthand code: the standard ..A, .DA, and ADE (among a few other common variations) is shortened to upper-case A, and similarly for C, G, T, and U. Modified nts, on the other hand, are shortened to their corresponding lower-case symbol. For example, modified guanine such as 2MG and M2G in the yeast phenylalanine tRNA (see Fig. 1 above) is assigned g. So in 3DNA/DSSR output, the upper and lower cases of bases (e.g., nts=3 gCG A.2MG10,A.C25,A.G45) convey special meanings.

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)

References

Cartoon-block representation of quadruplex-duplex interface

Integrating DSSR into Jmol and PyMOL

Key features of DSSR

DSSR and Jmol

DSSR and PyMOL

Notes

Characterization of base-pair geometry

Six rigid-body parameters

Cartoon-block representations

Scripts and data files (Lu-CCN-examples.tar.gz)

Related posts

DSSR base blocks in PyMOL, interactively

Notes

3DNA Forum is spam free

State the rules clearly in the “Registration Agreement”

Take advantage of anti-spam software

Verify and approve ‘suspect’ accounts quickly

Ask reproducible questions, publicly

Details on the simple base-pair parameters

Pair 2MG10+G45 (g+G, of type `M+N`, see Fig. 1)

Base reference frames

The base-pair reference frame

Derivation of the six ‘simple’ base-pair parameters (Fig. 2)

‘Corrections’ of Buckle and Propeller

Base-pair U6–A67 (Watson-Crick U–A, of type `M–N`, see Fig. 3)

Base reference frames

The base-pair reference frame

The six ‘simple’ and original base-pair parameters

Special notes on the ‘simple’ base-pair parameters

Related posts

Fitting of base reference frame

Please note the following subtle points:

Related topics:

Automatic identification of nucleotides

It is worth noting the following points:

Related topics:

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

References

Key features of DSSR

DSSR and Jmol

DSSR and PyMOL

Notes

Six rigid-body parameters

Cartoon-block representations

Scripts and data files (Lu-CCN-examples.tar.gz)

Related posts

Notes

State the rules clearly in the “Registration Agreement”

Take advantage of anti-spam software

Verify and approve ‘suspect’ accounts quickly

Pair 2MG10+G45 (g+G, of type M+N, see Fig. 1)

Base reference frames

The base-pair reference frame

Derivation of the six ‘simple’ base-pair parameters (Fig. 2)

‘Corrections’ of Buckle and Propeller

Base-pair U6–A67 (Watson-Crick U–A, of type M–N, see Fig. 3)

Base reference frames

The base-pair reference frame

The six ‘simple’ and original base-pair parameters

Special notes on the ‘simple’ base-pair parameters

Related posts

Please note the following subtle points:

Related topics:

It is worth noting the following points:

Related topics:

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)

Pair 2MG10+G45 (g+G, of type `M+N`, see Fig. 1)

Base-pair U6–A67 (Watson-Crick U–A, of type `M–N`, see Fig. 3)