X3DNA-DSSR Homepage -- Nucleic Acid Structures

Cover images provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

See the 2020 paper titled "DSSR-enabled innovative schematics of 3D nucleic acid structures with PyMOL" in Nucleic Acids Research and the corresponding Supplemental PDF for details. Many thanks to Drs. Wilma Olson and Cathy Lawson for their help in the preparation of the illustrations.

Details on how to reproduce the cover images are available on the 3DNA Forum.

November 2025

Structure of the human minor spliceosome pre-B complex (PDB id: 8Y7E; Bai R, Yuan M, Zhang P, Luo T, Shi Y, Wan R. 2024. Structural basis of U12-type intron engagement by the fully assembled human minor spliceosome. Science 383: 1245–1252). The protein–RNA assembly reveals the mechanisms of recognition and recruitment of several small nuclear ribonucleoproteins (snRNPs) involved in the splicing of U12-type introns. The pre-mRNA is depicted by a red ribbon, and the U12 small nuclear RNA (snRNA) by a green ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the proteins are shown as gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

October 2025

Human tRNA splicing endonuclease (TSEN) complex bound to pre-tRNAArg (PDB id: 7UXA; Hayne CK, Butay KJ, Stewart ZD, Krahn JM, Perera L, Williams JG, Petrovitch RM, Deterding LJ, Matera AG, Borgnia MJ, Stanley RE. 2023. Structural basis for pre-tRNA recognition and processing by the human tRNA splicing endonuclease complex. Nat Struct Mol Biol 30: 824–833). Cryo-EM structure of the TSEN protein assembly with pre-tRNAArg provides insights into the recognition and splicing of an intron that must be removed from the pre-tRNA before translation. The pre-tRNAArg is depicted by a red ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the TSEN subunits are shown as gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

September 2025

Systemic RNA interference defective protein 1 (SID1) in complex with dsRNA (PDB id: 8XC1; Wang R, Cong Y, Qian D, Yan C, Gong D. 2024. Structural basis for double-stranded RNA recognition by SID1. Nucleic Acids Res 52: 6718–6727). The cryo-EM structure provides a major step towards understanding the mechanism of dsRNA recognition by SID1, involving extensive interactions between basic amino-acid residues and the sugar-phosphate backbone. The dsRNA chains are depicted by red, green, blue, and yellow ribbons, with bases and Watson-Crick base pairs represented as color-coded blocks and minor-groove edges colored white: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; SID1 is shown by a gold ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

August 2025

Complex of arginyl-tRNA-protein transferase 1 (ATE1) with tRNA^Arg and a short peptide substrate (PDB id: 8UAU; Lan X, Huang W, Kim SB, Fu D, Abeywansha T, Lou J, Balamurugan U, Kwon YT, Ji CH, Taylor DJ, Zhang Y. 2024. Oligomerization and a distinct tRNA-binding loop are important regulators of human arginyl-transferase function. Nat Commun 15: 6350). The ATE1 homodimer dissociates upon binding the peptide and forms a loop that wraps around tRNA^Arg. The tRNA^Arg is depicted by a red ribbon, with bases and Watson–Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; ATE1 is shown by a gold ribbon and the peptide by a white ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

July 2025

Structure of endoribonuclease P (RNase P) in complex with pre-tRNA^His-Ser (PDB id: 8CBK; Meynier V, Hardwick SW, Catala M, Roske JJ, Oerum S, Chirgadze DY, Barraud P, Yue WW, Luisi BF, Tisné C. 2024. Structural basis for human mitochondrial tRNA maturation. Nat Commun 15: 4683). The structure reveals the first step of human mitochondrial tRNA maturation by RNase P, processing the 5′-leader of pre-tRNA. The RNA is depicted by a red ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the protein assembly is shown by the gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

June 2025

Structure of a group II intron ribonucleoprotein in the pre-ligation state (PDB id: 8T2R; Xu L, Liu T, Chung K, Pyle AM. 2023. Structural insights into intron catalysis and dynamics during splicing. Nature 624: 682–688). The pre-ligation complex of the Agathobacter rectalis group II intron reverse transcriptase/maturase with intron and 5′-exon RNAs makes it possible to construct a picture of the splicing active site. The intron is depicted by a green ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the 5′-exon is shown by white spheres and the protein by a gold ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

May 2025

Complex of terminal uridylyltransferase 7 (TUT7) with pre-miRNA and Lin28A (PDB id: 8OPT; Yi G, Ye M, Carrique L, El-Sagheer A, Brown T, Norbury CJ, Zhang P, Gilbert RJ. 2024. Structural basis for activity switching in polymerases determining the fate of let-7 pre-miRNAs. Nat Struct Mol Biol 31: 1426–1438). The RNA-binding pluripotency factor LIN28A invades and melts the RNA and affects the mechanism of action of the TUT7 enzyme. The RNA backbone is depicted by a red ribbon, with bases and Watson-Crick base pairs represented as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; TUT7 is represented by a gold ribbon and LIN28A by a white ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

April 2025

Cryo-EM structure of the pre-B complex (PDB id: 8QP8; Zhang Z, Kumar V, Dybkov O, Will CL, Zhong J, Ludwig SE, Urlaub H, Kastner B, Stark H, Lührmann R. 2024. Structural insights into the cross-exon to cross-intron spliceosome switch. Nature 630: 1012–1019). The pre-B complex is thought to be critical in the regulation of splicing reactions. Its structure suggests how the cross-exon and cross-intron spliceosome assembly pathways converge. The U4, U5, and U6 snRNA backbones are depicted respectively by blue, green, and red ribbons, with bases and Watson-Crick base pairs shown as color-coded blocks: A/A-U in red, C/C-G in yellow, G/G-C in green, U/U-A in cyan; the proteins are represented by gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

February 2025

Structure of the Hendra henipavirus (HeV) nucleoprotein (N) protein-RNA double-ring assembly (PDB id: 8C4H; Passchier TC, White JB, Maskell DP, Byrne MJ, Ranson NA, Edwards TA, Barr JN. 2024. The cryoEM structure of the Hendra henipavirus nucleoprotein reveals insights into paramyxoviral nucleocapsid architectures. Sci Rep 14: 14099). The HeV N protein adopts a bi-lobed fold, where the N- and C-terminal globular domains are bisected by an RNA binding cleft. Neighboring N proteins assemble laterally and completely encapsidate the viral genomic and antigenomic RNAs. The two RNAs are depicted by green and red ribbons. The U bases of the poly(U) model are shown as cyan blocks. Proteins are represented as semitransparent gold ribbons. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

January 2025

Structure of the helicase and C-terminal domains of Dicer-related helicase-1 (DRH-1) bound to dsRNA (PDB id: 8T5S; Consalvo CD, Aderounmu AM, Donelick HM, Aruscavage PJ, Eckert DM, Shen PS, Bass BL. 2024. Caenorhabditis elegans Dicer acts with the RIG-I-like helicase DRH-1 and RDE-4 to cleave dsRNA. eLife 13: RP93979. Cryo-EM structures of Dicer-1 in complex with DRH-1, RNAi deficient-4 (RDE-4), and dsRNA provide mechanistic insights into how these three proteins cooperate in antiviral defense. The dsRNA backbone is depicted by green and red ribbons. The U-A pairs of the poly(A)·poly(U) model are shown as long rectangular cyan blocks, with minor-groove edges colored white. The ADP ligand is represented by a red block and the protein by a gold ribbon. Cover image provided by X3DNA-DSSR, an NIGMS National Resource for structural bioinformatics of nucleic acids (R24GM153869; skmatics.x3dna.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

Moreover, the following 30 [12(2021) + 12(2022) + 6(2023)] cover images of the RNA Journal were generated by the NAKB (nakb.org).

Cover image provided by the Nucleic Acid Database (NDB)/Nucleic Acid Knowledgebase (NAKB; nakb.org). Image generated using DSSR and PyMOL (Lu XJ. 2020. Nucleic Acids Res 48: e74).

Jmol and DSSR

From the Jmol mailing list, I noticed Jmol 14.4.0 was released yesterday (October 13, 2015) by Dr. Bob Hanson. Among the development highlights is the following item:

biomolecule annotations including DSSR, RNA3D, EBI sequence domains, and PDB validation data

I am glad to see that DSSR has been integrated into Jmol, one of the most popular molecular graphics visualization programs. To enable easy access to the DSSR functionality from Jmol, I’ve set up two websites with easy-to-remember URLs: http://jmol.x3dna.org and http://jsmol.x3dna.org. They both point to the same jsmol/ folder extracted from jsmol.zip of the Jmol distribution.

In retrospect, I first met Bob at the Workshop on the PDBx/mmCIF Data Exchange Format for Structural Biology held at Rutgers University during October 21-22, 2013. I approached him during a lunch break, asking for a possible collaboration on integrating DSSR into Jmol. The name DSSR may have played a role in convincing Bob, since it matches the well-known DSSP program for proteins. In the end, we were both excited about the project, talked into details after the meeting, and continued our conversation the next morning while I drove him to the airport.

Nothing real happened until early April 2014. Once getting started, however, we moved forward rapidly: it took less then three weeks to get the first functional version ready for the community to play. See Bob’s announcement RNA/DNA Secondary Structure, anyone? in the Jmol mailing list on April 9, 2014. During this process, we communicated extensively via email, up to 30 messages per day, on technical details for better communication between the two programs. The integration works by using Jmol as a front-end, which calls a web-serivce hosted at Columbia University for DSSR analysis. Jmol’s parsing of the DSSR output is facilitated by the dedicated --jmol option.

The above preliminary, yet functional, DSSR-Jmol integration had be in service without infrastructural changes until two months ago. In August 10, 2015, Bob contacted me:

I might make a significant request though. That would be for the server to deliver all this in JSON format. This is really the way to go. It is what people want and it is perfect for Jmol as well.

I’d played around with JSON or SQLite as a structured data exchange format for quite some time, and Bob’s request finally convinced me that JSON is the (better) way to go. And that began another around of intensive collaborative work that has switched the exchange format between DSSR and Jmol from plain text output to JSON. From August 10 to September 22, we had a total of over 170-email exchanges, plus Skype. JSON has really simplified lives of both parties, especially in the long run.

Overall, collaborating with Bob has been truly an enjoyable and rewarding experience. The DSSR-Jmol integration also serves as a concrete example of what can be achieved by two dedicated minds with complementary expertise.

Comment

Analyzing DNA/RNA structures with Curves+ and 3DNA

Curves+ and 3DNA are currently the most widely used programs for analyzing nucleic acid structures (predominantly double helices). As noted in my blog post, Curves+ vs 3DNA, these two programs also complement each other in terms of features. It thus makes sense to run both to get a better understanding of the DNA/RNA structures one is interested in.

Indeed, over the past few years, I have seen quite a few articles citing both 3DNA and Curves+. Listed below are three recent examples:

Structures and energetics of four adjacent G·U pairs that stabilize an RNA helix by Gu et al. (2015) in J Phys Chem B.

The helical parameters were measured with 3DNA³³ and Curves+.³⁴ The local helical parameters are defined with regard to base steps and without regard to a global axis.

5-Formylcytosine alters the structure of the DNA double helix by Raiber et al. (2015) in Nat Struct Mol Biol.

Structure analysis. Helix, base and base pair parameters were calculated with 3DNA or curve+ software packages^23,24.

Structural insights into the effects of 2′-5′ linkages on the RNA duplex by Sheng et al. (2014) in Proc Natl Acad Sci USA.

The major global difference between the native and mixed backbone structures is that the RNA backbone is compressed or kinked in strands containing the modified linkage (Fig. 3 B and C, by CURVES) (30). … To compare the three RNA structures at a more detailed and local level, we calculated the base pair helical and step parameters for all three structures using the 3DNA software tools (31) (Fig. 4 and Table S2). [In the Results section]

For each snapshot, the structural parameters—including six base pair parameters, six local base pair step parameters, and pseudorotation angles for each nucleotide—were calculated using 3DNA (31). The two terminal base pairs are omitted for the 3DNA analysis, because they unwind frequently in the triple 2′-5′-linked duplex. [In the Materials and Methods section]

Reading through these papers, however, it is not clear to me if the authors took advantage of the find_pair -curves+ option in 3DNA, as detailed in Building a bridge between Curves+ and 3DNA. Hopefully, this post will help draw more attention to this connection between Curves+ and 3DNA.

Comment

DSSR --symmetry/--nmr options and MODEL/ENDMDL ensemble

Over the past couple of weeks, I’ve added two more DSSR options, --symmetry and --nmr, that are closely related to an ensemble of MODEL/ENDMDL-delineated structures in PDB files. However, there exist subtle differences between the two cases, and the usage of the same MODEL/ENDMDL ensemble format can be ambiguous to the uninitiated. This blog post aims to clarify the issues, using concrete examples.

The --symmetry options applies to X-ray crystal structures where an asymmetric unit represents only part of the whole biological assembly. In standard PDB format, the asymmetric unit contains instructions to produce crystallographic symmetry
related molecules.. Nevertheless, the biological assembly are also provided by the PDB (or NDB), with coordinate files ending with .pdb1 or such. For example, the PDB entry 2d94 has the single-stranded sequence GGGCGCCC in its asymmetric unit (2d94.pdb). It is the biological assembly in file 2d94.pdb1 that contains the DNA double helix.

x3dna-dssr -i=2d94.pdb # no pairs found
x3dna-dssr -i=2d94.pdb1 # still no pairs found
x3dna-dssr -i=2d94.pdb1 --symm # 8 pairs found
x3dna-dssr -i=2d94.pdb --symm # no pairs found

As shown by the above examples, DSSR by default reads only the first model even given the biological assemble file 2d94.pdb1. It is with --symmetry (abbreviated to --symm) explicitly specified that DSSR takes all models in the input biological assemble file into consideration. The last case also illustrates that DSSR does not generate crystallographic symmetry related molecules. The --symm simply informs DSSR to take all models, which already exist in the input file, into consideration.

On the other hand, the --nmr option is for auto-processing an ensemble of structures solved by solution NMR method (or trajectories of molecular dynamics simulations). The key point here is that each of the MODEL/ENDMDL-delinated structures is independent and thus can be processed separately, even though they are obviously closely related. Using the PDB entry 2n2d as an example, here are some sample usages:

x3dna-dssr -i=2n2d.pdb -o= 2n2d-first.out # only the first structure is processed
x3dna-dssr -i=2n2d.pdb --nmr -o=2n2d-all.out # all 10 structures are processed
x3dna-dssr -i=2n2d.pdb --nmr --json -o=2n2d-all.json # ibid., with output in JSON

Note that the NMR file is named 2n2d.pdb, and it contains 10 structures.

Interesting mixes show up when an X-ray biological assembly with multiple MODEL/ENDMDL entries is analyzed with --nmr, or an NMR entry is handled with --symmetry. Here are two such examples:

x3dna-dssr -i=2d94.pdb1 --nmr -o=temp # models 1 and 2 are handled sepatately
x3dna-dssr -i=2n2d.pdb --symm -o=temp # wrong -- does not make sense!

In summary, the --symmetry option is intended to treat symmetry-related molecules as a whole, as in a biological assembly of X-ray crystal structures. In contrast, the --nmr option aims to automate the analysis of each structure in a MODEL/ENDMDL-delineated ensemble, as in NMR structures or trajectories of MD simulations. The distinction between the two MODEL/ENDMDL usages is most clearly seen via a molecular visualization program: for example, check the figure below for 2d94.pdb1 (left) and 2n2d.pdb (right) when all frames are selected using Jmol.

2d94 (2 models)	2n2d (10 models)

Comment

Parsing DSSR json output

JSON (JavaScript Object Notation) is a simple human-readable format that expresses data objects in name-value pairs. Over the years, it has surpassed XML to become the preferred data exchange format between applications. As a result, I’ve recently added the --json command-line option to DSSR to make its numerous derived parameters easily accessible.

The DSSR JSON output is contained in a compact one-line text string that may look cryptic to the uninitiated. Yet, with commonly available JSON parsers or libraries, it is straightforward to make sense of the DSSR JSON output. In this blogpost, I am illustrating how to parse DSSR-derived .json file via two command-line tools, jq and Underscore-CLI.

jq — lightweight and flexible command-line JSON processor

According to its website,

jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

Moreover, like DSSR per se, “jq is written in portable C, and it has zero runtime dependencies.” Prebuilt binaries are available for Linux, OS X and Windows. So it is trivial to get jq up and running. The current stable version is 1.5, released on August 15, 2015.

Using the crystal structure of yeast phenylalanine tRNA (1ehz) as an example, here are some sample usages with DSSR-derived JSON output:

    # Pretty print JSON
x3dna-dssr -i=1ehz.pdb --json | jq .
    # Extract the top-level keys, in insertion order 
x3dna-dssr -i=1ehz.pdb --json | jq keys_unsorted
    # Extract parameters for nucleotides
x3dna-dssr -i=1ehz.pdb --json | jq .nts
    # Extract nucleotide id and its base reference frame
x3dna-dssr -i=1ehz.pdb --json | jq '.nts[] | (.nt_id, .frame)'

Underscore-CLI — command-line utility-belt for hacking JSON and Javascript.

Underscore-CLI is built upon Node.js, and can be installed using the npm package manager. It is claimed as ‘the “swiss-army-knife” tool for processing JSON data – can be used as a simple pretty-printer, or as a full-powered Javascript command-line.’

Following the above examples illustrating jq, here are the corresponding commands for Underscore-CLI:

x3dna-dssr -i=1ehz.pdb --json | underscore print --color
x3dna-dssr -i=1ehz.pdb --json | underscore keys --color
x3dna-dssr -i=1ehz.pdb --json | underscore select .nts --color
x3dna-dssr -i=1ehz.pdb --json | underscore select .nts | underscore select '.nt_id, .frame' --color

jq or Underscore-CLI — which one to use?

As always, it depends. While jq feels more like a standard Unix utility (as sed, awk, grep etc), Underscore-CLI is better integrated into the Javascript language. For simple applications such as parsing DSSR output, either jq or Underscore-CLI is more than sufficient.

I use jq most of the time, but resort to Underscore-CLI for its “smart whitespace”. Here is an example to illustrate the difference between the two:

# z-axis of A.G1 (1ehz) base reference frame
# jq output, split in 5 lines
    "z_axis": [
      0.799,
      0.488,
      -0.352
    ]
# Underscore-CLI, in a more-readable one line
    "z_axis": [0.799, 0.488, -0.352]

Comment

Simple base-pair parameters

Recently, I read with great interest an article titled A context-sensitive guide to RNA & DNA base-pair & base-stack geometry by Dr. Jane Richardson, published in CCN (Computational Crystallography Newsletter, 2015, 5, 42—49). Highlighted in the article are Buckle and Propeller twist (see bottom left of the figure below), two of the angular parameters that characterize base-pair (bp) non-planarity. Particularly, I was intrigued by the “Notes on measures and figures” at the end:

Base normals were constructed in Mage (Richardson 2001) and twist torsions and buckle angles were measured from them; propeller-twists were measured as dihedral angles around an axis between N1/9 atoms.

The Richardson CCN article prompted me to think more on intuitive description of bp geometry that can be easily grasped by experimentalist, especially X-ray crystallographers or cryo-EM practitioners. Without worrying about model building as with the six rigid-body parameters, it is straightforward to come up with a new set of four ‘simple’ parameters (Shear, Stretch, Buckle and Propeller) with the following characteristics:

Each parameter can be positive or negative. For type M–N pairs (as in the canonical cases), Shear and Buckle reverse their signs when the two bases are swapped (i.e. counted as N–M instead of M–N). In all other cases, the signs of the parameters remain unchanged. See the DSSR paper for the definition of M+N vs M–N type of pairs.
Intuitive results for non-canonical pairs, even when Opening is ~180º.
Consistent definition between Shear/Buckle (_x_-axis) vs Stretch/Propeller (_y_-axis).
As in 3DNA and DSSR, Buckle^2 + Propeller^2 = interBase_angle^2. Either Buckle or Propeller can render the two base planes of a pair non-parallel. Combined together, they introduce a non-zero inter-base angle. By definition, each parameter should not be larger than the overall inter-base angle.

With the cartoon-block representation introduced in DSSR, base-stacking interactions and bp deformations (especially Buckle and Propeller) are immediately obvious. Two example are illustrated in the figure below: one is the classic Dickerson B-DNA dodecamer (355d, DSSR output), and the other is the parallel double-stranded helix of poly(A) RNA (4jrd, DSSR output).


DSSR Output for 355d	DSSR Output for 4jrd

A portion of DSSR output for the B-DNA duplex 355d is shown below. Note that the first bp (at the bottom left in the figure above) has a Propeller of –17º (and a Buckle of +7º). As beautifully explained by Calladine et al. in their book Understanding DNA,
The Molecule & How It Works, Watson-Crick pairs prefer to have negative Propeller in right-handed DNA double helices to improve same-strand base-stacking interactions. The average value of Propeller in A- and B-DNA crystal structures is around –11º (see Table 3 of the Olson et al. standard base reference frame paper).

     nt1            nt2           bp  name        Saenger    LW  DSSR
   1 A.DC1          B.DG24         C-G WC          19-XIX    cWW  cW-W
       [-105.9(anti) ~C2'-endo lambda=53.5] [-141.3(anti) ~C3'-endo lambda=52.7]
       d(C1'-C1')=10.71 d(N1-N9)=8.96 d(C6-C8)=9.88 tor(C1'-N1-N9-C1')=-21.4
       H-bonds[3]: "O2(carbonyl)-N2(amino)[2.83],N3-N1(imino)[2.90],N4(amino)-O6(carbonyl)[2.98]"
       interBase-angle=19  Simple-bpParams: Shear=0.28 Stretch=-0.13 Buckle=7.3 Propeller=-17.2
       bp-pars: [0.28    -0.14   0.07    6.93    -17.31  -0.61]
   2 A.DG2          B.DC23         G-C WC          19-XIX    cWW  cW-W
       [-85.4(anti) ~C2'-endo lambda=53.4] [-150.3(anti) ~C3'-endo lambda=55.4]
       d(C1'-C1')=10.61 d(N1-N9)=8.92 d(C6-C8)=9.83 tor(C1'-N1-N9-C1')=-21.7
       H-bonds[3]: "O6(carbonyl)-N4(amino)[2.91],N1(imino)-N3[2.88],N2(amino)-O2(carbonyl)[2.88]"
       interBase-angle=17  Simple-bpParams: Shear=-0.24 Stretch=-0.18 Buckle=9.0 Propeller=-14.5
       bp-pars: [-0.24   -0.18   0.49    9.34    -14.30  -2.08]

A portion of DSSR output for the parallel A-DNA duplex 4jrd is shown below. Note that the values of ‘simple’ Propeller are positive for both bps #7 and #8. In contrast, the rigid-body bp parameters have their signs flipped over when Opening is switched from –179.56º for bp#7 to +179.23º for bp#8. This sign ‘ambiguity’ around 180º Opening could be confusing. Yet, all the six bp parameters must be kept as they are for rigorous rebuilding, especially within a larger context than a bp per se. From the very beginning, 3DNA has adopted the convention of keeping angular parameters in the range of [–180º, +180º] instead of [0, 360º], allowing left-handed Z-DNA to have negative twist.

   7 A.A8           B.A7           A+A --          02-II     tHH  tM+M
       [-175.8(anti) ~C3'-endo lambda=10.2] [-172.7(anti) ~C3'-endo lambda=12.6]
       d(C1'-C1')=11.15 d(N1-N9)=8.29 d(C6-C8)=6.31 tor(C1'-N1-N9-C1')=160.1
       H-bonds[4]: "OP2-N6(amino)[2.97],N7-N6(amino)[2.97],N6(amino)-OP2[2.92],N6(amino)-N7[2.91]"
       interBase-angle=14  Simple-bpParams: Shear=-7.88 Stretch=0.66 Buckle=-7.8 Propeller=11.9
       bp-pars: [-6.00   5.15    -0.02   0.63    14.22   -179.56]
   8 A.A9           B.A8           A+A --          02-II     tHH  tM+M
       [-177.4(anti) ~C3'-endo lambda=12.4] [-175.8(anti) ~C3'-endo lambda=10.3]
       d(C1'-C1')=11.01 d(N1-N9)=8.15 d(C6-C8)=6.18 tor(C1'-N1-N9-C1')=158.5
       H-bonds[4]: "OP2-N6(amino)[2.93],N7-N6(amino)[2.88],N6(amino)-OP2[2.97],N6(amino)-N7[2.92]"
       interBase-angle=15  Simple-bpParams: Shear=-7.91 Stretch=0.56 Buckle=-7.0 Propeller=13.7
       bp-pars: [6.11    -5.06   -0.05   -2.26   -15.22  179.23]

Comment

Updates on x3dna.org

From early on, the x3dna.org domain and its related sub-domains (e.g., for the forum and the web-interface to DSSR) has been served via shared hosting. By and large, this simple arrangement has worked quite well. Over the years, though, I’ve gradually realized some of its inherent limitations. One is the limited resources available to the 3DNA-related websites. Another is the accessibility issue from countries like China.

To remedy such issues, I’ve recently moved the 3DNA Forum and the web-interface to DSSR to a dedicated web server at Columbia University. Moreover, a duplicate copy of the 3DNA homepage is made available via http://home.x3dna.org hosted at Columbia. The three new websites have been verified to be accessible directly from China.

These updates on x3dna.org not only ensure global accessibility to 3DNA/DSSR, but also allow for more web services to be made available.

Comment

Output of reference frames in DSSR JSON output

As of v1.3.3-2015sep03, DSSR outputs the reference frame of any base or base-pair (bp). With an explicit list of such reference frames, one can better understand how the 3DNA/DSSR bp parameters are calculated. Moreover, third-party bioinformatics tools can take advantage of the frames for further exploration of nucleic acid structures, including visualization.

Let’s use the G1–C72 bp (detailed below) in the yeast phenylalanine tRNA (1ehz) as an example:

1 A.G1           A.C72          G-C WC          19-XIX    cWW  cW-W

The standard base reference frame for A.G1 is:

{
  rsmd: 0.008,
  origin: [53.757, 41.868, 52.93],
  x_axis: [-0.259, -0.25, -0.933],
  y_axis: [-0.543, 0.837, -0.073],
  z_axis: [0.799, 0.488, -0.352]
}

And the one for A.C72 is:

{
  rsmd: 0.006,
  origin: [53.779, 42.132, 52.224],
  x_axis: [-0.402, -0.311, -0.861],
  y_axis: [0.451, -0.886, 0.109],
  z_axis: [-0.797, -0.345, 0.497]
}

The G1–C72 bp reference frame is:

{
  rsmd: null,
  origin: [53.768, 42, 52.577],
  x_axis: [-0.331, -0.283, -0.9],
  y_axis: [-0.497, 0.863, -0.089],
  z_axis: [0.802, 0.418, -0.427]
}

The beauty of the DSSR JSON output is that the above information can be extracted on the fly. For example, the following commands extract the above frames:

x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.G1") | .frame'
x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.C72") | .frame'
x3dna-dssr -i=1ehz.pdb --json --more | jq .pairs[0].frame

Note that in JSON, the array is 0-indexed, so the first bp (G1–C72) has an index of 0. In addition to jq, I also used underscore to pretty-print the frames.

Comment

Quantifying base-pair geometry by six rigid-body parameters

Standard nitrogenous bases in DNA and RNA (A, C, G, T, and U) are aromatic compounds, each with a planar geometry. In the analyses of three-dimensional (3D) nucleic acid structures, the planar bases are normally taken as rigid bodies. The relative geometry of the two bases in base pair (bp) can then be rigorously quantified by six rigid-body parameters (see figure below). The three translations along the _x_-, _y_- and _z_-axes are termed Shear, Stretch, and Stagger, respectively. The three corresponding rotations are called Buckle, Propeller (twist), and Opening.

3DNA is unique with its coupled analyze and rebuild programs. The former calculates six bp parameters given 3D atomic coordinates (in PDB or PDBx/mmCIF format), while the later takes a set of such parameters to generate the corresponding structure. The rigor of the description can be easily verified in two equivalent ways: the close to zero root-mean-square deviation (RMSD) between the rebuilt structure and the original coordinates, after a least-squares superposition; or the identical six bp parameters when the rebuilt structure is analyzed.

As is often the case, a concrete example would make the point clear. Here I am using the reverse Hoogsteen (rHoogsteen) bp between U8 and A14 (see image below) in the yeast phenylalanine tRNA (1ehz) as an example. The PDB atomic coordinates of the U8–A14 rHoogsteen pair, excluding backbone atoms except for C1′, is stored in file 1ehz-U8-A14.pdb.

find_pair 1ehz-U8-A14.pdb stdout | analyze stdin
    # bp parameters in file '1ehz-U8-A14.out'
    # also generated 'bp_step.par' for rebuilding below
rebuild -atomic bp_step.par 1ehz-U8-A14-3DNA.pdb
    # rmsd is 0.044 Å between '1ehz-U8-A14.pdb' and '1ehz-U8-A14-3DNA.pdb'
find_pair 1ehz-U8-A14-3DNA.pdb stdout | analyze stdin
    # bp parameters of the rebuilt structure in '1ehz-U8-A14-3DNA.out'
rebuild -atomic bp_step.par 1ehz-U8-A14-3DNA-new.pdb
    # rmsd is 0 Å between '1ehz-U8-A14-3DNA.pdb' and '1ehz-U8-A14-3DNA-new.pdb'

Note that the above commands should be performed in order, since the file bp_step.par is overwritten after each analyze run. For your verification, here are the links to the five files:

The 0.044 Å rmsd between the original PDB coordinates in 1ehz-U8-A14.pdb and the 3DNA rebuilt structure in 1ehz-U8-A14-3DNA.pdb is due to the slight non-planarity of experimental bases. The rmsd is 0 between the two rounds of 3DNA rebuilt structures, 1ehz-U8-A14-3DNA.pdb and 1ehz-U8-A14-3DNA-new.pdb, as expected.

The bp parameters in 1ehz-U8-A14.out and 1ehz-U8-A14-3DNA.out are identical, as expected, and they are shown below.

Local base-pair parameters
     bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening
    1 U-A       4.14     -1.91      0.77     -4.62     12.12   -103.09

Running DSSR on 1ehz-U8-A14.pdb gives the following results. Note that the six bp parameters (last row prefixed with bp-pars) are the exactly same as in 3DNA — we are consistent.

# x3dna-dssr -i=1ehz-U8-A14.pdb --more
List of 1 base pair
      nt1            nt2           bp  name        Saenger    LW  DSSR
   1 A.U8           A.A14          U-A rHoogsteen  24-XXIV   tWH  tW-M
       [n/a(n/a) ---- lambda=28.3] [n/a(n/a) ---- lambda=21.5]
       d(C1'-C1')=9.63 d(N1-N9)=7.06 d(C6-C8)=6.00 tor(C1'-N1-N9-C1')=174.4
       H-bonds[2]: "O2(carbonyl)-N6(amino)[3.00],N3(imino)-N7[2.74]"
       interBase-angle=12.97  Simple-bpParams: Shear=4.28 Stretch=1.55 Buckle=-11.8 Propeller=5.4
       bp-pars: [4.14    -1.91   0.77    -4.62   12.12   -103.09]

As mentioned in the recent DSSR paper:

As in 3DNA (6,7), DSSR takes advantage of the six standard base-pair parameters––three translations (Shear, Stretch, Stagger) and three rotations (Buckle, Propeller, Opening)––to quantify the relative spatial position and orientation of any two interacting bases rigorously. Among the six parameters, only Shear, Stretch, and Opening are critical for characterizing different types of pairs. Buckle, Propeller and Stagger, on the other hand, describe the nonplanarity of a given pair (6). By virtue of the definition of the standard base reference frame, Shear, Stretch, and Opening are all close to zero for Watson-Crick pairs. Moreover, every other type of pair has a set of characteristic parameters. For example, the wobble G–U pair is characterized by an average Shear of –2.2 Å, and the Hoogsteen A+U pair is distinguished by a Stretch of approximately –3.5 Å and an Opening of near 66º.

In a follow-up post, I will talk about the “simple” bp parameters (Simple-bpParams in the above DSSR output list) recently introduced into DSSR — stay tuned!

Comment

DSSR output in JSON format

As of DSSR v1.3.0-2015aug27, the --json option is available for producing analysis results that is strictly compliant with the JSON data exchange format. The JSON file contains numerous DSSR-derived structural features, including those in the default main output, backbone torsions in dssr-torsions.txt, and a detailed list of hydrogen bonds.

According to the official JSON website:

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language… JSON is a text format that is completely language independent… These properties make JSON an ideal data-interchange language.

Indeed, the JSON output file makes DSSR readily accessible for integration with other bioinformatics tools or normal usages from the command line. Using the classic yeast phenylalanine tRNA 1ehz as an example (1ehz.pdb), let’s go over some simple use-cases. Note that the following examples take advantage of jq, a lightweight and flexible command-line JSON processor.

x3dna-dssr -i=1ehz.pdb --json -o=1ehz-dssr.json
jq . 1ehz-dssr.json  # reformatted for pretty output
x3dna-dssr -i=1ehz.pdb --json | jq .  # the above 2 steps combined

With 1ehz-dssr.json in hand, we can easily extract DSSR-derived structural features of interest:

jq .pairs 1ehz-dssr.json   # list of 34 pairs
jq .multiplets 1ehz-dssr.json  # list of 4 base triplets
jq .hbonds 1ehz-dssr.json  # list of hydrogen bonds
jq .helices 1ehz-dssr.json
jq .stems 1ehz-dssr.json
  # list of nucleotide parameters, including torsion angles and suites
jq .ntParams 1ehz-dssr.json
  # list of 14 modified nucleotides
jq '.ntParams[] | select(.is_modified)' 1ehz-dssr.json
  # select nucleotide id, delta torsion, sugar puckering and cluster of suite name
jq '.ntParams[] | {nt_id, delta, puckering, cluster}' 1ehz-dssr.json
  # same selection as above, but in 'Comma Separated Values' format
jq -r '.ntParams[] | [.nt_id, .delta, .puckering, .cluster] | @csv' 1ehz-dssr.json

Here is the result of running jq (v1.5) to select multiplets:

# jq .multiplets 1ehz-dssr.json
[
  {
    "index": 1,
    "num_nts": 3,
    "nts_short": "UAA",
    "nts_long": "A.U8,A.A14,A.A21"
  },
  {
    "index": 2,
    "num_nts": 3,
    "nts_short": "AUA",
    "nts_long": "A.A9,A.U12,A.A23"
  },
  {
    "index": 3,
    "num_nts": 3,
    "nts_short": "gCG",
    "nts_long": "A.2MG10,A.C25,A.G45"
  },
  {
    "index": 4,
    "num_nts": 3,
    "nts_short": "CGg",
    "nts_long": "A.C13,A.G22,A.7MG46"
  }
]

With the JSON file, DSSR can now be connected with the bioinformatics community in a ‘structured’ way, with a clearly delineated boundary. Now I can enjoy the freedom of refining the default main output format, without worrying too much about breaking third-party parsers. Moreover, I no longer need to write an adapter for each integration of DSSR with other tools. So nice!

For your reference, here is the output file 1ehz-dssr.json. It may be possible that the identifiers (names) of the JSON output will be refined in the next few iterations. I welcome your comments to make the DSSR-derived JSON better suite your needs.

Comment

« Older · Newer »

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

jq — lightweight and flexible command-line JSON processor

Underscore-CLI — command-line utility-belt for hacking JSON and Javascript.

jq or Underscore-CLI — which one to use?

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)