X3DNA-DSSR Homepage -- Nucleic Acid Structures

Ask reproducible questions, publicly

In recent years, reproducibility of ‘scientific’ publications has become quite a topic. See a recent essay Five selfish reasons to work reproducibly by Markowetz in Genome Biology (2015, 16:274). There are numerous reasons why reproducibility could become an issue at all in science. What I have continuously strived for in my scientific career, however, is to ensure that my published results are reproducible. As a concrete example, I created a dedicated section titled DSSR-NAR paper on the 3DNA Forum that provides full details (scripts and data files) so that any interested parties can rigorously reproduce the results reported in the DSSR Nucleic Acids Research (NAR) paper.

In my support of 3DNA for over a decade, the #1 issue I experienced is undoubtedly vague (non-reproducible) questions. For example, I have recently been asked via email why the 3DNA find_pair/analyze programs miss “some basepair … even though it is in the pdb file”. Without access to the PDB file to reproduce the problem, however, I cannot provide a concrete answer. In an effect to prevent ambiguous questions, I made the following explicit point in the “Registration Agreement” of the 3DNA Forum (no. 2 on the list):

Be specific with your questions; provide a minimal, reproducible example if possible; use attachments where appropriate.

The #2 issue is receiving 3DNA-related questions privately instead of on the intended public 3DNA Forum. I turned off “personal messaging” to receive private messages on the Forum long time ago, yet I have kept receiving questions via emails. In several locations on the 3DNA Forum, I have made this ‘public-question’ policy crystal clear:

Ask your questions in the public 3DNA forum instead of sending xiangjun emails or personal messages. (no. 1 on the ‘Registration Agreement’)

Please be aware that for the benefit of the 3DNA-user community at large, I do not provide private email/personal message support; the forum has been created specifically for open discussions of all 3DNA-related issues. In other words, any 3DNA-associated questions are welcome and should be directed here. Presumably I’ve made the message simple and clear enough to get across without further explanation. (in ‘Site announcements » Download instructions’ and ‘Downloads » 3DNA download’)

In response to the many 3DNA-related questions that still keep coming via email, I created the following entry of Canned Responses in gmail:

Thanks for your interest in using 3DNA. Please be aware that for the benefit of the 3DNA-user community at large, I do not provide private email support; the 3DNA Forum (http://forum.x3dna.org/) has been created specifically for open discussions of all 3DNA-related issues. In other words, any 3DNA-associated questions are welcome and should be directed there. I monitor the forum regularly and respond to posts promptly.

I look forward to seeing you on the 3DNA Forum (http://forum.x3dna.org/).

Overall, I’ve learned from experience that addressing reproducible questions publicly does the best for the 3DNA community. Users can register with personal (free) email address, and post simulated data to illustrate the problem at hand. Moreover, questions on the Forum have always received quick responses. Over time, the Forum has served as an archive that everyone can benefit from.

Comment [2]

'Simple' parameters for non-Watson-Crick base pairs

As of v2.3-2016jan01, the 3DNA analyze program outputs a list of new ‘simple’ base-pair and step parameters, by default. Shown below is a sample output for PDB entry 1xvk. This echinomycin-(GCGTACGC)2 complex has a single DNA strand as the asymmetric unit. 3DNA needs the the biological unit (1xvk.pdb1) to analyze the duplex (with the -symm option). This structure contains two Hoogsteen base pairs, and has popped up on the 3DNA Forum for the zero or negative Rise values. Note that the ‘simple’ Rise values are all positive; for the middle (#4) TA/TA step, it is now 3.09 Å instead of 0.

# find_pair -symm 1xvk.pdb1 1xvk.bps
# analyze -symm 1xvk.bps
#   OR by combing the above two commands:
# find_pair -symm 1xvk.pdb1 | analyze -symm
# The output is in file '1xvk.out'
This structure contains 4 non-Watson-Crick (with leading *) base pair(s)
----------------------------------------------------------------------------
Simple base-pair parameters based on RC8--YC6 vectors
      bp        Shear    Stretch   Stagger    Buckle  Propeller  Opening
*    1 G+C      -3.07      1.55     -0.35     -6.98      0.29     67.33
     2 C-G       0.27     -0.17      0.35    -22.34      3.33     -2.80
     3 G-C      -0.39     -0.17      0.41     22.91      1.81     -2.73
*    4 T+A      -3.29      1.56      0.31     -8.03      1.59    -70.46
*    5 A+T      -3.29      1.56     -0.31     -8.03      1.59     70.46
     6 C-G       0.39     -0.17      0.41    -22.91      1.81     -2.72
     7 G-C      -0.27     -0.17      0.35     22.34      3.32     -2.80
*    8 C+G      -3.07      1.55      0.35     -6.98      0.30    -67.33
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       ave.     -1.59      0.69      0.19     -3.75      1.75     -1.38
       s.d.      1.72      0.92      0.32     17.57      1.15     52.11
----------------------------------------------------------------------------
Simple base-pair step parameters based on consecutive C1'-C1' vectors
      step       Shift     Slide      Rise      Tilt      Roll     Twist
*    1 GC/GC     -0.55      0.39      7.41      6.40     -4.22     23.36
     2 CG/CG     -0.05      0.87      2.44     -0.55      3.94     -0.81
*    3 GT/AC      0.38      0.47      7.23     -8.62      3.75     25.70
*    4 TA/TA     -0.00      4.73      3.09     -0.00      7.49     25.67
*    5 AC/GT     -0.38      0.47      7.23      8.62      3.75     25.70
     6 CG/CG      0.05      0.87      2.44      0.55      3.94     -0.82
*    7 GC/GC      0.55      0.39      7.41     -6.40     -4.22     23.36
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        ave.     -0.00      1.17      5.32     -0.00      2.06     17.45
        s.d.      0.39      1.59      2.50      6.21      4.49     12.52

The simple parameters are ‘intuitive’ for non-Watson-Crick base pairs and associated base-pair steps, where the existing standard-reference-frame-based 3DNA parameters may look weird. Note that these simple parameters are for structural description only, not to be fed into the ‘rebuild’ program. Overall, they complement the rigorous characterization of base-pair geometry, as demonstrated by the original analyze/rebuild pair of programs in 3DNA.

In short, the ‘simple’ base-pair parameters employ the YC6—RC8 vector as the y-axis whereas the ‘simple’ step parameters use consecutive C1’—C1’ vectors. As before, the z-axis is the average of two base normals, taking consideration of the M–N vs M+N base-pair classification. In essence, the ‘simple’ parameters make geometrical sense by introducing an ad hoc base-pair reference frame in each case. More details will be provided in a series of blog posts shortly.

Overall, this new section of ‘simple’ parameters should be taken as experimental. The output can be turned off by specifying the analyze -simple=false command-line option explicitly. As always, I greatly appreciate your feedback.

Comment

Quality control of DSSR (3DNA) source code

Over the years, I have played quite a few computer programming languages. ANSI C has become my top choice for ‘serious’ software projects, due to its small size, efficiency, flexibility, and ubiquitous support. Moreover, C is a mature language, with a rich ecosystem. As it turns out, C has also been consistently rated as one of the most popular computer languages (#1 or #2) over the past thirty years.

Needless to say, ANSI C has its own quirks, and it takes a steep learning curve. However, once you get over the hurdles, the language serves you. I cannot remember when, but it has been a long while that coding in ANSI C is no longer an issue. It is the understanding of scientific questions that takes most of my time, and coding helps greatly in refining my thoughts.

Not surprisingly, ANSI C was chosen as the sole language for DSSR (and SNAP, or 3DNA in general). The ensure the overall quality of the DSSR codebase, I have taken the following steps:

The whole project is under git.
The ANSI C source code is compiled with strict GCC options for full compliance to the standard:

-ansi -pedantic -W -Wall -Wextra -Wunused -Wshadow -Werror -O3

The executable is checked with valgrind for any memory leak:

valgrind --leak-check=full x3dna-dssr -i=1ehz.pdb -o=1ehz.out --quiet
==19624== Memcheck, a memory error detector
==19624== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==19624== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==19624== Command: x3dna-dssr -i=1ehz.pdb -o=1ehz.out --quiet
==19624==
==19624==
==19624== HEAP SUMMARY:
==19624==     in use at exit: 0 bytes in 0 blocks
==19624==   total heap usage: 52,829 allocs, 52,829 frees, 92,878,578 bytes allocated
==19624==
==19624== All heap blocks were freed -- no leaks are possible
==19624==
==19624== For counts of detected and suppressed errors, rerun with: -v
==19624== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Extensive tests (with the simple diff command) to ensure the program is working as expected.

The above four measures combined allow me to add new features, refactor the code, and fix bugs, without worrying about accidentally breaking existing functionality. Reading literature (including citations to 3DNA/DSSR) and responding to user feedback on the 3DNA Forum keep me continuously improve DSSR. Some of the recent refinements to DSSR came about this way.

Comment

Metallo-base pairs can be identified by DSSR

Recently, I became aware of the metallo-base pairs, such as T-Hg-T (PDB id: 4l24) and C-Ag-C (5ay2) from the work of Kondo et al (Pubmed: 24478025 and 26448329). As of v1.4.3-2015oct23, DSSR can detect such metallo-bps automatically, as shown below:

# x3dna-dssr -i=4l24.pdb -o=4l24.out
List of 12 base pairs
      nt1            nt2           bp  name        Saenger    LW  DSSR
   1 A.DC1          B.DG24         C-G WC          19-XIX    cWW  cW-W
   2 A.DG2          B.DC23         G-C WC          19-XIX    cWW  cW-W
   3 A.DC3          B.DG22         C-G WC          19-XIX    cWW  cW-W
   4 A.DG4          B.DC21         G-C WC          19-XIX    cWW  cW-W
   5 A.DA5          B.DT20         A-T WC          20-XX     cWW  cW-W
   6 A.DT6          B.DT19         T-T Metal       n/a       cWW  cW-W
   7 A.DT7          B.DT18         T-T Metal       n/a       cWW  cW-W
   8 A.DT8          B.DA17         T-A WC          20-XX     cWW  cW-W
   9 A.DC9          B.DG16         C-G WC          19-XIX    cWW  cW-W
  10 A.DG10         B.DC15         G-C WC          19-XIX    cWW  cW-W
  11 A.DC11         B.DG14         C-G WC          19-XIX    cWW  cW-W
  12 A.DG12         B.DC13         G-C WC          19-XIX    cWW  cW-W

and

# x3dna-dssr -i=5ay2.pdb -o=5ay2.out
List of 24 base pairs
      nt1            nt2           bp  name        Saenger    LW  DSSR
   1 A.G1           B.C12          G-C WC          19-XIX    cWW  cW-W
   2 A.G2           B.C11          G-C WC          19-XIX    cWW  cW-W
   3 A.A3           B.U10          A-U WC          20-XX     cWW  cW-W
   4 A.C4           B.C9           C-C Metal       n/a       cWW  cW-W
   5 A.U5           B.A8           U-A WC          20-XX     cWW  cW-W
   6 A.CBR6         B.G7           c-G WC          19-XIX    cWW  cW-W
   7 A.G7           B.CBR6         G-c WC          19-XIX    cWW  cW-W
   8 A.A8           B.U5           A-U WC          20-XX     cWW  cW-W
   9 A.C9           B.C4           C-C Metal       n/a       cWW  cW-W
  10 A.U10          B.A3           U-A WC          20-XX     cWW  cW-W
  11 A.C11          B.G2           C-G WC          19-XIX    cWW  cW-W
  12 A.C12          B.G1           C-G WC          19-XIX    cWW  cW-W
  13 C.G1           D.C12          G-C WC          19-XIX    cWW  cW-W
  14 C.G2           D.C11          G-C WC          19-XIX    cWW  cW-W
  15 C.A3           D.U10          A-U WC          20-XX     cWW  cW-W
  16 C.C4           D.C9           C-C Metal       n/a       cWW  cW-W
  17 C.U5           D.A8           U-A WC          20-XX     cWW  cW-W
  18 C.CBR6         D.G7           c-G WC          19-XIX    cWW  cW-W
  19 C.G7           D.CBR6         G-c WC          19-XIX    cWW  cW-W
  20 C.A8           D.U5           A-U WC          20-XX     cWW  cW-W
  21 C.C9           D.C4           C-C Metal       n/a       cWW  cW-W
  22 C.U10          D.A3           U-A WC          20-XX     cWW  cW-W
  23 C.C11          D.G2           C-G WC          19-XIX    cWW  cW-W
  24 C.C12          D.G1           C-G WC          19-XIX    cWW  cW-W

Note the name “Metal” for the metallo-bps. Moreover, the corresponding entries in the ‘dssr-pairs.pdb’ file also include the metal ions, as shown below:

Metallo T-Hg-T base pair (PDB id: 4l24) Metallo C-Ag-C base pair (PDB id: 5ay2)

It is worth noting that in a metallo-bp, the metal ion lies approximately in the bp plane. Moreover, it is in the middle of the two bases, which would otherwise not form a pair in the conventional sense.

Comment

Analyzing DNA/RNA structures with Curves+ and 3DNA

Curves+ and 3DNA are currently the most widely used programs for analyzing nucleic acid structures (predominantly double helices). As noted in my blog post, Curves+ vs 3DNA, these two programs also complement each other in terms of features. It thus makes sense to run both to get a better understanding of the DNA/RNA structures one is interested in.

Indeed, over the past few years, I have seen quite a few articles citing both 3DNA and Curves+. Listed below are three recent examples:

Structures and energetics of four adjacent G·U pairs that stabilize an RNA helix by Gu et al. (2015) in J Phys Chem B.

The helical parameters were measured with 3DNA³³ and Curves+.³⁴ The local helical parameters are defined with regard to base steps and without regard to a global axis.

5-Formylcytosine alters the structure of the DNA double helix by Raiber et al. (2015) in Nat Struct Mol Biol.

Structure analysis. Helix, base and base pair parameters were calculated with 3DNA or curve+ software packages^23,24.

Structural insights into the effects of 2′-5′ linkages on the RNA duplex by Sheng et al. (2014) in Proc Natl Acad Sci USA.

The major global difference between the native and mixed backbone structures is that the RNA backbone is compressed or kinked in strands containing the modified linkage (Fig. 3 B and C, by CURVES) (30). … To compare the three RNA structures at a more detailed and local level, we calculated the base pair helical and step parameters for all three structures using the 3DNA software tools (31) (Fig. 4 and Table S2). [In the Results section]

For each snapshot, the structural parameters—including six base pair parameters, six local base pair step parameters, and pseudorotation angles for each nucleotide—were calculated using 3DNA (31). The two terminal base pairs are omitted for the 3DNA analysis, because they unwind frequently in the triple 2′-5′-linked duplex. [In the Materials and Methods section]

Reading through these papers, however, it is not clear to me if the authors took advantage of the find_pair -curves+ option in 3DNA, as detailed in Building a bridge between Curves+ and 3DNA. Hopefully, this post will help draw more attention to this connection between Curves+ and 3DNA.

Comment

Parsing DSSR json output

JSON (JavaScript Object Notation) is a simple human-readable format that expresses data objects in name-value pairs. Over the years, it has surpassed XML to become the preferred data exchange format between applications. As a result, I’ve recently added the --json command-line option to DSSR to make its numerous derived parameters easily accessible.

The DSSR JSON output is contained in a compact one-line text string that may look cryptic to the uninitiated. Yet, with commonly available JSON parsers or libraries, it is straightforward to make sense of the DSSR JSON output. In this blogpost, I am illustrating how to parse DSSR-derived .json file via two command-line tools, jq and Underscore-CLI.

jq — lightweight and flexible command-line JSON processor

According to its website,

jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

Moreover, like DSSR per se, “jq is written in portable C, and it has zero runtime dependencies.” Prebuilt binaries are available for Linux, OS X and Windows. So it is trivial to get jq up and running. The current stable version is 1.5, released on August 15, 2015.

Using the crystal structure of yeast phenylalanine tRNA (1ehz) as an example, here are some sample usages with DSSR-derived JSON output:

    # Pretty print JSON
x3dna-dssr -i=1ehz.pdb --json | jq .
    # Extract the top-level keys, in insertion order 
x3dna-dssr -i=1ehz.pdb --json | jq keys_unsorted
    # Extract parameters for nucleotides
x3dna-dssr -i=1ehz.pdb --json | jq .nts
    # Extract nucleotide id and its base reference frame
x3dna-dssr -i=1ehz.pdb --json | jq '.nts[] | (.nt_id, .frame)'

Underscore-CLI — command-line utility-belt for hacking JSON and Javascript.

Underscore-CLI is built upon Node.js, and can be installed using the npm package manager. It is claimed as ‘the “swiss-army-knife” tool for processing JSON data – can be used as a simple pretty-printer, or as a full-powered Javascript command-line.’

Following the above examples illustrating jq, here are the corresponding commands for Underscore-CLI:

x3dna-dssr -i=1ehz.pdb --json | underscore print --color
x3dna-dssr -i=1ehz.pdb --json | underscore keys --color
x3dna-dssr -i=1ehz.pdb --json | underscore select .nts --color
x3dna-dssr -i=1ehz.pdb --json | underscore select .nts | underscore select '.nt_id, .frame' --color

jq or Underscore-CLI — which one to use?

As always, it depends. While jq feels more like a standard Unix utility (as sed, awk, grep etc), Underscore-CLI is better integrated into the Javascript language. For simple applications such as parsing DSSR output, either jq or Underscore-CLI is more than sufficient.

I use jq most of the time, but resort to Underscore-CLI for its “smart whitespace”. Here is an example to illustrate the difference between the two:

# z-axis of A.G1 (1ehz) base reference frame
# jq output, split in 5 lines
    "z_axis": [
      0.799,
      0.488,
      -0.352
    ]
# Underscore-CLI, in a more-readable one line
    "z_axis": [0.799, 0.488, -0.352]

Comment

Updates on x3dna.org

From early on, the x3dna.org domain and its related sub-domains (e.g., for the forum and the web-interface to DSSR) has been served via shared hosting. By and large, this simple arrangement has worked quite well. Over the years, though, I’ve gradually realized some of its inherent limitations. One is the limited resources available to the 3DNA-related websites. Another is the accessibility issue from countries like China.

To remedy such issues, I’ve recently moved the 3DNA Forum and the web-interface to DSSR to a dedicated web server at Columbia University. Moreover, a duplicate copy of the 3DNA homepage is made available via http://home.x3dna.org hosted at Columbia. The three new websites have been verified to be accessible directly from China.

These updates on x3dna.org not only ensure global accessibility to 3DNA/DSSR, but also allow for more web services to be made available.

Comment

Output of reference frames in DSSR JSON output

As of v1.3.3-2015sep03, DSSR outputs the reference frame of any base or base-pair (bp). With an explicit list of such reference frames, one can better understand how the 3DNA/DSSR bp parameters are calculated. Moreover, third-party bioinformatics tools can take advantage of the frames for further exploration of nucleic acid structures, including visualization.

Let’s use the G1–C72 bp (detailed below) in the yeast phenylalanine tRNA (1ehz) as an example:

1 A.G1           A.C72          G-C WC          19-XIX    cWW  cW-W

The standard base reference frame for A.G1 is:

{
  rsmd: 0.008,
  origin: [53.757, 41.868, 52.93],
  x_axis: [-0.259, -0.25, -0.933],
  y_axis: [-0.543, 0.837, -0.073],
  z_axis: [0.799, 0.488, -0.352]
}

And the one for A.C72 is:

{
  rsmd: 0.006,
  origin: [53.779, 42.132, 52.224],
  x_axis: [-0.402, -0.311, -0.861],
  y_axis: [0.451, -0.886, 0.109],
  z_axis: [-0.797, -0.345, 0.497]
}

The G1–C72 bp reference frame is:

{
  rsmd: null,
  origin: [53.768, 42, 52.577],
  x_axis: [-0.331, -0.283, -0.9],
  y_axis: [-0.497, 0.863, -0.089],
  z_axis: [0.802, 0.418, -0.427]
}

The beauty of the DSSR JSON output is that the above information can be extracted on the fly. For example, the following commands extract the above frames:

x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.G1") | .frame'
x3dna-dssr -i=1ehz.pdb --json | jq '.ntParams[] | select(.nt_id=="A.C72") | .frame'
x3dna-dssr -i=1ehz.pdb --json --more | jq .pairs[0].frame

Note that in JSON, the array is 0-indexed, so the first bp (G1–C72) has an index of 0. In addition to jq, I also used underscore to pretty-print the frames.

Comment

Conformation of the sugar ring in nucleic acid structures

The conformation of the five-membered sugar ring in DNA/RNA structures can be characterized using the five corresponding endocyclic torsion angles (shown below).

v0: C4'-O4'-C1'-C2'
v1: O4'-C1'-C2'-C3'
v2: C1'-C2'-C3'-C4'
v3: C2'-C3'-C4'-O4'
v4: C3'-C4'-O4'-C1'

On account of the five-member ring constraint, the conformation can be characterized approximately by 5 - 3 = 2 parameters. Using the concept of pseudorotation of the sugar ring, the two parameters are the amplitude (τ_m) and phase angle (P, in the range of 0° to 360°).

One set of widely used formula to convert the five torsion angles to the pseudorotation parameters is due to Altona & Sundaralingam (1972): “Conformational Analysis of the Sugar Ring in Nucleosides and Nucleotides. A New Description Using the Concept of Pseudorotation” [J. Am. Chem. Soc., 94(23), pp 8205–8212]. As always, the concept is best illustrated with an example. Here I use the sugar ring of G4 (chain A) of the Dickerson-Drew dodecamer (1bna), with Matlab/Octave code:

# xyz coordinates of the sugar ring: G4 (chain A), 1bna
ATOM     63  C4'  DG A   4      21.393  16.960  18.505  1.00 53.00
ATOM     64  O4'  DG A   4      20.353  17.952  18.496  1.00 38.79
ATOM     65  C3'  DG A   4      21.264  16.229  17.176  1.00 56.72
ATOM     67  C2'  DG A   4      20.793  17.368  16.288  1.00 40.81
ATOM     68  C1'  DG A   4      19.716  17.901  17.218  1.00 30.52

# endocyclic torsion angles:
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4
Pconst = sin(pi/5) + sin(pi/2.5)  # 1.5388
P0 = atan2(v4 + v1 - v3 - v0, 2.0 * v2 * Pconst);  # 2.9034
tm = v2 / cos(P0);  # amplitude: 48.469
P = 180/pi * P0;  # phase angle: 166.35 [P + 360 if P0 < 0]

The Altona & Sundaralingam (1972) pseudorotation parameters are what have been adopted in 3DNA, following the NewHelix program of Dr. Dickerson. The Curves+ program, on the other hand, uses another (newer) set of formula due to Westhof & Sundaralingam (1983): “A Method for the Analysis of Puckering Disorder in Five-Membered Rings: The Relative Mobilities of Furanose and Proline Rings and Their Effects on Polynucleotide and Polypeptide Backbone Flexibility” [J. Am. Chem. Soc., 105(4), pp 970–976]. The two sets of formula, by Altona & Sundaralingam (1972) and Westhof & Sundaralingam (1983), give slightly different numerical values for the two pseudorotation parameters (τ_m and P).

Since 3DNA and Curves+ are currently two of the most widely used programs for conformational analysis of nucleic acid structures, the subtle differences in pseudorotation parameters may cause confusions for users who use (or are familiar with) both programs. Over the past few years, I have indeed received such questions via email.

With the same G4 (chain A, 1bna) sugar ring, here is the Matlab/Octave script showing how Curve+ calculates the pseudorotation parameters:

# xyz coordinates of sugar ring G4 (chain A, 1bna)

# endocyclic torsion angles, same as above
v0 = -26.7; v1 = 46.3; v2 = -47.1; v3 = 33.4; v4 = -4.4

v = [v2, v3, v4, v0, v1]; # reorder them into vector v[]
A = 0; B = 0;
for i = 1:5
    t = 0.8 * pi * (i - 1);
    A += v(i) * cos(t);
    B += v(i) * sin(t);
end
A *= 0.4;   # -48.476
B *= -0.4;  # 11.516

tm = sqrt(A * A + B * B);  # 49.825

c = A/tm; s = B/tm;
P = atan2(s, c) * 180 / pi;  # 166.64

For this specific example, i.e., the sugar ring of G4 (chain A, 1bna), the pseudorotation parameters as calculated by 3DNA per Altona & Sundaralingam (1972) and Curves+ per Westhof & Sundaralingam (1983) are as follows:

           amplitude        phase angle
3DNA        48.469             166.35
Curves+     49.825             166.64

Needless to say, the differences are subtle, and few people will notice/bother at all. For those who do care about such little details, however, hopefully this post will help you understand where the differences actually come from.

For consistency with the 3DNA output, DSSR (by default) also follows the Altona & Sundaralingam (1972) definitions of sugar pseudorotation. Nevertheless, DSSR also contains an undocumented option, --sugar-pucker=westhof83, to output τ_m and P according to the Westhof & Sundaralingam (1983) definitions.

Each sugar is assigned into one of the following ten puckering modes, by dividing the phase angle (P, in the range of 0° to 360°) into 36° ranges reach.

C3'-endo, C4'-exo,   O4'-endo, C1'-exo,  C2'-endo,
C3'-exo,  C4'-endo,  O4'-exo,  C1'-endo, C2'-exo

For sugars in nucleic acid structures, C3’-endo [0°, 36°) and C2’-endo [144°, 180°) are predominant. The former corresponds to sugars in ‘canonical’ RNA or A-form DNA, and the latter in sugars of standard B-form DNA. In reality, RNA structures as deposited in the PDB could also contain C2′-endo sugars. One significant example is the GpU dinucleotide platforms, where the 5′-ribose sugar (G) is in the C2′-endo form and the 3′-sugar (U) in the C3′-endo form — see my blog post, titled ‘Is the O2′(G)…O2P H-bond in GpU platforms real?’.

Notes:

This post is based on my 2011-06-11 blog post with the same title.
While visiting Lyon in July 2014, I had the opportunity to hear Dr. Lavery’s opinion on adopting the Westhof & Sundaralingam (1983) sugar-pucker definitions in Curves+. I learned that the new formula are more robust in rare, extreme cases of sugar conformation than the 1972 variants. After all, Dr. Sundaralingam is a co-author on both papers. It is possible that in future releases of DSSR, the new 1983 formula for sugar pucker would become the default.

Comment

« Older · Newer »

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids(An NIGMS National Resource supported by NIH grant R24GM153869)

jq — lightweight and flexible command-line JSON processor

Underscore-CLI — command-line utility-belt for hacking JSON and Javascript.

jq or Underscore-CLI — which one to use?

X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids
(An NIGMS National Resource supported by NIH grant R24GM153869)