It gives me great pleasure to announce that the 3DNA/DSSR project is now funded by the NIH R24GM153869 grant, titled "X3DNA-DSSR: a resource for structural bioinformatics of nucleic acids". I am deeply grateful for the opportunity to continue working on a project that has basically defined who I am. It was a tough time during the funding gap over the past few years. Nevertheless, I have experienced and learned a lot, and witnessed miracles enabled by enthusiastic users.
Since late 2020 when I lost my R01 grant, DSSR has been licensed by the Columbia Technology Ventures (CTV). I appreciate the numerous users (including big pharma) who purchased a DSSR Pro License or a DSSR Basic paid License. Thanks to the NIH R24GM153869 grant, we are pleased to provide DSSR Basic free of charge to the academic community. Academic Users may submit a license request for DSSR Basic or DSSR Pro by clicking "Express Licensing" on the CTV landing page. Commercial users may inquire about pricing and licensing terms by emailing techtransfer@columbia.edu, copying xiangjun@x3dna.org.
The current version of DSSR is v2.4.5-2024sep24 which contains miscellaneous bug fixes (e.g., chain id with > 4 chars) and minor improvements. This release synchronizes with the new R24 funding, which will bring the project to the next level. All existing users are encouraged to upgrade their installation.
Lots of exciting things will happen for the project. The first thing is to make DSSR freely accessible to the academic community. In the past couple of weeks, CTV have already issued quite a few DSSR Basic Academic licenses to users from all over the world. So the demand is high, and it will become stronger as more academic users become aware of DSSR. I'm closely monitoring the 3DNA Forum, and is always ready to answer users questions.
I am committed to making DSSR a brand that stands for quality and value. By virtue of its unmatched functionality, usability, and support, DSSR saves users a substantial amount of time and effort when compared to other options. My track record throughout the years has unambiguously demonstrated my dedication to this solid software product.
DSSR Basic contains all features described in the three DSSR-related papers, and includes the originally separate SNAP program (still unpublished) for analyzing DNA/RNA-protein complexes. The Pro version integrates the classic 3DNA functionality, plus advanced modeling routines, with email/Zoom/phone support.
From the very beginning, 3DNA calculates a set of nucleic acid backbone parameters, including the six main chain torsion angles (α, β, γ, δ, ε, and ζ) around the covalent bonds, χ about the glycosidic bond, and the sugar pucker (see figure below). For double helical structures, the standard analyze
output (.out
file) has a section for “Main chain and chi torsion angles,” and another dedicated to “Sugar conformational parameters”. Based on my experience/understanding, these two parts are well recognized and utlizied by 3DNA users. What has receive little attention (in spite of the several posts I’ve written on the topic), though, is 3DNA’s applicability to single-stranded (ss) RNA structures for the backbone torsions, among other parameters. Using the fully refined crystal structure of the Haloarcula marismortui large ribosomal subunit (PDB entry 1jj2) as an example, the procedure is below:
find_pair -s 1jj2.pdb 1jj2.nts
analyze 1jj2.nts
# or the above two steps can be combined:
find_pair -s 1jj2.pdb stdout | analyze stdin
# see output file '1jj2.outs'
In retrospect, the fact that 3DNA has been little used for RNA backbone conformational analysis is of no surprise:
- While base-pair parameters have different (oftentimes confusing) definitions, these backbone parameters are pretty “standard” — thus, for example, any program for DNA/RNA structural analysis would give the same numerical values for α or χ torsion angles.
- The two-step process as illustrated above is a bit awkward, and the torsions are “buried” among many other parameters.
- 3DNA is more directly “linked” (conceivably) to DNA base pairs than to RNA backbone.
So while adapting the Zp parameter for ss DNA/RNA structures in 3DNA v2.1, I also take this opportunity to add the -torsion
option to analyze
with the following handy features:
- Streamline the calculation by starting directly from a PDB file and output only backbone parameters. So the above example can be shortened to
analyze -t=1jj2.tor 1jj2.pdb
; the output file is named 1jj2.tor
.
- Classify backbone into BI/BII conformation, and base χ into syn / anti.
- Add pseudo-torsions, and Zp and Dp as defined by Richardson et al.
- Handle pseudouridine sensibly, and work also for nucleic acid structure with only backbone atoms.
- Be easy to use, efficient and robust — it takes ~1 second to process the large ribosomal subunit 1jj2 (with 2876 nucleotides consisting of 23S rRNA and 5S rRNA) on my MacBook Air.
Overall, analyze -torsion
is designed to be pragmatic and allows for automatic processing of all NDB entries or molecular dynamics trajectories. Given below is an excerpt of the three sections from an analyze -torsion
run on 1jj2:
****************************************************************************
Main chain and chi torsion angles:
Note: alpha: O3'(i-1)-P-O5'-C5'
beta: P-O5'-C5'-C4'
gamma: O5'-C5'-C4'-C3'
delta: C5'-C4'-C3'-O3'
epsilon: C4'-C3'-O3'-P(i+1)
zeta: C3'-O3'-P(i+1)-O5'(i+1)
chi for pyrimidines(Y): O4'-C1'-N1-C2
chi for purines(R): O4'-C1'-N9-C4
If chi is in range [-90, +90], syn conformation
otherwise, it is in anti conformation
e-z: epsilon - zeta
BI: e-z = [-160, +20]
BII: e-z = [+20, +200]
base chi alpha beta gamma delta epsilon zeta e-z
1 0:..10_:[..U]U -62.5(syn) --- --- 56.2 74.0 142.2 -87.8 -130.1(BI)
2 0:..11_:[..A]A 171.5(anti) 173.2 -161.0 168.5 84.0 -112.1 -65.4 -46.7(BI)
3 0:..12_:[..U]U -167.7(anti) -70.7 168.4 53.0 78.5 -128.5 -46.4 -82.1(BI)
4 0:..13_:[..G]G -172.5(anti) -61.8 170.4 67.7 73.5 -166.7 -79.6 -87.1(BI)
5 0:..14_:[..C]C -166.0(anti) -73.0 -172.5 55.1 83.2 -143.3 -77.7 -65.6(BI)
6 0:..15_:[..C]C -155.5(anti) -60.9 174.1 47.3 80.3 -154.4 -71.2 -83.2(BI)
****************************************************************************
Pseudo (virtual) eta/theta torsion angles:
Note: eta: C4'(i-1)-P(i)-C4'(i)-P(i+1)
theta: P(i)-C4'(i)-P(i+1)-C4'(i+1)
eta': C1'(i-1)-P(i)-C1'(i)-P(i+1)
theta': P(i)-C1'(i)-P(i+1)-C1'(i+1)
eta": Borg(i-1)-P(i)-Borg(i)-P(i+1)
theta": P(i)-Borg(i)-P(i+1)-Borg(i+1)
base eta theta eta' theta' eta" theta"
1 0:..10_:[..U]U --- --- --- --- --- ---
2 0:..11_:[..A]A -174.6 -129.7 177.0 -127.7 -157.5 -75.5
3 0:..12_:[..U]U 149.1 -105.1 174.1 -101.2 -111.0 -69.4
4 0:..13_:[..G]G 169.0 -172.5 -156.6 -169.2 -93.3 -137.1
5 0:..14_:[..C]C 176.2 -143.4 179.6 -140.6 -144.6 -120.6
6 0:..15_:[..C]C 165.0 -147.7 177.4 -146.8 -149.2 -121.7
****************************************************************************
Sugar conformational parameters:
Note: v0: C4'-O4'-C1'-C2'
v1: O4'-C1'-C2'-C3'
v2: C1'-C2'-C3'-C4'
v3: C2'-C3'-C4'-O4'
v4: C3'-C4'-O4'-C1'
tm: the amplitude of pucker
P: the phase angle of pseudorotation
Zp: z-coordinate of the 3' phosphorus atom (P) expressed in the
standard base reference frame; it's POSITIVE when P is on
the +z-axis side (base in anti conformation); NEGATIVE if
P is on the -z-axis side (base in syn conformation)
Dp: perpendicular distance of the 3' P atom to the glycosydic bond
[as per the MolProbity paper of Richardson et al. (2010)]
base v0 v1 v2 v3 v4 tm P Puckering Zp Dp
1 0:..10_:[..U]U -11.3 -15.4 34.5 -41.8 33.5 41.6 33.8 C3'-endo -0.13 3.53
2 0:..11_:[..A]A 11.4 -30.2 36.9 -31.5 12.6 36.9 1.2 C3'-endo 4.74 4.78
3 0:..12_:[..U]U 3.6 -29.3 42.4 -41.3 23.8 43.6 13.9 C3'-endo 4.67 4.82
4 0:..13_:[..G]G -13.0 -17.8 39.8 -47.9 38.5 47.8 33.7 C3'-endo 4.45 4.46
5 0:..14_:[..C]C 6.0 -28.4 38.9 -36.5 19.2 39.5 10.1 C3'-endo 4.57 4.70
6 0:..15_:[..C]C 1.9 -26.4 39.6 -39.6 23.7 41.2 16.0 C3'-endo 4.32 4.61
Given the x-, y-, and z-coordinates of four points (a-b-c-d) in 3-dimensional (3D) space, how to calculate the torsion angle? Overall, this is a well-solved problem in structural biology and chemistry; one can find a description of torsion angle in many text books and on-line documents. The algorithm for its calculation is implementated in virtually every software package in computational structural biology and chemistry.
As basic as the concept is, however, it is important (based on my experience) to have a clear understanding of how torsion angle is defined in order to really get into the 3D world. Here is a worked example using Octave/Matlab of my simplified, geometry-based implementation of calculating torsion angle, including how to determine its sign. No theory or (complicated) mathematical formula, just a step-by-step illustration of how I solve this problem.
- Coordinates of four points are given in variable
abcd
:
abcd = [ 21.350 31.325 22.681
22.409 31.286 21.483
22.840 29.751 21.498
23.543 29.175 22.594 ];
- Two auxiliary functions: norm_vec() to normalize a vector; get_orth_norm_vec() to get the orthogonal component (normalized) of a vector with reference to another vector, which should have already been normalized.
function ovec = norm_vec(vec)
ovec = vec / norm(vec);
endfunction
function ovec = get_orth_norm_vec(vec, vref)
temp = vec - vref * dot(vec, vref);
ovec = norm_vec(temp);
endfunction
- Get three vectors: b_c is the normalized vector b→c; b_a_orth is the orthogonal component (normalized) of vector b→a with reference to b→c; c_d_orth is similarly defined, as the orthogonal component (normalized) of vector c→d with reference to b→c.
b_c = norm_vec(abcd(3, :) - abcd(2, :))
% [0.2703158 -0.9627257 0.0094077]
b_a_orth = get_orth_norm_vec(abcd(1, :) - abcd(2, :), b_c)
% [-0.62126 -0.16696 0.76561]
c_d_orth = get_orth_norm_vec(abcd(4, :) - abcd(3, :), b_c)
% [0.41330 0.12486 0.90199]
- Now the torsion angle is defined as the angle between the two vectors, b_a_orth and c_d_orth, and can be easily calculated by their dot product. The sign of the torsion angle is determined by the relative orientation of the cross product of the same two vectors with reference to the middle vector b→c. Here they are in opposite direction, thus the torsion angle is negative.
angle_deg = acos(dot(b_a_orth, c_d_orth)) * 180 / pi % 65.609
sign = dot(cross(b_a_orth, c_d_orth), b_c) % -0.91075
if (sign < 0)
ang_deg = -angle_deg % -65.609
endif
A related concept is the so-called dihedral angle, or more generally the angle between two planes. As long as the normal vectors to the two corresponding planes are defined, the angle between them is easy to work out.
It’s worth noting that the helical twist angle in SCHNAaP and 3DNA is calculated similarly.
Backbone conformation of nucleic acid structures is most characterized by a set of 6 torsion angles (α, β, γ, δ, ε, and ζ) around the consecutive chemical bonds, chi (χ) quantifying the relative base/sugar orientation, plus the sugar pucker.
This large number of DNA/RNA backbone conformational parameters is in striking contrast to the two torsion angles (φ and ψ) in protein structures, routinely employed in Ramachandran plot. Over the years, the nucleic acid community has come up with simplified ways to represent DNA/RNA backbone conformation. Thus far, the most widely used one is the pseudo-torsion angles (See figure below) η: C4′(i-1)-P(i)-C4′(i)-P(i+1) and θ: P(i)-C4′(i)-P(i+1)-C4′(i+1).
The history of the P—C4′ virtual-bond concept and its application in RNA structure analysis have recently been reviewed by Pyle et al. in A new way to see RNA [Q Rev Biophys. 2011, 44(4), 433—466], where the following three contributions are highlighted:
- Olson (1980). Configurational statistics of polynucleotide chains. An updated virtual bond model to treat effects of base stacking., Macromolecules 13(3), 721—728.
- Malathi & Yathindra (1980). A novel virtual bond scheme to probe ordered and random coil conformations of nucleic acids: Configurational statistics of polynucleotide chains. Current Science, 49, 803—807.
- Duarte & Pyle (1998). Stepping through an RNA structure: A novel approach to conformational analysis. Journal of Molecular Biology, 284, 1465—1478.
More recently, Pyle et al. also employed a modified version of the pseudo-torsions, η′: C1′(i-1)-P(i)-C1′(i)-P(i+1) and θ′: P(i)-C1′(i)-P(i+1)-C1′(i+1), i.e., using C1′ instead of C4′, and found that:
The η′ and θ′ torsions are more suitable when interpreting crystallographic density because the C1′ atom is covalently bound to the nucleoside base and therefore can be more easily and accurately located within a low-resolution map.
While implementing the -torsion
option to analyze
to make it more explicit that 3DNA readily calculates conventional backbone torsion angles, I also take this opportunity to add the pseudo-torsion angles — η/θ and η′/θ′, among other new parameters. Moreover, while I am at it, I cannot help but also compute yet another set of pseudo-torsion angles: η″/θ″. Here, instead of C1′ or C4′, the origin of the base reference frame is employed; it can be taken as a pseudo-atom more accurately defined by the base plane than any real single atom.
The usefulness of η″/θ″, especially in comparison with η/θ and η′/θ′, remains to be determined. However, only η″/θ″ uniquely takes advantage of the two most accurately determined entities in a nucleic acid structure, the heavy phosphorus atom and the rigid base plane [see discussion (p.16) in the Richardson et al. MolProbity paper, Acta Cryst. (2010). D66, 12–21] Presumably, η″/θ″ provides a new perspective in RNA structural analysis by combining the backbone and the base.
Here is the pseudo-torsions for the yeast phenylalanine transfer RNA (6tna by simply running analyze -torsion=6tna.tor 6tna.pdb
):
Pseudo (virtual) eta/theta torsion angles:
Note: eta: C4'(i-1)-P(i)-C4'(i)-P(i+1)
theta: P(i)-C4'(i)-P(i+1)-C4'(i+1)
eta': C1'(i-1)-P(i)-C1'(i)-P(i+1)
theta': P(i)-C1'(i)-P(i+1)-C1'(i+1)
eta": Borg(i-1)-P(i)-Borg(i)-P(i+1)
theta": P(i)-Borg(i)-P(i+1)-Borg(i+1)
base eta theta eta' theta' eta" theta"
1 A:...1_:[..G]G --- -126.6 --- -141.5 --- -130.4
2 A:...2_:[..C]C 167.8 -168.3 174.6 -152.5 -151.4 -115.4
3 A:...3_:[..G]G 160.4 -119.8 -171.9 -138.9 -123.6 -119.2
4 A:...4_:[..G]G 148.0 -164.2 162.1 -159.2 -154.4 -124.6
5 A:...5_:[..A]A 168.7 -137.6 -175.9 -137.8 -129.5 -115.0
6 A:...6_:[..U]U 171.8 -145.7 -172.5 -140.5 -131.3 -124.7
7 A:...7_:[..U]U -151.0 -47.8 -136.0 -58.6 -117.7 -30.2
8 A:...8_:[..U]U 160.9 159.7 -161.0 -163.6 -144.2 178.0
9 A:...9_:[..A]A -137.0 -48.6 -158.1 -108.9 161.5 -104.7
10 A:..10_:[2MG]g 33.1 -135.8 93.4 -134.6 134.1 -113.0
11 A:..11_:[..C]C 167.2 -138.3 -179.4 -137.7 -142.4 -118.7
12 A:..12_:[..U]U 165.5 -120.7 -179.3 -128.0 -145.8 -106.7
13 A:..13_:[..C]C 174.1 -173.6 -165.5 179.6 -120.9 -180.0
14 A:..14_:[..A]A 173.0 -144.0 172.7 -132.4 177.6 -72.7
15 A:..15_:[..G]G 154.7 110.6 -176.2 85.5 -97.7 -76.9
16 A:..16_:[H2U]u 76.3 94.1 65.3 119.7 -152.8 -123.8
17 A:..17_:[H2U]u -36.7 -79.6 -50.7 -136.6 -142.7 -159.0
18 A:..18_:[..G]G -9.7 -166.8 41.7 -158.6 28.9 -120.4
19 A:..19_:[..G]G -131.6 -35.8 -122.9 -67.8 -104.3 -10.5
20 A:..20_:[..G]G 160.9 -93.2 -161.6 -98.9 -174.1 -112.3
21 A:..21_:[..A]A -83.6 152.5 -72.8 155.7 -59.1 155.4
22 A:..22_:[..G]G 164.1 169.4 160.0 -178.5 159.1 -157.6
23 A:..23_:[..A]A 177.6 -148.5 -174.5 -142.7 -154.5 -114.3
24 A:..24_:[..G]G 167.2 -98.9 -171.7 -128.6 -127.6 -99.1
25 A:..25_:[..C]C 151.6 -153.5 167.3 -140.8 -137.7 -84.8
26 A:..26_:[M2G]g 156.2 -137.4 -175.2 -135.2 -100.0 -104.2
27 A:..27_:[..C]C 166.2 -145.5 -177.9 -140.4 -129.1 -116.8
28 A:..28_:[..C]C 164.7 -140.5 175.8 -145.3 -152.7 -123.4
29 A:..29_:[..A]A 161.2 -145.3 175.7 -144.9 -142.0 -126.0
30 A:..30_:[..G]G -173.5 -120.3 -158.4 -133.2 -126.6 -94.4
31 A:..31_:[..A]A 169.8 -153.1 177.7 -140.4 -124.5 -81.5
32 A:..32_:[OMC]c 154.4 -126.8 -178.7 -131.3 -104.1 -128.0
33 A:..33_:[..U]U 170.0 -103.9 -179.9 -152.7 -164.6 143.6
34 A:..34_:[OMG]g -4.7 -123.7 41.8 -124.8 31.6 -99.6
35 A:..35_:[..A]A 163.5 -104.3 176.9 -127.9 -137.5 -128.2
36 A:..36_:[..A]A 175.9 173.6 180.0 -167.7 -156.4 -118.3
37 A:..37_:[.YG]g 166.8 -131.7 -174.5 -133.0 -115.1 -82.9
38 A:..38_:[..A]A 167.7 -121.6 -175.7 -114.3 -109.9 -79.9
39 A:..39_:[PSU]P 168.3 -146.8 -160.2 -146.4 -98.6 -116.5
40 A:..40_:[5MC]c 160.6 -138.7 174.0 -141.8 -139.7 -126.5
41 A:..41_:[..U]U 164.8 -161.4 175.9 -152.3 -150.5 -117.6
42 A:..42_:[..G]G 174.3 -140.9 -170.3 -145.4 -129.1 -121.3
43 A:..43_:[..G]G 169.6 -159.0 -176.2 -154.9 -133.7 -133.1
44 A:..44_:[..A]A 174.0 -121.5 -174.2 -122.0 -143.1 -74.9
45 A:..45_:[..G]G 174.4 -132.5 -166.2 -128.1 -101.8 -128.9
46 A:..46_:[7MG]g -112.8 -113.4 -127.2 -138.3 -139.8 -152.1
47 A:..47_:[..U]U -63.2 -53.8 -1.1 -92.0 22.8 -124.7
48 A:..48_:[..C]C -84.7 59.6 -20.1 8.9 19.3 -104.5
49 A:..49_:[5MC]c -56.8 -140.1 -29.9 -143.6 98.1 -125.4
50 A:..50_:[..U]U 173.6 -146.4 -178.3 -140.6 -147.6 -117.8
51 A:..51_:[..G]G 160.8 -148.1 -178.6 -150.7 -140.7 -121.9
52 A:..52_:[..U]U 164.9 -144.0 175.8 -143.5 -139.9 -114.3
53 A:..53_:[..G]G 168.2 -140.9 -171.1 -144.0 -121.6 -117.3
54 A:..54_:[5MU]u 167.0 -131.1 178.3 -124.9 -139.9 -77.0
55 A:..55_:[PSU]P 167.6 -114.2 -172.8 -155.6 -113.0 146.0
56 A:..56_:[..C]C 35.0 -121.5 52.6 -126.2 26.5 -83.8
57 A:..57_:[..G]G 168.4 -148.1 -177.1 -131.1 -115.4 -111.7
58 A:..58_:[1MA]a -136.3 -133.3 -106.5 -176.7 -105.3 149.6
59 A:..59_:[..U]U 23.0 -130.9 33.0 -115.4 48.2 -68.2
60 A:..60_:[..C]C -163.6 -54.3 -123.2 -76.4 -79.6 -36.4
61 A:..61_:[..C]C 125.5 -153.3 169.7 -144.7 -153.8 -123.4
62 A:..62_:[..A]A 172.5 -139.3 -177.0 -137.6 -150.7 -114.6
63 A:..63_:[..C]C 165.8 -146.6 -178.5 -149.8 -139.2 -127.8
64 A:..64_:[..A]A 164.7 -144.9 176.5 -145.8 -145.3 -118.1
65 A:..65_:[..G]G 170.4 -152.3 -175.5 -151.5 -132.3 -122.1
66 A:..66_:[..A]A 168.0 -152.0 -177.4 -150.2 -133.0 -118.7
67 A:..67_:[..A]A 170.9 -141.8 -178.4 -140.4 -134.8 -123.1
68 A:..68_:[..U]U 164.8 -135.1 -178.9 -137.9 -143.7 -95.2
69 A:..69_:[..U]U 168.2 -154.9 -174.3 -157.1 -112.2 -144.8
70 A:..70_:[..C]C 160.6 -153.2 170.7 -153.5 -164.4 -125.1
71 A:..71_:[..G]G 161.8 -144.3 172.1 -143.1 -145.7 -124.2
72 A:..72_:[..C]C 176.7 -136.4 -169.3 -134.5 -134.9 -87.1
73 A:..73_:[..A]A 160.6 -142.8 -179.7 -139.7 -112.8 -104.4
74 A:..74_:[..C]C -176.9 -115.9 -163.1 -115.4 -117.2 -68.7
75 A:..75_:[..C]C 169.8 80.9 -170.0 74.9 -108.5 -91.3
76 A:..76_:[..A]A --- --- --- --- --- ---
In nucleic acid structures, the chi (χ) torsion angle is about the glycosidic bond (N-C1′) that connects the sugar and the A/C/G/T/U bases (or their modified variants). Specifically, for pyrimidines (C, T and U), χ is defined by O4′-C1′-N1-C2; and for purines (A and G) by O4′-C1′-N9-C4 (see figure below).
Pseudouridine (5-ribosyluracil, PSU) was the first identified modified nucleoside in RNA and is the most abundant. PSU is unique in that it has a C-glycosidic bond (C-C1′) instead of the N-glycosidic bond common to all other nucleosides, canonical or modified. It thus poses a problem as to how to calculate the χ torsion angle: should it be O4′-C1′-C5-C4, reflecting the actual glycosidic bond connection, or should the conventional definition O4′-C1′-N1-C2 still be applied literally? As a concrete example, the figure below shows the (slightly) different numerical values (–162.7° vs. –163.9°), as given by the two definitions, for PSU 6 on chain A of the PDB entry 3cgp (based on the 2009 Biochemistry article by Lin & Kielkopf titled X-ray structures of U2 snRNA-branchpoint duplexes containing conserved pseudouridines).
Needless to say, the specific definition of the χ torsion angle for PSU in RNA structures is a very subtle point, and I am not aware of any discussion on this issue in literature. In 3DNA, PSU is identified explicitly, and χ is defined by O4′-C1′-C5-C4. In NDB and a couple of other tools I am familiar with, χ for PSU is defined by O4′-C1′-N1-C2. Again using 3cgp (figure below) as an example, 3DNA gives –162.7°, whilst NDB gives –163.9°. Additionally, this distinction in N-C1′ vs. C-C1′ connection also comes into play when calculating the perpendicular distance from the 3′ phosphorus atom to the glycosidic bond, as per Richardson et al.
Except for pseudouridine, a nucleoside in DNA/RNA contains an N-glycosidic bond that connects the base to the sugar. The chi (χ) torsion angle, which characterizes the relative base/sugar orientation, is defined by O4′-C1′-N1-C2 for pyrimidines (C, T and U), and O4′-C1′-N9-C4 for purines (A and G).
Normally (as in A- and B-form DNA/RNA duplex), χ falls into the ranges of +90° to +180°; –90° to –180° (or 180° to 270°), corresponding to the anti conformation (Figure below, top). Occasionally, χ has values in the range of –90° to +90°, referring to the syn conformation (Figure below, bottom). Note that in left-handed Z-DNA with CG repeating sequence, the purine G is in syn conformation whilst the pyrimidine C is anti.
Presumably, the χ-related anti / syn conformation is a simple geometric concept. Nevertheless, the N-glycosidic bond and the corresponding χ torsion angle illustrate that the base and the sugar are two separate entities, i.e. there is an internal degree of freedom between them. In this respect, it is worth noting that the Leontis-Westhod sugar edge for base-pair classification corresponds to the anti form (as applied to RNA) only. When a base is flipped over into the syn conformation, the “sugar edge”, defined in connection with the minor (shallow) groove side of a nitrogenous bases, simply does not exist.
Base-flipping (anti / syn conformation switch) is one of the factors associated with the two possible relative orientations of the two bases in a pair, characterized explicitly in 3DNA as of type M+N
or M–N
since the 2003 NAR paper (Figure 2, linked below). I re-emphasized this distinction in our 2010 GpU dinucleotide platform paper (in particular, see supplementary Figure S2). Unfortunately, this subtle (but crucial, in my opinion) point has never been taken seriously (or at all) by the RNA community, even with 3DNA’s wide adoption. However, as people know 3DNA deeper/better and take RNA base-pair classification more rigorously, I have no doubt that the simplicity of this explicit distinction and the resultant full quantification of each and every possible base pair using standard geometric parameters will gradually be appreciated.
As of 3DNA v2.1, the output of the χ torsion angle is also associated with its classification in anti / syn conformation, among other new features (see for example the output for 6tna).
The sugar puckers in DNA/RNA structures are predominately in either C3′-endo (A-DNA or RNA) or C2′-endo (B-DNA; see Figure below, left), corresponding to the A- or B-form conformation in a duplex. In these two sugar conformations, the distance between neighboring phosphorus (P) atoms and the orientation of P relative to the sugar/bases are also dramatically different (figure below, right).
Recently, I carefully re-read some articles on RNA backbone conformation by Richardson et al., including:
I became intrigued by one of their observations: i.e., the correlation between the sugar pucker and a simple distance parameter:
C3′-endo and C2′-endo sugar puckers are highly correlated to the perpendicular distance between the C1′–N1/9 glycosidic bond vector and the following phosphate: > 2.9 Å for C3′-endo and < 2.9 Å for C2′-endo. (p.16 from the MolProbity paper).
Out of curiosity and for a better understanding of this correlation, I played around with some sample cases both visually and numerically. Overall, this involves a simple geometric calculation, i.e., the shortest distance from a point to a line in three-dimensional space. Given below is the Octave/Matlab script for calculating the distances for G175 and U176 of PDB entry 1jj2 (the large ribosomal subunit of Haloarcula marismortui):
function d = get_p3_nc_dist(P3, C1, N)
C1_N = N - C1; # vector from C1′ to N
nv_C1_N = C1_N / norm(C1_N); # normalized vector
C1_P3 = P3 - C1; # vector from C1′ to P3
proj = dot(C1_P3, nv_C1_N);
d = norm(C1_P3 - proj * nv_C1_N);
end
## G175
P3 = [70.104 112.366 44.586];
C1 = [73.017 109.666 45.304];
N9 = [74.445 109.380 45.288];
d1 = get_p3_nc_dist(P3, C1, N9) # 2.2 Å -- C2′-endo
## U176
P3 = [66.871 116.402 46.804];
C1 = [68.213 112.454 49.279];
N1 = [69.678 112.480 49.438];
d2 = get_p3_nc_dist(P3, C1, N1) # 4.6 Å -- C3′-endo
The GpU dinucleotide used in the above example forms a platform (see figure below), where the sugar of G175 adopts a C2′-endo conformation, and that of U176 C3′-endo. Indeed, the distance for G175 is 2.2 Å (< 2.9 Å); whilst the value for U176 is 4.6 Å (> 2.9 Å).
Note that the Richardson et al. articles focus on the RNA backbone, without paying attention to the base (pair) geometry. The 3DNA Zp parameter, which is the mean z-coordinate of the two P atoms in the mean reference frame of a dinucleotide step (see figure below), has been readily adapted to single-stranded RNA structures. For example, the vertical distances of the 3′ P atoms to the G175 and U176 base planes are 1.9 Å and 4.4 Å, respectively. Since base planes and the P atoms are the two most accurately located entities in a given nucleic acid structure, the nucleotide-based Zp variant is presumably more robust and discriminative than the distance from P to the glycosidic bond.
This new single-stranded based “Zp” parameter is available as of 3DNA v2.1.
RNA has three salient structural features (compared to DNA): it contains the ribose (not deoxyribose) sugar, it has the uracil (not thymine) base, and it is normally single (not double)-stranded. The O2′(G)…O2P(U) H-bond stabilized GpU dinucleotide platform may turn out to be the smallest unit with all those RNA hallmarks.
First, it must have the guanosine ribose to have the 2′-hydroxyl group form the O2′(G)…O2P(U) H-bond.
Second, the methyl group in position 5 of thymine would cause steric clash with guanosine, thus disrupting the N2(G)…O4(U) base-base H-bond to form the GpU dinucleotide platform.
Third, a dinucleotide, by definition, is single-standed. The two H-bonds, plus the covalent linkage, makes the GpU platform extremely rigid (see Figure 1 of our 2010 NAR paper).
Moreover, the GpU platform is directional: swapping the two bases while keeping the sugar-phosphate backbone fixed does not allow for a base-base H-bond, thus no UpG dinucleotide platform.
It worth noting that state-of-the-art quantum chemistry calculations have verified the importance of the O2′(G)…O2P(U) H-bond in stabilizing the GpU dinucleotide platform.
The least-squares (LS) fitting procedures presented below make use of well known mathematics. Indeed, the methods are so well known and widely used that it is somewhat difficult to locate the original references. In our previous effort to resolve the discrepancies among nucleic acid conformational analysis programs, we came across a variety of LS fitting procedures. Here we provide a detailed description, with step-by-step examples, of our implementation in 3DNA of two LS fitting algorithms based on a covariance matrix and its eigen-system. This post is the revised version of a note first made available in the “Technical Details” section of earlier 3DNA websites.
LS fitting between standard and experimental bases
Three analysis schemes — CompDNA, Curves/Curves+, and RNA — use LS procedures to fit a standard base with an embedded reference frame to an observed base structure. CompDNA
and Curves/Curves+
take advantage of the conventional approach of McLachlan [“Least Squares Fitting of Two Structures.” J. Mol. Biol., 128, 74-79 (1979)], while the RNA
program implements a closed-form solution of absolute orientation using unit quaternions first introduced by Horn. The two algorithms are mathematically equivalent for the most general cases, since the unit quaternion can be transformed to the rotation matrix given by McLachlan. The Horn method, however, is more straightforward and generally applicable; it can be applied even when one or both of the structures are perfectly planar, whereas the McLachlan approach fails.
Here we use the ideal adenine geometry derived from the high resolution crystal structures of model nucleosides, nucleotides, and bases. The x-, y-, and z-coordinates of the standard base, taken from the NDB, are listed below in the columns labeled sx
, sy
, and sz
, respectively. s_(average)
is the geometric center of the base.
sx sy sz
1 N9 0.213 0.660 1.287
2 C4 0.250 2.016 1.509
3 N3 0.016 2.995 0.619
4 C2 0.142 4.189 1.194
5 N1 0.451 4.493 2.459
6 C6 0.681 3.485 3.329
7 N6 0.990 3.787 4.592
8 C5 0.579 2.170 2.844
9 N7 0.747 0.934 3.454
10 C8 0.520 0.074 2.491
------------------------------------
s_(average): 0.4589 2.4803 2.3778
We similarly describe the coordinates of one of the adenine bases (the fifth nucleotide in the sequence strand) from the high resolution (1.4 Å) self-complementary d(CGCGAATTCGCG) dodecamer duplex determined by Williams and co-workers (PDB id: 355d). The experimental xyz coordinates are listed below in the columns labeled ex
, ey
, and ez
. The geometric center is e_(average)
. Note that the atomic serial numbers from the PDB (first column) have been rearranged so that the atoms are in the same order as those of the ideal base listed above.
ex ey ez
91 N9 16.461 17.015 14.676
100 C4 15.775 18.188 14.459
99 N3 14.489 18.449 14.756
98 C2 14.171 19.699 14.406
97 N1 14.933 20.644 13.839
95 C6 16.223 20.352 13.555
96 N6 16.984 21.297 12.994
94 C5 16.683 19.056 13.875
93 N7 17.918 18.439 13.718
92 C8 17.734 17.239 14.207
------------------------------------
e_(average):16.1371 19.0378 14.0485
We collect the two sets of xyz coordinates in the 10 × 3 matrices S
and E
corresponding respectively to the standard and experimental bases. We then construct the 3 × 3 covariance matrix C
between S
and E
using the following formula:
1 1
C = ------- [S' E - --- S' i i' E]
n - 1 n
=
0.2782 0.2139 -0.1601
-1.4028 1.9619 -0.2744
1.0443 0.9712 -0.6610
Here n
, the number of atoms in each base, is 10, and i
is an n x 1 column vector consisting of only ones. S'
and i'
are the transpose of matrix S
and column vector i
, respectively.
From the nine elements of the C
matrix, we subsequently generate the 4 × 4 real symmetric matrix M
using the expression:
| c11+c22+c33 c23-c32 c31-c13 c12-c21 |
M = | c23-c32 c11-c22-c33 c12+c21 c31+c13 |
| c31-c13 c12+c21 -c11+c22-c33 c23+c32 |
| c12-c21 c31+c13 c23+c32 -c11-c22+c33 |
=
1.5792 -1.2456 1.2044 1.6167
-1.2456 -1.0228 -1.1890 0.8842
1.2044 -1.1890 2.3447 0.6968
1.6167 0.8842 0.6968 -2.9011
The largest eigenvalue of matrix M
is 4.0335, and its corresponding unit eigenvector is:
[ q0 q1 q2 q3 ] = [ 0.6135 -0.2878 0.7135 0.1780 ]
The rotation matrix R
is deduced from the above eigenvector as below:
| q0q0+q1q1-q2q2-q3q3 2(q1q2-q0q3) 2(q1q3+q0q2) |
R = | 2(q2q1+q0q3) q0q0-q1q1+q2q2-q3q3 2(q2q3-q0q1) |
| 2(q3q1-q0q2) 2(q3q2+q0q1) q0q0-q1q1-q2q2+q3q3 |
=
-0.0817 -0.6291 0.7730
-0.1923 0.7710 0.6072
-0.9779 -0.0990 -0.1839
Following coordinate transformation with matrix R
, the origin of the standard base is found to be displaced from the experimental structure by:
o = e_(average) - s_(average) R' = [15.8969 15.7701 15.1802]
The least-squares fitted coordinates (F
) of the standard base atoms on the experimental structure are then given by:
F = S R' + i o
=
16.4592 17.0194 14.6699
15.7747 18.1925 14.4586
14.4899 18.4519 14.7542
14.1729 19.6974 14.4070
14.9343 20.6404 13.8420
16.2222 20.3472 13.5569
16.9832 21.2875 12.9925
16.6829 19.0585 13.8760
17.9183 18.4437 13.7219
17.7335 17.2396 14.2062
Here S
is the (n x 3) matrix of original coordinates of the standard base, and as noted above, i
is an n x 1 column vector consisting of only ones.
The difference matrix (D
) between F
and E
, the (n x 3) matrix of original coordinates of the experimental base, and the root-mean-square (RMS) deviation between the two structures are found as:
D = E - F
=
0.0018 -0.0044 0.0061
0.0003 -0.0045 0.0004
-0.0009 -0.0029 0.0018
-0.0019 0.0016 -0.0010
-0.0013 0.0036 -0.0030
0.0008 0.0048 -0.0019
0.0008 0.0095 0.0015
0.0001 -0.0025 -0.0010
-0.0003 -0.0047 -0.0039
0.0005 -0.0006 0.0008
RMS deviation = 0.0054
It should be noted that if the standard base is already defined in terms of its reference frame, as in 3DNA (e.g., $X3DNA/config/Atomic_A.pdb
), the vector o
and the matrix R
represent the best-fitted coordinate frame of the experimental base. Moreover, the three axes of the frame given by R
are guaranteed to be orthonormal. If you want to get an insight of the LS fitting algorithm and a better understanding of how 3DNA derives its base reference frame, it’d be a valuable experience to repeat the above procedure with $X3DNA/config/Atomic_A.pdb
.
Note: the algorithm does not apply to a molecule vs its inversion (an improper rotation) — thanks to Boris Averkiev for reporting this subtle point (see comments below). One possible remedy is to treat this edge case separately.
Base normal
Rather than fit a standard base to experimental coordinates, the CEHS, FREEHELIX, and NUPARM analyses perform a fitting of a LS plane to a set of atoms in order to define the base and base-pair normals. The covariance matrix based on the n x 3 matrix of experimental Cartesian coordinates E
is diagonalized to find the vector normal to the best plane. Specifically, C
is obtained using the above formula with S
substituted by E
. The normal vector then lies along the eigenvector that corresponds to the smallest eigenvalue. Note that the coefficient 1/(n-1)
in the formula for calculating C
has no effect on the direction of the eigenvectors but scales the magnitudes of the eigenvalues.
Using the above adenine base from the high resolution dodecamer duplex as an example, the covariance matrix C
is:
C =
1.6680 -0.5015 -0.3253
-0.5015 2.0670 -0.5840
-0.3253 -0.5840 0.3061
The smallest eigenvalue of C
, 8.26e-5
, indicates that the base is almost perfectly planar. The corresponding unit eigenvector corresponding to the base normal is:
Base normal: 0.2737 0.3224 0.9062
Related topics:
Over the years, the fiber
utility program has become a handy way to generate standard B-DNA and A-DNA structures, as evident from citations to 3DNA. Nevertheless, the currently collected 55 experimental fiber models, comprehensive as they are, do not include one for canonical double-stranded (ds) RNA or single-stranded (ss) RNA structures of generic A/C/G/U sequence.
This situation is best illustrated by a recent article by Charles Brooks and Hashim Al-Hashimi and their co-workers, titled Unraveling the structural complexity in a single-stranded RNA tail: implications for efficient ligand binding in the prequeuosine riboswitch [Nucleic Acids Research, 40(3) 1345–1355 (2012)] , where they wrote:
Idealized A-form structures were constructed using Insight II (Molecular Simulations, Inc.) correcting the propeller twist angles from +15° to –15° using an in-house program, as previously described (47). The complementary strand was removed and the resulting ssRNA used in NMR data analysis. B-form helices were constructed using W3DNA (48).
As of 3DNA v2.1, however, that’s no longer the case: now the fiber
utility provides direct support for generating idealized dsRNA or ssRNA structures of arbitrary A/C/G/U sequence. As always, the new functionality can be best illustrated with examples. Let’s build ssRNAs of the wild-type (5’-AUAAAAAACUAA-3’) and A29C mutated form (5’-AUAACAAACUAA-3’) used in the work cited above:
fiber -r -s -seq=AUAAAAAACUAA wt-12nt.pdb
fiber -r -s -seq=AUAACAAACUAA mt-12nt.pdb
Here the -r
option is for RNA, -s
for a ss structure, and -seq
for the specific base sequence. The generated ssRNA structure for the wild-type sequence is named wt-12nt.pdb, and that for the mutated sequence named mt-12nt.pdb.
Note that the new RNA model is based on Struther Arnott’s work of fiber A-DNA from calf thymus (#1 in the list). The dsRNA, as its dsDNA counterpart, has a helical twist of 32.7° and a helical rise of 2.548 Å. Relevant to the above citation, here the propeller twist angle of each base pair is –10.5°, a negative value similar to that observed in high-resolution x-ray crystal structures. Furthermore, you can easily verify the three numbers with the following commands:
fiber -r -seq=AUAAAAAACUAA wt-12nt.pdb
find_pair wt-12nt.pdb stdout | analyze stdin
In summary, it is very easy to generate canonical RNA structures with the revised fiber
command. Through its integrated analysis routine, 3DNA can also be used to check structural features of the resultant RNA models. Moreover, as mentioned in the opening post What can 3DNA do for RNA structures? on the forum, 3DNA has much to offer in the filed of RNA structural bioinformatics.