As of v2.1, I’ve switched from Perl to Ruby as the scripting language for 3DNA. Consequently, the Perl scripts in previous versions of 3DNA (v1.5 and v2.0) are now obsolete. I’ll only correct bugs in existing Perl scripts, but will not add any new features.
For back reference, the scripts are still available from a separate directory $X3DNA/perl_scripts
, with the following contents:
OP_Mxyz* dcmnfile* nmr_strs*
README del_ms* pdb_frag*
block_atom* expand_ids* x3dna2charmm_pdb*
blocview.pl* manalyze* x3dna_r3d2png*
bp_mutation* mstack2img* x3dna_setup.pl*
cp_std* nmr_ensemble* x3dna_utils.pm
Among them, x3dna_setup.pl
and blocview.pl
have corresponding Ruby versions: x3dna_setup
and blocview
. Actually, the .pl
file extension (for Perl) was added to avoid confusion with the new Ruby scripts.
Some of the functionalities have been incorporated into the Ruby script x3dna_utils
:
------------------------------------------------------------------------
A miscellaneous collection of 3DNA utilities
Usage: x3dna_utils [-h|-v] sub-command [-h] [options]
where sub-command must be one of:
block_atom -- generate a base block schematic representation
cp_std -- select standard PDB datasets for analyze/rebuild
dcmnfile -- remove fixed-name files generated with 3DNA
x3dna_r3d2png -- convert .r3d to image with Raster3D or PyMOL
------------------------------------------------------------------------
--version, -v: Print version and exit
--help, -h: Show this message
Along the same line, ensemble-related functionalities (for NMR or molecular dynamics simulations) have been consolidated and extended into the new Ruby script x3dna_ensemble
:
------------------------------------------------------------------------
Utilities for the analysis and visualization of an ensemble
Usage: x3dna_ensemble [-h|-v] sub-command [-h] [options]
where sub-command must be one of:
analyze -- analyze MODEL/ENDMDL delineated ensemble (NMR or MD)
block_image -- generate a base block schematic image
extract -- extract structural parameters after running 'analyze'
reorient -- reorient models to a particular frame/orientation
------------------------------------------------------------------------
--version, -v: Print version and exit
--help, -h: Show this message
Conceivably, C programs in 3DNA can also be consolidated. For backward compatibility, however, all existing C programs will be kept — and refined as necessary — in the current 3DNA v2.x series. As of v3.x, I’ll completely re-organize 3DNA incorporating my years of experience in programming languages and knowledge of macromolecular structures.
In 3DNA, each base pair (bp) is specified by the identity of its two comprising nucleotides (nts), and their interactions. Some examples are shown below based on the PDB entry 1ehz (the crystal structure of yeast phenylalanine tRNA at 1.93 Å resolution), with the shorthand form on the right:
....>A:...1_:[..G]G-----C[..C]:..72_:A<.... G-C
....>A:...4_:[..G]G-*---U[..U]:..69_:A<.... G-U
....>A:...9_:[..A]A-**+-A[..A]:..23_:A<.... A+A
....>A:..15_:[..G]G-**+-C[..C]:..48_:A<.... G+C
....>A:..26_:[M2G]g-**--A[..A]:..44_:A<.... g-A
Specification of a nucleotide
The nt specification string consists of 6 fields and follows the pattern below, with the number of characters in each field inside the parentheses:
modelNum(4)>chainId(1):ntNum(4)insCode(1):[ntName(3)]baseName(1)
- modelNum(4) — the model number is up to 4 digits, right-justified, with each leading space replaced by a dot. If no model number is available, as is the case for 1ehz (and virtually all other x-ray crystal structures in the PDB), it is written as
....
(4 dots).
- chainId(1) — the chain id is 1-char long, with space replaced by underscore.
- ntNum(4) — the nt residue number, handled as for the model number.
- insCode(1) — insertion code, handled as for the chain id.
- ntName(3) — the nt residue name is up to 3-char long, right-justified, with each leading space replaced by a dot.
- baseName(1) — the base name is 1-char long, mapped from ntName(3) following
$X3DNA/config/baselist.dat
. Note that modified nucleotides are put in lower case to distinguish them from the canonical ones — for example, M2G
to g
.
For the complementary base in a bp, the order of the 6 fields is reversed — see examples above. To see the full list of nts in a PDB data file, run: find_pair -s 1ehz.pdb stdout
(here using 1ehz as an example).
Specification of a base pair
The pattern of a bp is M-xyz-N
, where M and N are 1-char base names (as in aforesaid field #6), and the three characters xyz
have the following meaning:
z
— the sign of the dot product of the z-axes of the M
and N
base reference frames. It is positive (+
) if the two z-axes point in similar directions, as in Hoogsteen or reverse Watson-Crick bps. Conversely, it is negative (-
) when the two z-axes point in opposite directions, as in the canonical Watson-Crick and Wobble bps. See figure below:
y
— it is -
if M
and N
are in a so-called Watson-Crick geometry (the two y-axes of the M and N base reference frames are anti-parallel, so are the two z-axes, whilst the two x-axes are parallel), e.g., the G-U Wobble pair; otherwise, *
.
x
— it is -
for Watson-Crick bps, otherwise, *
.
By design, Watson-Crick bps would be of the pattern M-----N
, Wobble bps M-*---N
, and non-canonical bps M-**+-N
or M-**--N
. Thus by browsing through the 3DNA output, users can readily identify these three bp types.
The shortened form is represented as MzN
; following aforementioned notation, it can be either M-N
or M+N
. The relative direction of the two z-axes is critical in effecting 3DNA-calculated bp (and step) parameters, as detailed in the 2003 3DNA NAR paper:
To calculate the six complementary base pair parameters of an M–N pair (Shear, Stretch, Stagger, Buckle, Propeller and Opening), where the two z‐axes run in opposite directions, the reference frame of the complementary base N is rotated about the x2‐axis by 180°, i.e. reversing the y2‐ and z2‐axes in Figure 2a. Under this convention, if the base pair is reckoned as an N–M pair, rather than an M–N pair, the x‐axis parameters (Shear and Buckle) reverse their signs. For an M+N pair, e.g. the Hoogsteen A+U in Figure 2b, the x2‐, y2‐ and z2‐axes do not change sign; thus all six parameters for an N+M pair are of opposite sign(s) from those for an M+N pair.
The M-N
and M+N
bp designation is unique to 3DNA. In combination with the corresponding 6 bp parameters (shear, stretch, stagger, buckle, propeller, and opening), 3DNA provides a rigorous description of all possible bps. This contrasts and complements with the conventional Saenger scheme and the 3-edge based Leontis/Westhof notation.
The 3DNA M-N
vs M+N
bp designation is base-centric, without concerning the sugar-phosphate backbone. The chi (χ) torsion angle, which characterizes base/sugar relative orientation, can be in either anti or syn conformation; thus similar backbone(S) can accommodate either M-N
or M+N
.
As the old saying goes, a picture is worth a thousand words. To help you have a better idea of what 3DNA/DSSR is about, we’ve collected the following pictures; they serve to demonstrate selected features from 3DNA/DSSR’s versatile functionality.
Schematic diagram of base-pair parameters
Influence of Slide and Roll on DNA helical conformation
Roll-introduced DNA bending
Global bending of DNA associated with selective B → A conformational transformation
Canonical fiber models of A-, B-, C- and Z-DNA
3DNA-generated view of a four-way DNA–RNA junction (1egk)
3DNA-detected pentaplets in the large ribosomal subunit (1jj2)
Nucleic-acid-containing structures generated with w3DNA
Analysis of DNA with a B-Z junction (2acj, left) and detection of hydration patterns (right)
Schematics images auto-generated via blocview
A video overview of DSSR
DSSR (Dissecting the Spatial Structure of RNA) is an integrated software tool for the analysis/annotation, model building, and schematic visualization of 3D nucleic acid structures (see the figures below and the video overview). It is built upon the well-known, tested, and trusted 3DNA suite of programs. DSSR has been made possible by the developer’s extensive user-support experience, detail-oriented software engineering skills, and expert domain knowledge accumulated over two decades. It streamlines tasks in RNA/DNA structural bioinformatics, and outperforms its ‘competitors’ by far in terms of functionality, usability, and support.
Wide citations. DSSR has been widely cited in scientific literature, including: (i) “Selective small-molecule inhibition of an RNA structural element” (Nature, 2015; Merck Research Laboratories), (ii) “The structure of the yeast mitochondrial ribosome” (Science, 2017), (iii) “RNA force field with accuracy comparable to state-of-the-art protein force fields” (PNAS, 2018; D. E. Shaw Research), (iv) “Predicting site-binding modes of ions and water to nucleic acids using molecular solvation theory” (JACS, 2019), (v) “RIC-seq for global in situ profiling of RNA-RNA spatial interactions” (Nature, 2020), and (vi) “DNA mismatches reveal conformational penalties in protein-DNA recognition” (Nature, 2020).
Broad integrations. To make DSSR as widely accessible as possible, I have initiated collaborations with the principal developers of Jmol and PyMOL. The DSSR-Jmol and DSSR-PyMOL integrations bring unparalleled search capabilities (e.g., ‘select junctions’ for all multi-branch loops) and innovative visualization styles into 3D nucleic acid structures. DSSR has also been adopted into numerous other structural bioinformatics resources, including: (i) URS, (ii) RiboSketch, (iii) RNApdbee, (iv) forgi, (v) RNAvista, (vi) VeriNA3d, (vii) RNAMake, (viii) ElTetrado, (ix) DNAproDB, (x) LocalSTAR3D, (xi) IPANEMAP, and (xii) RNANet.
Advanced features. DSSR may be licensed from Columbia University. DSSR Pro is the commercial version. It has more functionalities than DSSR basic (the free academic version), including: (i) homology modeling via in silico base mutations, a feature employed by Merck scientists, (ii) easy generation of regular helical models, including circular or super-helical DNA (see figures below), (iii) creation of customized structures with user-specified base sequences and rigid-body parameters, (iv) efficient processing of molecular dynamics (MD) trajectories, (v) detailed characterization of DNA-protein or RNA-protein spatial interactions, and (vi) template-based modeling of DNA-protein complexes (see figures below). DSSR Pro supersedes 3DNA. It integrates the disparate analysis and modeling programs of 3DNA under one umbrella, and offers new advanced features, through a convenient interface. For example, with the mutate module of DSSR Pro, one can automatically perform the following tasks: (i) mutate all bases to Us, (ii) mutate bases in hairpin loops to Gs, and (iii) mutate G–C Watson-Crick pairs to C–G, and A–U to U–A. Moreover, DSSR Pro includes an in-depth user manual and one-year technical support from the developer.
Quality control. DSSR is a solid software product that excels in RNA structural bioinformatics. It is written in strict ANSI C, as a single command-line program. It is self-contained, with zero runtime dependencies on third-party libraries. The binary executables for macOS, Linux, and Windows are just ~2MB. DSSR has been extensively tested using all nucleic-acid-containing structures in the PDB. It is also routinely checked with Valgrind to avoid memory leaks. DSSR requires no set up or configuration: it simply works.
Theoretical models of G-quadruplexes, created using DSSR Pro.
Template-based modeling of DNA-protein complexes using DSSR Pro.
Here are two chromatin-like models using PDB entry 4xzq as the template.
Circular DNA duplexes modeled using DSSR Pro.
DNA super helices modeled using DSSR Pro.
Innovative cartoon-block schematics enabled by the DSSR-PyMOL integration for six representative PDB entries. Watson-Crick pairs are shown as long blocks with minor-groove edges in black (A, B), G-tetrads represented as square blocks and the metal ion as sphere ©, the ligand rendered as balls-and-sticks (D), and proteins depicted as purple cartoons (E, F). Color code for base blocks: A, red; C, yellow; G, green; T, blue; U, cyan; G-tetrad, green; WC-pairs, per base in the leading strand. Visit http://skmatic.x3dna.org.
Recommended in Faculty Opinions: “simple and effective”, “Good for Teaching”.
Employed by the NDB to create cover images of the RNA Journal.
The following links point to tools that are relevant to 3DNA.
- Curves+ — an updated version of the well-known Curves program, and it conforms to the standard base reference frame.
- 3D-DART — 3DNA-Driven DNA Analysis and Rebuilding Tool. Another web-interface to commonly used 3DNA functionality.
- do_x3dna — “do_x3dna has been developed for analysis of the DNA/RNA dynamics during the molecular dynamics simulations. It uses the 3DNA package to calculate several structural descriptors of DNA/RNA from the GROMACS MD trajectory. It executes 3DNA tools to calculate these descriptors and subsequently, extracts these output and saves into external output files as a function of time.”
- SwS — a Solvation web Service for Nucleic Acids where 3DNA plays a role.
- Raster3D — a set of tools for generating high-quality raster images of proteins or other molecules.
- MolScript — a program for displaying molecular 3D structures, such as proteins, in both schematic and detailed representations.
- Jmol — an open-source Java viewer for chemical structures in 3D with features for chemicals, crystals, materials, and biomolecules.
- PyMOL — a user-sponsored molecular visualization system on an open-source foundation.
- ImageMagick — a software suite to create, edit, compose, or convert bitmap images.
- NDB — Nucleic acids database.
- SBGrid — Excellent services for structural biology laboratories as well software developers.
The v2.1 release of 3DNA, currently in beta, contains many refinements of existing C programs, a complete migration from Perl scripts to Ruby, and additions of several significant new programs. All know bugs in v2.0 have been fixed. Highlights include:
- Added mutate_bases to perform in silico base mutations in nucleic-acid-containing structures (DNA, RNA, and their complexes with ligands and proteins). The program has two key and unique features: (1) the sugar-phosphate backbone conformation is untouched; (2) the base reference frame (position and orientation) is reserved, i.e., the mutated structure shares the same base-pair/step parameters as those of the native structure.
- Added
x3dna_ensemble
, a Ruby script to automate the processing of an NMR structure ensemble or MD trajectories in MODEL/ENDMDL delineated PDB format. It has sub-commands analyze
, extract
, reorient
, and block_iamge
. To add: convert
to transform Amber, Gromacs or CHARMM trajectories.
- Enhanced
find_pair
with -c+
option for generating input to Curves+.
- Expanded
fiber
with the -s
option for generating single stranded structures; the -seq
option for specifying base sequence directly on the command line; and the -r
option for generating RNA structures (single or double stranded) of arbitrary ACGU sequences.
- Updated the ‘baselist.dat’ file to incorporate all types of NDB/PDB nucleotides as of February 15, 2015; refined
find_pair/analyze/mutate_bases
etc to automatically detect and assign of modified bases.
- Renamed Atomic_a.pdb and Atomic.a.pdb etc for modified bases to account for Mac OS X filesystem case sensitivity issue; Copied all Perl scripts to a new directory
perl_scripts/
.
- 3DNA now generates PDB files that are compliant with PDB format v3.x, and also has option to allow for three-letter nucleotide names, thus directly compatible with PdbViewer and HADDock. An option is provided to convert 3DNA-generated base rectangular blocks in Alchemy to the more widely accepted MDL molfile format (e.g. by PyMOL).