Thanks to Google scholar, I recently become aware of the article by Mohammed AlQuraishi & Harley McAdams (2012) Three enhancements to the inference of statistical protein-DNA potentials” in Proteins: Structure, Function, and Bioinformatics. Reading through the text, I like it quite a bit. The abstract summarize the work well:
The energetics of protein-DNA interactions are often modeled using so-called statistical potentials, that is, energy models derived from the atomic structures of protein-DNA complexes. Many statistical protein-DNA potentials based on differing theoretical assumptions have been investigated, but little attention has been paid to the types of data and the parameter estimation process used in deriving the statistical potentials. We describe three enhancements to statistical potential inference that significantly improve the accuracy of predicted protein-DNA interactions: (i) incorporation of binding energy data of protein-DNA complexes, in conjunction with their X-ray crystal structures, (ii) use of spatially-aware parameter fitting, and (iii) use of ensemble-based parameter fitting. We apply these enhancements to three widely-used statistical potentials and use the resulting enhanced potentials in a structure-based prediction of the DNA binding sites of proteins. These enhancements are directly applicable to all statistical potentials used in protein-DNA modeling, and we show that they can improve the accuracy of predicted DNA binding sites by up to 21%.
I’m glad to find that the 3DNA mutate_bases program was used in deriving the statistical potentials of protein-DNA interactions:
The relative binding affinity of a protein to two different DNA sequences can be evaluated by computing the binding energy of the protein to those two sequences. This is done by mutating the DNA sequence in silico while keeping the protein fixed. We used the 3DNA software package for mutating DNA23,24, which maintains the backbone atoms of the DNA molecule but replaces the basepair atoms in a way that is consistent with the backbone orientation of the DNA.
For each base position, in silicon structural mutants are generated using 3DNA23,24 to mutate the basepair to include all four possibilities.
This is exactly one of the use cases I have in mind while creating the program:
Overall, mutate_bases has been designed to solve the in silica base mutation problem in a practical sense: robust and efficient, getting its job done and then out of the way. The program can have many possible applications: in addition to perform base-pair mutations in DNA-protein complexes, it should also prove handy in RNA modeling and in providing initial structures for QM/MM/MD energy calculations, and in DNA/RNA modeling studies.
With the recent refinement to allow for 3-letter nucleotide name in the standard base-reference frame file, mutate_bases now makes it exceedingly easy to mutate cytosine to 5-methylcytosine.
As more people get to know this 3DNA functionality, I am confident that mutate_bases will be more widely used.
Base-stacking interactions stabilize nucleic acid structures. Many ways exist to account for such interactions, including quantum chemical calculations (see for example the review by Sponer et al. [2008] on Nature and magnitude of aromatic stacking of nucleic acid bases.). In 3DNA, base-stacking interactions are assessed from planar projections of the ring and exocyclic atoms in consecutive bases or base pairs; the larger the overlap area, the stronger the stacking interactions, and vice versa.
Over the years, I’ve seen a few publications taking advantage of this 3DNA parameter. Here are two recent ones:
To analyze the role of the sequence regularity for the double-helical structure, we calculated the overall overlapping of base pairs (stacking) at every step of the two duplexes of 20mer pG(CUG)6C and the duplex of 19mer pGG(CGG)3(CUG)2CC using the program 3DNA (Lu & Olson, 2003).
Basepair overlap values are calculated by 3DNA software.35
Hopefully, more 3DNA users would notice this ‘little’ feature and make good use of it.
In the ‘Advance Access’ section of Nucleic Acids Research, published on September 12, 2012 (DOI: 10.1093/nar/gks856), I came across the paper FRETmatrix: a general methodology for the simulation and analysis of FRET in nucleic acids by Søren Preus et al.. In this work, the authors developed a methodological platform (implemented in the Matlab package ‘FRETmatrix’) to simulate the base-base FRET in order to elucidate the structure and dynamics of nucleic acids.
Reading through the text, I am pleased to find that the authors take advantage of the matrix-based Calladine and El Hassen Scheme (CEHS) for ‘building nucleic acid geometrical models’, and kindly cite SCHNArP, 3DNA, and the standard base-reference frame paper. They provide a succinct description of the model building process, and also note the connection between CEHS and SCHNArP. From the very beginning, I appreciated the elegance of the CEHS method — it is simple, mathematical rigorous, and generally applicable for quantifying the relative position and orientation between any two rigid bodies. SCHNAaP/SCHNArP implements the analysis/rebuilding components of CEHS in an expanded form, and CEHS further serves as a corner stone of 3DNA.
Another point worth noting is Figure 3 (see below) where the authors present (a–c) Representative examples of output geometries produced by FRETmatrix (right) along with the block representation of the corresponding structures produced by 3DNA (28) (left). To the best of my memory, this is one of the very few times where 3DNA’s blocview functionality is explicitly cited.
While browsing the August 2012 40(14) issue of Nucleic Acids Research (NAR), I noticed the following four papers that cite 3DNA:
The local base pair step parameters as calculated by x3dna (37,38) are represented in the Supplementary Figure S2.
The initial extended single-stranded DNA structure was obtained using the 3DNA program (15).
DNA structures were analyzed using 3DNA (31).
Each of these DNA structural models consists of values for all base-pair step parameters (roll, twist, tilt, rise, shift and slide) for each dinucleotide or trinucleotide. This enabled us to convert DNA sequences into 3D coordinates by using the rebuilding part of 3DNA (39), a program for analysis, rebuilding and visualization of 3D nucleic acid structures.
The above four NAR papers appear in the sections “Nucleic Acid Enzymes” (1), “Structural Biology” (2) and “Methods Online” (1), and cover research areas of DNA-protein interactions (3) and G-quadruplex structures (1). As quoted above, two papers employ the analyzing components of 3DNA, while the other two take advantage of its rebuilding facilities.
Between the two primary 3DNA publications, the 2003 NAR paper (NAR03) is cited twice, while the 2008 Nature Protocol paper (NP08) is cited three times. Apparently, after some time lag, NP08 has gradually overpassed NAR03 to become the community’s favorite citation for 3DNA.