Over the past couple of weeks, I’ve added two more DSSR options,
--nmr, that are closely related to an ensemble of MODEL/ENDMDL-delineated structures in PDB files. However, there exist subtle differences between the two cases, and the usage of the same MODEL/ENDMDL ensemble format can be ambiguous to the uninitiated. This blog post aims to clarify the issues, using concrete examples.
--symmetry options applies to X-ray crystal structures where an asymmetric unit represents only part of the whole biological assembly. In standard PDB format, the asymmetric unit contains instructions to produce crystallographic symmetry
related molecules.. Nevertheless, the biological assembly are also provided by the PDB (or NDB), with coordinate files ending with
.pdb1 or such. For example, the PDB entry 2d94 has the single-stranded sequence GGGCGCCC in its asymmetric unit (
2d94.pdb). It is the biological assembly in file
2d94.pdb1 that contains the DNA double helix.
x3dna-dssr -i=2d94.pdb # no pairs found x3dna-dssr -i=2d94.pdb1 # still no pairs found x3dna-dssr -i=2d94.pdb1 --symm # 8 pairs found x3dna-dssr -i=2d94.pdb --symm # no pairs found
As shown by the above examples, DSSR by default reads only the first model even given the biological assemble file
2d94.pdb1. It is with
--symmetry (abbreviated to
--symm) explicitly specified that DSSR takes all models in the input biological assemble file into consideration. The last case also illustrates that DSSR does not generate crystallographic symmetry related molecules. The
--symm simply informs DSSR to take all models, which already exist in the input file, into consideration.
On the other hand, the
--nmr option is for auto-processing an ensemble of structures solved by solution NMR method (or trajectories of molecular dynamics simulations). The key point here is that each of the MODEL/ENDMDL-delinated structures is independent and thus can be processed separately, even though they are obviously closely related. Using the PDB entry 2n2d as an example, here are some sample usages:
x3dna-dssr -i=2n2d.pdb -o= 2n2d-first.out # only the first structure is processed x3dna-dssr -i=2n2d.pdb --nmr -o=2n2d-all.out # all 10 structures are processed x3dna-dssr -i=2n2d.pdb --nmr --json -o=2n2d-all.json # ibid., with output in JSON
Note that the NMR file is named
2n2d.pdb, and it contains 10 structures.
Interesting mixes show up when an X-ray biological assembly with multiple MODEL/ENDMDL entries is analyzed with
--nmr, or an NMR entry is handled with
--symmetry. Here are two such examples:
x3dna-dssr -i=2d94.pdb1 --nmr -o=temp # models 1 and 2 are handled sepatately x3dna-dssr -i=2n2d.pdb --symm -o=temp # wrong -- does not make sense!
In summary, the
--symmetry option is intended to treat symmetry-related molecules as a whole, as in a biological assembly of X-ray crystal structures. In contrast, the
--nmr option aims to automate the analysis of each structure in a MODEL/ENDMDL-delineated ensemble, as in NMR structures or trajectories of MD simulations. The distinction between the two MODEL/ENDMDL usages is most clearly seen via a molecular visualization program: for example, check the figure below for
2d94.pdb1 (left) and
2n2d.pdb (right) when all frames are selected using Jmol.
|2d94 (2 models)||2n2d (10 models)|