Name of base atoms in PDB formats

Canonical bases (A, C, G, T and U) in nucleic acid structures have standard atom names, shown below using the Watson-Crick A–T and G–C pairs. Ring atoms of adenine, for example, are named (N1, C2, N3, C4, C5, C6, N7, C8, N9) respectively.

Watson-Crick base pairs

Four characters are reserved for atom names in the PDB format. The convention, as seen in files downloaded from the RCSB PDB, is to put the two-character base name in the middle, as in .N1.. Note that here each dot (.) is used for a space character to make it stand out.

Long time ago, I became aware a PDB format variant where the base name is left-aligned, as in N1... This case has ever since been properly handled by 3DNA (including DSSR and SNAP). While checking submitted entries to web-DSSR, I recently noticed yet another PDB format variation in labeling base names with the format of ..N1 (i.e., right-aligned). Without taking this special variant of PDB format into consideration, 3DNA/DSSR reported that “no nucleotides found!” Once the issue is known, however, fixing it is straightforward. As of May 4, 2015, 3DNA v2.2, DSSR and SNAP can all handle this special PDB variant correctly.

Over the years, I have come across many PDB variants claimed to compliant with the loosely defined format. If you find 3DNA or DSSR is not working as expected, it is likely the coordinate file in the self-claimed ‘PDB format’ is at fault. Wherever practical, I’ve tried to incorporate as many non-standard variants as possible.

---

Comment

 
---

·

Thank you for printing this article from http://home.x3dna.org/. Please do not forget to visit back for more 3DNA-related information. — Xiang-Jun Lu