From early on, DSSR-derived nucleic acid secondary structures have been written in the compact dot-bracket notation (.dbn) with pseudo-knot information. To better connect DSSR to the 2D world, I recently looked into the connect (.ct) format, which was first introduced by Zuker’s mfold program. Over time, the .ct format has become one of the most commonly used RNA secondary structure formats, and it is more expressive than the .dbn format (see below).
As of v1.0, for each analyzed structure, DSSR produces two secondary structure files with default names dssr-2ndstrs.dbn and dssr-2ndstrs.ct, in .dbn and .ct formats, respectively. Using the 27-nucleotides (nt) RNA fragment 1msy as an example, the DSSR-derived secondary structure in .dbn and .ct formats are shown below:
In dot-bracket notation (.dbn) [dssr-2ndstrs.dbn]
------------------------------------------------------
>1msy nts=27 DSSR-derived secondary structure
UGCUCCUAGUACGUAAGGACCGGAGUG
.(((((.....(....)....))))).
------------------------------------------------------
In connect format (.ct) [dssr-2ndstrs.ct]
------------------------------------------------------
27 DSSR-derived secondary structure in '1msy'
1 U 0 2 0 2647 # name=A.U2647
2 G 1 3 26 2648 # name=A.G2648, pairedNt=A.U2672
3 C 2 4 25 2649 # name=A.C2649, pairedNt=A.G2671
4 U 3 5 24 2650 # name=A.U2650, pairedNt=A.A2670
5 C 4 6 23 2651 # name=A.C2651, pairedNt=A.G2669
6 C 5 7 22 2652 # name=A.C2652, pairedNt=A.G2668
7 U 6 8 0 2653 # name=A.U2653
8 A 7 9 0 2654 # name=A.A2654
9 G 8 10 0 2655 # name=A.G2655
10 U 9 11 0 2656 # name=A.U2656
11 A 10 12 0 2657 # name=A.A2657
12 C 11 13 17 2658 # name=A.C2658, pairedNt=A.G2663
13 G 12 14 0 2659 # name=A.G2659
14 U 13 15 0 2660 # name=A.U2660
15 A 14 16 0 2661 # name=A.A2661
16 A 15 17 0 2662 # name=A.A2662
17 G 16 18 12 2663 # name=A.G2663, pairedNt=A.C2658
18 G 17 19 0 2664 # name=A.G2664
19 A 18 20 0 2665 # name=A.A2665
20 C 19 21 0 2666 # name=A.C2666
21 C 20 22 0 2667 # name=A.C2667
22 G 21 23 6 2668 # name=A.G2668, pairedNt=A.C2652
23 G 22 24 5 2669 # name=A.G2669, pairedNt=A.C2651
24 A 23 25 4 2670 # name=A.A2670, pairedNt=A.U2650
25 G 24 26 3 2671 # name=A.G2671, pairedNt=A.C2649
26 U 25 27 2 2672 # name=A.U2672, pairedNt=A.G2648
27 G 26 0 0 2673 # name=A.G2673
------------------------------------------------------
Presumably, the .ct format is very simple, and examining a sample file as shown above would give one a pretty good sense of what each column is about. While there exist many oversimplified descriptions of the .ct format on the web, the most detailed and accurate explanation is from the mfold manual:
The ``ct’‘ file (connect table) contains the sequence and base pair information, and is meant to be an input file for a structure drawing program. In addition to containing base pair information, it also lists the 5′ and 3′ neighbor of each base, allowing for the representation of circular RNA or multiple molecules. The ct file also lists the historical base numbering in the original sequence, as bases and base pairs are numbered according from 1 to the size of the folded segment. A portion of a ct file is displayed in Figure 12.
Figure 12: The ct file for the second and final folding of S. cerevisiae Phe-tRNA at 37°, with default parameters. The first record displays the fragment size (76), ΔG and sequence name. The ith subsequent record contains, in order, i, ri, the index of the 5′-connecting base, the index of the 3′-connecting base, the index of the paired base and the historical numbering of the ith base in the original sequence. The 5′, 3′ and base pair indices are 0 when there is no connection or base pair.
Specifically, the 3rd, 4th, and 6th columns in the .ct format convey specific information; by design, they are not redundant to information contained in the 1st column. Note that in the above ‘1msy’ example, the 6th column gives the nt sequence numbers (as in the PDB datafile) instead of the serial numbers (as in the 1st column). The DSSR produced .ct files also contain extra information after ‘#’, in the comma separated key=value format.
As an example of the usefulness of the 3rd and 4th columns, have a look of the DSSR-derived .ct file for the Dickerson DNA dodecamer duplex with sequence CGCGAATTCGCG:
24 DSSR-derived secondary structure in '355d'
1 C 0 2 24 1 # name=A.DC1, pairedNt=B.DG24
2 G 1 3 23 2 # name=A.DG2, pairedNt=B.DC23
3 C 2 4 22 3 # name=A.DC3, pairedNt=B.DG22
4 G 3 5 21 4 # name=A.DG4, pairedNt=B.DC21
5 A 4 6 20 5 # name=A.DA5, pairedNt=B.DT20
6 A 5 7 19 6 # name=A.DA6, pairedNt=B.DT19
7 T 6 8 18 7 # name=A.DT7, pairedNt=B.DA18
8 T 7 9 17 8 # name=A.DT8, pairedNt=B.DA17
9 C 8 10 16 9 # name=A.DC9, pairedNt=B.DG16
10 G 9 11 15 10 # name=A.DG10, pairedNt=B.DC15
11 C 10 12 14 11 # name=A.DC11, pairedNt=B.DG14
12 G 11 0 13 12 # name=A.DG12, pairedNt=B.DC13
13 C 0 14 12 13 # name=B.DC13, pairedNt=A.DG12
14 G 13 15 11 14 # name=B.DG14, pairedNt=A.DC11
15 C 14 16 10 15 # name=B.DC15, pairedNt=A.DG10
16 G 15 17 9 16 # name=B.DG16, pairedNt=A.DC9
17 A 16 18 8 17 # name=B.DA17, pairedNt=A.DT8
18 A 17 19 7 18 # name=B.DA18, pairedNt=A.DT7
19 T 18 20 6 19 # name=B.DT19, pairedNt=A.DA6
20 T 19 21 5 20 # name=B.DT20, pairedNt=A.DA5
21 C 20 22 4 21 # name=B.DC21, pairedNt=A.DG4
22 G 21 23 3 22 # name=B.DG22, pairedNt=A.DC3
23 C 22 24 2 23 # name=B.DC23, pairedNt=A.DG2
24 G 23 0 1 24 # name=B.DG24, pairedNt=A.DC1
Note the 0 at the 4th column for A.DG12 which is at the 3′ end of chain A, and the 0 at 3rd column for B.DC13 which is at the 5′ end of chain B.

![1msy [GUAA tetra loop] in 3d and 2d representations 1msy [GUAA tetra loop] in 3d and 2d representations](http://forum.x3dna.org/images/1msy-3d-2d.png)